Skip to content

seg_id0 is duplicated for the root segment for big files when multiple files are loaded #710

@yruslan

Description

@yruslan

Describe the bug

seg_id0 should never be duplicated for the root segment.

However, when loading big files, we see duplications.

Code snippet that caused the issue

Maybe this is happening only when record length field is used:

  .option("record_format", "F")
  .option("record_length_field", "REC_LENGTH + 17")
  .option("segment_field", "SEGMENT-ID")

Expected behavior

seg_id0 should never be duplicated for the root segment.

Context

  • Cobrix version: 2.7.5
  • Spark version: 3.3.4
  • Scala version: 2.12
  • Operating system: --

Copybook (if possible)

--

Attach a small data file that can help reproduce the issue, if possible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions