Skip to content

ForEachItem - Stream input items to the subflow instead of splitting the input file into multiple files #5161

@johnkm516

Description

@johnkm516

Feature description

One of the biggest problems with the current ForEachItem for me is that by default, ForEachItem will split an input file into multiple files, one for each batch.
Consider the following flow parameters :

  • Batch job that must handle ~100,000 rows, stored in an ION file from a database query. Subflow must take in one row per execution.
  • ForEachItem with the 100,000 rows. {{ouptuts.query.uri}} would be values input into the ForEachItem
  • Result : ForEachItem splits the 100,000 into 100,000 files in the internal storage, which takes ages to do because it's splitting one file into 100,000 files, and the disk IO becomes a major bottleneck

Instead of splitting the 100,000 row ION file into 100,000 separate files, I would like an option for the ForEachItem to simply stream the original ION file in memory in batches (the amount stored in memory will be dependent on the batch property's rows attribute), creating executions as it processes the file in a "sliding window" fashion.

I believe this will make the ForEachItem task far more efficient way it passes inputs to subflows.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions