You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adds a limit parameter to Extract, Map, Filter, and Reduce operations to control the number of processed items.
Co-authored-by: ss.shankar505 <[email protected]>
| `limit` | Maximum number of documents to extract from before stopping | Processes all data |
144
+
145
+
When `limit` is set, Extract only reformats and submits the first _N_ documents. This is handy when the upstream dataset is large and you want to cap cost while previewing results.
Copy file name to clipboardExpand all lines: docs/operators/filter.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -83,6 +83,10 @@ This example demonstrates how the Filter operation distinguishes between high-im
83
83
84
84
See [map optional parameters](./map.md#optional-parameters) for additional configuration options, including `batch_prompt` and `max_batch_size`.
85
85
86
+
### Limiting filtered outputs
87
+
88
+
`limit`behaves slightly differently for filter operations than for map operations. Because filter drops documents whose predicate evaluates to `false`, the limit counts only the documents that would be retained (i.e., the ones whose boolean output is `true`). DocETL will continue evaluating additional inputs until it has collected `limit` passing documents and then stop scheduling further LLM calls. This ensures you can request “the first N matches” without paying to score the entire dataset.
89
+
86
90
!!! info "Validation"
87
91
88
92
For more details on validation techniques and implementation, see [operators](../concepts/operators.md#validation).
Copy file name to clipboardExpand all lines: docs/operators/map.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -140,6 +140,7 @@ This example demonstrates how the Map operation can transform long, unstructured
140
140
| `optimize` | Flag to enable operation optimization | `True` |
141
141
| `recursively_optimize` | Flag to enable recursive optimization of operators synthesized as part of rewrite rules | `false` |
142
142
| `sample` | Number of samples to use for the operation | Processes all data |
143
+
| `limit` | Maximum number of outputs to produce before stopping | Processes all data |
143
144
| `tools` | List of tool definitions for LLM use | None |
144
145
| `validate` | List of Python expressions to validate the output | None |
145
146
| `flush_partial_results` | Write results of individual batches of map operation to disk for faster inspection | False |
@@ -158,6 +159,10 @@ This example demonstrates how the Map operation can transform long, unstructured
158
159
159
160
Note: If `drop_keys` is specified, `prompt` and `output` become optional parameters.
160
161
162
+
### Limiting execution
163
+
164
+
Set `limit` when you only need the first _N_ map results or want to cap LLM spend. The operation slices the processed dataset to the first `limit` entries and also stops scheduling new prompts once that many outputs have been produced, even if a prompt returns multiple records. Filter operations inherit this behavior but redefine the count so the limit only applies to records whose filter predicate evaluates to `true` (see [Filter](./filter.md#optional-parameters)).
| `sample` | Number of samples to use for the operation | None |
55
+
| `limit` | Maximum number of groups to process before stopping | All groups |
55
56
| `synthesize_resolve` | If false, won't synthesize a resolve operation between map and reduce | true |
56
57
| `model` | The language model to use | Falls back to default_model |
57
58
| `input` | Specifies the schema or keys to subselect from each item | All keys from input items |
@@ -67,6 +68,10 @@ This Reduce operation processes customer feedback grouped by department:
67
68
| `litellm_completion_kwargs` | Additional parameters to pass to LiteLLM completion calls. | {} |
68
69
| `bypass_cache` | If true, bypass the cache for this operation. | False |
69
70
71
+
### Limiting group processing
72
+
73
+
Set `limit` to short-circuit the reduce phase after the first _N_ groups have been aggregated. This is useful for previewing results or capping LLM usage when you only need the earliest groups (according to the original input order). Groups beyond the limit are never scheduled, so you avoid extra fold/merge calls. If a grouped reduce returns more than one record per group, the final output list is truncated to `limit`.
0 commit comments