Recommended way to handle long text inputs (Sliding Window for Token Classification in DJL) #3809
Replies: 2 comments
-
|
@ivrisivris |
Beta Was this translation helpful? Give feedback.
-
|
@frankfliu - slightly related, but what about |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi DJL team,
we’re using DJL for token classification on long text inputs that often exceed the model’s maximum token length (e.g., 512 tokens).
The HuggingFaceTokenizer supports parameters like stride and withOverflowingTokens, but it seems that TokenClassificationTranslator does not currently handle overflowing segments automatically.
Could you please advise what is the recommended approach in DJL for processing long texts — ideally with a sliding window or chunking mechanism that merges overlapping predictions?
Also, is there any plan to include native sliding window support in future versions of the library?
Thanks!
Kamil Kočí
Beta Was this translation helpful? Give feedback.
All reactions