Recommended way to handle long text inputs (Sliding Window for Token Classification in DJL) #3809

ivrisivris · 2025-10-13T11:49:19Z

ivrisivris
Oct 13, 2025

Hi DJL team,

we’re using DJL for token classification on long text inputs that often exceed the model’s maximum token length (e.g., 512 tokens).

The HuggingFaceTokenizer supports parameters like stride and withOverflowingTokens, but it seems that TokenClassificationTranslator does not currently handle overflowing segments automatically.

Could you please advise what is the recommended approach in DJL for processing long texts — ideally with a sliding window or chunking mechanism that merges overlapping predictions?

Also, is there any plan to include native sliding window support in future versions of the library?

Thanks!

Kamil Kočí

frankfliu · 2025-10-13T14:57:54Z

frankfliu
Oct 13, 2025

@ivrisivris
TokenClassificationTranslator doesn't support overflow tokens. You have to implement your own TokenClassificationTranslator. Are interested in contribute to DJL to enhance TokenClassificationTranslator?

0 replies

david-sitsky · 2025-10-16T03:08:02Z

david-sitsky
Oct 16, 2025

@frankfliu - slightly related, but what about TextEmbeddingTranslator handling large text inputs via chunking? At the moment I am doing the chunking "client-side" using the HuggingFace tokeniser with .optStride and sending the token IDs to my custom translators, but ideally I'd like this done "properly" in the standard DJL library. Would we create a ChunkingTextEmbeddingTranslator that returns a List<float[]> or something like that?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recommended way to handle long text inputs (Sliding Window for Token Classification in DJL) #3809

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Recommended way to handle long text inputs (Sliding Window for Token Classification in DJL) #3809

Uh oh!

ivrisivris Oct 13, 2025

Replies: 2 comments

Uh oh!

frankfliu Oct 13, 2025

Uh oh!

david-sitsky Oct 16, 2025

ivrisivris
Oct 13, 2025

frankfliu
Oct 13, 2025

david-sitsky
Oct 16, 2025