Disaggregated Tokenization (Tokenization + Chat-Templating)

## Objectives

Today tokenization is done by an embedded HuggingFace (HF) Rust tokenizer bindings. Chat-templating is done by an embedded Python interpreter that calls the relevant HF libraries.

For production we would like to support a deployment of disaggregated preprocessing as well. A service that handles prompt -> tokens for all APIs (the entire preprocessing step), ideally using code and features from vLLM.

## Phased approach

1. Service for tokenization and templating that runs HF (transformers) code
2. A vLLM-based efficient and lightweight preprocessing service
    - This is the most scalable/future-proof approach, since preprocessing is getting more complex by time and is efficiently implemented by vLLM maintainers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Disaggregated Tokenization (Tokenization + Chat-Templating) #126

Objectives

Phased approach

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Disaggregated Tokenization (Tokenization + Chat-Templating) #126

Description

Objectives

Phased approach

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions