-
Notifications
You must be signed in to change notification settings - Fork 3.1k
[Feature] Add /tokenize and /detokenize OpenAI compatible endpoints #9545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @adarshxs, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request integrates new OpenAI-compatible /tokenize
and /detokenize
endpoints into the API server. This enhancement allows users to programmatically convert text into token IDs and token IDs back into human-readable text, addressing a common need for interacting with large language models. The changes involve adding new API routes, defining request/response protocols, implementing the core tokenization/detokenization logic, and providing comprehensive unit tests to ensure reliability.
Highlights
- New API Endpoints: This PR introduces two new OpenAI-compatible API endpoints:
/tokenize
and/detokenize
. These endpoints allow users to convert text into token IDs and vice-versa, providing essential utilities for working with language models. - Core Logic Implementation: The core logic for handling tokenization and detokenization requests is implemented in the new
serving_tokenize.py
file. This includes robust handling for various input formats (single strings, lists of strings/tokens) and options for managing special tokens. - API Protocol Definition: New data models (
TokenizeRequest
,TokenizeResponse
,DetokenizeRequest
,DetokenizeResponse
) have been added toprotocol.py
to define the clear structure for requests and responses, ensuring compatibility and ease of use. - Thorough Unit Testing: Comprehensive unit tests have been added to
test_srt_endpoint.py
to validate the functionality of the new endpoints, covering various scenarios including valid inputs, edge cases, and error handling for invalid data.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces /tokenize
and /detokenize
endpoints to the OpenAI-compatible API, which is a valuable feature enhancement. The implementation is well-structured, following the existing patterns for API endpoints, and includes a comprehensive set of unit tests covering various input types and edge cases. My review includes one suggestion to improve the robustness of input validation in the detokenization logic to handle malformed requests more gracefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to discuss further whether to add these endpoints.
Understood. it was initially requested in #5653 and we received a lot a requests in this thread as well: #5711 . additionally similar endpoints are supported in vllm: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html?h=%2Ftokenize#tokenizer-api_1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! We plan to merge this PR, please resolve the comments ~
"""Request schema for the /tokenize endpoint.""" | ||
|
||
model: str = DEFAULT_MODEL_NAME | ||
prompt: Union[str, List[str]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we keep the batched option? cc @slin1237 @CatherineSue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can keep it as this is not an official OpenAI endpoint, and it directly uses tokenizer, so no performance or compatibility concerns.
I think these lines could be removed
as we already routed |
yeah my bad i forgot to remove that. updated the same |
@CatherineSue, could you please check the OpenAI endpoints related modifications? |
and request.tokens | ||
and isinstance(request.tokens[0], int) | ||
): | ||
if not all(isinstance(t, int) for t in request.tokens): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Should this be better removed to _validate_request
?
return self.create_error_response( | ||
"Invalid input: 'tokens' must be a list of integers." | ||
) | ||
tokens_to_decode = [int(t) for t in request.tokens] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Why do we need int(t) here? I assume the above if check already makes sure tokens are int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Left some nit comments.
@ispobock @slin1237 @merrymercy Could you please review this PR when you have a moment? This is important for our workflow. Thanks! |
@adarshxs Could you check the failed CI test? https://github.com/sgl-project/sglang/actions/runs/18284367523/job/52107776495?pr=9545#step:5:14449 |
@ispobock should be fixed. thanks |
Motivation
multiple users requesting tokenize/detokenize endpoints: #5711 (comment)
Adds
/tokenize
and/detokenize
endpoints to the OpenAI-compatible API server.works like this:

Checklist