Skip to content

Conversation

adarshxs
Copy link
Collaborator

Motivation

multiple users requesting tokenize/detokenize endpoints: #5711 (comment)

Adds /tokenize and /detokenize endpoints to the OpenAI-compatible API server.

works like this:
image

Checklist

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @adarshxs, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates new OpenAI-compatible /tokenize and /detokenize endpoints into the API server. This enhancement allows users to programmatically convert text into token IDs and token IDs back into human-readable text, addressing a common need for interacting with large language models. The changes involve adding new API routes, defining request/response protocols, implementing the core tokenization/detokenization logic, and providing comprehensive unit tests to ensure reliability.

Highlights

  • New API Endpoints: This PR introduces two new OpenAI-compatible API endpoints: /tokenize and /detokenize. These endpoints allow users to convert text into token IDs and vice-versa, providing essential utilities for working with language models.
  • Core Logic Implementation: The core logic for handling tokenization and detokenization requests is implemented in the new serving_tokenize.py file. This includes robust handling for various input formats (single strings, lists of strings/tokens) and options for managing special tokens.
  • API Protocol Definition: New data models (TokenizeRequest, TokenizeResponse, DetokenizeRequest, DetokenizeResponse) have been added to protocol.py to define the clear structure for requests and responses, ensuring compatibility and ease of use.
  • Thorough Unit Testing: Comprehensive unit tests have been added to test_srt_endpoint.py to validate the functionality of the new endpoints, covering various scenarios including valid inputs, edge cases, and error handling for invalid data.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces /tokenize and /detokenize endpoints to the OpenAI-compatible API, which is a valuable feature enhancement. The implementation is well-structured, following the existing patterns for API endpoints, and includes a comprehensive set of unit tests covering various input types and edge cases. My review includes one suggestion to improve the robustness of input validation in the detokenization logic to handle malformed requests more gracefully.

@adarshxs
Copy link
Collaborator Author

cc @CatherineSue

Copy link
Collaborator

@JustinTong0323 JustinTong0323 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to discuss further whether to add these endpoints.

@adarshxs
Copy link
Collaborator Author

adarshxs commented Aug 31, 2025

We need to discuss further whether to add these endpoints.

Understood. it was initially requested in #5653 and we received a lot a requests in this thread as well: #5711 . additionally similar endpoints are supported in vllm: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html?h=%2Ftokenize#tokenizer-api_1

Copy link
Collaborator

@JustinTong0323 JustinTong0323 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! We plan to merge this PR, please resolve the comments ~

"""Request schema for the /tokenize endpoint."""

model: str = DEFAULT_MODEL_NAME
prompt: Union[str, List[str]]
Copy link
Collaborator

@JustinTong0323 JustinTong0323 Aug 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we keep the batched option? cc @slin1237 @CatherineSue

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep it as this is not an official OpenAI endpoint, and it directly uses tokenizer, so no performance or compatibility concerns.

@JustinTong0323
Copy link
Collaborator

JustinTong0323 commented Sep 2, 2025

I think these lines could be removed

@app.post(
    "/tokenize",
    response_class=ORJSONResponse,
    dependencies=[Depends(validate_json_request)],
    include_in_schema=False,
)
async def openai_tokenize(request: TokenizeRequest, raw_request: Request):
    return await openai_v1_tokenize(request, raw_request)
@app.post(
    "/detokenize",
    response_class=ORJSONResponse,
    dependencies=[Depends(validate_json_request)],
    include_in_schema=False,
)
async def openai_detokenize(request: DetokenizeRequest, raw_request: Request):
    return await openai_v1_detokenize(request, raw_request)

as we already routed /tokenize and /detokenize

@adarshxs
Copy link
Collaborator Author

adarshxs commented Sep 2, 2025

yeah my bad i forgot to remove that. updated the same

@hnyls2002 hnyls2002 enabled auto-merge (squash) September 10, 2025 04:31
@hnyls2002
Copy link
Collaborator

@CatherineSue, could you please check the OpenAI endpoints related modifications?

and request.tokens
and isinstance(request.tokens[0], int)
):
if not all(isinstance(t, int) for t in request.tokens):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should this be better removed to _validate_request?

return self.create_error_response(
"Invalid input: 'tokens' must be a list of integers."
)
tokens_to_decode = [int(t) for t in request.tokens]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why do we need int(t) here? I assume the above if check already makes sure tokens are int?

Copy link
Collaborator

@CatherineSue CatherineSue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Left some nit comments.

@anonymousmaharaj
Copy link

@ispobock @slin1237 @merrymercy Could you please review this PR when you have a moment? This is important for our workflow. Thanks!

@ispobock ispobock disabled auto-merge October 6, 2025 14:31
@ispobock ispobock added the run-ci label Oct 6, 2025
@ispobock
Copy link
Collaborator

ispobock commented Oct 7, 2025

@adarshxs
Copy link
Collaborator Author

adarshxs commented Oct 7, 2025

@ispobock should be fixed. thanks

@ispobock ispobock merged commit 7c3f07d into sgl-project:main Oct 8, 2025
164 of 173 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants