Skip to content

[serve][llm] Add tokenize/detokenize to SGLang example engine#61446

Open
eureka928 wants to merge 1 commit intoray-project:masterfrom
eureka928:feature/sglang-tokenize-detokenize
Open

[serve][llm] Add tokenize/detokenize to SGLang example engine#61446
eureka928 wants to merge 1 commit intoray-project:masterfrom
eureka928:feature/sglang-tokenize-detokenize

Conversation

@eureka928
Copy link
Contributor

Description

Add tokenize() and detokenize() endpoints to the SGLang example engine, completing the remaining items from #61113 (follow-up to #61159 which added embeddings).

Changes:

  • tokenize() — uses self.engine.tokenizer_manager.tokenizer.encode(), supports both TokenizeCompletionRequest (prompt) and TokenizeChatRequest (messages). Returns tokens, count, and max_model_len.
  • detokenize() — uses tokenizer.decode(request.tokens), returns the decoded prompt string.
  • Updated readme.md to reflect the new supported endpoints.
  • Added test_sglang_tokenize and test_sglang_detokenize tests with round-trip verification.

Related issues

Closes remaining checklist items from #61113.
Follow-up to #61159 (embeddings).

Additional information

  • Tests use httpx for direct HTTP calls since the OpenAI Python client doesn't expose tokenize/detokenize endpoints.
  • max_model_len is read from self.engine.server_args.context_length.
  • The tokenize endpoint reuses _render_chat_prompt() for chat tokenize requests, consistent with the chat completions flow.

@eureka928 eureka928 requested a review from a team as a code owner March 3, 2026 13:10
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds tokenize and detokenize endpoints to the SGLang example engine, along with corresponding tests. The implementation is mostly correct, but it misses a check for whether the tokenizer is available, which could lead to a runtime error if SGLang is initialized with --skip-tokenizer-init. I've added comments to address this by adding checks for the tokenizer's existence in both new methods.

@eureka928 eureka928 force-pushed the feature/sglang-tokenize-detokenize branch 2 times, most recently from e1eef59 to 23df4e0 Compare March 3, 2026 13:14
@ray-gardener ray-gardener bot added serve Ray Serve Related Issue docs An issue or change related to documentation llm community-contribution Contributed by the community labels Mar 3, 2026
@eicherseiji eicherseiji added the go add ONLY when ready to merge, run all tests label Mar 3, 2026
Copy link
Contributor

@eicherseiji eicherseiji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…engine

Implement tokenize() and detokenize() methods on SGLangServer using
the tokenizer from self.engine.tokenizer_manager.tokenizer. Tokenize
supports both completion and chat request formats. Add round-trip
tests and update the readme.

Closes remaining items from ray-project#61113.

Signed-off-by: leonace924 <meobius123@gmail.com>
@eureka928 eureka928 force-pushed the feature/sglang-tokenize-detokenize branch from 23df4e0 to 2f3345c Compare March 3, 2026 21:42
@eicherseiji
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community docs An issue or change related to documentation go add ONLY when ready to merge, run all tests llm serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants