-
Notifications
You must be signed in to change notification settings - Fork 32
feat: add max-model-len configuration and validation for context window (#82) #85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the PR! Very thorough. Please see a minor comment
pkg/llm-d-inference-sim/utils.go
Outdated
func validateContextWindow(promptTokens int, maxCompletionTokens *int64, maxModelLen int) error { | ||
if maxModelLen <= 0 { | ||
return nil // no limit configured | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that maxModelLen is <= 0? You added a check in configuration validate()
if c.MaxModelLen < 1 { return errors.New("max model len cannot be less than 1") }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @irar2 , you're right. This is redundant. Removing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, Can you please check now?
It("should pass when no max model length is configured", func() { | ||
promptTokens := 1000 | ||
maxCompletionTokens := int64(1000) | ||
maxModelLen := 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this.
Entry(tests[9].name, tests[9].args), | ||
Entry(tests[10].name, tests[10].args), | ||
Entry(tests[11].name, tests[11].args), | ||
Entry(tests[11].name, tests[11].args), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The merge is incorrect, there are 11 tests in main, and you added another one, so there should be an additional line with
Entry(tests[12].name, tests[12].args)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed this one, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
Fixed static lint errors |
pkg/llm-d-inference-sim/utils.go
Outdated
|
||
totalTokens := int64(promptTokens) + completionTokens | ||
if totalTokens > int64(maxModelLen) { | ||
return fmt.Errorf("this model's maximum context length is %d tokens. However, you requested %d tokens (%d in the messages, %d in the completion). Please reduce the length of the messages or completion", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to be consistent with vLLM's error messages, which start with capital letter.
Please return error message to be sent to the client and use it in HTTP response creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed that to lowercase to eliminate a static error check but I guess that was only for error messages ending with punctuation. Nevertheless I fixed it now.
Let me know if there is any other issue?
pkg/llm-d-inference-sim/utils.go
Outdated
|
||
totalTokens := int64(promptTokens) + completionTokens | ||
if totalTokens > int64(maxModelLen) { | ||
return fmt.Errorf("This model's maximum context length is %d tokens. However, you requested %d tokens (%d in the messages, %d in the completion). Please reduce the length of the messages or completion", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mohitpalsingh, the error message cannot be capitalized (the error: "error strings should not be capitalized"), and actually there is no need in error, we need only message that should be returned in HTTP response. This is the reason I suggested to return error message only and use it in the HTTP response payload.
You can run locally "make lint" to test lint issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mayabar I got the issue now. I've refactored the validation function to instead return bool and form the error inside the caller function. This solved both lint and coherency issues.
Let me know if this approach works?
@mohitpalsingh can you please update the readme file with new command line option, thanks ;) |
…ssages and update README
README is updated with the new flag 🐼 |
max-model-len Feature Implementation
Overview
This implementation adds support for the
max-model-len
parameter, which defines the model's context window - the maximum number of tokens in a single request including both input and output tokens.Feature Details
Configuration
--max-model-len
Validation Logic
When a request is received, the simulator validates:
If this condition is violated, the request is rejected with HTTP 400 status code.
Error Response Format
When the context window limit is exceeded, the following error response is returned:
Where:
<Z>
= max_model_len value<X>
= number of tokens in the prompt/messages<Y>
= max_tokens requested for completionImplementation Details
Files Modified
config.go
MaxModelLen int
field to configuration structnewConfig()
validate()
methodsimulator.go
--max-model-len
handleCompletions()
sendCompletionError()
to use correct HTTP status codeutils.go
validateContextWindow()
function for validation logicrequest.go
getMaxCompletionTokens()
method to completionRequest interfaceTest Coverage
Added comprehensive tests covering:
Usage Examples
Command Line
Configuration File
API Requests
Request that exceeds context window:
Response (HTTP 400):
Compatibility
/v1/chat/completions
) and text completion (/v1/completions
) endpointsmax_tokens
andmax_completion_tokens
parametersTesting
Run the test suite to verify the implementation:
go test ./pkg/llm-d-inference-sim/ -v
All tests pass (87/89 specs) including the new context window validation tests.