feat: add max-model-len configuration and validation for context window (#82) #85

mohitpalsingh · 2025-07-12T21:20:13Z

max-model-len Feature Implementation

Overview

This implementation adds support for the max-model-len parameter, which defines the model's context window - the maximum number of tokens in a single request including both input and output tokens.

Feature Details

Configuration

Parameter: --max-model-len
Default Value: 1024 tokens
Type: Integer
Description: Defines the model's context window size

Validation Logic

When a request is received, the simulator validates:

prompt_tokens + max_completion_tokens <= max_model_len

If this condition is violated, the request is rejected with HTTP 400 status code.

Error Response Format

When the context window limit is exceeded, the following error response is returned:

{
  "object": "error",
  "message": "This model's maximum context length is <Z> tokens. However, you requested <Y> tokens (<X> in the messages, <Y> in the completion). Please reduce the length of the messages or completion.",
  "type": "BadRequestError",
  "param": null,
  "code": 400
}

Where:

<Z> = max_model_len value
<X> = number of tokens in the prompt/messages
<Y> = max_tokens requested for completion

Implementation Details

Files Modified

config.go
- Added MaxModelLen int field to configuration struct
- Added default value (1024) in newConfig()
- Added validation in validate() method
simulator.go
- Added command line flag --max-model-len
- Added context window validation in handleCompletions()
- Fixed sendCompletionError() to use correct HTTP status code
utils.go
- Added validateContextWindow() function for validation logic
request.go
- Added getMaxCompletionTokens() method to completionRequest interface
- Implemented the method for both chat and text completion requests

Test Coverage

Added comprehensive tests covering:

Configuration validation for invalid max-model-len values
Context window validation logic (unit tests)
Integration tests for both chat and text completion APIs
HTTP response format validation
Successful requests within context window limits

Usage Examples

Command Line

# Start simulator with 2048 token context window
./llm-d-inference-sim --model my-model --max-model-len 2048

# Default 1024 token context window
./llm-d-inference-sim --model my-model

Configuration File

max-model-len: 2048

API Requests

Request that exceeds context window:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-model",
    "messages": [{"role": "user", "content": "This is a very long prompt..."}],
    "max_tokens": 100
  }'

Response (HTTP 400):

{
  "object": "error",
  "message": "This model's maximum context length is 50 tokens. However, you requested 120 tokens (20 in the messages, 100 in the completion). Please reduce the length of the messages or completion.",
  "type": "BadRequestError",
  "param": null,
  "code": 400
}

Compatibility

Works with both chat completion (/v1/chat/completions) and text completion (/v1/completions) endpoints
Supports both max_tokens and max_completion_tokens parameters
Compatible with streaming and non-streaming requests
Maintains backward compatibility (feature is optional with sensible default)

Testing

Run the test suite to verify the implementation:

go test ./pkg/llm-d-inference-sim/ -v

All tests pass (87/89 specs) including the new context window validation tests.

…ow (#82)

irar2

Thanks a lot for the PR! Very thorough. Please see a minor comment

irar2 · 2025-07-13T05:47:00Z

pkg/llm-d-inference-sim/utils.go

+func validateContextWindow(promptTokens int, maxCompletionTokens *int64, maxModelLen int) error {
+	if maxModelLen <= 0 {
+		return nil // no limit configured
+	}


Is it possible that maxModelLen is <= 0? You added a check in configuration validate()
if c.MaxModelLen < 1 { return errors.New("max model len cannot be less than 1") }

Hey @irar2 , you're right. This is redundant. Removing this.

Done, Can you please check now?

irar2 · 2025-07-13T05:49:26Z

pkg/llm-d-inference-sim/utils_test.go

+		It("should pass when no max model length is configured", func() {
+			promptTokens := 1000
+			maxCompletionTokens := int64(1000)
+			maxModelLen := 0


Removing this.

…extWindow

Signed-off-by: Mohit Pal Singh <[email protected]>

irar2 · 2025-07-13T07:36:23Z

pkg/llm-d-inference-sim/config_test.go

 		Entry(tests[9].name, tests[9].args),
 		Entry(tests[10].name, tests[10].args),
-		Entry(tests[11].name, tests[11].args),
+    Entry(tests[11].name, tests[11].args),


The merge is incorrect, there are 11 tests in main, and you added another one, so there should be an additional line with
Entry(tests[12].name, tests[12].args)

Missed this one, thanks.

mohitpalsingh · 2025-07-13T08:24:11Z

Fixed static lint errors

mayabar · 2025-07-13T08:29:27Z

pkg/llm-d-inference-sim/utils.go

+
+	totalTokens := int64(promptTokens) + completionTokens
+	if totalTokens > int64(maxModelLen) {
+		return fmt.Errorf("this model's maximum context length is %d tokens. However, you requested %d tokens (%d in the messages, %d in the completion). Please reduce the length of the messages or completion",


We want to be consistent with vLLM's error messages, which start with capital letter.
Please return error message to be sent to the client and use it in HTTP response creation.

I changed that to lowercase to eliminate a static error check but I guess that was only for error messages ending with punctuation. Nevertheless I fixed it now.

Let me know if there is any other issue?

mayabar · 2025-07-14T04:49:08Z

pkg/llm-d-inference-sim/utils.go

+
+	totalTokens := int64(promptTokens) + completionTokens
+	if totalTokens > int64(maxModelLen) {
+		return fmt.Errorf("This model's maximum context length is %d tokens. However, you requested %d tokens (%d in the messages, %d in the completion). Please reduce the length of the messages or completion",


Hi @mohitpalsingh, the error message cannot be capitalized (the error: "error strings should not be capitalized"), and actually there is no need in error, we need only message that should be returned in HTTP response. This is the reason I suggested to return error message only and use it in the HTTP response payload.
You can run locally "make lint" to test lint issues.

@mayabar I got the issue now. I've refactored the validation function to instead return bool and form the error inside the caller function. This solved both lint and coherency issues.

Let me know if this approach works?

mayabar · 2025-07-14T06:41:09Z

@mohitpalsingh can you please update the readme file with new command line option, thanks ;)

…ssages and update README

mohitpalsingh · 2025-07-14T08:30:17Z

README is updated with the new flag 🐼

feat: add max-model-len configuration and validation for context wind…

72710e1

…ow (#82)

mohitpalsingh mentioned this pull request Jul 12, 2025

Support --max-model-len config parameter #82

Closed

irar2 reviewed Jul 13, 2025

View reviewed changes

mohitpalsingh added 3 commits July 13, 2025 13:00

refactor: remove redundant check for max model length in validateCont…

39bb7a7

…extWindow

Merge branch 'main' into main

9e95645

Signed-off-by: Mohit Pal Singh <[email protected]>

fix: correct indentation for test entry in simulator configuration tests

cba24c5

irar2 reviewed Jul 13, 2025

View reviewed changes

test: add additional test case for simulator configuration

ae380e0

mohitpalsingh requested a review from irar2 July 13, 2025 07:51

mohitpalsingh added 2 commits July 13, 2025 13:48

fix: static lint check errors

5bbb302

fix: update error message capitalization in validateContextWindow

72ab4ba

mayabar reviewed Jul 13, 2025

View reviewed changes

fix: update error message capitalization in validateContextWindow

70b08d5

mohitpalsingh requested a review from mayabar July 13, 2025 11:13

mayabar requested changes Jul 14, 2025

View reviewed changes

fix: refactored context window validation func with detailed error me…

e1b93cd

…ssages and update README

mohitpalsingh requested a review from mayabar July 14, 2025 08:30

irar2 approved these changes Jul 14, 2025

View reviewed changes

mayabar approved these changes Jul 14, 2025

View reviewed changes

irar2 merged commit 9f3d093 into llm-d:main Jul 14, 2025
2 checks passed

feat: add max-model-len configuration and validation for context window (#82) #85

feat: add max-model-len configuration and validation for context window (#82) #85

Uh oh!

Conversation

mohitpalsingh commented Jul 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

max-model-len Feature Implementation

Overview

Feature Details

Configuration

Validation Logic

Error Response Format

Implementation Details

Files Modified

Test Coverage

Usage Examples

Command Line

Configuration File

API Requests

Compatibility

Testing

Uh oh!

irar2 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mohitpalsingh commented Jul 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mayabar commented Jul 14, 2025

Uh oh!

mohitpalsingh commented Jul 14, 2025

Uh oh!

Uh oh!

Uh oh!

mohitpalsingh commented Jul 12, 2025 •

edited

Loading