Skip to content

prompt_tokens support in message [anthropic vertex ai] #1187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

arturfromtabnine
Copy link
Contributor

@arturfromtabnine arturfromtabnine commented Jun 27, 2025

Description

This PR improves token usage tracking in Google Vertex AI streaming responses by storing prompt token information in the stream state and including token usage metrics in response chunks

Motivation

The current implementation doesn't properly maintain token usage information across streaming chunks, which leads missing token metrics in the chunk response. This change ensures that prompt tokens are properly stored in the stream state and that total token counts are accurately calculated and included in the response

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)

How Has This Been Tested?

  • Unit Tests
  • Integration Tests
  • Manual Testing

Screenshots (if applicable)

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Related Issues

Before vs after

{
  id: 'vertex-ai-1751013934585',
  object: 'chat.completion.chunk',
  created: 1751013934,
  model: 'claude-3-7-sonnet-20250219',
  provider: 'vertex-ai',
  choices: [ { index: 0, delta: {}, finish_reason: 'end_turn' } ],
  usage: { completion_tokens: 29 }
}
{
  id: 'vertex-ai-1751013762551',
  object: 'chat.completion.chunk',
  created: 1751013762,
  model: 'claude-3-7-sonnet-20250219',
  provider: 'vertex-ai',
  choices: [ { index: 0, delta: {}, finish_reason: 'end_turn' } ],
  usage: { completion_tokens: 39, prompt_tokens: 9, total_tokens: 48 }
}

Copy link

matter-code-review bot commented Jun 27, 2025

Code Quality new feature bug fix

Description

Summary By MatterAI MatterAI logo

🔄 What Changed

This Pull Request enhances the token usage tracking for Google Vertex AI streaming responses. It introduces a mechanism to store prompt_tokens from the initial message_start event into the streamState. Subsequently, these stored prompt_tokens are consistently included in message_delta and message_stop chunks. Additionally, the message_stop event now calculates and includes total_tokens by summing prompt_tokens and completion_tokens.

🔍 Impact of the Change

This change significantly improves the accuracy and completeness of token usage reporting in streaming scenarios, especially for Anthropic models on Vertex AI. Previously, prompt_tokens might not have been consistently available or correctly aggregated across all stream events. Now, consumers of the stream will receive accurate prompt_tokens and a final total_tokens count, which is crucial for billing, monitoring, and cost analysis.

📁 Total Files Changed

1 file changed (src/providers/google-vertex-ai/chatComplete.ts).

🧪 Test Added

No specific tests were mentioned in the PR description. It is assumed that manual testing was performed to verify the token usage tracking.

🔒Security Vulnerabilities

No security vulnerabilities were detected in the provided code changes.

Motivation

This PR improves token usage tracking in Google Vertex AI streaming responses by storing prompt token information in the stream state and including token usage metrics in response chunks.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)

How Has This Been Tested?

  • Unit Tests
  • Integration Tests
  • Manual Testing

Screenshots (if applicable)

N/A

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Related Issues

.

Tip

Quality Recommendations

  1. Consider adding unit tests to verify the correct extraction, storage, and reporting of prompt_tokens and total_tokens across different stream chunk types (message_start, message_delta, message_stop). This ensures robustness and prevents regressions.

  2. While optional chaining is used, ensure that parsedChunk.message?.usage and parsedChunk.usage are always expected to be present when needed, or add explicit error handling/logging for unexpected missing usage data to improve debugging.

Sequence Diagram

sequenceDiagram
    participant SI as Stream Input
    participant VT as VertexAnthropicChatCompleteStreamChunkTransform
    participant SS as Stream State
    participant TO as Transformed Output

    SI->>VT: parsedChunk (raw stream data)
    activate VT

    alt parsedChunk.type === 'message_start'
        VT->>VT: Extract input_tokens from parsedChunk.message.usage
        VT->>SS: Store prompt_tokens (parsedChunk.message.usage.input_tokens)
        VT->>TO: Send JSON (id, type, message, usage: {prompt_tokens: parsedChunk.message.usage.input_tokens})
    else parsedChunk.type === 'message_delta'
        VT->>SS: Retrieve prompt_tokens
        VT->>TO: Send JSON (id, type, delta, usage: {prompt_tokens: streamState.usage.prompt_tokens})
    else parsedChunk.type === 'message_stop'
        VT->>SS: Retrieve prompt_tokens
        VT->>VT: Extract output_tokens from parsedChunk.usage
        VT->>VT: Calculate total_tokens (prompt_tokens + output_tokens)
        VT->>TO: Send JSON (id, type, usage: {completion_tokens, prompt_tokens, total_tokens})
    end

    deactivate VT
Loading

Copy link

@matter-code-review matter-code-review bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds support for prompt_tokens in the Google Vertex AI provider, which is a good enhancement for tracking token usage. I've identified a few minor improvements that could make the implementation more robust.

@arturfromtabnine arturfromtabnine changed the title prompt_tokens support in message prompt_tokens support in message [anthropic vertex ai] Jun 27, 2025
Copy link
Collaborator

@narengogi narengogi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty good for a bot tbh

Copy link

Important

PR Review Skipped

PR review skipped as per the configuration setting. Run a manually review by commenting /matter review

💡Tips to use Matter AI

Command List

  • /matter summary: Generate AI Summary for the PR
  • /matter review: Generate AI Reviews for the latest commit in the PR
  • /matter review-full: Generate AI Reviews for the complete PR
  • /matter release-notes: Generate AI release-notes for the PR
  • /matter : Chat with your PR with Matter AI Agent
  • /matter remember : Generate AI memories for the PR
  • /matter explain: Get an explanation of the PR
  • /matter help: Show the list of available commands and documentation
  • Need help? Join our Discord server: https://discord.gg/fJU5DvanU3

@VisargD VisargD merged commit 76676ba into Portkey-AI:main Jul 2, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants