improvement: add cache details to usage for (vertex ai) gemini and anthropic models #1373

miuosz · 2025-10-10T03:19:55Z

Description

(vertex) Add cache count (cachedContentTokenCount) to google models responses — supported by gemini models
(vertex) Add cache usage to Anthropic model responses

Motivation

Without proper usage details, it's impossible to accurately count tokens or estimate the cost.

1. claude-opus-4

before

"usage": {
    "prompt_tokens": 4,
    "completion_tokens": 317,
    "total_tokens": 321
}

after

"usage": {
    "prompt_tokens": 4,
    "completion_tokens": 310,
    "total_tokens": 2078,
    "cache_read_input_tokens": 0,
    "cache_creation_input_tokens": 1764
}

2. gemini-2.5-flash

before

"usage": {
    "prompt_tokens": 2924,
    "completion_tokens": 4,
    "total_tokens": 3036,
    "completion_tokens_details": {
        "reasoning_tokens": 108
    }
}

after

"usage": {
    "prompt_tokens": 2924,
    "completion_tokens": 3,
    "total_tokens": 3078,
    "prompt_tokens_details": {
        "cached_tokens": 2532
    },
    "completion_tokens_details": {
        "reasoning_tokens": 151
    }
}

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)

How Has This Been Tested?

Unit Tests
Integration Tests
Manual Testing

Screenshots (if applicable)

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Related Issues

matter-code-review · 2025-10-10T03:20:33Z

Description

Summary By MatterAI

🔄 What Changed

This Pull Request introduces a new prompt_tokens_details object with a cached_tokens field into the usage reporting for Anthropic and Google Vertex AI (Gemini) chat completion responses. This applies to both direct and streaming API calls. The AnthropicUsage type definition has been updated to reflect this new structure. Additionally, the condition for shouldSendCacheUsage in Google Vertex AI has been refined to ensure it's not sent under strict OpenAI compliance and only when cache tokens are present.

🔍 Impact of the Change

This enhancement provides more granular visibility into token consumption by explicitly reporting cached token usage. This improves cost analysis, observability, and understanding of how caching mechanisms affect token usage for these AI models.

📁 Total Files Changed

src/providers/anthropic/chatComplete.ts: Added prompt_tokens_details to usage object in direct and streaming responses.
src/providers/anthropic/types.ts: Added prompt_tokens_details to AnthropicUsage interface.
src/providers/google-vertex-ai/chatComplete.ts: Added prompt_tokens_details to usage object in direct and streaming responses, and refined shouldSendCacheUsage condition.

🧪 Test Added

N/A

🔒Security Vulnerabilities

N/A

Motivation

To provide comprehensive and granular token usage information, including cached tokens, for Vertex AI Gemini and Anthropic models, thereby improving cost analysis, observability, and understanding of model interactions.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)

How Has This Been Tested?

Unit Tests
Integration Tests
Manual Testing

Screenshots (if applicable)

N/A

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Related Issues

N/A

Tip

Quality Recommendations

Implement unit and/or integration tests to verify the accurate calculation and reporting of cached_tokens in the prompt_tokens_details for both Anthropic and Google Vertex AI models, covering both direct and streaming response scenarios.
Update the public API documentation for Anthropic and Google Vertex AI chat completion responses to clearly describe the new prompt_tokens_details field and its cached_tokens sub-field, including its purpose and expected values.

Tanka Poem ♫

Cache tokens now seen,
Usage clear, a new insight,
Models learn and grow.
Data flows, a scientist's joy,
Efficiency's sweet reward. 📊✨

Sequence Diagram

sequenceDiagram
    participant Client
    participant AnthropicChatComplete as Anthropic Chat Complete
    participant GoogleVertexAIChatComplete as Google Vertex AI Chat Complete
    participant AnthropicAPI as Anthropic API
    participant GoogleVertexAIApi as Google Vertex AI API

    Client->>AnthropicChatComplete: chatComplete(request)
    AnthropicChatComplete->>AnthropicAPI: POST /messages (payload)
    AnthropicAPI-->>AnthropicChatComplete: API Response (usage, output_tokens, cache_read_input_tokens)
    Note over AnthropicChatComplete: Calculate total_tokens, prompt_tokens, completion_tokens, and populate usage.prompt_tokens_details.cached_tokens
    AnthropicChatComplete-->>Client: Chat Completion Response (with usage.prompt_tokens_details.cached_tokens)

    Client->>GoogleVertexAIChatComplete: chatComplete(request)
    GoogleVertexAIChatComplete->>GoogleVertexAIApi: POST /generateContent (payload)
    GoogleVertexAIApi-->>GoogleVertexAIChatComplete: API Response (usage, output_tokens, cache_read_input_tokens)
    Note over GoogleVertexAIChatComplete: Calculate total_tokens, prompt_tokens, completion_tokens, and populate usage.prompt_tokens_details.cached_tokens
    GoogleVertexAIChatComplete-->>Client: Chat Completion Response (with usage.prompt_tokens_details.cached_tokens)

    Note over AnthropicChatComplete,GoogleVertexAIChatComplete: Also applies to streaming responses, where usage is updated per chunk.

matter-code-review

Added cache token details for Vertex AI and Anthropic models to improve usage tracking.

src/providers/google-vertex-ai/chatComplete.ts

…th vertex and gemini

matter-code-review

Added cache and audio token details to Vertex AI and Google models usage metadata

src/providers/google-vertex-ai/chatComplete.ts

matter-code-review · 2025-10-13T10:20:16Z

src/providers/google/chatComplete.ts

+        audio_tokens:
+          parsedChunk.usageMetadata?.candidatesTokensDetails?.reduce(
+            (acc, curr) => {
+              if (curr.modality === VERTEX_MODALITY.AUDIO)
+                return acc + curr.tokenCount;
+              return acc;
+            },
+            0
+          ),


🔴 Security Issue

Issue: Potential null dereference when accessing parsedChunk.usageMetadata?.candidatesTokensDetails without validating its existence.
Fix: Add explicit null check before destructuring.
Impact: Prevents runtime errors when usage data is missing.

Suggested change

audio_tokens:

parsedChunk.usageMetadata?.candidatesTokensDetails?.reduce(

(acc, curr) => {

if (curr.modality === VERTEX_MODALITY.AUDIO)

return acc + curr.tokenCount;

return acc;

},

0

),

audio_tokens:

(parsedChunk.usageMetadata?.candidatesTokensDetails ?? []).reduce(

(acc, curr) => {

if (curr?.modality === VERTEX_MODALITY.AUDIO)

return acc + (curr?.tokenCount ?? 0);

return acc;

},

0

),

src/providers/google/chatComplete.ts

narengogi · 2025-10-13T10:21:30Z

@miuosz I've replicated your changes in the google provider (vertex-ai is the hosted gcp service)
additionally I've added changes to support audio_tokens in request and response tokens

references:

gemini spec for usagemetadata: https://ai.google.dev/api/generate-content#UsageMetadata
openai spec: https://platform.openai.com/docs/api-reference/chat/object#chat/object-usage

…th vertex and gemini

matter-code-review

Fix null dereference in audio token calculation and improve type safety

matter-code-review · 2025-10-13T10:41:09Z

src/providers/google-vertex-ai/chatComplete.ts

+        audio_tokens: parsedChunk.usageMetadata?.promptTokensDetails?.reduce(
+          (acc, curr) => {
+            if (curr.modality === VERTEX_MODALITY.AUDIO)
+              return acc + curr.tokenCount;
+            return acc;
+          },
+          0
+        ),
      },
    };
  }


🔴 Security Issue

Issue: Potential null dereference when accessing parsedChunk.usageMetadata?.promptTokensDetails without validating its existence.
Fix: Add explicit null check before destructuring.
Impact: Prevents runtime errors when usage data is missing.

Suggested change

audio_tokens: parsedChunk.usageMetadata?.promptTokensDetails?.reduce(

(acc, curr) => {

if (curr.modality === VERTEX_MODALITY.AUDIO)

return acc + curr.tokenCount;

return acc;

},

0

),

},

};

}

audio_tokens: (parsedChunk.usageMetadata?.promptTokensDetails ?? []).reduce(

(

acc: number,

curr: { modality: VERTEX_MODALITY; tokenCount: number }

) => {

if (curr?.modality === VERTEX_MODALITY.AUDIO)

return acc + (curr?.tokenCount ?? 0);

return acc;

},

0

),

src/providers/google-vertex-ai/chatComplete.ts

matter-code-review

Add cache details to usage for Anthropic and Vertex AI models

matter-code-review · 2025-10-13T11:30:30Z

src/providers/anthropic/chatComplete.ts

          ...streamState.usage,
+          completion_tokens: parsedChunk.usage?.output_tokens,
          total_tokens: totalTokens,
+          prompt_tokens_details: {
+            cached_tokens: streamState.usage?.cache_read_input_tokens ?? 0,
+          },
        },


🟡 Code Quality Issue

Issue: Inconsistent handling of cache_read_input_tokens in usage object. In one place it's included in the spread operator, in another it's explicitly set.
Fix: Standardize the usage object construction to avoid redundancy.
Impact: Improves code maintainability and reduces potential confusion.

Suggested change

...streamState.usage,

completion_tokens: parsedChunk.usage?.output_tokens,

total_tokens: totalTokens,

prompt_tokens_details: {

cached_tokens: streamState.usage?.cache_read_input_tokens ?? 0,

},

},

usage: {

...streamState.usage,

completion_tokens: parsedChunk.usage?.output_tokens,

total_tokens: totalTokens,

prompt_tokens_details: {

cached_tokens: streamState.usage?.cache_read_input_tokens ?? 0,

},

...(shouldSendCacheUsage && {

cache_read_input_tokens: cache_read_input_tokens,

cache_creation_input_tokens: cache_creation_input_tokens,

}),

},

matter-code-review · 2025-10-13T11:30:30Z

src/providers/google-vertex-ai/chatComplete.ts

+          ...streamState.usage,
          completion_tokens: parsedChunk.usage?.output_tokens,
-          prompt_tokens: streamState.usage?.prompt_tokens,
-          total_tokens:
-            (streamState.usage?.prompt_tokens || 0) +
-            (parsedChunk.usage?.output_tokens || 0),
+          total_tokens: totalTokens,
+          prompt_tokens_details: {
+            cached_tokens: streamState.usage?.cache_read_input_tokens ?? 0,
+          },
        },


🟡 Code Quality Issue

Issue: Inconsistent handling of cache_read_input_tokens in usage object. In one place it's included in the spread operator, in another it's explicitly set.
Fix: Standardize the usage object construction to avoid redundancy.
Impact: Improves code maintainability and reduces potential confusion.

Suggested change

...streamState.usage,

completion_tokens: parsedChunk.usage?.output_tokens,

prompt_tokens: streamState.usage?.prompt_tokens,

total_tokens:

(streamState.usage?.prompt_tokens || 0) +

(parsedChunk.usage?.output_tokens || 0),

total_tokens: totalTokens,

prompt_tokens_details: {

cached_tokens: streamState.usage?.cache_read_input_tokens ?? 0,

},

},

usage: {

...streamState.usage,

completion_tokens: parsedChunk.usage?.output_tokens,

total_tokens: totalTokens,

prompt_tokens_details: {

cached_tokens: streamState.usage?.cache_read_input_tokens ?? 0,

},

...(shouldSendCacheUsage && {

cache_read_input_tokens: cache_read_input_tokens,

cache_creation_input_tokens: cache_creation_input_tokens,

}),

},

miuosz added 3 commits October 10, 2025 04:44

add cachedContentTokenCount to vertex usage metadata

73d0a4d

add support for vertex ai gemini cachedContentTokenCount

867f3d9

add support for anthropic vertex-ai cache

142b5bb

matter-code-review bot reviewed Oct 10, 2025

View reviewed changes

src/providers/google-vertex-ai/chatComplete.ts Show resolved Hide resolved

handle cache tokens for google as well and handle audio tokens for bo…

bcc756f

…th vertex and gemini

matter-code-review bot reviewed Oct 13, 2025

View reviewed changes

handle cache tokens for google as well and handle audio tokens for bo…

d5cb122

…th vertex and gemini

matter-code-review bot reviewed Oct 13, 2025

View reviewed changes

narengogi reviewed Oct 13, 2025

View reviewed changes

src/providers/google-vertex-ai/chatComplete.ts Show resolved Hide resolved

handle mapping for cache creation tokens in anthropic

6fe2837

matter-code-review bot reviewed Oct 13, 2025

View reviewed changes

narengogi approved these changes Oct 13, 2025

View reviewed changes

improvement: add cache details to usage for (vertex ai) gemini and anthropic models #1373

Are you sure you want to change the base?

improvement: add cache details to usage for (vertex ai) gemini and anthropic models #1373

Uh oh!

Conversation

miuosz commented Oct 10, 2025

Description

Motivation

1. claude-opus-4

2. gemini-2.5-flash

Type of Change

How Has This Been Tested?

Screenshots (if applicable)

Checklist

Related Issues

Uh oh!

matter-code-review bot commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary By MatterAI

🔄 What Changed

🔍 Impact of the Change

📁 Total Files Changed

🧪 Test Added

🔒Security Vulnerabilities

Motivation

Type of Change

How Has This Been Tested?

Screenshots (if applicable)

Checklist

Related Issues

Quality Recommendations

Tanka Poem ♫

Sequence Diagram

Uh oh!

matter-code-review bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

matter-code-review bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

matter-code-review bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

narengogi commented Oct 13, 2025

Uh oh!

matter-code-review bot left a comment

Choose a reason for hiding this comment

Uh oh!

matter-code-review bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

matter-code-review bot left a comment

Choose a reason for hiding this comment

Uh oh!

matter-code-review bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

matter-code-review bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

matter-code-review bot commented Oct 10, 2025 •

edited

Loading