Skip to content

Conversation

@steebchen
Copy link
Member

@steebchen steebchen commented Oct 26, 2025

Summary

  • Tracks and exposes cached input tokens (cachedContentTokenCount) for Google model usage
  • Propagates cached tokens through extraction, parsing, and payload transformation

Changes

Token extraction

  • Updated extract-token-usage.ts to read usageMetadata.cachedContentTokenCount and store it as cachedTokens

Provider response parsing

  • Updated parse-provider-response.ts to read json.usageMetadata.cachedContentTokenCount and store it as cachedTokens

Streaming payload transformation

  • Updated transform-streaming-to-openai.ts to include prompt_tokens_details.cached_tokens when a cached token count is present, across all relevant payload branches

Rationale

  • Improves accuracy of token accounting by including cached content tokens from Google models

Test plan

  • Verify that when Google's model returns cachedContentTokenCount, it is captured in token usage stats
  • Verify OpenAI payload includes prompt_tokens_details.cached_tokens when available
  • Ensure behavior remains unchanged when cachedContentTokenCount is absent

🌿 Generated by Terry


ℹ️ Tag @terragon-labs to ask questions and address PR feedback

📎 Task: https://www.terragonlabs.com/task/637c3997-7ae3-43a6-9fdf-e95d69b2bfeb

Summary by CodeRabbit

New Features

  • Token usage tracking now includes cached content token counts from Google and Vertex AI providers, exposing cached token information alongside existing prompt and total token counts in usage metrics.

- Extract and parse cachedContentTokenCount from usageMetadata
- Include cached_tokens details in transformStreamingToOpenai output

This enables detailed tracking of cached content tokens for better usage analytics.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
@bunnyshell
Copy link

bunnyshell bot commented Oct 26, 2025

❗ Preview Environment deployment failed on Bunnyshell

See: Environment Details | Pipeline Logs

Available commands (reply to this comment):

  • 🚀 /bns:deploy to redeploy the environment
  • /bns:delete to remove the environment

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 26, 2025

Walkthrough

The pull request extends token usage tracking across three token-handling modules to capture cachedContentTokenCount from Google and Vertex AI provider responses, making cached token metrics available in token usage metadata and OpenAI-formatted streaming responses.

Changes

Cohort / File(s) Summary
Google cached token count support
apps/gateway/src/chat/tools/extract-token-usage.ts, parse-provider-response.ts, transform-streaming-to-openai.ts
Adds extraction of cachedContentTokenCount from Google provider metadata and surfaces it as cachedTokens in token usage structures. In transform-streaming-to-openai.ts, cached token details are conditionally embedded into usage.tokens.prompt_tokens_details.cached_tokens when the field is present in usage metadata.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

  • Changes follow a consistent, homogeneous pattern of extracting and passing through a new field across the token pipeline
  • All modifications are additive and non-breaking, gated on field existence
  • Low logic density with straightforward data flow transformations

Possibly related PRs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "feat(google): track and expose cached input tokens" directly and accurately summarizes the main change across all three modified files. The changes consistently focus on capturing cached token counts (cachedContentTokenCount) from Google's responses and propagating this data through the token extraction, parsing, and streaming transformation layers. The title is concise, uses conventional commit format, includes the relevant provider scope (Google), and clearly conveys the primary objective without vagueness or unnecessary details.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch terragon/fix-google-model-cached-tokens-71qih2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot changed the title Track and expose cached input tokens for Google model feat(google): track and expose cached input tokens Oct 26, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 09f8a8f and 4fe1e56.

📒 Files selected for processing (3)
  • apps/gateway/src/chat/tools/extract-token-usage.ts (1 hunks)
  • apps/gateway/src/chat/tools/parse-provider-response.ts (1 hunks)
  • apps/gateway/src/chat/tools/transform-streaming-to-openai.ts (3 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Always use top-level import; never use require() or dynamic import()

Files:

  • apps/gateway/src/chat/tools/parse-provider-response.ts
  • apps/gateway/src/chat/tools/extract-token-usage.ts
  • apps/gateway/src/chat/tools/transform-streaming-to-openai.ts
apps/{gateway,api}/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

apps/{gateway,api}/**/*.ts: Use Hono for HTTP routing in Gateway and API services
Use Zod schemas for request/response validation in server routes

Files:

  • apps/gateway/src/chat/tools/parse-provider-response.ts
  • apps/gateway/src/chat/tools/extract-token-usage.ts
  • apps/gateway/src/chat/tools/transform-streaming-to-openai.ts
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Never use any or as any in this TypeScript project unless absolutely necessary
Always use top-level import; do not use require or dynamic import()

Files:

  • apps/gateway/src/chat/tools/parse-provider-response.ts
  • apps/gateway/src/chat/tools/extract-token-usage.ts
  • apps/gateway/src/chat/tools/transform-streaming-to-openai.ts
{apps/{api,gateway}/**/*.ts,packages/db/**/*.ts}

📄 CodeRabbit inference engine (CLAUDE.md)

For read operations, use db().query.<table>.findMany() or db().query.<table>.findFirst()

Files:

  • apps/gateway/src/chat/tools/parse-provider-response.ts
  • apps/gateway/src/chat/tools/extract-token-usage.ts
  • apps/gateway/src/chat/tools/transform-streaming-to-openai.ts
🔇 Additional comments (2)
apps/gateway/src/chat/tools/extract-token-usage.ts (1)

27-27: LGTM! Correct null coalescing operator used.

The use of ?? null correctly preserves 0 as a valid cached token count, only defaulting to null when the field is undefined or null. This is consistent with the surrounding token extraction logic (lines 23, 24, 26).

apps/gateway/src/chat/tools/transform-streaming-to-openai.ts (1)

299-303: LGTM! Consistent pattern for optional token details.

The conditional spreading correctly follows the existing pattern used for reasoning_tokens (lines 296-298, 350-352, 390-392), where optional fields are only included when present and non-zero. This maintains backward compatibility and aligns with OpenAI's convention of omitting optional fields when not applicable.

The identical block appears in three streaming contexts (content delta, finish reason, and fallback), which is appropriate given the different scenarios being handled.

Also applies to: 353-357, 393-397

promptTokens = json.usageMetadata?.promptTokenCount || null;
completionTokens = json.usageMetadata?.candidatesTokenCount || null;
reasoningTokens = json.usageMetadata?.thoughtsTokenCount || null;
cachedTokens = json.usageMetadata?.cachedContentTokenCount || null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent nullish handling - use ?? null instead of || null.

The || null operator treats 0 as falsy and converts it to null, which is incorrect for token counts. A cached token count of 0 (no cached tokens) is distinct from null (not available) and should be preserved for accurate token accounting.

In extract-token-usage.ts line 27, the same field extraction correctly uses ?? null. Consider aligning this file with that pattern.

Apply this diff:

-			cachedTokens = json.usageMetadata?.cachedContentTokenCount || null;
+			cachedTokens = json.usageMetadata?.cachedContentTokenCount ?? null;

Note: This same pattern appears throughout this file (lines 181-183, 215-218, etc.), so consider addressing it more broadly for consistency with extract-token-usage.ts.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cachedTokens = json.usageMetadata?.cachedContentTokenCount || null;
cachedTokens = json.usageMetadata?.cachedContentTokenCount ?? null;
🤖 Prompt for AI Agents
In apps/gateway/src/chat/tools/parse-provider-response.ts around line 184 (and
similarly lines ~181-183, 215-218), the code uses `|| null` which converts valid
zero counts to null; change these uses to the nullish coalescing operator `??
null` so that 0 is preserved while undefined/null become null, and scan the file
to replace other occurrences of `|| null` for token-count or usage fields to
match the pattern used in extract-token-usage.ts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants