-
Notifications
You must be signed in to change notification settings - Fork 761
improvement: add cache details to usage for (vertex ai) gemini and anthropic models #1373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
improvement: add cache details to usage for (vertex ai) gemini and anthropic models #1373
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added cache token details for Vertex AI and Anthropic models to improve usage tracking.
…th vertex and gemini
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added cache and audio token details to Vertex AI and Google models usage metadata
audio_tokens: | ||
parsedChunk.usageMetadata?.candidatesTokensDetails?.reduce( | ||
(acc, curr) => { | ||
if (curr.modality === VERTEX_MODALITY.AUDIO) | ||
return acc + curr.tokenCount; | ||
return acc; | ||
}, | ||
0 | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 Security Issue
Issue: Potential null dereference when accessing parsedChunk.usageMetadata?.candidatesTokensDetails
without validating its existence.
Fix: Add explicit null check before destructuring.
Impact: Prevents runtime errors when usage data is missing.
audio_tokens: | |
parsedChunk.usageMetadata?.candidatesTokensDetails?.reduce( | |
(acc, curr) => { | |
if (curr.modality === VERTEX_MODALITY.AUDIO) | |
return acc + curr.tokenCount; | |
return acc; | |
}, | |
0 | |
), | |
audio_tokens: | |
(parsedChunk.usageMetadata?.candidatesTokensDetails ?? []).reduce( | |
(acc, curr) => { | |
if (curr?.modality === VERTEX_MODALITY.AUDIO) | |
return acc + (curr?.tokenCount ?? 0); | |
return acc; | |
}, | |
0 | |
), |
@miuosz I've replicated your changes in the references:
|
…th vertex and gemini
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix null dereference in audio token calculation and improve type safety
audio_tokens: parsedChunk.usageMetadata?.promptTokensDetails?.reduce( | ||
(acc, curr) => { | ||
if (curr.modality === VERTEX_MODALITY.AUDIO) | ||
return acc + curr.tokenCount; | ||
return acc; | ||
}, | ||
0 | ||
), | ||
}, | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 Security Issue
Issue: Potential null dereference when accessing parsedChunk.usageMetadata?.promptTokensDetails
without validating its existence.
Fix: Add explicit null check before destructuring.
Impact: Prevents runtime errors when usage data is missing.
audio_tokens: parsedChunk.usageMetadata?.promptTokensDetails?.reduce( | |
(acc, curr) => { | |
if (curr.modality === VERTEX_MODALITY.AUDIO) | |
return acc + curr.tokenCount; | |
return acc; | |
}, | |
0 | |
), | |
}, | |
}; | |
} | |
audio_tokens: (parsedChunk.usageMetadata?.promptTokensDetails ?? []).reduce( | |
( | |
acc: number, | |
curr: { modality: VERTEX_MODALITY; tokenCount: number } | |
) => { | |
if (curr?.modality === VERTEX_MODALITY.AUDIO) | |
return acc + (curr?.tokenCount ?? 0); | |
return acc; | |
}, | |
0 | |
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add cache details to usage for Anthropic and Vertex AI models
...streamState.usage, | ||
completion_tokens: parsedChunk.usage?.output_tokens, | ||
total_tokens: totalTokens, | ||
prompt_tokens_details: { | ||
cached_tokens: streamState.usage?.cache_read_input_tokens ?? 0, | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Code Quality Issue
Issue: Inconsistent handling of cache_read_input_tokens
in usage object. In one place it's included in the spread operator, in another it's explicitly set.
Fix: Standardize the usage object construction to avoid redundancy.
Impact: Improves code maintainability and reduces potential confusion.
...streamState.usage, | |
completion_tokens: parsedChunk.usage?.output_tokens, | |
total_tokens: totalTokens, | |
prompt_tokens_details: { | |
cached_tokens: streamState.usage?.cache_read_input_tokens ?? 0, | |
}, | |
}, | |
usage: { | |
...streamState.usage, | |
completion_tokens: parsedChunk.usage?.output_tokens, | |
total_tokens: totalTokens, | |
prompt_tokens_details: { | |
cached_tokens: streamState.usage?.cache_read_input_tokens ?? 0, | |
}, | |
...(shouldSendCacheUsage && { | |
cache_read_input_tokens: cache_read_input_tokens, | |
cache_creation_input_tokens: cache_creation_input_tokens, | |
}), | |
}, |
...streamState.usage, | ||
completion_tokens: parsedChunk.usage?.output_tokens, | ||
prompt_tokens: streamState.usage?.prompt_tokens, | ||
total_tokens: | ||
(streamState.usage?.prompt_tokens || 0) + | ||
(parsedChunk.usage?.output_tokens || 0), | ||
total_tokens: totalTokens, | ||
prompt_tokens_details: { | ||
cached_tokens: streamState.usage?.cache_read_input_tokens ?? 0, | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Code Quality Issue
Issue: Inconsistent handling of cache_read_input_tokens
in usage object. In one place it's included in the spread operator, in another it's explicitly set.
Fix: Standardize the usage object construction to avoid redundancy.
Impact: Improves code maintainability and reduces potential confusion.
...streamState.usage, | |
completion_tokens: parsedChunk.usage?.output_tokens, | |
prompt_tokens: streamState.usage?.prompt_tokens, | |
total_tokens: | |
(streamState.usage?.prompt_tokens || 0) + | |
(parsedChunk.usage?.output_tokens || 0), | |
total_tokens: totalTokens, | |
prompt_tokens_details: { | |
cached_tokens: streamState.usage?.cache_read_input_tokens ?? 0, | |
}, | |
}, | |
usage: { | |
...streamState.usage, | |
completion_tokens: parsedChunk.usage?.output_tokens, | |
total_tokens: totalTokens, | |
prompt_tokens_details: { | |
cached_tokens: streamState.usage?.cache_read_input_tokens ?? 0, | |
}, | |
...(shouldSendCacheUsage && { | |
cache_read_input_tokens: cache_read_input_tokens, | |
cache_creation_input_tokens: cache_creation_input_tokens, | |
}), | |
}, |
Description
Motivation
Without proper
usage
details, it's impossible to accurately count tokens or estimate the cost.1. claude-opus-4
before
after
2. gemini-2.5-flash
before
after
Type of Change
How Has This Been Tested?
Screenshots (if applicable)
Checklist
Related Issues