-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
fix(next): Wrap all Random APIs with a safe runner #18700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
fix(next): Wrap all Random APIs with a safe runner #18700
Conversation
size-limit report 📦
|
node-overhead report 🧳Note: This is a synthetic benchmark with a minimal express app and does not necessarily reflect the real-world performance impact in an application.
|
8138056 to
a5c2c86
Compare
| export function runInRandomSafeContext<T>(cb: () => T): T { | ||
| // Skips future symbol lookups if we've already resolved the runner once | ||
| if (RESOLVED_RUNNER) { | ||
| return RESOLVED_RUNNER(cb); | ||
| } | ||
|
|
||
| const sym = Symbol.for('__SENTRY_SAFE_RANDOM_ID_WRAPPER__'); | ||
| const globalWithSymbol: typeof GLOBAL_OBJ & { [sym]?: SafeRandomContextRunner } = GLOBAL_OBJ; | ||
| if (!(sym in globalWithSymbol) || typeof globalWithSymbol[sym] !== 'function') { | ||
| return cb(); | ||
| } | ||
|
|
||
| RESOLVED_RUNNER = globalWithSymbol[sym]; | ||
|
|
||
| return globalWithSymbol[sym](cb); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reviewer: I think we can reduce global symbol lookups overhead by instead having a setter API here, but that needs to be exposed and won't be a "secret" escape hatch anymore even if we prefix it with __INTERNAL.
would it be worth it? given that the random functions are used frequently.
Another concern is serverless runtimes, I think the current approach is safe against them because the lookup happens every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean setting the snapshot on some internal property directly in nextjs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, so something like this:
// core
export function setRandomSafeContextRunner(wrapper) {
// still sets on global but we don't need to lookup the symbol anymore
global['__sentry_rnd_runner'] = wrapper;
}Then the Next.js SDK can do:
import { _INTERNAL_setRandomSafeContextRunner } from '@sentry/core';
_INTERNAL_setRandomSafeContextRunner(AsyncLocalStorage.snapshot());The global symbol register should be fast tho, so maybe this isn't needed especially since I don't want users to be able to use this.
chargome
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline I'm really happy that this indeed works! 🚀
I generally wanted to avoid having to adapt any core logic to accommodate for cacheComponents, but we could revert once Vercel maybe ships an escape hatch.
| * Which will generate and set a trace id in the propagation context, which should trigger the random API error if unpatched | ||
| * See: https://github.com/getsentry/sentry-javascript/issues/18392 | ||
| */ | ||
| export function generateMetadata() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we expand this test case, or better duplicate it and run some async computation within generateMetadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, I will add some logic there
| export function runInRandomSafeContext<T>(cb: () => T): T { | ||
| // Skips future symbol lookups if we've already resolved the runner once | ||
| if (RESOLVED_RUNNER) { | ||
| return RESOLVED_RUNNER(cb); | ||
| } | ||
|
|
||
| const sym = Symbol.for('__SENTRY_SAFE_RANDOM_ID_WRAPPER__'); | ||
| const globalWithSymbol: typeof GLOBAL_OBJ & { [sym]?: SafeRandomContextRunner } = GLOBAL_OBJ; | ||
| if (!(sym in globalWithSymbol) || typeof globalWithSymbol[sym] !== 'function') { | ||
| return cb(); | ||
| } | ||
|
|
||
| RESOLVED_RUNNER = globalWithSymbol[sym]; | ||
|
|
||
| return globalWithSymbol[sym](cb); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean setting the snapshot on some internal property directly in nextjs?
c581f30 to
f379ed5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a safe runner mechanism to wrap random and time-based APIs (Date.now, Math.random, performance.now, crypto.*) to prevent errors in Next.js cache components. It adds a new ESLint rule to enforce consistent usage of these wrapped APIs across the codebase.
Key Changes:
- Implemented
runInRandomSafeContext()wrapper and safe helper functions (safeDateNow(),safeMathRandom()) in@sentry/core - Added Next.js-specific initialization that prepares AsyncLocalStorage snapshot for safe context execution
- Created ESLint rule
no-unsafe-random-apisto enforce usage of safe wrappers in designated packages - Updated all existing random/time API calls to use the safe wrappers across core, node, opentelemetry, and Next.js packages
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/utils/safeRandomGeneratorRunner.ts | Core implementation of safe runner mechanism with helper functions for Date.now() and Math.random() |
| packages/core/src/index.ts | Exports safe runner APIs with _INTERNAL prefix for use by other SDK packages |
| packages/core/src/utils/time.ts | Updates time-related functions to use runInRandomSafeContext wrapper |
| packages/core/src/utils/tracing.ts | Updates trace propagation to use safeMathRandom() |
| packages/core/src/scope.ts | Updates propagation context generation to use safeMathRandom() |
| packages/core/src/tracing/trace.ts | Updates trace initialization to use safeMathRandom() |
| packages/core/src/utils/misc.ts | Updates UUID generation to use safe wrappers |
| packages/core/src/utils/ratelimit.ts | Updates rate limiting to use safeDateNow() |
| packages/core/src/client.ts | Updates error sampling to use safeMathRandom() |
| packages/core/src/integrations/mcp-server/correlation.ts | Updates span storage to use safeDateNow() |
| packages/nextjs/src/server/prepareSafeIdGeneratorContext.ts | Next.js-specific implementation that prepares AsyncLocalStorage snapshot as safe context runner |
| packages/nextjs/src/server/index.ts | Calls prepareSafeIdGeneratorContext() at SDK initialization |
| packages/nextjs/src/common/pages-router-instrumentation/wrapApiHandlerWithSentryVercelCrons.ts | Updates cron monitoring to use _INTERNAL_safeDateNow() |
| packages/nextjs/src/config/withSentryConfig.ts | Adds eslint-disable comment for intentional Math.random() usage in tunnel route generation |
| packages/nextjs/src/config/polyfills/perf_hooks.js | Adds eslint-disable comment for polyfill code |
| packages/node-core/src/integrations/context.ts | Updates context integration to use _INTERNAL_safeDateNow() |
| packages/opentelemetry/src/spanExporter.ts | Updates span exporter timestamp handling to use _INTERNAL_safeDateNow() |
| packages/opentelemetry/src/sampler.ts | Updates sampling to use _INTERNAL_safeMathRandom() |
| packages/eslint-plugin-sdk/src/rules/no-unsafe-random-apis.js | New ESLint rule to enforce wrapping of random/time APIs |
| packages/eslint-plugin-sdk/src/index.js | Registers the new ESLint rule |
| packages/eslint-plugin-sdk/test/lib/rules/no-unsafe-random-apis.test.ts | Test suite for the new ESLint rule |
| packages/core/.eslintrc.js | Enables no-unsafe-random-apis rule with test file exceptions |
| packages/nextjs/.eslintrc.js | Enables no-unsafe-random-apis rule with test file exceptions |
| packages/node/.eslintrc.js | Enables no-unsafe-random-apis rule with test file exceptions |
| packages/node-core/.eslintrc.js | Enables no-unsafe-random-apis rule with test file exceptions |
| packages/opentelemetry/.eslintrc.js | Enables no-unsafe-random-apis rule with test file exceptions |
| packages/vercel-edge/.eslintrc.js | Enables no-unsafe-random-apis rule with test file exceptions |
| dev-packages/e2e-tests/test-applications/nextjs-16-cacheComponents/tests/cacheComponents.spec.ts | Adds E2E tests for metadata generation in cache components |
| dev-packages/e2e-tests/test-applications/nextjs-16-cacheComponents/app/metadata/page.tsx | Test page for synchronous metadata generation with cache components |
| dev-packages/e2e-tests/test-applications/nextjs-16-cacheComponents/app/metadata-async/page.tsx | Test page for asynchronous metadata generation with cache components |
| .size-limit.js | Updates bundle size limits to account for new safe runner code |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
packages/eslint-plugin-sdk/test/lib/rules/no-unsafe-random-apis.test.ts
Outdated
Show resolved
Hide resolved
02878ec to
f3c7b35
Compare
|
|
||
| /** Inits the Sentry NextJS SDK on node. */ | ||
| export function init(options: NodeOptions): NodeClient | undefined { | ||
| prepareSafeIdGeneratorContext(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edge runtime missing safe random context setup
Medium Severity
The server runtime calls prepareSafeIdGeneratorContext() at the start of init() to set up the safe random context for cache components, but the edge runtime's init() function does not make this call. Both runtimes export the cache-aware span methods from nextSpan.ts, suggesting cache components are supported in both. If AsyncLocalStorage.snapshot() is available in the Vercel Edge runtime, random API calls in cache component contexts would still trigger errors because the safe context isn't set up. The edge runtime should also attempt to call prepareSafeIdGeneratorContext(), which will gracefully handle cases where the required APIs aren't available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentional, we don't want to set that up there.
So the general approach seems sound but I worry about this kind of lint rule. Exiting out of the context to generate arbitrary random values is potentially wrong when prerendering with Cache Components in Next.js. The reason it makes sense for Span ids is that these ideally don't ever leak into the render context and thus cannot cause a prerender to observe anything non-deterministic. (They're side effects for telemetry only) This is actually not always true though b/c if you pass a Cache Function to startActiveSpan then the Span object will get passed into the Cache Function and you've just created a situation where the Cache Function will never be a cache hit b/c it always receives an argument with a random value. The best remediation for non-determinism is to guard it behind something async. Once you're past the first task of the prerender you can do all the non-deterministic stuff you want. By linting to ensure you bypass the deterministic check for sync APIs you may create a situation in the future where someone without understanding the full implications adds something that does leak into the render context or flows into a Cache Function and breaks something. The idea with erroring in these cases is it allows you to respond before you ship something broken to production but if you disable the ability to error by exiting the ALS it'll be much harder to detect that this has happened. I was originally thinking only the Span creation would be exempted. But it seems there are things in here for event processing and more. Are these all necessary? |
|
@gnoff Thanks for taking a look!
I thought it would be hard for future maintainers to know when they should exit the ALS as they may not be even working on Next.js related stuff.
Thanks for the explanation! I suppose then a better approach is to NOT patch the random value call sites like I did here and instead wrap the APIs that do need them. Like span ID, trace ID, timestamps for logs and spans, etc. Even if it is duplicated around the codebase, since in those cases we know for a fact it won't be used for a side-effect. Would that be a better approach? It would require implicit understanding how a certain API interacts with Next.js but would better match the intention of wrapping the non-side effect-y APIs.
I can revert them one by one to see which ones are necessary, from my tests I needed to patch only those initially:
It was a cat and mouse game, every time I patched one call, there was another which prompted me to see if eslint can catch those. I will try reverting and keep only the minimal changes and we can give it another look to see if it makes sense instead of sprinkling the wrapper everywhere. |
|
|
||
| if (!traceparentData?.traceId) { | ||
| return { | ||
| traceId: generateTraceId(), | ||
| sampleRand: Math.random(), | ||
| traceId: withRandomSafeContext(generateTraceId), | ||
| sampleRand: withRandomSafeContext(() => Math.random()), | ||
| }; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Unwrapped Math.random() calls in getSampleRandFromTraceparentAndDsc will crash Next.js cache components during distributed tracing, as this function is called when a sentry-trace header is present.
Severity: CRITICAL | Confidence: High
🔍 Detailed Analysis
The function getSampleRandFromTraceparentAndDsc contains unwrapped calls to Math.random(). This function is invoked by propagationContextFromHeaders when an incoming request includes a sentry-trace header, a common scenario for distributed tracing. In a Next.js application, if this request processing occurs within a cache component context (e.g., during generateMetadata), the use of the unwrapped Math.random() will violate Next.js's restrictions and cause the application to crash. This oversight undermines the PR's goal of making random API calls safe in such environments.
💡 Suggested Fix
Wrap the Math.random() calls within the getSampleRandFromTraceparentAndDsc function using the withRandomSafeContext utility. This will ensure that random number generation is compatible with Next.js cache component restrictions.
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: packages/core/src/utils/tracing.ts#L65-L72
Potential issue: The function `getSampleRandFromTraceparentAndDsc` contains unwrapped
calls to `Math.random()`. This function is invoked by `propagationContextFromHeaders`
when an incoming request includes a `sentry-trace` header, a common scenario for
distributed tracing. In a Next.js application, if this request processing occurs within
a cache component context (e.g., during `generateMetadata`), the use of the unwrapped
`Math.random()` will violate Next.js's restrictions and cause the application to crash.
This oversight undermines the PR's goal of making random API calls safe in such
environments.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 8300189
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this fixes it 🤦♂️
Currently the cache components feature in Next.js prevents us from using any random value APIs like:
Date.nowperformance.nowMath.randomcrypto.*We tried resolving this by patching several span methods, but then we have plenty of other instances where we use those APIs, like in trace propagation, timestamp generation for logs, and more.
Running around and patching them one by one in the Next.js SDK isn't a viable solution since most of those functionalities are strictly internal and cannot be patched from the outside, and adding escape hatches for each of them is not maintainable.
So I'm testing out the other way around, by hunting those APIs down and wrapping them with a safe runner that acts as an escape hatch. Some of the Vercel engineers suggested doing that, but we need to do it for
almost every call(see Josh comment below).The idea is an SDK can "turn on" the safe runner by injecting a global function that executes a callback and returns its results. I
How does this fix it for Next.js?
The Next.js SDK case, a safe runner would be an
AsyncLocalStoragesnapshot which is captured at the server runtime init, way before any rendering is done.I kept the API internal as much as possible to avoid users messing up with it, but the
@sentry/opentelemetrySDK also needed this functionality so I exported the API with_INTERNALprefix as we already do.I tested this in a simple Next.js app and it no longer errors out, and all current tests pass. I still need to take a look at the traces and see how would they look in cached component cases.
Charly is already working on this and may have a proper solution, but I thought to just see if we can ship a stopgap until then.
On the bright side, this seems to fix it as well for Webpack.
closes #18392
closes #18340