Skip to content

Conversation

@andrewlock
Copy link
Member

@andrewlock andrewlock commented Sep 24, 2025

Summary of changes

Fixes flakiness in the TraceAnnotation tests

Reason for change

#7287 incidentally introduced flakiness in the test when the profiler is not available (i.e. only on Windows). We temporarily disabled the test in #7551 and #7552. This reinstates the test, and removes the flake.

Implementation details

The important thing is that we do a "real" instrumentation in the app, to insure that we send instrumentation telemetry. Without this, there's a race condition between us instrumenting our "own" HttpClient usages and the app ending.

In more detail, we see this flake because:

  • The integration telemetry is only sent when something is instrumented (or subsequently errors)
  • The delay in profiling introduced in [Profiler] Support Stable Config #7287 causes a delay in sending the calltarget definitions
    • We will address that delay in a separate PR
  • The telemetry thread runs in the background, and is already started before we P/Invoke into the profiler
  • The delay causes us to not instrument the HttpClient calls that the background telemetry thread makes initially - if we skip the P/Invoke, then they are instrumented at this point
  • The app shuts down, which causes another telemetry flush
  • At this point the HttpClient calls are instrumented, and the instrumentation details are collected, but now we're shutting down, so this data is never sent.

By forcing an instrumentation in the app, we bypass the race condition entirely.

Test coverage

Excluded the span from the tests so it's effectively the same. We don't care about the contents of that span (or whether the request passes or fails) - we just want to make sure we have instrumentation.

Other details

By forcing an instrumentation, which in turn forces integration telemetry to be sent
@andrewlock andrewlock added area:tests unit tests, integration tests type:flake-fix labels Sep 24, 2025
@andrewlock andrewlock requested review from a team as code owners September 24, 2025 10:58
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex Review: Here are some suggestions.

Reply with @codex fix comments to fix any unresolved comments.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

Comment on lines 112 to 119
HttpRequestMessage message = new HttpRequestMessage();
message.Method = HttpMethod.Get;

// trigger a "standard" instrumented integration to ensure that integration telemetry is always sent,
// even when trace annotations are disabled
message.RequestUri = new Uri("http://www.google.com");
var httpClient = new HttpClient();
using var response = await httpClient.SendAsync(message);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Remove external HTTP dependency or swallow failures

The sample now performs a real HttpClient.SendAsync call to http://www.google.com but any network failure will throw an exception and abort the sample process. Many CI environments block outbound traffic or resolve slowly, so this introduces a new source of test failures and long timeouts despite the comment that the request success doesn’t matter. Consider wrapping the call in a try/catch (and ignoring failures) or using a local stubbed handler so instrumentation telemetry is triggered without relying on external connectivity.

Useful? React with 👍 / 👎.

@dd-trace-dotnet-ci-bot
Copy link

dd-trace-dotnet-ci-bot bot commented Sep 24, 2025

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.8) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Bailout
    This PR (7554) - mean (72ms)  : 71, 73
     .   : milestone, 72,
    master - mean (72ms)  : 71, 73
     .   : milestone, 72,

    section Baseline
    This PR (7554) - mean (68ms)  : 66, 70
     .   : milestone, 68,
    master - mean (68ms)  : 66, 70
     .   : milestone, 68,

    section CallTarget+Inlining+NGEN
    This PR (7554) - mean (1,052ms)  : 994, 1110
     .   : milestone, 1052,
    master - mean (1,049ms)  : 988, 1110
     .   : milestone, 1049,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Bailout
    This PR (7554) - mean (106ms)  : 105, 108
     .   : milestone, 106,
    master - mean (106ms)  : 105, 108
     .   : milestone, 106,

    section Baseline
    This PR (7554) - mean (106ms)  : 103, 109
     .   : milestone, 106,
    master - mean (105ms)  : 103, 108
     .   : milestone, 105,

    section CallTarget+Inlining+NGEN
    This PR (7554) - mean (744ms)  : 723, 765
     .   : milestone, 744,
    master - mean (745ms)  : 716, 774
     .   : milestone, 745,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Bailout
    This PR (7554) - mean (94ms)  : 93, 95
     .   : milestone, 94,
    master - mean (94ms)  : 93, 95
     .   : milestone, 94,

    section Baseline
    This PR (7554) - mean (94ms)  : 91, 96
     .   : milestone, 94,
    master - mean (93ms)  : 91, 95
     .   : milestone, 93,

    section CallTarget+Inlining+NGEN
    This PR (7554) - mean (705ms)  : 685, 725
     .   : milestone, 705,
    master - mean (706ms)  : 677, 735
     .   : milestone, 706,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET 8) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Bailout
    This PR (7554) - mean (93ms)  : 92, 94
     .   : milestone, 93,
    master - mean (92ms)  : 91, 93
     .   : milestone, 92,

    section Baseline
    This PR (7554) - mean (92ms)  : 90, 94
     .   : milestone, 92,
    master - mean (92ms)  : 90, 94
     .   : milestone, 92,

    section CallTarget+Inlining+NGEN
    This PR (7554) - mean (658ms)  : 643, 673
     .   : milestone, 658,
    master - mean (664ms)  : 647, 680
     .   : milestone, 664,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.8) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Bailout
    This PR (7554) - mean (202ms)  : 197, 206
     .   : milestone, 202,
    master - mean (199ms)  : 197, 202
     .   : milestone, 199,

    section Baseline
    This PR (7554) - mean (198ms)  : 193, 203
     .   : milestone, 198,
    master - mean (196ms)  : 193, 199
     .   : milestone, 196,

    section CallTarget+Inlining+NGEN
    This PR (7554) - mean (1,181ms)  : 1127, 1235
     .   : milestone, 1181,
    master - mean (1,170ms)  : 1101, 1240
     .   : milestone, 1170,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Bailout
    This PR (7554) - mean (285ms)  : 279, 290
     .   : milestone, 285,
    master - mean (282ms)  : 275, 289
     .   : milestone, 282,

    section Baseline
    This PR (7554) - mean (285ms)  : 276, 295
     .   : milestone, 285,
    master - mean (281ms)  : 276, 286
     .   : milestone, 281,

    section CallTarget+Inlining+NGEN
    This PR (7554) - mean (951ms)  : 911, 992
     .   : milestone, 951,
    master - mean (945ms)  : 900, 989
     .   : milestone, 945,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Bailout
    This PR (7554) - mean (279ms)  : 274, 284
     .   : milestone, 279,
    master - mean (275ms)  : 269, 281
     .   : milestone, 275,

    section Baseline
    This PR (7554) - mean (279ms)  : 270, 288
     .   : milestone, 279,
    master - mean (275ms)  : 270, 279
     .   : milestone, 275,

    section CallTarget+Inlining+NGEN
    This PR (7554) - mean (948ms)  : 895, 1001
     .   : milestone, 948,
    master - mean (930ms)  : 895, 965
     .   : milestone, 930,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET 8) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Bailout
    This PR (7554) - mean (277ms)  : 271, 282
     .   : milestone, 277,
    master - mean (275ms)  : 265, 286
     .   : milestone, 275,

    section Baseline
    This PR (7554) - mean (276ms)  : 271, 280
     .   : milestone, 276,
    master - mean (273ms)  : 266, 281
     .   : milestone, 273,

    section CallTarget+Inlining+NGEN
    This PR (7554) - mean (868ms)  : 846, 891
     .   : milestone, 868,
    master - mean (861ms)  : 840, 883
     .   : milestone, 861,

Loading

Copy link
Collaborator

@bouwkast bouwkast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@andrewlock andrewlock merged commit 7955e93 into master Sep 24, 2025
153 checks passed
@andrewlock andrewlock deleted the andrew/fix-trace-annotations branch September 24, 2025 15:42
andrewlock added a commit that referenced this pull request Sep 24, 2025
… env var (#7555)

## Summary of changes

Update profiler availability helper algorithm on Windows

## Reason for change

- If the Continuous Profiler isn't available, the native loader does not
rewrite P/Invokes and does not set
`DD_INTERNAL_PROFILING_NATIVE_ENGINE_PATH`
- If it is available, the native loader sets the
`DD_INTERNAL_PROFILING_NATIVE_ENGINE_PATH` value
- Currently, the `ProfilerAvailabilityHelper` is treating the presence
of the variable as indication the profiler is available, but it's not
treating the _absence_ as an indication the profiler is _not_ available.
- This PR changes that to treat the
`DD_INTERNAL_PROFILING_NATIVE_ENGINE_PATH` variable as the deciding
factor on Windows

(Note that his variable is _not_ available today on non-Windows as it
doesn't propagate from the native side. We hope to fix that in the
future, but in the mean time we must rely on heuristics).

## Implementation details

Update the logic of the helper on Windows to only rely on the
`DD_INTERNAL_PROFILING_NATIVE_ENGINE_PATH` variable.

Note that I also removed the AAS Extension check, as that only runs on
Windows anyway, so would never return `true` (as it's caught in the
`IsWindows()` branch).

## Test coverage

Updated the unit tests with new behaviour. Basically we need to skip
more tests on Windows, as it's only governed by that flag

## Other details

Related to the following:
- #7554
- #7287
- #7552
@github-actions github-actions bot added this to the vNext-v3 milestone Sep 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:tests unit tests, integration tests type:flake-fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants