-
Notifications
You must be signed in to change notification settings - Fork 150
[OTEL] Metrics API Support - Metrics Reader & OTLP Exporter #7514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing the following branches/commits: Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
dateFormat X
axisFormat %s
todayMarker off
section Bailout
This PR (7514) - mean (72ms) : 71, 73
. : milestone, 72,
master - mean (72ms) : 71, 73
. : milestone, 72,
section Baseline
This PR (7514) - mean (68ms) : 66, 71
. : milestone, 68,
master - mean (68ms) : 67, 69
. : milestone, 68,
section CallTarget+Inlining+NGEN
This PR (7514) - mean (1,049ms) : 1010, 1089
. : milestone, 1049,
master - mean (1,054ms) : 1000, 1108
. : milestone, 1054,
gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat X
axisFormat %s
todayMarker off
section Bailout
This PR (7514) - mean (107ms) : 105, 108
. : milestone, 107,
master - mean (106ms) : 105, 108
. : milestone, 106,
section Baseline
This PR (7514) - mean (106ms) : 103, 109
. : milestone, 106,
master - mean (106ms) : 104, 108
. : milestone, 106,
section CallTarget+Inlining+NGEN
This PR (7514) - mean (745ms) : 722, 769
. : milestone, 745,
master - mean (746ms) : 724, 769
. : milestone, 746,
gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat X
axisFormat %s
todayMarker off
section Bailout
This PR (7514) - mean (101ms) : 100, 102
. : milestone, 101,
master - mean (101ms) : 100, 101
. : milestone, 101,
section Baseline
This PR (7514) - mean (100ms) : 98, 102
. : milestone, 100,
master - mean (100ms) : 98, 102
. : milestone, 100,
section CallTarget+Inlining+NGEN
This PR (7514) - mean (776ms) : 730, 822
. : milestone, 776,
master - mean (775ms) : 727, 822
. : milestone, 775,
gantt
title Execution time (ms) FakeDbCommand (.NET 8)
dateFormat X
axisFormat %s
todayMarker off
section Bailout
This PR (7514) - mean (93ms) : 91, 94
. : milestone, 93,
master - mean (93ms) : 92, 94
. : milestone, 93,
section Baseline
This PR (7514) - mean (92ms) : 90, 95
. : milestone, 92,
master - mean (92ms) : 90, 94
. : milestone, 92,
section CallTarget+Inlining+NGEN
This PR (7514) - mean (656ms) : 642, 670
. : milestone, 656,
master - mean (664ms) : 648, 679
. : milestone, 664,
gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
dateFormat X
axisFormat %s
todayMarker off
section Bailout
This PR (7514) - mean (198ms) : 194, 202
. : milestone, 198,
master - mean (200ms) : 196, 205
. : milestone, 200,
section Baseline
This PR (7514) - mean (193ms) : 189, 196
. : milestone, 193,
master - mean (196ms) : 190, 202
. : milestone, 196,
section CallTarget+Inlining+NGEN
This PR (7514) - mean (1,170ms) : 1096, 1244
. : milestone, 1170,
master - mean (1,197ms) : 1120, 1273
. : milestone, 1197,
gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat X
axisFormat %s
todayMarker off
section Bailout
This PR (7514) - mean (279ms) : 274, 284
. : milestone, 279,
master - mean (282ms) : 276, 287
. : milestone, 282,
section Baseline
This PR (7514) - mean (278ms) : 273, 283
. : milestone, 278,
master - mean (281ms) : 275, 287
. : milestone, 281,
section CallTarget+Inlining+NGEN
This PR (7514) - mean (933ms) : 899, 967
. : milestone, 933,
master - mean (948ms) : 904, 992
. : milestone, 948,
gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat X
axisFormat %s
todayMarker off
section Bailout
This PR (7514) - mean (281ms) : 277, 286
. : milestone, 281,
master - mean (285ms) : 280, 291
. : milestone, 285,
section Baseline
This PR (7514) - mean (281ms) : 275, 287
. : milestone, 281,
master - mean (286ms) : 279, 294
. : milestone, 286,
section CallTarget+Inlining+NGEN
This PR (7514) - mean (985ms) : 933, 1037
. : milestone, 985,
master - mean (1,008ms) : 965, 1051
. : milestone, 1008,
gantt
title Execution time (ms) HttpMessageHandler (.NET 8)
dateFormat X
axisFormat %s
todayMarker off
section Bailout
This PR (7514) - mean (269ms) : 264, 274
. : milestone, 269,
master - mean (276ms) : 266, 286
. : milestone, 276,
section Baseline
This PR (7514) - mean (269ms) : 265, 273
. : milestone, 269,
master - mean (275ms) : 269, 281
. : milestone, 275,
section CallTarget+Inlining+NGEN
This PR (7514) - mean (847ms) : 826, 867
. : milestone, 847,
master - mean (869ms) : 841, 898
. : milestone, 869,
|
…trics-api-exporter
…trics-api-exporter
…trics-api-exporter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just leave a file-level comment to make sure it doesn't get lost, but as a follow-up task we should vendor this code so that it's easy to update how we emit protobuf if we need to update to a newer proto file to implement. In general it seems like we're creating a bunch of intermediate byte arrays which could be optimized away, but it's not worth spending time on this if we're going to change the code soon. Otherwise, as long as we are able to emit correct OTLP, this is good for this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last overall comment I have is that we should remove the gRPC emission from this PR and focus on the http/protobuf serialization, but nothing else major from me!
8a89f6a to
57b1eb5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! We've got some follow-up items so let's make sure to track all of them, and please make sure system-tests for the http/protobuf exporter tests pass.
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting
| private async Task<bool> SendWithRetry(HttpRequestMessage httpRequest) | ||
| { | ||
| const int maxRetries = 3; | ||
| var retryDelay = TimeSpan.FromMilliseconds(100); | ||
|
|
||
| for (var attempt = 0; attempt <= maxRetries; attempt++) | ||
| { | ||
| try | ||
| { | ||
| using var cts = new CancellationTokenSource(TimeSpan.FromMilliseconds(_timeoutMs)); | ||
| var response = await _httpClient.SendAsync(httpRequest, cts.Token).ConfigureAwait(false); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recreate requests between retries
The retry loop reuses the same HttpRequestMessage instance across attempts. After _httpClient.SendAsync sends a request once, the framework throws InvalidOperationException: The request message was already sent on any subsequent call, so none of the configured retries ever execute and transient 5xx/429 failures immediately fail the export. Cloning the request or rebuilding it inside the loop is required for the retry policy to work.
Useful? React with 👍 / 👎.
| switch (_identity.InstrumentType) | ||
| { | ||
| case InstrumentType.Counter: | ||
| case InstrumentType.ObservableCounter: | ||
| case InstrumentType.UpDownCounter: | ||
| case InstrumentType.ObservableUpDownCounter: | ||
| point.UpdateCounter(value); | ||
| break; | ||
| case InstrumentType.ObservableCounter: | ||
| case InstrumentType.ObservableUpDownCounter: | ||
| point.UpdateObservableCounter(value); | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delta exports for ObservableCounter report cumulative totals
For asynchronous counters the measurement callback writes the current cumulative value into _runningDoubleValue, and CreateSnapshotAndReset() zeroes that value whenever the temporality is delta (the default). Because no previous value is preserved, each export emits the full cumulative reading rather than the increment since the last export (e.g., 120 is exported instead of 20 when a counter rises from 100 to 120). This doubles counts every interval for any ObservableCounter<T> when OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE=delta.
Useful? React with 👍 / 👎.
| if (metric.InstrumentType is InstrumentType.Counter | ||
| or InstrumentType.UpDownCounter | ||
| or InstrumentType.ObservableCounter | ||
| or InstrumentType.ObservableUpDownCounter) | ||
| { | ||
| WriteTag(writer, FieldNumbers.NumberDataPointAsInt, Fixed64); | ||
| writer.Write((long)metric.SnapshotSum); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double counters are truncated to integers during OTLP serialization
When serializing sum metrics the code always writes counters to the protobuf as_int field and casts SnapshotSum to long. Any counter based on double (e.g., instruments incremented by 0.1 or 1.5) loses its fractional component, and large values can overflow the 64‑bit cast. The serializer should choose NumberDataPointAsDouble whenever the metric was recorded as a floating‑point value or, at minimum, avoid casting away precision.
Useful? React with 👍 / 👎.
Reverts #7582 This is preventing us to run the action as expected as the regex appears to be incorrect. Since we must trigger this ourselves there really isn't a risk here at the moment, will let the SINT team follow up to resolve. This is blocking the testing of #7514 https://dev.azure.com/datadoghq/dd-trace-dotnet/_build/results?buildId=188595&view=logs&j=8299a7f3-61e2-5626-a12e-85b7b20deab1&t=e26f8d61-df24-5d93-564c-322f7221200d
ee91524 to
2c59b2c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments but LGTM overall
Note that I didn't inspect the snapshots very closely as I'm not super sure on what they should be 😅
tracer/test/Datadog.Trace.ClrProfiler.IntegrationTests/OpenTelemetrySdkTests.cs
Show resolved
Hide resolved
…/dd-trace-dotnet into maximo/otel-metrics-api-exporter
Summary of changes
This PR implements the OTLP Metrics Exporter for OpenTelemetry Metrics API support, completing the final piece of the RFC implementation. Key additions include:
MetricExporterabstract class andOtlpExporterconcrete implementation initially with onlyHTTP/protobuftransport support ATM.metrics_export_attempts,metrics_export_successes,metrics_export_failures) with protocol tagThis completes the series of 3 PRs for OpenTelemetry Metrics API support.
Reason for change
The [RFC] OpenTelemetry Metrics API support requires the .NET tracer to act as a drop-in replacement for the OpenTelemetry SDK. This PR implements the OTLP export functionality, enabling customers to migrate from OTel SDK to Datadog SDK with zero friction.
Business Impact: Enables customers using OpenTelemetry Metrics API to switch to Datadog without code changes, supporting our strategic initiative to be OTel-native.
Implementation details
✅ OTLP Metrics Exporter Components:
✅ Key Features:
✅ Sample App Enhancements:
Test coverage
Added comprehensive integration test
SubmitsOtlpMetricsthat:NOTE: Hoping to add Grpc and HTTP/Json tests along with the current
SubmitsOtlpMetricsprotobuf one and also benchmarks if possible to compare OTEL vs Datadog overhead.Other details
Jira: APMAPI-1573