Skip to content

Conversation

@ArthurSens
Copy link
Member

@ArthurSens ArthurSens commented May 13, 2025

Description

This PR is a PoC for the spec change described at open-telemetry/opentelemetry-specification#4223.

In the specification change, it's proposed that Prometheus exporters stop generating the metric otel_scope_info with all the scope attributes and instead add all scope information as labels, turning the Scope information "identifying".

For the receiver, this means that Scope should no longer be populated from the otel_scope_info but should instead be looked for metrics with the same labels prefixed with otel_scope_.

For users who still haven't updated to newer versions of the SDK that adhere to the spec change, ignoring the otel_scope_info metric would be a breaking change from this receiver's perspective. For that reason, this change is being made through feature flags.

If an exporter exposes otel_scope_info and the feature gate receiver.prometheusreceiver.RemoteScopeInfo is enabled, then otel_scope_info will be transformed into a metric like all others instead of being cached and used to generate ScopeMetrics information

Fixes #41502

@github-actions github-actions bot added the receiver/prometheus Prometheus receiver label May 13, 2025
@ArthurSens ArthurSens force-pushed the spec#4223-prom-receiver branch 2 times, most recently from 82b3f12 to 85a566c Compare May 24, 2025 18:16
@ArthurSens ArthurSens marked this pull request as ready for review May 25, 2025 12:36
@ArthurSens ArthurSens requested a review from a team as a code owner May 25, 2025 12:36
@ArthurSens
Copy link
Member Author

PoC ready for review!

@ArthurSens ArthurSens changed the title [wip]receiver/prometheus: Populate scope attributes from labels instead of otel_scope_info receiver/prometheus: Populate scope attributes from labels instead of otel_scope_info May 25, 2025
@pellared
Copy link
Member

Has it been tested together with open-telemetry/opentelemetry-go#5947?

This is what I guess from: open-telemetry/opentelemetry-go#5947 (comment)

CC @jade-guiton-dd

@jade-guiton-dd
Copy link
Contributor

@pellared What I tested was that the Go SDK's Prometheus exporter runs properly when the Collector emits internal metrics that only differ in their scope attributes. I haven't tested Contrib's receiver.

@ArthurSens
Copy link
Member Author

I'll run some tests today with the full pipeline: Go-SDK -> Prometheus Receiver -> Prometheus Exporter/ Debug Exporter

@ArthurSens
Copy link
Member Author

ArthurSens commented May 26, 2025

Manual tests are showing that I broke the current behavior that extracts attributes from otel_scope_info metrics and when the feature gate is enabled, I'm getting nil pointers xD

Back to draft

@ArthurSens ArthurSens marked this pull request as draft May 26, 2025 15:10
@ArthurSens ArthurSens force-pushed the spec#4223-prom-receiver branch from 176be90 to 9faa9a7 Compare May 30, 2025 00:07
@ArthurSens
Copy link
Member Author

Still working on this and still strugging to fix it. I've changed the test strategy to something that tests more end2end, and this one triggers the nilpointers I'm seeing when doing manual tests with the whole pipeline go-sdk->prom-receiver->debug exporter

github-merge-queue bot pushed a commit to open-telemetry/opentelemetry-specification that referenced this pull request Jun 5, 2025
Fixes
#4223

Prototypes:
- open-telemetry/opentelemetry-go#5947
- open-telemetry/opentelemetry-go#6770
- open-telemetry/opentelemetry-java#7356
-
open-telemetry/opentelemetry-collector-contrib#40060
-
open-telemetry/opentelemetry-collector-contrib#40004



## Changes

Currently (before this PR) [Prometheus and OpenMetrics
Compatibility](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/compatibility/prometheus_and_openmetrics.md)
assumes that only scope name and scope version are identifying.

However, with
#4161
this is no longer true.

Therefore, this PR updates the Prometheus and OpenMetrics Compatibility
specification to add the scope name, version, schema URL, scope
attributes to all metrics.

This also removes the `otel_scope_info` as it looks that it won't be
useful. See:
#4223 (comment).

This change important for Collector
open-telemetry/opentelemetry-go#5846 (comment).
It is also is necessary towards stabilization of OTel-Prom/OpenMetrics
compatibility) and the Prometheus exporter.

_Initially, I thought about [splitting it into a few
PRs](#4223 (comment)).
However, it looks like doing it in one PR would be a more complete
approach (also there are not that many changes)._

---------

Co-authored-by: Jade Guiton <[email protected]>
Co-authored-by: Carlos Alberto Cortez <[email protected]>
github-merge-queue bot pushed a commit to open-telemetry/opentelemetry-specification that referenced this pull request Jun 5, 2025
Fixes
#4223

Prototypes:
- open-telemetry/opentelemetry-go#5947
- open-telemetry/opentelemetry-go#6770
- open-telemetry/opentelemetry-java#7356
-
open-telemetry/opentelemetry-collector-contrib#40060
-
open-telemetry/opentelemetry-collector-contrib#40004



## Changes

Currently (before this PR) [Prometheus and OpenMetrics
Compatibility](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/compatibility/prometheus_and_openmetrics.md)
assumes that only scope name and scope version are identifying.

However, with
#4161
this is no longer true.

Therefore, this PR updates the Prometheus and OpenMetrics Compatibility
specification to add the scope name, version, schema URL, scope
attributes to all metrics.

This also removes the `otel_scope_info` as it looks that it won't be
useful. See:
#4223 (comment).

This change important for Collector
open-telemetry/opentelemetry-go#5846 (comment).
It is also is necessary towards stabilization of OTel-Prom/OpenMetrics
compatibility) and the Prometheus exporter.

_Initially, I thought about [splitting it into a few
PRs](#4223 (comment)).
However, it looks like doing it in one PR would be a more complete
approach (also there are not that many changes)._

---------

Co-authored-by: Jade Guiton <[email protected]>
Co-authored-by: Carlos Alberto Cortez <[email protected]>
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jun 13, 2025
@github-actions
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot added the Stale label Aug 1, 2025
@ArthurSens ArthurSens force-pushed the spec#4223-prom-receiver branch from 9faa9a7 to f9d9614 Compare August 12, 2025 00:01
@ArthurSens
Copy link
Member Author

ArthurSens commented Aug 12, 2025

Ok, I finally found some time to get back to this. I ended up fixing the previous problems by introducing an interface. I was afraid this would introduce some performance penalty, so I implemented a benchmark before proceeding. CPU profiles show increase in runtime operations from ~20s to ~22 between benchmarks. Allocations are going mostly to the new function getScopeIdentifier and for hashing the scope key.

goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver/internal
cpu: Apple M2 Pro
                                       │ internal/bench-main.txt │         benchmark-new.txt          │
                                       │         sec/op          │   sec/op     vs base               │
TransactionAppendAndCommit/Floats-2                  2.668µ ± 2%   2.872µ ± 2%   +7.63% (p=0.002 n=6)
TransactionAppendAndCommit/Histogram-2               6.550µ ± 1%   7.263µ ± 2%  +10.88% (p=0.002 n=6)
geomean                                              4.181µ        4.567µ        +9.24%

                                       │ internal/bench-main.txt │         benchmark-new.txt          │
                                       │          B/op           │     B/op      vs base              │
TransactionAppendAndCommit/Floats-2                 2.410Ki ± 0%   2.555Ki ± 0%  +6.00% (p=0.002 n=6)
TransactionAppendAndCommit/Histogram-2              5.465Ki ± 0%   5.918Ki ± 0%  +8.29% (p=0.002 n=6)
geomean                                             3.629Ki        3.888Ki       +7.14%

                                       │ internal/bench-main.txt │         benchmark-new.txt         │
                                       │        allocs/op        │ allocs/op   vs base               │
TransactionAppendAndCommit/Floats-2                   22.00 ± 0%   26.00 ± 0%  +18.18% (p=0.002 n=6)
TransactionAppendAndCommit/Histogram-2                48.00 ± 0%   64.00 ± 0%  +33.33% (p=0.002 n=6)
geomean                                               32.50        40.79       +25.53%

image image

Not sure if we're in the position to care too much about the performance, but please let me know if I could be taking another approach!

@ArthurSens ArthurSens marked this pull request as ready for review August 12, 2025 00:12
@ArthurSens ArthurSens removed the Stale label Aug 12, 2025
@ArthurSens ArthurSens force-pushed the spec#4223-prom-receiver branch from f9d9614 to 3f8b8cc Compare August 12, 2025 13:59
@dashpole
Copy link
Contributor

I think we should proceed with whatever you have working, and open an issue with your profiles, etc to look into optimizing that code path.

@ArthurSens ArthurSens force-pushed the spec#4223-prom-receiver branch 2 times, most recently from c721b63 to f40b998 Compare August 12, 2025 15:47
@ArthurSens ArthurSens force-pushed the spec#4223-prom-receiver branch from 0b034b8 to f49d021 Compare August 13, 2025 18:03
@ArthurSens
Copy link
Member Author

ArthurSens commented Aug 13, 2025

I think we should proceed with whatever you have working, and open an issue with your profiles, etc to look into optimizing that code path.

Once the otel_scope_info functionality is fully removed, we could remove the interface again and then the compiler will be able to inline a lot of the codepath again 🤔

@ArthurSens ArthurSens force-pushed the spec#4223-prom-receiver branch from 979c23d to eb9b37c Compare August 18, 2025 12:56
var emptyScopeID scopeID
// ScopeIdentifier represents an identifier for a metric scope, which can include
// just the basic scope information or also include scope attributes.
type ScopeIdentifier interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather define a hash function on a metric scope if we can, and use maps of uint64. String appending is very expensive--especially compared to hashing. Maybe we could add it to https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/5404f7c2434387d8b68f3be6010a72fded822d85/pkg/pdatautil/hash.go?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what you mean here 😬, could you give a code example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func ScopeHash64(scope pcommon.InstrumentationScope) uint64 {
    return pdatautil.Hash64(pdatautil.WithString(scope.Name()), pdatautil.WithString(scope.Version()), pdatautil.WithMap(scope.Attributes())
}

Your code would use the uint64 hash as the map key for the scope:

  1. Construct the InstrumentationScope from the metric's labels.
  2. Compute the hash of the scope.
  3. Lookup the ScopeMetrics using the hash
  4. Append the metric to that ScopeMetrics.

Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map lookups with a uint64 is MUCH faster than with a data structure, although collisions are possible. We could do the same for the ResourceMetrics lookup if we made a hash function for the resource.

Comment on lines +78 to +80
scopeMap map[resourceKey]map[string]ScopeIdentifier
scopeAttributes map[resourceKey]map[LegacyScopeID]pcommon.Map // Legacy mode only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need both of these? Can't we just have a map[resourceKey]map[string]ScopeIdentifier?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "ID" string can just be constructed differently depending on whether the feature gate is enabled or not. For legacy, the id string is just the name + version. For the new approach, it would be constructed with the entire scope.

Copy link
Contributor

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine people will have a mix of clients for some time. Some will use the new scope attributes on metrics, and others will still be using otel_scope_info. I think we should always add otel_scope_foo attributes as scope attributes, and use the feature gate to remove special handling of otel_scope_info.

@ArthurSens
Copy link
Member Author

I imagine people will have a mix of clients for some time. Some will use the new scope attributes on metrics, and others will still be using otel_scope_info. I think we should always add otel_scope_foo attributes as scope attributes, and use the feature gate to remove special handling of otel_scope_info.

Hmmm, good point. Let me change that

Signed-off-by: Arthur Silva Sens <[email protected]>
Signed-off-by: Arthur Silva Sens <[email protected]>
Signed-off-by: Arthur Silva Sens <[email protected]>
… labels from otel_scope_info metric

Signed-off-by: Arthur Silva Sens <[email protected]>
@ArthurSens ArthurSens force-pushed the spec#4223-prom-receiver branch from eb9b37c to 0d4282d Compare September 1, 2025 22:51
@ArthurSens
Copy link
Member Author

ArthurSens commented Sep 1, 2025

Okay, I've added a commit changing the behavior of the feature gate. Now, we're always parsing attributes from labels prefixed with otel_scope_, and the feature gate controls whether we merge attributes from the otel_scope_info metric or not.

Could we first review this part, and if necessary, we can refactor the duplicate map(#40060 (comment)) after this review?

@dashpole
Copy link
Contributor

dashpole commented Sep 2, 2025

Could we first review this part, and if necessary, we can refactor the duplicate map(#40060 (comment)) after this review?

Sure

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Sep 17, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2025

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

receiver/prometheus Prometheus receiver Stale

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[receiver/prometheus] Add support for otel_scope_<attribute-name> labels

7 participants