Skip to content

Conversation

atoulme
Copy link
Contributor

@atoulme atoulme commented Jun 9, 2025

This is not ready - opening for discussion and to help move the effort forward.

@github-actions github-actions bot requested a review from MovieStoreGuy June 9, 2025 07:07
@atoulme atoulme force-pushed the pprofreceiver_impl branch 3 times, most recently from aa49ba8 to 856302b Compare June 9, 2025 23:21
@atoulme atoulme force-pushed the pprofreceiver_impl branch from 856302b to 9ed9b02 Compare June 9, 2025 23:29
Copy link
Contributor

@florianl florianl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a first quick look

func pprofToPprofile(parsed *profile.Profile, originalPayload []byte) pprofile.Profiles {
result := pprofile.NewProfiles()
// create a new profile record
resultProfile := result.ResourceProfiles().AppendEmpty().ScopeProfiles().AppendEmpty().Profiles().AppendEmpty()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single Resource Profile in OTel Profiling can have multiple Scope Profiles. And OTel eBPF Profiler is already using multiple Scope Profiles to differentiate between on CPU and off CPU profiling. Mixing these profiles will cause issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that defined somewhere I can reuse?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add more complexity 🙈 open-telemetry/opentelemetry-ebpf-profiler#517 will implement the recent changes in the OTel Profiling signal to generate a single ResourceProfile per container and an additional ResourceProfile for all non-containerized processes. As, I think, google/pprof does not differentiate between containerized elements and others, I think it is hard to map this correctly. Google/pprof might be able to set attributes accordingly, but they might not be standartized.

But I think, there should be one OTel/ScopeProfile per Google/pprof Profile.sample_type. This will then also be according to open-telemetry/opentelemetry-proto#649.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, can this be clarified in a spec? This is a bit of a departure from other signals where scope typically is to identify the instrumentation used.

Comment on lines +22 to +23
// per https://github.com/open-telemetry/semantic-conventions/blob/main/model/profile/registry.yaml
resultProfile.SetOriginalPayloadFormat("go")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no direct correlation between the frame types that are reported with OTel Profiling (https://github.com/open-telemetry/semantic-conventions/blob/main/model/profile/registry.yaml) and the SetOriginalPayloadFormat. Regular OTel Profiles will also contain python and frames from other programing languges.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand, sorry. Is this value invalid?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was confused about this element.

The best value, when converting Google/pprof to OTel Profiling should be pprof or something similar, as it should indicate the original resource (google/pprof in this case).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not what the spec says though - pointing back at the schema you linked, the exhaustive list of enums has go: https://github.com/open-telemetry/semantic-conventions/blob/main/model/profile/registry.yaml#L71

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec you are referring to is about profile.frame.type and Describes the interpreter or compiler of a single frame.. The known list of frame types is in no relation to SetOriginalPayloadFormat.

The type of a single frame is different to the format of SetOriginalPayloadFormat. To my knowledge, there is no enum/list that describes known values for SetOriginalPayloadFormat.

The field original_payload_format is closely tight to the field original_payload. And in this context, JFR is named as a potential payload (and respective format value) in the proto file:

https://github.com/open-telemetry/opentelemetry-proto/blob/fe18c4b149ed308a7af4b47075c32742e30ebd59/opentelemetry/proto/profiles/v1development/profiles.proto#L264-L273

Copy link
Member

@christos68k christos68k Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original payload refers to profiling data serialized under a different format (e.g. JFR or pprof) that was converted into the OTel profiling format. Note also how the documentation explicitly states that if the original data is in pprof format then it SHOULD not be included in this field (and thus one wouldn't set original_payload_format either), presumably because OTel profiling format is a superset of pprof and conversions can take place without loss of information.

I think you got confused because you probably read this bit (that is vague and doesn't specifically refer to profiling semantic conventions) and extrapolated that the semantic conventions it refers to (which AFAIK do not yet exist) are the profiling frame type conventions (which are completely unrelated). So to avoid this confusion in the future, we need to either reword this documentation comment about a semantic conventions reference that seemingly doesn't exist or introduce it.

hash = md5.Sum(originalPayload) //nolint:gosec

// set period
resultProfile.SetPeriod(parsed.Period)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is also a conflict with off CPU profiling as off CPU profiling does not have a regular period but is triggered by scheduler events. So its cadence can change depending on the workload and is not constant, like sampling based on CPU profiling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I put something else here? Or not set it in some situations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends, I guess 😅

If the input originates from a sampling based approach, then this value should be set. Otherwise, it should be left out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot really know that, can I? Whatever pprof provides can be passed in and pprof can make that assertion?

func pprofToPprofile(parsed *profile.Profile, originalPayload []byte) pprofile.Profiles {
result := pprofile.NewProfiles()
// create a new profile record
resultProfile := result.ResourceProfiles().AppendEmpty().ScopeProfiles().AppendEmpty().Profiles().AppendEmpty()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add more complexity 🙈 open-telemetry/opentelemetry-ebpf-profiler#517 will implement the recent changes in the OTel Profiling signal to generate a single ResourceProfile per container and an additional ResourceProfile for all non-containerized processes. As, I think, google/pprof does not differentiate between containerized elements and others, I think it is hard to map this correctly. Google/pprof might be able to set attributes accordingly, but they might not be standartized.

But I think, there should be one OTel/ScopeProfile per Google/pprof Profile.sample_type. This will then also be according to open-telemetry/opentelemetry-proto#649.

Comment on lines +22 to +23
// per https://github.com/open-telemetry/semantic-conventions/blob/main/model/profile/registry.yaml
resultProfile.SetOriginalPayloadFormat("go")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was confused about this element.

The best value, when converting Google/pprof to OTel Profiling should be pprof or something similar, as it should indicate the original resource (google/pprof in this case).

hash = md5.Sum(originalPayload) //nolint:gosec

// set period
resultProfile.SetPeriod(parsed.Period)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends, I guess 😅

If the input originates from a sampling based approach, then this value should be set. Otherwise, it should be left out.

resultL := result.ProfilesDictionary().LocationTable().AppendEmpty()
resultL.SetAddress(l.Address)
resultL.SetIsFolded(l.IsFolded)
for i, m := range result.ProfilesDictionary().MappingTable().All() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not scale well, if OTels result.ProfilesDictionary().MappingTable() is prepopulated and for each Location all populated elements are iterated. Maybe just add the mapping from google/pprof into ProfilesDictionary().MappingTable at this point and set the OTel Location.mapping_index accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are not trying to scale, just make a roundtrip pprof -> otlp -> pprof. After that works we can fine tune.

resultS.SetLocationsLength(int32(len(parsed.Location)))
}

for _, f := range parsed.Function {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to mappings, the resulting connection between OTel Sample.Location.Line.Function is missing. So I would also suggest to add the google/pprof Function to the OTel Function Dictionary while encountering it while iterating over google Samples.

// TODO set aggregation temporality
}

commentsIndices := make([]int32, len(parsed.Comments))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As google/pprof mixes sample types and profiles in general, it might be hard to assign the google/Profile.comment to the corresponding OTel Profile.comment_strindices, as there could be multiple OTel Profiles per google/Profile.

Copy link
Contributor

github-actions bot commented Jul 1, 2025

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jul 1, 2025
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Jul 15, 2025
@atoulme atoulme reopened this Jul 23, 2025
@github-actions github-actions bot removed the Stale label Jul 23, 2025
Copy link
Contributor

github-actions bot commented Aug 6, 2025

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Aug 6, 2025
@florianl
Copy link
Contributor

florianl commented Aug 6, 2025

I'm interested in picking up this task, once the OTel Profiling signal settled more. Maybe with the next v1.8.0 release of OTel proto.

@github-actions github-actions bot removed the Stale label Aug 7, 2025
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@felixge
Copy link
Member

felixge commented Sep 4, 2025

I started a thread for this in slack, but I agree with @florianl that we need to wait for the collector to upgrade to v1.8.0 before proceeding here.

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Sep 20, 2025
florianl added a commit to florianl/opentelemetry-collector-contrib that referenced this pull request Sep 23, 2025
Follow up to open-telemetry#40548

TODO:
- [ ] verify result
- [ ] more tests

Signed-off-by: Florian Lehner <[email protected]>
@atoulme atoulme closed this Oct 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants