[receiverhelper] Add metric for requests that failed to be received

**Is your feature request related to a problem? Please describe.**

The [observability requirements for stable components](https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/component-stability.md#observability-requirements) recommend emitting telemetry in a way that allows users to differentiate between errors originating from a component and errors propagated from downstream components. This is currently somewhat complicated to do in receivers that use `receiverhelper`, notably the OTLP receiver (see [OTLP receiver telemetry review](https://github.com/open-telemetry/opentelemetry-collector/issues/11139#issuecomment-2582962557)), for two reasons:
- All errors are surfaced as the same `otelcol_receiver_refused_x` metric;
- If an internal error happens before the telemetry payload was fully received and parsed, we cannot determine the number of telemetry items involved, and thus cannot properly surface the error with `ObsReport.EndXOp`. This means that `StartXOp` may be delayed until everything is parsed (as in the OTLP receiver), which mean internal failures are never surfaced through metrics.

**Describe the solution you'd like**

Following the precedent of the [pipeline auto-instrumentation RFC](https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/component-universal-telemetry.md#auto-instrumented-metrics), I believe we should differentiate between payloads that were "refused" by downstream components and requests that "failed".

Telemetry-wise, this would mean specializing the `otelcol_receiver_refused_x` metric to downstream errors (ones returned from `nextConsumer.ConsumeX`; this is already the case de-facto in the OTLP receiver), and add a new metric to account for internal errors:
- Either a simple `otelcol_receiver_failed_requests` metric (maybe `_operations` if we want to account for scrapers?);
- Or a generic `otelcol_receiver_requests` metric which counts all receiver operations, with an `outcome: success / failure / refused` attribute, following the convention in the above RFC.

API-wise, with the goal of avoiding breakage, I think the simplest way to implement this would be to add a new method to `ObsReport` which could be called in place of `EndXOp`, which would emit a "failure" metric instead of a "refused" metric, and encourage component authors to call `StartXOp` as early in processing as possible. (Note: This could also be used to improve the timing information provided by tracing by adding a span event signifying the end of internal processing). Under the assumption that most receivers behave like the OTLP receiver and mostly only wrap downstream processing in `Start/EndXOp`, components that haven't updated would continue to behave as before.

**Describe alternatives you've considered**
We could also leave things as-is, and let receiver component authors add their own internal failure metrics.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[receiverhelper] Add metric for requests that failed to be received #12207

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[receiverhelper] Add metric for requests that failed to be received #12207

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions