Make OkHttp3 instrumentation more reliable #2894

agoallikmaa · 2021-04-30T18:58:02Z

Currently the OkHttp3 instrumentation adds a new interceptor to the OkHttpClient.Builder when OkHttpClient constructor is invoked. This interceptor remains there and if the builder is reused, another one is added, which eventually caused the stack overflow in the linked issue.

This change fixes that specific issue, but also aims to prevent future bugs by making the following guarantees:

The state of the builder is the same before and after OkHttpClient constructor - any changes are only applied temporarily and removed on method exit. This prevents changes to the builder that might not be expected by the application code.
In the constructed OkHttpClient, there is exactly one tracing interceptor and it is the last one in the chain (assuming no other instrumentations targeting the same method). This prevents any inconsistencies between monitored and real requests caused by interceptors that could alter the request after the tracing interceptor.
Builder created via OkHttpClient#newBuilder does not contain the tracing interceptor, to prevent that interceptor from leaking into application code. However, the tracing interceptor can still be obtained by OkHttpClient#interceptors.

...nt/src/test/groovy/io/opentelemetry/javaagent/instrumentation/okhttp/v3_0/OkHttp3Test.groovy

...ava/io/opentelemetry/javaagent/instrumentation/okhttp/v3_0/OkHttp3InstrumentationModule.java

anuraaga

Since interceptors are invoked in order I think we need the opposite, ours has to be first so any rpcs within interceptors are traced.

I think just adding three lines to the current advice without the rest should work fine?

if (builder.interceptors().contains(SINGLETON) ) {
  return;
}

anuraaga · 2021-05-03T03:09:59Z

...ava/io/opentelemetry/javaagent/instrumentation/okhttp/v3_0/OkHttp3InstrumentationModule.java

+        @Advice.Local("otelOriginalInterceptors") List<Interceptor> originalInterceptors) {
+
+      if (builder.interceptors().contains(OkHttp3Interceptors.TRACING_INTERCEPTOR)) {
+        // Potential corner case - the tracing interceptor may be in the builder due to the builder


What's the difference vs just calling return without the state management?

A possible case is that the tracing interceptor may be in the middle, therefore it captures a "request" that may not be the actual request at all due to some later interceptor deciding to do something different.

I don't think it can be in the middle if we always add it in the constructor (unless already present which is only when constructing a builder from another client, where it again couldn't be in the middle).

It is possible - the third test in OkHttp3Test produces this scenario.

anuraaga · 2021-05-03T03:12:05Z

...ava/io/opentelemetry/javaagent/instrumentation/okhttp/v3_0/OkHttp3InstrumentationModule.java


-  public static class OkHttp3Advice {
+  // This advice makes two guarantees:
+  // 1) The state of the builder (specifically interceptor list) is the same before and after the


Is this relying on an implementation detail that the interceptor is applied directly to a client? Otherwise I don't see how it can work otherwise if we remove it right away. But I think we shouldn't rely on such a special implementation detail, it seems correct for the builder to have our interceptor when it's constructed.

Changing the builder not affecting any objects that have already been built in the past by this builder definitely does not sound like an implementation detail to me, but a very logical part of how the builder pattern works. Unless I misunderstood what you meant by the interceptor being applied to the client directly.

Also, I think it is highly unexpected that a call to build mutates the builder itself. This is not usually done by builders and therefore adding such a side effect to that method would not be good instrumentation etiquette.

Yeah if this was the build method I'd get it but it's the builder constructor isn't it?

The constructor is effectively filling the role of what is usually done by a build method (in cases where an instance can also be created without a builder), and is private and only called from the build method and the zero-parameter constructor. However, being a constructor does not change that it should leave the builder in the same state, and that the builder can be modified and reused later to build a different client instance.

Oh shoot I had been misreading the constructor to be of the builder, not the client. Sorry for the confusion makes sense now. Is it possible to simplify by changing to instrument the builder constructor to prepopulate interceptors if it doesn't already have it and not instrument the client itself at all?

agoallikmaa · 2021-05-03T07:00:34Z

Since interceptors are invoked in order I think we need the opposite, ours has to be first so any rpcs within interceptors are traced.

The idea is that since interceptors may modify anything about the request, we would want to capture the request that is actually performed, if one was performed at all. So if the interceptor is last, then if no request was performed due to another interceptor deciding to abort it, no span would be created. Similarly, if there is an interceptor that performs retries, then all request attempts would be captured as spans. Whether we want to have some synthetic wrapper span as a parent in case interceptors cause multiple requests (such as for a retry) to be made is a different matter, but every real outgoing request should still be captured as a separate span.

anuraaga

The point about aborting is interesting and may need discussion in the spec for the best modeling for it.

For now though this doesn't seem consistent with other instrumentation. For example if DNS resolution fails we will not send a request but will record a failed span in most if not all clients, a circuit breaker would be a similar failure IMO. Also capturing timing is the most important - while we have a spec issue open on exactly how to deal with it for now we are assuming all the time for a client request is in a client span, so we can't miss out on auth requests, etc.

anuraaga · 2021-05-03T07:38:22Z

We could perhaps add a config property to choose last or first though, users may know of cases for both and want to pick.

agoallikmaa · 2021-05-03T10:50:04Z

None of the standard HTTP steps (such as DNS resolve, connect) are part of this interceptor list - those are performed after the last interceptor anyway. Thus these interceptors are only for custom steps that are unlikely to be covered by the specification anyway, and in that case any additional operations done in them showing up as separate spans directly under the server span should not be a problem.

I'm not opposed to adding a configuration option - but considering the above, do you think it is still necessary?

anuraaga · 2021-05-04T05:35:30Z

I use DNS as an example, but I think interceptors making RPCs or doing circuit breaker are very similar - and so far our practice has generally been to intercept as early as possible to make sure we capture all the processing for a request.

I would recommend either

Just keep interceptor at front for consistency with our other instrumentation.
Add an option - it seems ok but might be confusing anyways.

I would lean towards 1) to keep things simple but interested in hearing what others think. We should do a large scale redesign if we ever progress on the spec for the nested client span stuff, which should cover things like retry. Until that I think simpler, just one overall span, is safer.

trask · 2021-05-06T05:16:28Z

I would suggest that the default behavior should be for CLIENT spans to represent just the underlying library call, not including any user configured pre- or post- callbacks/interceptors.

We do this already with "on response" callbacks, by ending the CLIENT span before calling them.

For interceptor style hooks, it makes sense to me to change our existing behavior and put our interceptor last (like in this PR), so the CLIENT spans don't include timings from any user configured interceptors (and don't hide any nested CLIENT spans that those interceptors may trigger).

iNikem · 2021-05-06T11:39:05Z

First, I think this is one of the questions that Otel recommendations about span modelling should answer. Alas, that work is happening much slower as hoped for.

Second, I slightly prefer having CLIENT spans as close to user code as possible. Meaning having such interceptors first. For me this is the same rule of thumb as in the case of e.g. DynamoDB requests: CLIENT span is the outermost (with db semantic), actual network calls with http semantic are underneath and separate.

trask · 2021-05-07T05:33:45Z

I discussed with @anuraaga today, and wrote up summary in #2923.

Suggest we change this PR to put the interceptor in front, following the reasoning in that issue.

As far as adding configuration options to support other behavior, it would probably be best to wait for spec discussions on nested client spans to unfold, unless there is a pressing need, in which case we can add an option under the experimental namespace.

agoallikmaa · 2021-05-14T17:18:30Z

As the main issue of duplicating interceptors was fixed by the other PR, and since having the correct details for the outgoing request is apparently of lower priority than nesting extra spans caused by interceptors under the client span, this PR is no longer needed, so will close.

Make OkHttp3 instrumentation more reliable

2827c40

agoallikmaa requested review from anuraaga, iNikem, jkwatson, laurit, mateuszrzeszutek, pavolloffay, trask and tylerbenson as code owners April 30, 2021 18:58

Fix Codenarc violation

263f6c1

iNikem reviewed May 2, 2021

View reviewed changes

agoallikmaa added 2 commits May 2, 2021 20:10

Addressed PR comments

b106b29

Addressed more PR comments

ede5156

iNikem approved these changes May 2, 2021

View reviewed changes

anuraaga reviewed May 3, 2021

View reviewed changes

trask mentioned this pull request May 7, 2021

Ordering of tracing "interceptors" #2923

Closed

anuraaga mentioned this pull request May 10, 2021

Instrument okhttp builder constructor instead of client #2946

Merged

agoallikmaa closed this May 14, 2021

Make OkHttp3 instrumentation more reliable #2894

Make OkHttp3 instrumentation more reliable #2894

Uh oh!

Conversation

agoallikmaa commented Apr 30, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anuraaga left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anuraaga May 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agoallikmaa commented May 3, 2021

Uh oh!

anuraaga left a comment

Choose a reason for hiding this comment

Uh oh!

anuraaga commented May 3, 2021

Uh oh!

agoallikmaa commented May 3, 2021

Uh oh!

anuraaga commented May 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trask commented May 6, 2021

Uh oh!

iNikem commented May 6, 2021

Uh oh!

trask commented May 7, 2021

Uh oh!

agoallikmaa commented May 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

anuraaga May 4, 2021 •

edited

Loading

anuraaga commented May 4, 2021 •

edited

Loading