-
Notifications
You must be signed in to change notification settings - Fork 1k
Make OkHttp3 instrumentation more reliable #2894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make OkHttp3 instrumentation more reliable #2894
Conversation
...nt/src/test/groovy/io/opentelemetry/javaagent/instrumentation/okhttp/v3_0/OkHttp3Test.groovy
Show resolved
Hide resolved
...nt/src/test/groovy/io/opentelemetry/javaagent/instrumentation/okhttp/v3_0/OkHttp3Test.groovy
Show resolved
Hide resolved
...ava/io/opentelemetry/javaagent/instrumentation/okhttp/v3_0/OkHttp3InstrumentationModule.java
Outdated
Show resolved
Hide resolved
anuraaga
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since interceptors are invoked in order I think we need the opposite, ours has to be first so any rpcs within interceptors are traced.
I think just adding three lines to the current advice without the rest should work fine?
if (builder.interceptors().contains(SINGLETON) ) {
return;
}
| @Advice.Local("otelOriginalInterceptors") List<Interceptor> originalInterceptors) { | ||
|
|
||
| if (builder.interceptors().contains(OkHttp3Interceptors.TRACING_INTERCEPTOR)) { | ||
| // Potential corner case - the tracing interceptor may be in the builder due to the builder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference vs just calling return without the state management?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A possible case is that the tracing interceptor may be in the middle, therefore it captures a "request" that may not be the actual request at all due to some later interceptor deciding to do something different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it can be in the middle if we always add it in the constructor (unless already present which is only when constructing a builder from another client, where it again couldn't be in the middle).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible - the third test in OkHttp3Test produces this scenario.
|
|
||
| public static class OkHttp3Advice { | ||
| // This advice makes two guarantees: | ||
| // 1) The state of the builder (specifically interceptor list) is the same before and after the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this relying on an implementation detail that the interceptor is applied directly to a client? Otherwise I don't see how it can work otherwise if we remove it right away. But I think we shouldn't rely on such a special implementation detail, it seems correct for the builder to have our interceptor when it's constructed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the builder not affecting any objects that have already been built in the past by this builder definitely does not sound like an implementation detail to me, but a very logical part of how the builder pattern works. Unless I misunderstood what you meant by the interceptor being applied to the client directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think it is highly unexpected that a call to build mutates the builder itself. This is not usually done by builders and therefore adding such a side effect to that method would not be good instrumentation etiquette.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah if this was the build method I'd get it but it's the builder constructor isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constructor is effectively filling the role of what is usually done by a build method (in cases where an instance can also be created without a builder), and is private and only called from the build method and the zero-parameter constructor. However, being a constructor does not change that it should leave the builder in the same state, and that the builder can be modified and reused later to build a different client instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh shoot I had been misreading the constructor to be of the builder, not the client. Sorry for the confusion makes sense now. Is it possible to simplify by changing to instrument the builder constructor to prepopulate interceptors if it doesn't already have it and not instrument the client itself at all?
The idea is that since interceptors may modify anything about the request, we would want to capture the request that is actually performed, if one was performed at all. So if the interceptor is last, then if no request was performed due to another interceptor deciding to abort it, no span would be created. Similarly, if there is an interceptor that performs retries, then all request attempts would be captured as spans. Whether we want to have some synthetic wrapper span as a parent in case interceptors cause multiple requests (such as for a retry) to be made is a different matter, but every real outgoing request should still be captured as a separate span. |
anuraaga
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point about aborting is interesting and may need discussion in the spec for the best modeling for it.
For now though this doesn't seem consistent with other instrumentation. For example if DNS resolution fails we will not send a request but will record a failed span in most if not all clients, a circuit breaker would be a similar failure IMO. Also capturing timing is the most important - while we have a spec issue open on exactly how to deal with it for now we are assuming all the time for a client request is in a client span, so we can't miss out on auth requests, etc.
|
We could perhaps add a config property to choose last or first though, users may know of cases for both and want to pick. |
|
None of the standard HTTP steps (such as DNS resolve, connect) are part of this interceptor list - those are performed after the last interceptor anyway. Thus these interceptors are only for custom steps that are unlikely to be covered by the specification anyway, and in that case any additional operations done in them showing up as separate spans directly under the server span should not be a problem. I'm not opposed to adding a configuration option - but considering the above, do you think it is still necessary? |
|
I use DNS as an example, but I think interceptors making RPCs or doing circuit breaker are very similar - and so far our practice has generally been to intercept as early as possible to make sure we capture all the processing for a request. I would recommend either
I would lean towards 1) to keep things simple but interested in hearing what others think. We should do a large scale redesign if we ever progress on the spec for the nested client span stuff, which should cover things like retry. Until that I think simpler, just one overall span, is safer. |
|
I would suggest that the default behavior should be for CLIENT spans to represent just the underlying library call, not including any user configured pre- or post- callbacks/interceptors. We do this already with "on response" callbacks, by ending the CLIENT span before calling them. For interceptor style hooks, it makes sense to me to change our existing behavior and put our interceptor last (like in this PR), so the CLIENT spans don't include timings from any user configured interceptors (and don't hide any nested CLIENT spans that those interceptors may trigger). |
|
First, I think this is one of the questions that Otel recommendations about span modelling should answer. Alas, that work is happening much slower as hoped for. Second, I slightly prefer having CLIENT spans as close to user code as possible. Meaning having such interceptors first. For me this is the same rule of thumb as in the case of e.g. DynamoDB requests: CLIENT span is the outermost (with db semantic), actual network calls with http semantic are underneath and separate. |
|
I discussed with @anuraaga today, and wrote up summary in #2923. Suggest we change this PR to put the interceptor in front, following the reasoning in that issue. As far as adding configuration options to support other behavior, it would probably be best to wait for spec discussions on nested client spans to unfold, unless there is a pressing need, in which case we can add an option under the |
|
As the main issue of duplicating interceptors was fixed by the other PR, and since having the correct details for the outgoing request is apparently of lower priority than nesting extra spans caused by interceptors under the client span, this PR is no longer needed, so will close. |
Fixes #2886
Currently the OkHttp3 instrumentation adds a new interceptor to the
OkHttpClient.BuilderwhenOkHttpClientconstructor is invoked. This interceptor remains there and if the builder is reused, another one is added, which eventually caused the stack overflow in the linked issue.This change fixes that specific issue, but also aims to prevent future bugs by making the following guarantees:
OkHttpClientconstructor - any changes are only applied temporarily and removed on method exit. This prevents changes to the builder that might not be expected by the application code.OkHttpClient, there is exactly one tracing interceptor and it is the last one in the chain (assuming no other instrumentations targeting the same method). This prevents any inconsistencies between monitored and real requests caused by interceptors that could alter the request after the tracing interceptor.OkHttpClient#newBuilderdoes not contain the tracing interceptor, to prevent that interceptor from leaking into application code. However, the tracing interceptor can still be obtained byOkHttpClient#interceptors.