Skip to content

Conversation

@agoallikmaa
Copy link
Contributor

Fixes #2886

Currently the OkHttp3 instrumentation adds a new interceptor to the OkHttpClient.Builder when OkHttpClient constructor is invoked. This interceptor remains there and if the builder is reused, another one is added, which eventually caused the stack overflow in the linked issue.

This change fixes that specific issue, but also aims to prevent future bugs by making the following guarantees:

  • The state of the builder is the same before and after OkHttpClient constructor - any changes are only applied temporarily and removed on method exit. This prevents changes to the builder that might not be expected by the application code.
  • In the constructed OkHttpClient, there is exactly one tracing interceptor and it is the last one in the chain (assuming no other instrumentations targeting the same method). This prevents any inconsistencies between monitored and real requests caused by interceptors that could alter the request after the tracing interceptor.
  • Builder created via OkHttpClient#newBuilder does not contain the tracing interceptor, to prevent that interceptor from leaking into application code. However, the tracing interceptor can still be obtained by OkHttpClient#interceptors.

Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since interceptors are invoked in order I think we need the opposite, ours has to be first so any rpcs within interceptors are traced.

I think just adding three lines to the current advice without the rest should work fine?

if (builder.interceptors().contains(SINGLETON) ) {
  return;
}

@Advice.Local("otelOriginalInterceptors") List<Interceptor> originalInterceptors) {

if (builder.interceptors().contains(OkHttp3Interceptors.TRACING_INTERCEPTOR)) {
// Potential corner case - the tracing interceptor may be in the builder due to the builder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference vs just calling return without the state management?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible case is that the tracing interceptor may be in the middle, therefore it captures a "request" that may not be the actual request at all due to some later interceptor deciding to do something different.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it can be in the middle if we always add it in the constructor (unless already present which is only when constructing a builder from another client, where it again couldn't be in the middle).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible - the third test in OkHttp3Test produces this scenario.


public static class OkHttp3Advice {
// This advice makes two guarantees:
// 1) The state of the builder (specifically interceptor list) is the same before and after the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this relying on an implementation detail that the interceptor is applied directly to a client? Otherwise I don't see how it can work otherwise if we remove it right away. But I think we shouldn't rely on such a special implementation detail, it seems correct for the builder to have our interceptor when it's constructed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the builder not affecting any objects that have already been built in the past by this builder definitely does not sound like an implementation detail to me, but a very logical part of how the builder pattern works. Unless I misunderstood what you meant by the interceptor being applied to the client directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think it is highly unexpected that a call to build mutates the builder itself. This is not usually done by builders and therefore adding such a side effect to that method would not be good instrumentation etiquette.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if this was the build method I'd get it but it's the builder constructor isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor is effectively filling the role of what is usually done by a build method (in cases where an instance can also be created without a builder), and is private and only called from the build method and the zero-parameter constructor. However, being a constructor does not change that it should leave the builder in the same state, and that the builder can be modified and reused later to build a different client instance.

Copy link
Contributor

@anuraaga anuraaga May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh shoot I had been misreading the constructor to be of the builder, not the client. Sorry for the confusion makes sense now. Is it possible to simplify by changing to instrument the builder constructor to prepopulate interceptors if it doesn't already have it and not instrument the client itself at all?

@agoallikmaa
Copy link
Contributor Author

Since interceptors are invoked in order I think we need the opposite, ours has to be first so any rpcs within interceptors are traced.

The idea is that since interceptors may modify anything about the request, we would want to capture the request that is actually performed, if one was performed at all. So if the interceptor is last, then if no request was performed due to another interceptor deciding to abort it, no span would be created. Similarly, if there is an interceptor that performs retries, then all request attempts would be captured as spans. Whether we want to have some synthetic wrapper span as a parent in case interceptors cause multiple requests (such as for a retry) to be made is a different matter, but every real outgoing request should still be captured as a separate span.

Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point about aborting is interesting and may need discussion in the spec for the best modeling for it.

For now though this doesn't seem consistent with other instrumentation. For example if DNS resolution fails we will not send a request but will record a failed span in most if not all clients, a circuit breaker would be a similar failure IMO. Also capturing timing is the most important - while we have a spec issue open on exactly how to deal with it for now we are assuming all the time for a client request is in a client span, so we can't miss out on auth requests, etc.

@anuraaga
Copy link
Contributor

anuraaga commented May 3, 2021

We could perhaps add a config property to choose last or first though, users may know of cases for both and want to pick.

@agoallikmaa
Copy link
Contributor Author

None of the standard HTTP steps (such as DNS resolve, connect) are part of this interceptor list - those are performed after the last interceptor anyway. Thus these interceptors are only for custom steps that are unlikely to be covered by the specification anyway, and in that case any additional operations done in them showing up as separate spans directly under the server span should not be a problem.

I'm not opposed to adding a configuration option - but considering the above, do you think it is still necessary?

@anuraaga
Copy link
Contributor

anuraaga commented May 4, 2021

I use DNS as an example, but I think interceptors making RPCs or doing circuit breaker are very similar - and so far our practice has generally been to intercept as early as possible to make sure we capture all the processing for a request.

I would recommend either

  1. Just keep interceptor at front for consistency with our other instrumentation.

  2. Add an option - it seems ok but might be confusing anyways.

I would lean towards 1) to keep things simple but interested in hearing what others think. We should do a large scale redesign if we ever progress on the spec for the nested client span stuff, which should cover things like retry. Until that I think simpler, just one overall span, is safer.

@trask
Copy link
Member

trask commented May 6, 2021

I would suggest that the default behavior should be for CLIENT spans to represent just the underlying library call, not including any user configured pre- or post- callbacks/interceptors.

We do this already with "on response" callbacks, by ending the CLIENT span before calling them.

For interceptor style hooks, it makes sense to me to change our existing behavior and put our interceptor last (like in this PR), so the CLIENT spans don't include timings from any user configured interceptors (and don't hide any nested CLIENT spans that those interceptors may trigger).

@iNikem
Copy link
Contributor

iNikem commented May 6, 2021

First, I think this is one of the questions that Otel recommendations about span modelling should answer. Alas, that work is happening much slower as hoped for.

Second, I slightly prefer having CLIENT spans as close to user code as possible. Meaning having such interceptors first. For me this is the same rule of thumb as in the case of e.g. DynamoDB requests: CLIENT span is the outermost (with db semantic), actual network calls with http semantic are underneath and separate.

@trask
Copy link
Member

trask commented May 7, 2021

I discussed with @anuraaga today, and wrote up summary in #2923.

Suggest we change this PR to put the interceptor in front, following the reasoning in that issue.

As far as adding configuration options to support other behavior, it would probably be best to wait for spec discussions on nested client spans to unfold, unless there is a pressing need, in which case we can add an option under the experimental namespace.

@agoallikmaa
Copy link
Contributor Author

As the main issue of duplicating interceptors was fixed by the other PR, and since having the correct details for the outgoing request is apparently of lower priority than nesting extra spans caused by interceptors under the client span, this PR is no longer needed, so will close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

StackOverflow observed in open telemetry version 1.0.1

4 participants