-
Notifications
You must be signed in to change notification settings - Fork 4k
xds: implement xDS timeout #7481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xds: implement xDS timeout #7481
Conversation
24e1886 to
58e6038
Compare
58e6038 to
e103a9e
Compare
0642c4a to
d2c53f6
Compare
| // Make newly added clusters selectable by config selector and deleted clusters no longer | ||
| // selectable. | ||
| currentRoutes = routes; | ||
| fallbackTimeoutNano = httpMaxStreamDurationNano; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note there is a race between updating currentRoutes and fallbackTimeoutNano. The config selector might select a route from the updated currentRoutes list while using the old fallbackTimeoutNano. That is, these two updates should be atomic but actually it is not. The race window should be extremely small (should be almost negligible).
One option is to create a data structure that groups these two fields so that the update can be atomic. But given the race window is small, may not be necessary to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, eliminated the race by grouping the list of routes and fallback timeout value into RoutingConfig.
18b23af to
bd82d8e
Compare
…out_with_max_stream_duration
…ration that falls back to LDS's http protocol option max stream duration.
bd82d8e to
49c14d2
Compare
| } | ||
| if (timeoutNano == 0) { | ||
| timeoutNano = Long.MAX_VALUE; | ||
| long timeoutNano = 0L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should expose hasMaxStreamDuration() based on the 3rd bullet point in https://github.com/grpc/proposal/blob/master/A31-xds-timeout-support-and-config-selector.md#supported-fields
We can not simply check timeoutNano == 0 instead of maxStreamDuration.hasMaxStreamDuration()). Two cases that's not working as spec:
-
Case 1:
maxStreamDuration.hasGrpcTimeoutHeaderMax()andmaxStreamDuration.getGrpcTimeoutHeaderMax() == 0. In this case, we should use 0 regardless of value of fallback timeout. -
Case 2:
!maxStreamDuration.hasGrpcTimeoutHeaderMax()andmaxStreamDuration.hasMaxStreamDuration()andmaxStreamDuration.getMaxStreamDuration() == 0. In this case, we should use 0 regardless of value of fallback timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, if you look at the table in the gRFC, a value of 0 for any of those fields is effectively equivalent to the field is not set.
So the order of preference is:
- If
maxStreamDuration.hasGrpcTimeoutHeaderMax()andmaxStreamDuration.getGrpcTimeoutHeaderMax() != 0: use it. Otherwise, go to next step. - If
maxStreamDuration.hasMaxStreamDuration()andmaxStreamDuration.getMaxStreamDuration() != 0: use it. Otherwise, go to next step. - If
max_stream_durationfrom theHttpConnectionManager.common_http_optionsis not 0, use this as timeout. Otherwise, use application's timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table is after applying the fallback if possible.
"Note that the max_stream_duration column refers to the effective setting based on both RouteAction.max_stream_duration.max_stream_duration and the max_stream_duration from the HTTP Connection Manager's common_http_options."
The condition for when to apply fallback does not change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my updated comment. I don't understand your point. But, a value of 0 is effectively equivalent to it is not set, do you agree with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Case 1 corresponds to row 4 of the table.
If maxStreamDuration.hasGrpcTimeoutHeaderMax() and maxStreamDuration.getGrpcTimeoutHeaderMax() != 0: use it. Otherwise, go to next step.
This is completely broken for case 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that table is problematic. If you interpret it in that way, the second last row is a counter-example...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the second last row is a counter-example
The second from the last row does not conflict with any other row, it just takes min(20, infinite).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, fixed. Falling back to max_stream_duration from the HttpConnectionManager.common_http_options will only take place if both !maxStreamDuration.hasGrpcTimeoutHeaderMax() and !maxStreamDuration.hasMaxStreamDuration() hold. We don't need to make the last fallback value nullable as its value of 0 and unset are treated equivalently (aka, as infinite) when it is indeed used.
…lback should not happen if the timeout value in RouteAction is set (including 0).
…out_with_max_stream_duration
076adf2 to
d488707
Compare
The xDS timeout retrieves per-route timeout value from RouteAction.max_stream_duration.grpc_timeout_header_max or RouteAction.max_stream_duration.max_stream_duration if the former is not set. If neither is set, it eventually falls back to the max_stream_duration setting in HttpConnectionManager.common_http_options retrieved from the Route's upstream Listener resource. The final timeout value applied to the call is the minimum of the xDS timeout value and the per-call timeout set by application.
Implements the xDS timeout feature. See details in gRFC-A31.
Each Route (converted) contains the per-route timeout setting that is from
RouteAction.max_stream_duration.grpc_timeout_header_maxorRouteAction.max_stream_duration.max_stream_durationif the former is not set in the xDS protocol. It falls back to themax_stream_durationsetting inHttpConnectionManager.common_http_optionsif neither of the previous two is set.Some caveats:
Note calls started between t1 and t3 should have timeout value set to 2s. Only after t3, the calls started with the latest config selector will have timeout value set to 5s. That is, the timeout value of 2s should be configured for the list of routes L1 and the timeout value of 5s should be configured for the list of routes L2. Before the config selector updates to select from L2, the fallback timeout value should have not been updated.