Skip to content

Conversation

@voidzcy
Copy link
Contributor

@voidzcy voidzcy commented Oct 2, 2020

Implements the xDS timeout feature. See details in gRFC-A31.

Each Route (converted) contains the per-route timeout setting that is from RouteAction.max_stream_duration.grpc_timeout_header_max or RouteAction.max_stream_duration.max_stream_duration if the former is not set in the xDS protocol. It falls back to the max_stream_duration setting in HttpConnectionManager.common_http_options if neither of the previous two is set.


Some caveats:

  • The effective fallback httpMaxStreamDuration value should be consistent with routes selected by ConfigSelector (aka, RPC). Consider the following scenario:
    • t=t1, receives an LDS update with httpMaxStreamDuration = 2s and a list of routes (L1) for the targeted authority. None of the routes have per-route timeout value set. So after calls are able to pick up the latest config selector, they will have timeout value of 2s.
    • t=t2, receives an LDS update with httpMaxStreamDuration = 5s and an RDS resource name. So the resolver starts subscribing to the corresponding RDS resource.
    • t=t3, receives an RDS update for the subscribed RDS resource containing a list of routes (L2). None of the routes have per-route timeout value set.

Note calls started between t1 and t3 should have timeout value set to 2s. Only after t3, the calls started with the latest config selector will have timeout value set to 5s. That is, the timeout value of 2s should be configured for the list of routes L1 and the timeout value of 5s should be configured for the list of routes L2. Before the config selector updates to select from L2, the fallback timeout value should have not been updated.

@voidzcy voidzcy force-pushed the impl/xds_timeout_with_max_stream_duration branch from 24e1886 to 58e6038 Compare October 2, 2020 23:03
@voidzcy voidzcy force-pushed the impl/xds_timeout_with_max_stream_duration branch from 58e6038 to e103a9e Compare October 5, 2020 20:31
@voidzcy voidzcy marked this pull request as ready for review October 8, 2020 19:21
@voidzcy voidzcy requested a review from dapengzhang0 October 8, 2020 19:32
@voidzcy voidzcy force-pushed the impl/xds_timeout_with_max_stream_duration branch from 0642c4a to d2c53f6 Compare October 8, 2020 21:18
// Make newly added clusters selectable by config selector and deleted clusters no longer
// selectable.
currentRoutes = routes;
fallbackTimeoutNano = httpMaxStreamDurationNano;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note there is a race between updating currentRoutes and fallbackTimeoutNano. The config selector might select a route from the updated currentRoutes list while using the old fallbackTimeoutNano. That is, these two updates should be atomic but actually it is not. The race window should be extremely small (should be almost negligible).

One option is to create a data structure that groups these two fields so that the update can be atomic. But given the race window is small, may not be necessary to do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, eliminated the race by grouping the list of routes and fallback timeout value into RoutingConfig.

@voidzcy voidzcy force-pushed the impl/xds_timeout_with_max_stream_duration branch from 18b23af to bd82d8e Compare October 12, 2020 22:52
@voidzcy voidzcy force-pushed the impl/xds_timeout_with_max_stream_duration branch from bd82d8e to 49c14d2 Compare October 12, 2020 23:36
}
if (timeoutNano == 0) {
timeoutNano = Long.MAX_VALUE;
long timeoutNano = 0L;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should expose hasMaxStreamDuration() based on the 3rd bullet point in https://github.com/grpc/proposal/blob/master/A31-xds-timeout-support-and-config-selector.md#supported-fields

We can not simply check timeoutNano == 0 instead of maxStreamDuration.hasMaxStreamDuration()). Two cases that's not working as spec:

  • Case 1: maxStreamDuration.hasGrpcTimeoutHeaderMax() and maxStreamDuration.getGrpcTimeoutHeaderMax() == 0. In this case, we should use 0 regardless of value of fallback timeout.

  • Case 2: !maxStreamDuration.hasGrpcTimeoutHeaderMax() and maxStreamDuration.hasMaxStreamDuration() and maxStreamDuration.getMaxStreamDuration() == 0. In this case, we should use 0 regardless of value of fallback timeout.

Copy link
Contributor Author

@voidzcy voidzcy Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, if you look at the table in the gRFC, a value of 0 for any of those fields is effectively equivalent to the field is not set.

So the order of preference is:

  • If maxStreamDuration.hasGrpcTimeoutHeaderMax() and maxStreamDuration.getGrpcTimeoutHeaderMax() != 0: use it. Otherwise, go to next step.
  • If maxStreamDuration.hasMaxStreamDuration() and maxStreamDuration.getMaxStreamDuration() != 0: use it. Otherwise, go to next step.
  • If max_stream_duration from the HttpConnectionManager.common_http_options is not 0, use this as timeout. Otherwise, use application's timeout.

Copy link
Contributor

@dapengzhang0 dapengzhang0 Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table is after applying the fallback if possible.
"Note that the max_stream_duration column refers to the effective setting based on both RouteAction.max_stream_duration.max_stream_duration and the max_stream_duration from the HTTP Connection Manager's common_http_options."

The condition for when to apply fallback does not change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my updated comment. I don't understand your point. But, a value of 0 is effectively equivalent to it is not set, do you agree with that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case 1 corresponds to row 4 of the table.

If maxStreamDuration.hasGrpcTimeoutHeaderMax() and maxStreamDuration.getGrpcTimeoutHeaderMax() != 0: use it. Otherwise, go to next step.

This is completely broken for case 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that table is problematic. If you interpret it in that way, the second last row is a counter-example...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the second last row is a counter-example

The second from the last row does not conflict with any other row, it just takes min(20, infinite).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, fixed. Falling back to max_stream_duration from the HttpConnectionManager.common_http_options will only take place if both !maxStreamDuration.hasGrpcTimeoutHeaderMax() and !maxStreamDuration.hasMaxStreamDuration() hold. We don't need to make the last fallback value nullable as its value of 0 and unset are treated equivalently (aka, as infinite) when it is indeed used.

…lback should not happen if the timeout value in RouteAction is set (including 0).
@voidzcy voidzcy force-pushed the impl/xds_timeout_with_max_stream_duration branch from 076adf2 to d488707 Compare October 15, 2020 00:24
@voidzcy voidzcy merged commit d25f5ac into grpc:master Oct 15, 2020
dfawley pushed a commit to dfawley/grpc-java that referenced this pull request Jan 15, 2021
The xDS timeout retrieves per-route timeout value from RouteAction.max_stream_duration.grpc_timeout_header_max or RouteAction.max_stream_duration.max_stream_duration if the former is not set. If neither is set, it eventually falls back to the max_stream_duration setting in HttpConnectionManager.common_http_options retrieved from the Route's upstream Listener resource. The final timeout value applied to the call is the minimum of the xDS timeout value and the per-call timeout set by application.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants