Skip to content

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented May 14, 2020

No description provided.

@pitrou
Copy link
Member Author

pitrou commented May 14, 2020

@fsaintjacques @bkietz Comments welcome on this draft.

@github-actions
Copy link

@pitrou pitrou force-pushed the ARROW-8732-cancel branch from 9fb2b36 to 615e7c7 Compare May 20, 2020 15:04
Copy link
Contributor

@fsaintjacques fsaintjacques left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking until some more documentation on which methods blocks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least, document which methods are blocking. Do you think we should always make this explicit in the naming, e.g. RequestStopBlocking()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add docstrings in any case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why this is not part of the constructor, do you foresee cases where we need to change the callback dynamically?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the producer is sequencing several different operations with their own stop callbacks, this should make it easier than having to write a single overall callback that will dispatch to the right sub-callback.

Or at least that's the thinking. We may want to revisit this later, once we start using this concretely :-)

@pitrou
Copy link
Member Author

pitrou commented Dec 9, 2020

@bkietz @westonpace I updated this draft PR and added integration with futures. I'm not sure about the overall API and control flow, feel free to discuss.

@pitrou pitrou force-pushed the ARROW-8732-cancel branch from 13ae181 to f7aa129 Compare December 9, 2020 19:54
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lidavidm This PR, if accepted, will change the Executor interface slightly. You may want to take a look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads up. This looks good, we don't yet use the Executor interface but will move to it once we start looking at upstreaming/open-sourcing our project.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why expose the stop token to the executor at all? If I'm writing a custom executor what do I do with the stop token? I see that the thread pool uses this to skip the task if it has been cancelled prior to scheduling but conceivably this could be achieved by wrapping the task as well in the same way that the TaskGroup::AppendReal does. Do you envision certain classes of custom executor will want to do something different with the stop token?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I answered my own question. The executor might get some benefit from freeing the resources sooner rather than later (e.g. to keep queues down and more efficient) and the executor can choose to completely ignore the stop token without any consequence so it isn't like it is putting much burden on the implementor of the executor interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the idea is that it gives more freedom to the executor like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the producer is creating the stop token wouldn't the correct interface be

Result<StopToken*> SpawnReal(TaskHints hints, std::function<void()> task);

The executor is responsible for creating and returning a stop token. For example, in the existing model, how does the executor inform the caller that it is, in fact, cancellable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the stop token is created in Executor::Submit since it's embedded in the Future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the future has already been marked completed? This could happen with a timing issue (user requests cancel at the same time the operation is finishing) or when we mix this in with continuations it could happen where a chain of futures is cancelled from the end, the cancellation will propagate backwards through the chain, and the first few futures may have already completed.

This second case we could address when merging that change in too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. In Java and Python, a completed Future cannot be cancelled (note the Python API is inspired from the Java API, AFAIK).

@westonpace
Copy link
Member

So for discussion & comparison's sake, I will point out that this approach is a little different than C#'s "unified" approach [1] and is more in line with Java's approach (minus Java's added complexity around "interrupts").

In the C# model the stop token is provided by the consumer which is a little counter-intuitive but it does have some advantages. An intermediate layer's only responsibility is to receive the cancellation token and pass it down to all lower layer calls. For example, consider some kind of intermediate parsing-aggregation layer. It makes multiple concurrent calls to load and parse a column of data. With the current approach it would have to keep track of several stop tokens, provide its own stop token (which it returns to the caller), listen to its stop token, and propagate any cancellation to all of the child stop tokens.

This model also means that arrow-10182 will have to propagate cancellation up the callback chain. In other words, when a child future is cancelled, it should cancel the parent future that caused the child future to be created. This is the opposite direction that errors propagate. I'm certain it's something that can be done, but it adds more complexity. In the unified model all futures in the chain would be cancelled at once (since they share the same cancellation source) and there is no need to propagate.

On the other hand, to play devil's advocate, the unified cancellation model can be confusing to the consumer. As I mentioned earlier, it is counter-intuitive for the consumer to provide the cancellation token. People are used to the idea of getting some kind of handle back from a long running operation that they cancel.

[1] https://docs.microsoft.com/en-us/dotnet/standard/threading/cancellation-in-managed-threads

@pitrou
Copy link
Member Author

pitrou commented Dec 15, 2020

Thanks for the insight @westonpace . I agree the "unified" model looks attractive. Let me think about it.

@pitrou
Copy link
Member Author

pitrou commented Dec 15, 2020

One possible limitation is that it doesn't seem easy for the consumer to know whether the producer will actually notice a cancellation request (some producers may be unable to?). Unless we expose a StopToken::Accept method that the producer calls to signal that it is able to react to cancellation requests.

@pitrou
Copy link
Member Author

pitrou commented Dec 15, 2020

Note also that in this PR, StopToken can have a single callback. We would have to change that if we want a single token to be potentially listened to by multiple producers.

@pitrou
Copy link
Member Author

pitrou commented Dec 15, 2020

An advantage of the "unified" model is that it should be less expensive to create a single token (even if it can handle multiple callbacks) that one at each delegation level.

@westonpace
Copy link
Member

One possible limitation is that it doesn't seem easy for the consumer to know whether the producer will actually notice a cancellation request (some producers may be unable to?). Unless we expose a StopToken::Accept method that the producer calls to signal that it is able to react to cancellation requests.

Correct, although I don't know of any cancellation framework that does support this. I'm not sure what the caller is supposed to do with the information that cancel is not supported.

Potentially one could imagine that the user's "cancel" button is disabled if an operation cannot be cancelled but it's hard to imagine a non-cancelable operation making it all the way up to the user layer. For example, the moment a future is chained the resulting future will be cancellable (the second half of the chain might not run if cancelled at the first step). Any intermediate layer that looks at the stop token to prevent follow-up work will hide the underlying non-cancellability.

If a producer doesn't support cancel (this could be very common, for example, there is typically no way to cancel an ongoing blocking I/O operation) then calling cancel can still just be a no-op.

@pitrou
Copy link
Member Author

pitrou commented Dec 16, 2020

I'll probably try to submit another PR with the "unified" model (i.e. consumer-issued stop token), though that may be after my end-of-year vacation.

@github-actions github-actions bot added the needs-rebase A PR that needs to be rebased by the author label Dec 18, 2020
@pitrou
Copy link
Member Author

pitrou commented Feb 18, 2021

I rebased this on git master, but I will file a separate PR for the consumer-issued approach.

@pitrou
Copy link
Member Author

pitrou commented Mar 3, 2021

Closed, superseded by #9528.

@pitrou pitrou closed this Mar 3, 2021
@pitrou pitrou deleted the ARROW-8732-cancel branch March 3, 2021 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: C++ needs-rebase A PR that needs to be rebased by the author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants