-
Notifications
You must be signed in to change notification settings - Fork 298
[FEAT] Add stateful actor context and set CUDA_VISIBLE_DEVICES #3002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CodSpeed Performance ReportMerging #3002 will not alter performanceComparing Summary
|
Took a quick first pass, but I'll probably need to give it a much more thorough review again given how much logic is changing in the PyRunner |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3002 +/- ##
==========================================
- Coverage 78.39% 78.34% -0.05%
==========================================
Files 603 611 +8
Lines 71443 72515 +1072
==========================================
+ Hits 56005 56813 +808
- Misses 15438 15702 +264
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not 100% sure yet on the PR, saw quite a few gaps still on this pass-through. Let's find some time to do a live review in person?
daft/runners/pyrunner.py
Outdated
|
||
future.add_done_callback(lambda _: self._release_resources(resource_request)) | ||
future.add_done_callback(create_resource_release_callback(resources)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand this as to why we need to have this as an inline function.
Could we not do:
def _release_resources_callback(self, resources):
return lambda _: self._resources.release(resources)
...
future.add_done_callback(self._release_resources_callback(resources))
Resolves #2896
Some details about this PR:
PyActorPool
into a specialized classPyStatefulActorSingleton
AcquiredResources
to store the resources used by a task or actor. The runner resources includes not only amount of CPU and memory resources, but the exact GPUs that each task/actor is using, which enables settingCUDA_VISIBLE_DEVICES
in actors.PyRunnerResources
classactor_resource_requests * num_workers
anymore, so the actor pool context now asks for them individually.