Skip to content

Conversation

kevinzwang
Copy link
Member

@kevinzwang kevinzwang commented Oct 5, 2024

Resolves #2896

Some details about this PR:

  • I moved the actor-local singleton out of PyActorPool into a specialized class PyStatefulActorSingleton
  • I changed GPU resources to be accounted on a per-device level. That resulted in creating the data class AcquiredResources to store the resources used by a task or actor. The runner resources includes not only amount of CPU and memory resources, but the exact GPUs that each task/actor is using, which enables setting CUDA_VISIBLE_DEVICES in actors.
  • I also moved resource acquisition and releasing logic into a PyRunnerResources class
  • I added validation that GPU resources must be integers if greater than 1, which means it is no longer accurate to request for actor_resource_requests * num_workers anymore, so the actor pool context now asks for them individually.

@github-actions github-actions bot added the enhancement New feature or request label Oct 5, 2024
Copy link

codspeed-hq bot commented Oct 5, 2024

CodSpeed Performance Report

Merging #3002 will not alter performance

Comparing kevin/stateful-udf-rank (0a30f41) with main (e4c6f3f)

Summary

✅ 17 untouched benchmarks

@kevinzwang kevinzwang requested a review from jaychia October 5, 2024 04:21
@kevinzwang kevinzwang marked this pull request as ready for review October 5, 2024 04:26
@jaychia
Copy link
Contributor

jaychia commented Oct 7, 2024

Took a quick first pass, but I'll probably need to give it a much more thorough review again given how much logic is changing in the PyRunner

@kevinzwang kevinzwang requested a review from jaychia October 7, 2024 23:20
Copy link

codecov bot commented Oct 7, 2024

Codecov Report

Attention: Patch coverage is 80.55556% with 35 lines in your changes missing coverage. Please review.

Project coverage is 78.34%. Comparing base (272163f) to head (0a30f41).
Report is 30 commits behind head on main.

Files with missing lines Patch % Lines
src/common/resource-request/src/lib.rs 70.58% 15 Missing ⚠️
daft/runners/pyrunner.py 89.71% 11 Missing ⚠️
daft/internal/gpu.py 57.14% 9 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3002      +/-   ##
==========================================
- Coverage   78.39%   78.34%   -0.05%     
==========================================
  Files         603      611       +8     
  Lines       71443    72515    +1072     
==========================================
+ Hits        56005    56813     +808     
- Misses      15438    15702     +264     
Files with missing lines Coverage Δ
daft/dependencies.py 58.33% <ø> (+0.64%) ⬆️
...al_optimization/rules/split_actor_pool_projects.rs 95.00% <100.00%> (-0.07%) ⬇️
daft/internal/gpu.py 56.00% <57.14%> (-5.54%) ⬇️
daft/runners/pyrunner.py 85.87% <89.71%> (+0.61%) ⬆️
src/common/resource-request/src/lib.rs 65.06% <70.58%> (-2.03%) ⬇️

... and 106 files with indirect coverage changes

@kevinzwang kevinzwang requested a review from jaychia October 9, 2024 19:47
Copy link
Contributor

@jaychia jaychia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not 100% sure yet on the PR, saw quite a few gaps still on this pass-through. Let's find some time to do a live review in person?

@kevinzwang kevinzwang requested a review from jaychia October 17, 2024 22:47

future.add_done_callback(lambda _: self._release_resources(resource_request))
future.add_done_callback(create_resource_release_callback(resources))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand this as to why we need to have this as an inline function.

Could we not do:

def _release_resources_callback(self, resources):
    return lambda _: self._resources.release(resources)

...

    future.add_done_callback(self._release_resources_callback(resources))

@kevinzwang kevinzwang enabled auto-merge (squash) October 18, 2024 21:22
@kevinzwang kevinzwang merged commit 5795adc into main Oct 18, 2024
40 checks passed
@kevinzwang kevinzwang deleted the kevin/stateful-udf-rank branch October 18, 2024 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ActorPoolProject] Correctly allocate GPUs for each running Actor in the PyRunner
2 participants