Skip to content

Conversation

MortalHappiness
Copy link
Member

@MortalHappiness MortalHappiness commented Mar 11, 2025

Why are these changes needed?

Based on #51172 and #49864 to convert HealthzHead to subprocess dashboard module.

  • Integrate SubprocessModule into Dashboard. Now it can load SubprocessModules.
  • The support is only in non-minimal. For a minimal Ray, no SubprocessModule are loaded, but others minimal DashboardHeadModules can still work.
  • Converted HealthzHead to non-minimal. It does not do anything in minimal anyway.
  • Optimized usage_stats.py to use multi_get to save number of rpcs.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@MortalHappiness MortalHappiness added the go add ONLY when ready to merge, run all tests label Mar 11, 2025
@MortalHappiness MortalHappiness force-pushed the feature/healthzhead-dashboard-subprocess branch from 2dbfedf to 1da9451 Compare March 11, 2025 23:17
@MortalHappiness MortalHappiness changed the title Feature/healthzhead dashboard subprocess [Feat][Core/Dashboard] Add SubprocessModules to the Dashboard routes, and convert HealthzHead Mar 11, 2025
@MortalHappiness MortalHappiness force-pushed the feature/healthzhead-dashboard-subprocess branch 4 times, most recently from 30b0061 to 5836482 Compare March 13, 2025 18:19
@MortalHappiness MortalHappiness marked this pull request as ready for review March 14, 2025 02:10
@MortalHappiness MortalHappiness requested a review from a team March 14, 2025 02:11
@MortalHappiness MortalHappiness force-pushed the feature/healthzhead-dashboard-subprocess branch 3 times, most recently from b2f8ac0 to 5c080c5 Compare March 14, 2025 03:25
@MortalHappiness MortalHappiness marked this pull request as draft March 14, 2025 05:42
@MortalHappiness MortalHappiness force-pushed the feature/healthzhead-dashboard-subprocess branch 2 times, most recently from 28bc5a8 to 52c166e Compare March 14, 2025 09:52
@MortalHappiness
Copy link
Member Author

MortalHappiness commented Mar 15, 2025

Added back _parent_process_death_detection_task in the subprocess because I found it hangs when calling run_string_as_driver in tests.

This reverts commit bbed411.

Signed-off-by: Chi-Sheng Liu <[email protected]>
@MortalHappiness MortalHappiness force-pushed the feature/healthzhead-dashboard-subprocess branch from 73f9222 to 1186504 Compare March 15, 2025 02:47
@MortalHappiness MortalHappiness marked this pull request as ready for review March 15, 2025 03:56
raise OSError(
f"AF_UNIX path length cannot exceed {maxlen} bytes: {result!r}"
)
validate_socket_filepath(result.split("://", 1)[-1])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep .encode("utf-8")?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use non-ASCII chracters for filename? If we only use ASCII characters for filename, then the length of string and the length of the encoded bytes will be the same.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the history, I think it's safer to keep it.

Copy link
Member Author

@MortalHappiness MortalHappiness Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added back .encode("utf-8") inside validate_socket_filepath.

Detect parent process death by checking if ppid is still the same.
"""
while True:
ppid = os.getppid()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return the parent’s process id. When the parent process has exited, on Unix the id returned is the one of the init process (1), on Windows it is still the same id, which may be already reused by another process.

This approach won't work on Windows.

Try multiprocessing.parent_process().is_alive() or pipe.

Copy link
Member Author

@MortalHappiness MortalHappiness Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:param dashboard_head: The DashboardHead instance.
"""
self._config = config
self._parent_process = psutil.Process(parent_process_pid)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if parent is already dead and the pid is reused?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the constructor of the child process. So do you mean that the parent process dies immediately even before the child process constructor has run?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the implementation to use multiprocessing.parent_process().is_alive() because I found that the parent_process() call returns a variable maintained by Python, so it should be able to identify the parent process calling multiprocessing.

https://github.com/python/cpython/blob/a936af924efc6e2fb59e27990dcd905b7819470a/Lib/multiprocessing/process.py#L51-L55


async def _detect_parent_process_death(self):
"""
Detect parent process death by checking if ppid is still the same.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to update the comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@jjyao jjyao merged commit ae7340d into ray-project:master Mar 18, 2025
5 checks passed
ruisearch42 added a commit to ruisearch42/ray that referenced this pull request Mar 19, 2025
… routes, and convert HealthzHead (ray-project#51282)"

This reverts commit ae7340d.

Signed-off-by: Rui Qiao <[email protected]>
jjyao pushed a commit that referenced this pull request Mar 19, 2025
… routes, and convert HealthzHead (#51282)" (#51512)

Signed-off-by: Rui Qiao <[email protected]>
MortalHappiness added a commit to MortalHappiness/ray that referenced this pull request Mar 19, 2025
…d routes, and convert HealthzHead (ray-project#51282)" (ray-project#51512)

This reverts commit 8773682.

Signed-off-by: Chi-Sheng Liu <[email protected]>
jjyao pushed a commit that referenced this pull request Mar 19, 2025
…d routes, and convert HealthzHead (#51282)" (#51512) (#51523)

Signed-off-by: Chi-Sheng Liu <[email protected]>
dhakshin32 pushed a commit to dhakshin32/ray that referenced this pull request Mar 27, 2025
… and convert HealthzHead (ray-project#51282)

Signed-off-by: Chi-Sheng Liu <[email protected]>
Signed-off-by: Dhakshin Suriakannu <[email protected]>
dhakshin32 pushed a commit to dhakshin32/ray that referenced this pull request Mar 27, 2025
… routes, and convert HealthzHead (ray-project#51282)" (ray-project#51512)

Signed-off-by: Rui Qiao <[email protected]>
Signed-off-by: Dhakshin Suriakannu <[email protected]>
dhakshin32 pushed a commit to dhakshin32/ray that referenced this pull request Mar 27, 2025
…d routes, and convert HealthzHead (ray-project#51282)" (ray-project#51512) (ray-project#51523)

Signed-off-by: Chi-Sheng Liu <[email protected]>
Signed-off-by: Dhakshin Suriakannu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-backlog go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants