Skip to content

Provide access to resource usage for processes and nodes #521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rhc54
Copy link
Member

@rhc54 rhc54 commented Jun 7, 2025

Operating systems typically maintain a running measure of resource
utilization by active processes. This includes metrics on CPU
utilization, disk accesses, memory size, and network activity.
Define a set of attributes by which these these metrics can be
requested and returned.

Attributes are used as a means of providing for later extension
to include a broader range of metrics.

Replaces #335

@rhc54
Copy link
Member Author

rhc54 commented Jun 7, 2025

Please use emoji reactions ON THIS COMMENT to indicate your position on this proposal.

  • You do not need to vote on every proposal
  • If you have no opinion, don't vote - that is also useful data
  • If you've already commented on this issue, please still vote so
    we know your current thoughts
  • Not all proposals solve exactly the same problem, so we may end
    up accepting proposals that appear to have some overlap
    This is not a binding majority-rule vote, but it will be a very
    significant input into the corresponding ASC decision.

Here are the meanings for the emojis:

  • Hooray or Rocket: I support this so strongly that I
    want to be an advocate for it
  • Heart: I think this is an ideal solution
  • Thumbs up: I'd be happy with this solution
  • Confused: I'd rather we not do this, but I can tolerate it
  • Thumbs down: I'd be actively unhappy, and may even consider
    other technologies instead
    If you want to explain in more detail, feel free to add another
    comment, but please also vote on this comment.

@rhc54 rhc54 self-assigned this Jun 7, 2025
@rhc54
Copy link
Member Author

rhc54 commented Jun 7, 2025

This PR replaces the referenced one, which was woefully stale. The doc has been reorganized and heavily modified since the original proposed change. This has been updated and organized to fit within the current doc, and to address the questions that remained on the prior PR.

@rhc54
Copy link
Member Author

rhc54 commented Jun 7, 2025

@HawkmoonEternal Does this look okay to you?

@HawkmoonEternal
Copy link

This looks great!

I could imagine that extensions to the list of sampled stats (e.g., for power and energy measurements) might be necessary in the future. Adding additional struct members later should be straightforward.

@rhc54
Copy link
Member Author

rhc54 commented Jun 8, 2025

Hmmm...extending the current structs would actually require renaming them to avoid conflicts with prior implementations. I can see two alternatives:

  • eliminate the struct definitions and replace them with attributes. So we would have a PMIX_PROC_STATS attribute that is associated with a pmix_pointer_array_t of pmix_info_t values containing things like PMIX_HOSTNAME for the node the proc is on, PMIX_PROC_PSS for the pss value, etc. This would maximize flexibility as we could add whatever we want down the road.
  • retain the current struct definitions and extend them with a pmix_info_t array. So the structs would contain the standard Linux OS entries, but would have an array we could use to add anything else.

The second is less cumbersome if all you want is the OS values, but feels somewhat odd as it implies OS values should be treated differently.

The first is aesthetically nicer, but means defining a bunch of attributes - not a big deal, just looks like a bigger change - and the struct winds up using more memory due to all those string keys.

Anyone have any thoughts?

@rhc54
Copy link
Member Author

rhc54 commented Jun 16, 2025

@HawkmoonEternal I opted to go with the attribute-based approach to accommodate later extensions without having to rename/deprecate structures and their associated utility functions. Seemed like the more forward-looking approach. Please see what you think.

Operating systems typically maintain a running measure of resource
utilization by active processes. This includes metrics on CPU
utilization, disk accesses, memory size, and network activity.
Define a set of attributes by which these these metrics can be
requested and returned.

Attributes are used as a means of providing for later extension
to include a broader range of metrics.

Signed-off-by: Ralph Castain <[email protected]>
@naughtont3
Copy link
Contributor

Note from 25Q3 meeting, will allow more time for reviews/comments and bring for vote at next quarterly (25Q4).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants