-
Notifications
You must be signed in to change notification settings - Fork 22
Update Group Construct entry #520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Please use emoji reactions ON THIS COMMENT to indicate your position on this proposal.
Here are the meanings for the emojis:
|
This is proposed for provisional inclusion in the next release - not sure if that is expected to be a v5.1 or v6.0. |
@sonjahapp Does this look okay to you? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rhc54 for this effort!! 🙏
I left a few questions and comments that hopefully help to clarify some aspects of the bootstrap method.
\declareAttributeProvisional{PMIX_GROUP_LOCAL_CID}{"pmix.grp.lclid"}{size_t}{ | ||
Local context ID for the participating process member of a group. Provided | ||
in the \refstruct{pmix_info_t} directives when the process calls construct. | ||
These values are to be returned to all participants with the collected data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are these values supposed to be returned with the collected data, i.e., what is the name of the attribute for the local group ctx id of a specific member process in the collected data? As I understand, each group member process might add its own PMIX_GROUP_LOCAL_CID
in the info object at construction time, so there should be a uniform way to differentiate the local cids per member process in the collected data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is - the values are treated by PMIx as "qualified" values. In other words, each value is paired with another value specifying the proc that provided it:
#define PMIX_QUALIFIED_VALUE "pmix.qual.val"
// (pmix_data_array_t*) Value being provided consists
// of the primary key-value pair in first position, followed
// by one or more key-value qualifiers to be used when
//subsequently retrieving the primary value
PMIx then stores the qualified value, and the user can retrieve it via a PMIx_Get
for PMIX_GROUP_LOCAL_CID
for that proc.
Note that PMIx handles all this "under the covers" - the participants just include their local CID, and we handle all the plumbing that deals with qualified values (i.e., where multiple participants each provide their own value for the same key). This is why I didn't include all that in the Standard as the user never sees it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation Ralph! I see why this is an implementation specific aspect that does not have to be in the standard. Anyway, for practical use, the explanation is very useful!
Chap_API_Sets_Groups.tex
Outdated
participants don't know the entire list but at least | ||
know how may leaders will be involved. Leaders provide only their | ||
own process ID in the \code{procs} parameter to the construct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: may
--> many
I wonder if PMIx Process Groups can actually have multiple leaders. In the current standard it reads as if there should be always only one (elected) group leader at most (or none). Here it says there can be multiple leaders for the bootstrap method. I guess I understand what you mean here, but perhaps "leaders" is not a good term to use? Perhaps just calling them the "group constructing processes" would be cleaner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should probably update the rest of the language. In the rest of the material, we do speak of a "leader" that, for example, gets notified if a participant fails prior to completing construct. In the bootstrap case, all leaders would receive that notification. So it really is meant to indicate that we do have multiple leaders in the bootstrap instance.
Reason: in the fully described case, the "leader" knows all participants and can therefore tell them (via notification if nothing else) that the group has a problem. In the bootstrap case, there is no one proc that has that knowledge - each leader knows of procs that the other leaders don't know. So they truly have to operate as "leaders" as defined in the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated text is much clearer with respect to group leaders and their role during group construction. Thanks for updating it!
Chap_API_Sets_Groups.tex
Outdated
API, and are \textit{required} to include the \refattr{PMIX_GROUP_BOOTSTRAP} | ||
attribute in their array of \refstruct{pmix_info_t} directives, with | ||
the value in that attribute set to equal the number | ||
of leaders in the group construct operation. They may also provide the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions for understanding:
- If there are N "group constructing processes" then all of them have to put
PMIX_GROUP_BOOTSTRAP=N
as attribute to successfully construct the group. Is this correct? - In addition, each of the N "group constructing processes" may use a
PMIX_GROUP_ADD_MEMBERS
attribute to specify concrete process IDs that have to be member of the final group for successful construction. Can thosePMIX_GROUP_ADD_MEMBERS
be part of the N "group constructing processes" or are these different processes? - How does the value of
PMIX_GROUP_BOOTSTRAP
relate toPMIX_GROUP_ADD_MEMBERS
? Does the number of "group constructing processes" and the overall number of "added members" have to add up to what is provided asPMIX_GROUP_BOOTSTRAP
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions for understanding:
If there are N "group constructing processes" then all of
them have to putPMIX_GROUP_BOOTSTRAP=N
as attribute to
successfully construct the group. Is this correct?
Correct
In addition, each of the N "group constructing processes" may use a
PMIX_GROUP_ADD_MEMBERS
attribute to specify concrete process IDs that have to be member of the final group for successful construction. Can thosePMIX_GROUP_ADD_MEMBERS
be part of the N "group constructing processes" or are these different processes?
Different processes - these are procs that a particular "group constructing process" knows about but that the other constructors might not know. The runtime has to count the unique procs (since constructors might have duplicate entries) so it knows when the construction is complete.
How does the value of
PMIX_GROUP_BOOTSTRAP
relate toPMIX_GROUP_ADD_MEMBERS
? Does the number of "group constructing processes" and the overall number of "added members" have to add up to what is provided asPMIX_GROUP_BOOTSTRAP
?
No, the PMIX_GROUP_BOOTSTRAP
number only indicates the number of constructors - it does not include the number of add members. This is why the runtime has to count the total number of unique add members plus constructors when determining that the construct operation has completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your responses! All makes sense - in particular with the updated text. 👍
\declareAttributeProvisional{PMIX_GROUP_ADD_MEMBERS}{"pmix.grp.add"}{pmix_data_array_t*}{ | ||
Array of \refstruct{pmix_proc_t} identifying procs that are not | ||
included in the membership specified in the \code{procs} array passed to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the order of processes in this array matter somehow? Or this this array treated as a list and the position of each process within the array is meaningless?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Order does not matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the clarification! I think it is good that the text now says that the order is not significant.
Chap_API_Sets_Groups.tex
Outdated
\refattr{PMIX_GROUP_BOOTSTRAP} attribute in their array of \refstruct{pmix_info_t} | ||
directives, with the value in that attribute set to equal the number | ||
of leaders in the group construct operation. Each leader may optionally provide the | ||
\refattr{PMIX_GROUP_ADD_MEMBERS} attribute with an array of process IDs that are to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the following an intended use case for the PMIX_GROUP_ADD_MEMBERS
attribute?
There is a "group constructing process" A that knows about some other process B which has to become a member of the group. Process A could be the parent of the spawned process B, for example. However, process B cannot act as "group constructing process" itself because it does -- for some reason -- only know the name of the group but B does not know the number of "group constructing processes" that it should pass via PMIX_GROUP_BOOTSTRAP
. Hence process A adds B to the group via the PMIX_GROUP_ADD_MEMBERS
attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - in fact, that is similar to the use-case that stimulated this new behavior. The situation that sparked this was one where two sets of procs want to join into a group, but neither side knows the number of procs in the other group nor their proc IDs. So one proc in each group acts as the "constructor", passing a PMIX_GROUP_BOOTSTRAP
value of 2 plus a PMIX_GROUP_ADD_MEMBERS
attribute listing the rest of the members of its group. This is sufficient to allow the group to be constructed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I see. If this use case was the motivation for the new bootstrap method would it make sense to add another code example that uses the bootstrap method for group construction in combination with a PMIx_Spawn
?
Update the PMIx_Group_construct entry to include a few missing attributes: * PMIX_GROUP_LOCAL_CID - allow a participant to provide a local context ID to be shared with other participants * PMIX_GROUP_BOOTSTRAP - indicate that the "bootstrap" method of group construction is to be used * PMIX_GROUP_ADD_MEMBERS - specify the additional members that are to be included in the final group * PMIX_GROUP_JOB_INFO - used in the data returned at the end of the group construct operation. Includes all job-level info for participants. Extend the PMIx_Group_construct description to explain the new construct method and how it is used. Signed-off-by: Ralph Castain <[email protected]>
@sonjahapp I have added explanation to the group section to address your questions - please take a look and see if this resolves things for you. |
@sonjahapp I think this is stuck waiting for your comments on the updates. Can you please take a look? |
Note from 25Q3 meeting, will allow more time for reviews/comments and bring for vote at next quarterly (25Q4). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long response time.
I appreciate the text update for the group construct methods and in particular the clarification about the role of the group leader(s). Thanks a lot!
the bootstrap method is a somewhat more dynamic form of the construct operation | ||
that assumes each leader only knows the number of group leaders, | ||
but does not know their process IDs. This is commonly the case when | ||
two or mote collections of processes wish to join together (e.g., in an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: mote
--> more
\declareAttributeProvisional{PMIX_GROUP_LOCAL_CID}{"pmix.grp.lclid"}{size_t}{ | ||
Local context ID for the participating process member of a group. Provided | ||
in the \refstruct{pmix_info_t} directives when the process calls construct. | ||
These values are to be returned to all participants with the collected data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation Ralph! I see why this is an implementation specific aspect that does not have to be in the standard. Anyway, for practical use, the explanation is very useful!
\declareAttributeProvisional{PMIX_GROUP_ADD_MEMBERS}{"pmix.grp.add"}{pmix_data_array_t*}{ | ||
Array of \refstruct{pmix_proc_t} identifying procs that are not | ||
included in the membership specified in the \code{procs} array passed to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the clarification! I think it is good that the text now says that the order is not significant.
Chap_API_Sets_Groups.tex
Outdated
participants don't know the entire list but at least | ||
know how may leaders will be involved. Leaders provide only their | ||
own process ID in the \code{procs} parameter to the construct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated text is much clearer with respect to group leaders and their role during group construction. Thanks for updating it!
Chap_API_Sets_Groups.tex
Outdated
API, and are \textit{required} to include the \refattr{PMIX_GROUP_BOOTSTRAP} | ||
attribute in their array of \refstruct{pmix_info_t} directives, with | ||
the value in that attribute set to equal the number | ||
of leaders in the group construct operation. They may also provide the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your responses! All makes sense - in particular with the updated text. 👍
Chap_API_Sets_Groups.tex
Outdated
\refattr{PMIX_GROUP_BOOTSTRAP} attribute in their array of \refstruct{pmix_info_t} | ||
directives, with the value in that attribute set to equal the number | ||
of leaders in the group construct operation. Each leader may optionally provide the | ||
\refattr{PMIX_GROUP_ADD_MEMBERS} attribute with an array of process IDs that are to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I see. If this use case was the motivation for the new bootstrap method would it make sense to add another code example that uses the bootstrap method for group construction in combination with a PMIx_Spawn
?
Update the PMIx_Group_construct entry to include a few missing attributes:
PMIX_GROUP_LOCAL_CID - allow a participant to provide a local context ID to be shared with other participants
PMIX_GROUP_BOOTSTRAP - indicate that the "bootstrap" method of group construction is to be used
PMIX_GROUP_ADD_MEMBERS - specify the additional members that are to be included in the final group
PMIX_GROUP_JOB_INFO - used in the data returned at the end of the group construct operation. Includes all job-level info for participants.
Extend the PMIx_Group_construct description to explain the new construct method and how it is used.