-
Notifications
You must be signed in to change notification settings - Fork 67
bug: multi-node setup needs unique network names #375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
In the case of using Podman (or a runtime that has a shared CNI directory in the user home) and the case that the runtime generates a cni file for each node network, if you have a shared filesystem and a single, non-unique name, each node will write a slightly different address in the CNI file and clobber any previously written files (race condition). This additional make multi-node command will replace "default network" to be specific to the hostname and avoid this. Signed-off-by: vsoch <[email protected]>
See `make help`. | ||
|
||
If you are running a multi-node setup with a shared filesystem and location for your network CNI files, you will want to create a non-shared location for each node's usernetes code (e.g., `/tmp` is usually not shared) and run this additional command for each of the control-plane and worker nodes before `make up`. It will give the network (and corresponding CNI files) unique names in the shared location, usually in `~/.config/cni`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that container engines support locating CNI files on a shared filesystem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest just mounting a local filesystem on .config/cni
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest just mounting a local filesystem on .config/cni
You mean on the HPC node? On top of NFS, and for every user? That seems overkill for what comes down to a file naming issue.
The solution here does not change functionality for a user that doesn't need this change, but supports multi-node shared filesystem setups for users that need it with an isolated make multi-node
command. If there turns out to be other multi-node functionality that is needed, it could be added to that section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that container engines support locating CNI files on a shared filesystem
In rootless mode, podman puts the cni files in the user's home. To be clear, it isn't shared between users, it is shared between nodes. reference,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as:
I don't think a new Makefile target should be added for this.
docker-compose.yaml
can be modified in vi
or yq
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for an HPC cluster of hundreds or thousands of nodes, you want the user to manually update the file with vim?
You requested changes on the PR - can you please clarify what I can change? It seems more you are rejecting any kind of change for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CNI files aren't expected to be shared between nodes.
If you aren't allowed to mount local filesystems, as a workaround you can just automate updating YAMLs with yq
https://github.com/mikefarah/yq
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CNI files aren't expected to be shared between nodes.
In a rootless environment with Podman, where they are stored in ~/.config in the user's home (that is mounted and shared across compute nodes) it is not just expected, it is guaranteed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are expected/guaranteed to be under the home, but not expected to be under the shared home
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are expected/guaranteed to be under the home, but not expected to be under the shared home
I have never seen an HPC cluster with a user home that is not a filesystem mapped across nodes, and thus shared. It's usually NFS. It's strategically like that so you can login to multiple different clusters an see files, and jobs running across compute nodes can see the same space too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of using Podman (or a runtime that has a shared CNI directory in the user home) and the case that the runtime generates a cni file for each node network, if you have a shared filesystem and a single, non-unique name, each node will write a slightly different address in the CNI file and clobber any previously written files (race condition). This additional make multi-node command will replace "default network" to be specific to the hostname and avoid this.
I renamed the network from
default
todefault_network
so it would be more unique for the sed (default is fairly generic). If the user doesn't run this (and they don't need to for most setups without a shared cni cache) the network will just be calledusernetes_default_network
instead ofusernetes_default
.