-
Notifications
You must be signed in to change notification settings - Fork 378
Add net device feature #3163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add net device feature #3163
Conversation
I have a few questions:
Thank you for your attention to these points. (WIP) In the containerd integration test, I haven’t identified the cause yet, so I’d like to continue investigating it. |
@utam0k |
Yes, but please leave the comment. |
Yes, it is acceptable. |
How about using a system call mock? |
I'll proceed with that approach! BTW, is my understanding correct that this refers to using the pub struct TestHelperSyscall {
id: RefCell<MockId>,
mocks: MockCalls,
}
pub struct MockId {
uid: Uid,
gid: Gid,
euid: Uid,
egid: Gid,
} Furthermore, if we go with this approach, we would need to refactor many parts of the codebase that currently use I'm currently thinking that building a mock like the above would be the right direction. Does this approach sound reasonable to you? ref: 6d2d9a0 |
080cd27
to
6d2d9a0
Compare
Yes.
How large is it? You may use a different PR. |
I'll address this in a separate PR. |
Make sense +1 Thanks. |
I’ve created a separate PR: #3173 . |
ab967f0
to
af229bf
Compare
Signed-off-by: nayuta-ai <[email protected]>
Signed-off-by: nayuta-ai <[email protected]>
Signed-off-by: nayuta-ai <[email protected]>
Signed-off-by: nayuta-ai <[email protected]>
Signed-off-by: nayuta-ai <[email protected]>
Signed-off-by: nayuta-ai <[email protected]>
af229bf
to
431bbda
Compare
I have confirmed that it works locally, and it also succeeded the last time, as shown in the link below. Can we rerun the workflow? I'd appreciate it if you could review it once the job has passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once I've looked overall.
crates/libcontainer/src/network.rs
Outdated
|
||
link_client.set_ns_fd(index, &new_name, netns_file.as_raw_fd())?; | ||
|
||
let thread_handle = std::thread::spawn({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let me know why we need to use threads in this process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 66f8b6e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nayuta-ai I can understand if you do this in the init process, but if you do it in main or intermediate, I don't see the problem. Does it need to be a thread? Can't it be a process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given our previous discussion, it seems safer to perform the setns operations from a separate process rather than a thread, as it doesn't strictly need to be in the main process. Furthermore, since these operations don't require the main process itself, it would be beneficial to port them to the init process. Thank you for the confirmation.
Therefore, my proposed approach, which I believe is a suitable solution to the earlier questions, is as follows:
sequenceDiagram
participant Main Process
participant Intermediate Process
participant Init Process
Intermediate Process->>Main Process: Send "ready for intermediate process" and init PID
Init Process->>Main Process: Send seccomp notify FD
Main Process->>Init Process: Send "seccomp notify done"
Note over Main Process: Setup network device and move<br>interface to target netns
Main Process->>Init Process: Send "ready to setup network device"
Note over Init Process: Setup network device for init process
Init Process->>Main Process: Send "ready"
Seccomp is not directly related to the current implementation; it was included to indicate that the network device setup takes place after seccomp.
If the above understanding is correct, I think it should be feasible, so I'd like to try making the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@utam0k
Please confirm if the above policy is acceptable. If so, there's one difficult implementation point: how to pass Vec<AddressMessage>
to the init process.
The challenge is that AddressMessage
itself is used for Netlink responses and is not serializable, so it cannot be directly added to the Message
Enum.
I've considered two workarounds:
- Keep the data as
bytes
internally within theaddr_client.get_by_index
function, without parsing it intoVec<AddressMessage>
. - Create a new struct, like
SerializableAddress
, and convert the data into a structure that can be described in theMessage
Enum.- The current undecided policy is described in this Draft PR. Please refer to it for our discussion. If it seems fine, I plan to move it to this PR.
- The relevant commit for the discussion is 6a055ec.
Personally, I think creating SerializableAddress
and performing type-safe operations for each process is the better approach, but I'd like your input on how to implement this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm correcting my previous comment, as there was a misunderstanding.
The prerequisite for executing dev_change_net_namespace is the presence of the CAP_NET_ADMIN
capability.
I didn't understand why creating a thread would grant privilege.
The current implementation runs the thread within the main process
, which has root privileges. Since the thread only joins the network namespace internally, it inherits the main process's privileges, allowing it to perform operations with those elevated permissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@utam0k
To meet these requirements, we've identified three potential approaches:
- Continue with the current thread-based implementation.
- Move the thread's operations to an intermediate process (this is feasible because network namespaces are inherited by child processes).
- Perform the operations currently handled by the thread within the init process, specifically before capabilities are set (somewhere in
https://github.com/youki-dev/youki/blob/main/crates/libcontainer/src/process/init/process.rs#L71-L299
).
Personally, I'm leaning towards implementing the operations within the init process. My reasoning is that threads can be unsafe, and this approach helps prevent over-centralizing logic in an intermediate process. What are your thoughts on this? If you agree, I'll proceed with implementing this strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the threads unsafe?
Do you prefer this option?
Perform the operations currently handled by the thread within the init process, specifically before capabilities are set (somewhere in https://github.com/youki-dev/youki/blob/main/crates/libcontainer/src/process/init/process.rs#L71-L299).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the threads unsafe?
Factors like a file descriptor being closed prematurely can introduce thread safety issues, requiring careful handling (e.g., ensuring file descriptors are not invalidated during thread execution).
Do you prefer this option?
Yes, I prefer this one because of its safe handling.
FYI
Thread Approach
- Pros:
- Shared Memory for Data: Threads operate within the same memory space as the main process, allowing for easy sharing of variables (like address lists).
- Simple Lifecycle Management: The processing is consolidated within the main process, leading to a simpler and more centralized lifecycle management.
- Cons:
- Overhead from Thread Creation: There's an overhead associated with creating and tearing down threads, which scales with the number of devices being processed.
- File Descriptor Safety Concerns: Factors like a file descriptor being closed prematurely can introduce thread safety issues, requiring careful handling (e.g., ensuring file descriptors are not invalidated during thread execution).
- Potential Deviation from `youki`'s Design: If `youki` primarily uses processes rather than threads for operations, this approach might not align with its established architectural patterns.
Init Process Approach
- Pros:
- No Thread Creation Overhead: This method avoids the overhead of creating new threads for each operation.
- Alignment with `youki`'s Design: If `youki`'s core design favors process-based operations, this approach would be more consistent with its architectural philosophy.
- Cons:
- Requires IPC for Data Sharing: Sharing data like address lists would necessitate implementing inter-process communication (IPC) mechanisms, such as socket communication.
- Challenges with Large Data Sharing: Handling large address lists or other significant data sizes through IPC can become complex and less efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concerns about using threads are performance and the potential for new security vulnerabilities. I don't think there is any other option besides the init process. Let's give it a try. Thank you for participating in the discussion. I feel like we've moved in a better direction.
crates/libcontainer/src/network.rs
Outdated
let new_name = device | ||
.name() | ||
.as_ref() | ||
.filter(|d| !d.is_empty()) | ||
.map_or(name.clone(), |d| d.to_string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_name has to be validated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delayed response. I've already performed the validation here:
youki/crates/libcontainer/src/utils.rs
Lines 361 to 372 in 60768cd
if let Some(devices) = linux.net_devices() { | |
for (name, net_dev) in devices { | |
if !dev_valid_name(name) { | |
return Err(NetDevicesError::InvalidDeviceName(name.to_string())); | |
} | |
if let Some(dev_name) = net_dev.name() { | |
if !dev_valid_name(dev_name) { | |
return Err(NetDevicesError::InvalidDeviceName(dev_name.to_string())); | |
} | |
} | |
} | |
} |
… added comments on thread usage for safe namespace operations Signed-off-by: nayuta-ai <[email protected]>
…nd update dev_change_net_namespace to use a reference for LinuxNetDevice. Signed-off-by: nayuta-ai <[email protected]>
Signed-off-by: nayuta-ai <[email protected]>
…ation and reliability. Signed-off-by: nayuta-ai <[email protected]>
e02c503
to
4b9c09b
Compare
crates/libcontainer/src/network.rs
Outdated
|
||
link_client.set_ns_fd(index, &new_name, netns_file.as_raw_fd())?; | ||
|
||
let thread_handle = std::thread::spawn({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nayuta-ai I can understand if you do this in the init process, but if you do it in main or intermediate, I don't see the problem. Does it need to be a thread? Can't it be a process?
crates/libcontainer/src/network.rs
Outdated
// Set the interface link state to DOWN before modifying attributes like namespace or name. | ||
// This prevents potential conflicts or disruptions on the host network during the transition, | ||
// particularly if other host components depend on this specific interface or its properties. | ||
link_client.set_down(index)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it fails under this line, does it leave the interface down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the interface remains in the down state. This behavior is based on an implementation aligned with how runc
handles it.
ref: opencontainers/runc#4538
To avoid leaving the interface down in cases where the operations before moving the interface into the network namespace (specifically addr_client.get_by_index
and set_ns_fd
) fail, it seems reasonable to add logic that brings the interface back up when such failures occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aojea 👋 Hi, we are implementing nevdev feature. I have a question about it. What if a failure occurs after the link goes down? Should we recover or ignore it?
Probably, we should define this behavior on OCI Runtime Spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is a good question, the main problem with dealing with interfaces is that they impact the entire namespace,so bringing an interface up that has changed or modified any network parameters can be dangerous as it impacts the entire host networking in this case
My opinion in anything networking related is that failing close/safe is always the best option, so leaving it down you ensure that there is no risk associated with the operation despite it failed ... if something in the host was already depending on this interface being up it will have the same impact that the operation succeeded than not succeed but leaving it down
…inct files, update error handling in LibcontainerError, and adjust imports for dev_change_net_namespace. This enhances code organization and clarity in network operations. Signed-off-by: nayuta-ai <[email protected]>
…onality by removing the netlink module, introducing a new client structure, and enhancing the wrapper for better abstraction. This improves code organization and facilitates testing. Signed-off-by: nayuta-ai <[email protected]>
…date imports in builder_impl.rs, and enhance AddressMessageHandler with target index functionality. Signed-off-by: nayuta-ai <[email protected]>
…k client tests for improved reliability and isolation. Update FakeNetlinkClient to handle success and error responses more effectively, ensuring better test coverage and accuracy. Signed-off-by: nayuta-ai <[email protected]>
…handling in network namespaces. Refactor dev_change_net_namespace to utilize this new function, enhancing code clarity and testability. Add tests to verify address processing and filtering based on scope and permanence. Signed-off-by: nayuta-ai <[email protected]>
@utam0k |
Signed-off-by: nayuta-ai <[email protected]>
Signed-off-by: nayuta-ai <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the overall design is good, but please give me a little more time to review the details.
name: String, | ||
netns_path: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
&str is enough for these vars. Others are the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a quick look at everything except the client.
crates/libcontainer/src/utils.rs
Outdated
if linux | ||
.namespaces() | ||
.as_ref() | ||
.unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this always safety?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix: bb31374
crates/libcontainer/src/utils.rs
Outdated
} | ||
if let Some(dev_name) = net_dev.name() { | ||
if !dev_valid_name(dev_name) { | ||
return Err(NetDevicesError::InvalidDeviceName(dev_name.to_string())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Others too.
return Err(NetDevicesError::InvalidDeviceName(dev_name.to_string())); | |
return Err(NetDevicesError::InvalidDeviceName(dev_name.into())); |
crates/libcontainer/src/utils.rs
Outdated
if let Some(devices) = linux.net_devices() { | ||
for (name, net_dev) in devices { | ||
if !dev_valid_name(name) { | ||
return Err(NetDevicesError::InvalidDeviceName(name.to_string())); | ||
} | ||
if let Some(dev_name) = net_dev.name() { | ||
if !dev_valid_name(dev_name) { | ||
return Err(NetDevicesError::InvalidDeviceName(dev_name.to_string())); | ||
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if let Some(devices) = linux.net_devices() { | |
for (name, net_dev) in devices { | |
if !dev_valid_name(name) { | |
return Err(NetDevicesError::InvalidDeviceName(name.to_string())); | |
} | |
if let Some(dev_name) = net_dev.name() { | |
if !dev_valid_name(dev_name) { | |
return Err(NetDevicesError::InvalidDeviceName(dev_name.to_string())); | |
} | |
} | |
} | |
} | |
if let Some(devices) = linux.net_devices() { | |
devices.iter().try_for_each(|(name, net_dev)| { | |
if !dev_valid_name(name) { | |
return Err(NetDevicesError::InvalidDeviceName(name.to_string())); | |
} | |
if let Some(dev_name) = net_dev.name() { | |
if !dev_valid_name(dev_name) { | |
return Err(NetDevicesError::InvalidDeviceName(dev_name.to_string())); | |
} | |
} | |
Ok(()) | |
})?; | |
} |
@@ -276,6 +277,16 @@ pub fn container_init_process( | |||
InitProcessError::SyscallOther(err) | |||
})?; | |||
|
|||
// Setup some operations in the network namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make it clearly. What is some operations
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
init_sender: &mut channel::InitSender, | ||
) -> Result<()> { | ||
let mut addrs_map = HashMap::new(); | ||
// host network pods does not move network devices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does pods
mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix: 6de834c
|
||
// get the namespace defined by the config and fall back | ||
// to the one created by youki to run the container process. | ||
let fallback_ns_path = PathBuf::from(format!("/proc/{}/ns/net", init_pid.as_raw())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is fallback
? I guess it may be default
...?
let ns_path = namespaces | ||
.iter() | ||
.find_map(|ns| { | ||
if ns.typ() == LinuxNamespaceType::Network { | ||
ns.path().as_deref() | ||
} else { | ||
None | ||
} | ||
}) | ||
.unwrap_or_else(|| &fallback_ns_path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should find ns_path
before L250, then return Ok if not found it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix: b9d6233
@@ -229,6 +237,62 @@ fn setup_mapping(config: &UserNamespaceConfig, pid: Pid) -> Result<()> { | |||
Ok(()) | |||
} | |||
|
|||
/// setup_network_device sets up and initializes any defined network interface inside the container. | |||
fn setup_network_device( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please be more specific? As far as I know, this function is responsible for moving network devices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ref: #3163 (comment)
if let Some(devices) = linux.net_devices() { | ||
main_receiver.wait_for_network_setup_ready()?; | ||
for (name, net_dev) in devices { | ||
let addrs = dev_change_net_namespace( | ||
name.to_string(), | ||
ns_path.to_string_lossy().to_string(), | ||
net_dev, | ||
) | ||
.map_err(|err| { | ||
tracing::error!("failed to dev_change_net_namespace: {}", err); | ||
err | ||
})?; | ||
addrs_map.insert(name.clone(), addrs); | ||
} | ||
init_sender.move_network_device(addrs_map)?; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this?
if let Some(devices) = linux.net_devices() { | |
main_receiver.wait_for_network_setup_ready()?; | |
for (name, net_dev) in devices { | |
let addrs = dev_change_net_namespace( | |
name.to_string(), | |
ns_path.to_string_lossy().to_string(), | |
net_dev, | |
) | |
.map_err(|err| { | |
tracing::error!("failed to dev_change_net_namespace: {}", err); | |
err | |
})?; | |
addrs_map.insert(name.clone(), addrs); | |
} | |
init_sender.move_network_device(addrs_map)?; | |
} | |
} | |
let addrs_map = devices | |
.iter() | |
.map(|(name, net_dev)| { | |
let addrs = dev_change_net_namespace( | |
name.to_string(), | |
ns_path.to_string_lossy().to_string(), | |
net_dev, | |
) | |
.map_err(|err| { | |
tracing::error!("failed to dev_change_net_namespace: {}", err); | |
err | |
})?; | |
Ok((name.clone(), addrs)) | |
}) | |
.collect::<Result<HashMap<String, Vec<SerializableAddress>>>>()?; | |
init_sender.move_network_device(addrs_map)?; | |
/// setup_network_device sets up a network device in a new namespace. | ||
/// It moves the device to the new namespace and adds the IP addresses to the device. | ||
/// It also sets the device up. | ||
pub fn setup_network_device( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has two roles.
- Assign an address.
- Link up.
Can these parts be separated into two functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation wasn't complex enough to warrant a separate function, so I called it from within configure_container_network_devices
.
6de834c
…pose of setup_network_device and SerializableAddress structs, improving code readability and maintainability. Signed-off-by: nayuta-ai <[email protected]>
…ed strings, improving performance and memory efficiency. Update related calls in the main and init processes for consistency. Signed-off-by: nayuta-ai <[email protected]>
…ions. Signed-off-by: nayuta-ai <[email protected]>
…_device, and fix clippy lints Signed-off-by: nayuta-ai <[email protected]>
…ace retrieval and improve error handling in move_network_devices_to_container function. Signed-off-by: nayuta-ai <[email protected]>
5ef9c7a
to
b9d6233
Compare
Description
Add the function to move the network device to a network namespace and the corresponding test.
Type of Change
Testing
youki spec
and fix the netns and netDevice configurations as belowRelated Issues
Fixes #3136
Additional Context
Please review the placement of the
setup_network_device
function, whether the asynchronous logic is implemented correctly, and whether the test case is sufficient.