-
Notifications
You must be signed in to change notification settings - Fork 419
Support k8s.v1.cni.cncf.io/networks
annotation removal or reset to blank to clean up Pod's secondary interfaces
#7119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support k8s.v1.cni.cncf.io/networks
annotation removal or reset to blank to clean up Pod's secondary interfaces
#7119
Conversation
69d1de2
to
33e9c45
Compare
/test-sriov-secondary-network-e2e |
8b49a8d
to
4153418
Compare
Can one of the admins verify this patch? |
/test-sriov-secondary-network-e2e |
f268415
to
91350bf
Compare
/test-sriov-secondary-network-e2e |
91350bf
to
570bde4
Compare
@wenqiq please update the description about what's kind of update events have been handled in this PR. |
pkg/agent/cniserver/secondary.go
Outdated
func rebindVFToHost(pciAddr string) error { | ||
// echo 0000:00:04.0 > /sys/bus/pci/devices/0000:00:04.0/driver/unbind | ||
unbindPath := filepath.Join("/sys/bus/pci/devices", pciAddr, "driver/unbind") | ||
if err := os.WriteFile(unbindPath, []byte(pciAddr), 0200); err != nil { | ||
return fmt.Errorf("unbind driver failed: %v", err) | ||
} | ||
|
||
bindPath := filepath.Join("/sys/bus/pci/drivers", vfDriver, "bind") | ||
// echo 0000:00:04.0 > /sys/bus/pci/drivers/ixgbevf/bind | ||
if err := os.WriteFile(bindPath, []byte(pciAddr), 0200); err != nil { | ||
if recoverErr := recoverVFDriver(pciAddr); recoverErr != nil { | ||
klog.Errorf("Critical: Failed to recover VF driver: %v", recoverErr) | ||
} | ||
return fmt.Errorf("bind to driver %s failed: %v", vfDriver, err) | ||
} | ||
|
||
klog.Infof("Successfully rebound %s to %s", pciAddr, vfDriver) | ||
return nil | ||
} | ||
|
||
func recoverVFDriver(pciAddr string) error { | ||
bindPath := filepath.Join("/sys/bus/pci/drivers/vfio-pci/bind") | ||
return os.WriteFile(bindPath, []byte(pciAddr), 0200) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe rebound is done by sriov-network-device-plugin, why do we need this? can you explain in the comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tnqn if the sriov devices rebound is managed by sriov-network-device-plugin, do we have a possible way to do the annotation update for sriov interfaces? I am not sure if there is a way to trigger cni cmdDel when the annotation is updated but no Pod deletion event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sriov-network-device-plugin
should only take effect during Pod deletion. In this case, merely updating the Pod's annotations did not trigger the cmdDel
operation for Pod cleanup. To remove the interface, we must manually unbind the VFDriver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @tnqn @antoninbas @jianjuns Could you help take a look at this problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without having a great understanding of the issue, this is my hot take:
- we should only "undo" what we did ourselves when removing an interface. I believe both binding and unbind are out-of-scope of both Antrea and
sriov-network-device-plugin
. If we don't bind tovfio-pci
in the first place, why should we be repsonsible for unbinding? - my understanding is that
sriov-network-device-plugin
is acting based on the resource requirements (resources
field) in the Pod container spec. These are out-of-scope of Antrea. Even if the annotation is updated to "remove" an SR-IOV secondary network interface, the resource is still assigned to the Pod. I should also be able to undo my change to the annotation (add back the interface) sucessfully. - isn't
ixgbevf
specific to Intel devices?
More generally, I am not sure what problem we are trying to solve with this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @wenqiq any code update regarding the annotation update issue? I believe we don't need to do any specific SR-IOV device operation. As @antoninbas suggested, we should "undo" what we did.
@antoninbas do you think we should support deletion only (when an annotation is deleted, delete the corresponding interface and release the IP)? I am thinking NAD annotation contains lots of different fields including NAD name, interface name, mac address etc. I feel it would be hard to do a comprehensive support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luolanzone I commented about this earlier in #7065 (comment)
I don't think it is realistic for now to handle updates to the spec.config
of NetworkAttachmentDefinition
.
Edit: I see now that your question is specifically about the Pod annotation. It would be good if we could support dynamically adding / removing Pod interfaces by updating the annotation. However, we can prevent mutation of a given interface config (users should delete then recreate). Let me know if this is a good middle ground.
08c9e56
to
bee7d62
Compare
for _, inf := range storedInterfaces { | ||
exists := false | ||
for _, network := range networkList { | ||
if inf.IFDev == network.InterfaceRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct way to check stale interfaces. Considering a case that the interface name might not be changed when the other fields are updated in one annotation, e.g: NAD name.
I would consider @antoninbas 's suggestion to disallow fields update #7119 (comment), but only supporting deletion and add back.
cc @tnqn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is InterfaceRequest
is optional. If it is not set for an existing interface, then the interface will always be deleted, even there is no change of interface configuration.
Guess we need to be smarter to correlate a network in the annotation to an existing interface; when InterfaceRequest is not present, we need to check the network name and even configuration?
for _, inf := range storedInterfaces { | ||
exists := false | ||
for _, network := range networkList { | ||
if inf.IFDev == network.InterfaceRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is InterfaceRequest
is optional. If it is not set for an existing interface, then the interface will always be deleted, even there is no change of interface configuration.
Guess we need to be smarter to correlate a network in the annotation to an existing interface; when InterfaceRequest is not present, we need to check the network name and even configuration?
a243a9b
to
cb0cb72
Compare
/test-sriov-secondary-network-e2e |
@wenqiq please resolve code conflicts. |
5f84c5e
to
d92ca8a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/test-sriov-secondary-network-e2e |
@wenqiq please update the title to ensure it's precise. |
k8s.v1.cni.cncf.io/networks
annotation removal or reset to blank to clean up Pod's secondary interfaces
// So when networkList is empty, it indicates that existing interfaces should be deleted | ||
// Otherwise, it means a network is being added from an empty annotation, and no deletion is needed | ||
// TODO: We can enhance this function to support more complex annotation operations. | ||
func (pc *PodController) filterStaleInterfaces(networkList []*netdefv1.NetworkSelectionElement, storedInterfaces []*interfacestore.InterfaceConfig) (staleInfs []*interfacestore.InterfaceConfig) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it cause any issues in handling it this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading the code again, I may get what you intended to do (still not very sure). But do you think better to go a more straightforward way, e.g.:
if len(storedSecondaryInterfaces) > 0 {
if len(networkList) > 0 {
// We do not support secondary network update at the moment. Return as long as one
// secondary interface has been created for the Pod.
klog.V(1).InfoS("Secondary network already configured on this Pod. Changes to secondary network configuration are not supported, skipping update",
"Pod", klog.KObj(pod))
return nil
}
if err := pc.removeInterfaces(pc.filterStaleInterfaces(networkList, storedSecondaryInterfaces)); err != nil {
return err
}
}
var netStatus []netdefv1.NetworkStatus
if len(networkList) > 0 {
var err error
netStatus, err = pc.configurePodSecondaryNetwork(pod, networkList, podCNIInfo)
if err != nil {
return err
}
}
return pc.updatePodNetworkStatusAnnotation(netStatus, pod)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
} | ||
|
||
return pc.configurePodSecondaryNetwork(pod, networklist, podCNIInfo) | ||
return pc.configurePodSecondaryNetwork(pod, networkList, podCNIInfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel better to move the annotation update code to handleAddUpdatePod
, so it is clearer when we need to update annotation.
f6f5d4f
to
ca0f56e
Compare
177a2c5
to
1979ab6
Compare
} | ||
|
||
// Update the Pod's network status annotation | ||
if err := netdefutils.SetNetworkStatus(pc.kubeClient, pod, netStatus); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When no secondary network is configured, should we remove the networkStatus annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
// So when networkList is empty, it indicates that existing interfaces should be deleted | ||
// Otherwise, it means a network is being added from an empty annotation, and no deletion is needed | ||
// TODO: We can enhance this function to support more complex annotation operations. | ||
func (pc *PodController) filterStaleInterfaces(networkList []*netdefv1.NetworkSelectionElement, storedInterfaces []*interfacestore.InterfaceConfig) (staleInfs []*interfacestore.InterfaceConfig) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading the code again, I may get what you intended to do (still not very sure). But do you think better to go a more straightforward way, e.g.:
if len(storedSecondaryInterfaces) > 0 {
if len(networkList) > 0 {
// We do not support secondary network update at the moment. Return as long as one
// secondary interface has been created for the Pod.
klog.V(1).InfoS("Secondary network already configured on this Pod. Changes to secondary network configuration are not supported, skipping update",
"Pod", klog.KObj(pod))
return nil
}
if err := pc.removeInterfaces(pc.filterStaleInterfaces(networkList, storedSecondaryInterfaces)); err != nil {
return err
}
}
var netStatus []netdefv1.NetworkStatus
if len(networkList) > 0 {
var err error
netStatus, err = pc.configurePodSecondaryNetwork(pod, networkList, podCNIInfo)
if err != nil {
return err
}
}
return pc.updatePodNetworkStatusAnnotation(netStatus, pod)
} | ||
|
||
return pc.configurePodSecondaryNetwork(pod, networklist, podCNIInfo) | ||
return pc.updatePodNetworkStatusAnnotation(netStatus, pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning an error seems not help. Probably just ignore it. Maybe good to add comments to explain why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
1979ab6
to
4abf224
Compare
Signed-off-by: Wenqi Qiu <[email protected]>
4abf224
to
f201839
Compare
/test-sriov-secondary-network-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nits.
} | ||
|
||
var netStatus []netdefv1.NetworkStatus | ||
var err error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: err
can be moved into the if len(networkList) > 0 {
block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
var ( | ||
netdefutilsSetNetworkStatus = netdefutils.SetNetworkStatus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably just name the func variable: setNetworkStatus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Wenqi Qiu <[email protected]>
if netStatus != nil { | ||
// Intentionally ignore errors from updating the Pod's network status annotation here. | ||
// Failure to update the annotation does not affect the actual network setup for the Pod. | ||
// The annotation is mainly used for status reporting and is not critical for Pod networking functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it is also used for restoring SR-IOV interface information after agent restarts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you give a more accurate comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The annotation is mainly used for status reporting, as well as restoring SR-IOV interface information after agent restarts."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. I think error handling is omitted here primarily due to configurePodSecondaryNetwork
not being idempotent? However, combined with the retry mechanism in the SetNetworkStatus
, the likelihood of updatePodNetworkStatusAnnotation
returning errors is low, and the impact of ignoring them is negligible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we do not handle secondary network changes and do not check annotation should be updated either.
I think there can still be failures due to network failure, etc.
Signed-off-by: Wenqi Qiu <[email protected]>
/test-all |
@luolanzone @wenqiq : next time please trim the commit description to be <= 72 chars, and also update the commit message before merging. The commit was merged with the following commit message:
It does not match the change. |
Got it. Will pay attention to editing each commit message before pushing. |
This PR introduces support for adding or deleting secondary network configurations for Pods by dynamically reconciling the
k8s.v1.cni.cncf.io/networks
annotation. Users can now add new secondary network configurations or remove existing ones during a Pod's lifecycle, and the changes will be applied without requiring Pod recreation.Key Changes:
k8s.v1.cni.cncf.io/networks
annotation to a Pod triggers creation of all specified secondary network interfacesImplementation Details:
TestDone:
k8s.v1.cni.cncf.io/networks
annotation addition → interface creationk8s.v1.cni.cncf.io/networks
annotation deletion → interface removalNotes:
Resolves #7065
Depends on #7116