Skip to content

Conversation

tico88612
Copy link
Member

What type of PR is this?

/kind feature

What this PR does / why we need it:

Add Rocky Linux 10 support

Which issue(s) this PR fixes:

Fixes #12253

Special notes for your reviewer:

Add image first.

Does this PR introduce a user-facing change?:

Add RockyLinux 10 support

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 29, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tico88612

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 29, 2025
@k8s-ci-robot k8s-ci-robot requested review from ant31 and yankay June 29, 2025 11:47
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 29, 2025
@tico88612 tico88612 force-pushed the feat/rocky-10-support branch 3 times, most recently from 49e73a1 to baa5fa5 Compare June 30, 2025 01:04
@yankay
Copy link
Member

yankay commented Jul 7, 2025

@VannTen
Copy link
Contributor

VannTen commented Jul 7, 2025

/retest-failed

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 7, 2025
@tico88612 tico88612 force-pushed the feat/rocky-10-support branch from baa5fa5 to 60d619a Compare July 7, 2025 14:42
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 7, 2025
@tico88612 tico88612 force-pushed the feat/rocky-10-support branch 2 times, most recently from 15bdcc1 to 8aa01a1 Compare July 16, 2025 00:26
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 16, 2025
@tico88612
Copy link
Member Author

/retest

@tico88612 tico88612 force-pushed the feat/rocky-10-support branch 3 times, most recently from d0b18fc to d2029c1 Compare July 20, 2025 05:58
@tico88612
Copy link
Member Author

/label tide/merge-method-merge

@VannTen, could you take a look at the package installation? kernel-modules-extra needs to align with the OS kernel version, I'm not sure system_package can use a dynamic package name.

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-merge Denotes a PR that should use a standard merge by tide when it merges. label Jul 20, 2025
@tico88612 tico88612 force-pushed the feat/rocky-10-support branch from d2029c1 to 4592257 Compare August 13, 2025 12:44
@VannTen
Copy link
Contributor

VannTen commented Aug 18, 2025

@VannTen, could you take a look at the package installation? kernel-modules-extra needs to align with the OS kernel version, I'm not sure system_package can use a dynamic package name

Hum, isn't there a meta-package or something like that which would let us not use the version in the name ? This looks like we would keep extraneous package whenever the kernel version would change, and since ansible_kernel is probably the running kernel and not the installed one (like, just after an upgrade but before a reboot) I'm not even sure we would install the correct one...

(If there is not... RHEL is kinda insane ? 🤔 or maybe that's me)

@VannTen
Copy link
Contributor

VannTen commented Aug 18, 2025

https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/10/html-single/managing_monitoring_and_updating_the_kernel/index

The docs does not seem to expect administrators to specify a version, AFAICT.

@tmurakam
Copy link
Contributor

I think we don't need the version of kernel-modules-extra. I can install it without version on RockyLinux 10.

$ ansible localhost -m ansible.builtin.package -a "name=kernel-modules-extra" -c local --become
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: Skipping callback plugin 'ara_default', unable to load
localhost | CHANGED => {
    "changed": true,
    "msg": "",
    "rc": 0,
    "results": [
        "Installed: kernel-modules-extra-6.12.0-55.25.1.el10_0.x86_64"
    ]
}

@tico88612
Copy link
Member Author

@tmurakam The CI will fail if I remove the kernel-modules-extra installation.
@VannTen not sure on this, if no version is specified, the latest version will be installed by default, and CI will fail.

@VannTen
Copy link
Contributor

VannTen commented Aug 19, 2025

Hum. And I guess if we don't upgrade the kernel stay at whatever versions it was and thus we get a mismatch.
(like:

package:
   name: '*'
   state: latest

I guess that code could change the problem but it's a significant change (even though we could argue it'd be more correct in general, not just for RHEL). 🤔

Relying on ansible_kernel is brittle, considering the possible disconnect between running/installed kernel as mentionned above.

@VannTen
Copy link
Contributor

VannTen commented Aug 19, 2025

I also don't think that something like this is possible (to upgrade the existing and install new packages) :

package:
   name:
     - package1
     - package2
     - '*'
   state: latest

@tico88612
Copy link
Member Author

Relying on ansible_kernel is brittle, considering the possible disconnect between running/installed kernel as mentionned above.

Absolutely agree, I also don't prefer this if the CI can pass.

@tico88612
Copy link
Member Author

@tico88612 tico88612 force-pushed the feat/rocky-10-support branch from 421c824 to 41c40b9 Compare August 21, 2025 06:57
Some of the kernel modules required by CNI are missing, installing
kernel-modules-extra can solve this problem.

Signed-off-by: ChengHao Yang <[email protected]>
@VannTen
Copy link
Contributor

VannTen commented Aug 28, 2025

The approach is #12513 (upgrade to latest + reboot on kernel-*) upgrade appear to work.
That's kinda invasive, though. I'm not sure what we should do, because ansible_kernel is a bit hacky and dependant on the runtime version... But this close to release it might be a bad idea to do such a change.

Or maybe limit the 'upgrade all' stuff to RHEL 10 & friends for 2.29, and switch it to default for 2.30 ?

@tico88612 @yankay @ant31 opinions ?

(I'm not big on delaying Rocky / RHEL10 support until next release).

@yankay
Copy link
Member

yankay commented Aug 29, 2025

The approach is #12513 (upgrade to latest + reboot on kernel-*) upgrade appear to work. That's kinda invasive, though. I'm not sure what we should do, because ansible_kernel is a bit hacky and dependant on the runtime version... But this close to release it might be a bad idea to do such a change.

Or maybe limit the 'upgrade all' stuff to RHEL 10 & friends for 2.29, and switch it to default for 2.30 ?

@tico88612 @yankay @ant31 opinions ?

(I'm not big on delaying Rocky / RHEL10 support until next release).

The upgrade all and reboot operations are indeed somewhat intrusive. If we absolutely need to support Rocky 10 as soon as possible, we might open a known issue and see if we can improve it later.

I think it's acceptable to handle the 'upgrade all' stuff to RHEL 10 & friends for version 2.29. If we can optimize this issue in the future, it could be addressed in a patch release.

@VannTen
Copy link
Contributor

VannTen commented Aug 29, 2025

Maybe something like:

Gate "upgrade all" behind a bool variable, default to "ansible_os_family == RedHat && ansible_distribution_major_version == '10'"

same thing for reboot.

After 2.29, default to true unconditionally ?

I'm kinda thorn on this (and two variables is not ideal, lots of distrib will have weird stuff after upgrade without reboot I think) 🤔

@tico88612
Copy link
Member Author

Restarting after upgrading the package seems fine to me, but I thought of two things that might need attention:

  1. If users are upgrading a cluster, we might need to avoid restarting all machines at the same time.
  2. We should note that RockyLinux 10 will have this behavior, but we don't consider it a problem. In the future, we might need to adjust whether to restart after package upgrades, and we'll need to gather feedback from GitHub Issues and the Slack channel.

@rptaylor
Copy link
Contributor

Personally I would say that initially updating and rebooting nodes when building the cluster is a less important subset of the ongoing operational task of keeping nodes up to date over time, which falls under cluster administration and requires extra steps like cordoning and draining nodes.
For that reason, if Kubespray does start to take care of that aspect of cluster operations/administration, it would be better IMHO to do it only when upgrading a cluster because you're already cordoning and draining nodes. (I admit that would be useful even though I'm not 100% convinced Kubespray should take on that role.)

Moreover, starting with up-to-date nodes when building the cluster seems to me like it should instead be the responsibility of the OS image you use when installing nodes or launching VMs (i.e. RHEL 10 images should just work out of the box...)

Anyway that is just my 2 cents.

@VannTen
Copy link
Contributor

VannTen commented Sep 1, 2025

I'd rather keep install and upgrade as close as possible (and close the gap, in fact).

For that reason, if Kubespray does start to take care of that aspect of cluster operations/administration, it would be better IMHO to do it only when upgrading a cluster because you're already cordoning and draining nodes. (I admit that would be useful even though I'm not 100% convinced Kubespray should take on that role.)

Moreover, starting with up-to-date nodes when building the cluster seems to me like it should instead be the responsibility of the OS image you use when installing nodes or launching VMs (i.e. RHEL 10 images should just work out of the box...)

It looks like we already have a upgrade/system-upgrade, but only for apt/yum, introduced in #10184.

IMO we should basically replace it with system_packages in the upgrade-cluster playbook (which it does not include for now AFAICT, which could cause problems if we had required package in a new version for instance).

It needs some adaptation though, and that's outside the scope of this PR.

I don't see a robust way to work with rocky linux / RHEL 10 without that though (if you have ideas ! ^^). Maybe we can add a doc disclaimer requiring to install the kernel-modules-extra for RHEL 10 for now ? It should be easy enough to check.

@tico88612 wdyt ?

@tico88612
Copy link
Member Author

I don't see a robust way to work with rocky linux / RHEL 10 without that though (if you have ideas ! ^^). Maybe we can add a doc disclaimer requiring to install the kernel-modules-extra for RHEL 10 for now ? It should be easy enough to check.

Should I remove kernel-module-extra from this PR first? Removing it might just stall the CI (it looks like the only difference for RockyLinux 10 is kernel-module-extra).

@VannTen
Copy link
Contributor

VannTen commented Sep 2, 2025 via email

@ErikJiang
Copy link
Member

Hi @VannTen @tico88612,

Thank you for working on Rocky Linux 10 support.

I understand the necessity of upgrading packages and installing kernel-modules-extra to ensure proper functionality.

However, these operations involve system-wide package upgrades and may require reboots, which could disrupt service availability in production environments.

Could we consider making these operations optional with configuration switches (defaulting to disabled)? This would allow operators to:

  1. Control when to perform these potentially disruptive operations
  2. Schedule them during maintenance windows
  3. Choose between automated execution or manual steps

Additionally, clear documentation explaining when these operations are needed and their potential impact would be very helpful for operators.

This approach would balance the technical requirements for Rocky Linux 10 support with operational flexibility for production deployments.

What are your thoughts on this approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. tide/merge-method-merge Denotes a PR that should use a standard merge by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add RHEL10 + variants support
7 participants