-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Kontainer Engine v2: The new way rancher integrates with cloud providers
KEv2 refers to our operators for managing cloud provider clusters and their integration into rancher. It is called KEv2 to distinguish it from the old way of provisioning cloud provider clusters, KEv1.
Current operators:
KEv1 used pluggable GRPC server (drivers) and dynamic configs (really just interface maps in the CRD with a schema dynamically built depending on values retrieved from their driver).
KEv2 uses "operators" which are separate projects containing controllers that react to a CRD. The project is imported into rancher as a package and the spec of the CRD it uses to provision clusters is added to the rancher Cluster struct as a config.
A KEv2 operator watches a CRD, such as EKSClusterConfig
. When the CR of that type is created the operator looks at whether the spec indicated it is imported or not. If it is not import then the operator attempts to provision a cluster matching the spec. If it is imported the controller attempts to apply any non-nil fields to the upstream cluster. As the cluster goes through the provisioning lifecycle the phase
field on status
should change. The states for clusters are as follows:
-
""
: lifecycle is just beginning -
provisioning
: the cluster is being provisioned -
active
: the cluster is active and no updates are being processed -
updating
: the cluster is currently processing an update and will return toactive
once finished
The status contains a field called failureMessage
, if it is not empty then the cluster is failing in which ever state it is in. Updates can still be applied to the spec to try to resolve the failure.
Each operator project is imported into rancher as a package in go mods. Each operator's spec struct that it uses for provisioning is added to the rancher Cluster struct/CRD as a config field. For example, the Cluster struct contains the field:
EKSConfig *EKSClusterConfigSpec `json:"eksConfig,omitempty" yaml:"eksConfig,omitempty"
After adding such a field it will be necessary to run go generate
.
When cluster controllers run that do not apply to KEv2 clusters, they short circuit if they evaluate that a KEv2 config field is not nil. This indicates the cluster is a KEv2 cluster and will be handled by its own controller. Some logic checks the driver instead of the config field but it should be remembered that the driver field is on status and will not be set until it has entered its respective KEv2 controller.
The two handlers every KEv2 cluster should hit are the refresh CronJob) and their respective KEv2 integration controller.
The integration controller manages an upstream cluster for a particular cluster provider based on its config spec. It uses a KEv2 operator by deploying it in the cattle-system
namespace of the local cluster and managing an underlying instance of the operator's config CRD per cluster. This means if you have a rancher cluster of type EKSv2 and its id is c-asdf
, then the integration controller will create an EKSClusterConfig
in the cattle-global-data
namespace named c-asdf
with a spec that matches the Cluster's EKSClusterConfigSpec
field. The handler will update the EKSClusterConfig
object whenever the Cluster object's EKSClusterConfigSpec
field is updated. The EKSClusterConfig object is totally maintained by Rancher and it is not intended that a rancher user interact with it directly.
The tasks performed by the integration controller are as follows:
- If the relevant config field is nil, short circuit
- Check if operator app is installed in system namespace, install it if not, and keep it up to date.
- If the driver field on status is empty, write it to status and update.
- Create an instance of the actual config CRD, such as EKSClusterConfig, if one doesn't exist with the same id as the cluster in the
cattle-global-data
namespace. If it does exist, check if the config on the Cluster's spec does not match the spec on the EKSClusterConfig it is associated with, updated the EKSClusterConfig. - Check the phase on the object's status, propagate it as a condition on the Rancher Cluster object. If failureMessage is not empty then set the corresponding condition to false.
- If the phase is not active, perform any necessary logic and re-enqueue.
- If the phase is active check if
ServiceAccountToken
on the Cluster object's status field is empty. If so generate it. If the info required to generate it is not available, re-enqueue or wait until it is; for EKS this would mean waiting until a secret exists that the operator creates containing a CACert and endpoint URL.