-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
After some period, Pods cannot create and delete with this message
$ kubectl describe pod <name>
error killing pod: failed to "KillPodSandbox" for "9f91266a-70a9-428f-a1d6-a2ae8d5427d1" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"4657b77480472f4352e413d52e0c5d5545c675da862cc56c8e6f22d7b0577031\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: connection is unauthorized: Unauthorized"
It seems to be relate with the service account of policy changed from kubernetes v1.26.0
https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#manual-secret-management-for-serviceaccounts
Here is the workaround of solution.
re-read calico-node information by restart or delete.
$ kubectl rollout restart ds -n kube-system calico-node
Expected Behavior
kubectl create
or delete
is working fine.
Current Behavior
It won't work properly
[root@m-k8s ~]# kubectl get po
NAME READY STATUS RESTARTS AGE
dpy-nginx-6564b9dbcc-d7jj5 0/1 ContainerCreating 0 17m
dpy-nginx-6564b9dbcc-vgjmw 0/1 ContainerCreating 0 17m
dpy-nginx-6564b9dbcc-wbr59 0/1 ContainerCreating 0 17m
nfs-client-provisioner-7596fb9c9c-gmpmn 0/1 Terminating 0 47h
nfs-client-provisioner-7596fb9c9c-jvmnm 1/1 Running 1 (46m ago) 42h
nginx-76d9fbf4fb-7xjgb 0/1 Terminating 0 42h
nginx-76d9fbf4fb-dv48n 1/1 Running 0 42h
nginx-76d9fbf4fb-kqp5j 1/1 Running 0 42h
nginx-76d9fbf4fb-qrl4p 1/1 Running 0 42h
nginx-76d9fbf4fb-wlpwd 1/1 Running 0 42h
Possible Solution
`Workaround' is restart daemonset or delete pod.
OR
'Possible Solution' is that create a long period secret token for service account instead of this.
and use this secret with service account for calico-node. (it is related with #5712 #6421)
sh-4.4# cat /var/run/secrets/kubernetes.io/serviceaccount/token
eyJhbGciOiJSUzI1NiIsImtpZCI6IjlpTFk5RXlJR29yb01VZjlXOGg0UGhvLWhLRGhtZnNvekdyeU0xdVlFUTAifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiXSwiZXhwIjoxNzA1OTc1ODA5LCJpYXQiOjE2NzQ0Mzk4MDksImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsInBvZCI6eyJuYW1lIjoiY2FsaWNvLW5vZGUtOWRnZzIiLCJ1aWQiOiIxY2UwODRlYS1kNzIzLTQ5MDAtYjI1ZC00YzRhNTVmMmI0OWYifSwic2VydmljZWFjY291bnQiOnsibmFtZSI6ImNhbGljby1ub2RlIiwidWlkIjoiM2RhYmI5MmYtN2UzYy00ZTkyLWI4OTUtZmM3NzczM2RlMTBmIn0sIndhcm5hZnRlciI6MTY3NDQ0MzQxNn0sIm5iZiI6MTY3NDQzOTgwOSwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmNhbGljby1ub2RlIn0.SC5WdggKDD-SE2ZnIfNYaMROXNvJVqqdKXdF6SCN_qrLBwmLwXbSHnQA_vkBBFHqi1qsQP2CuBx0beYUzm5VkcBt7LMZeDBHaOfDIfBvwMbzkAAMcSoqd6bnZi1mZa8Mf2ZTVEvhLOJSyb9npGAa0te6xfWAvEbTmGWTOvZaQ59y-RqJ9OfqAiYYWoEDCLpjjjG0F1-ke2_6eRx7m6Ri2Ne47WKGGURfMVvf2GAtV0xrYuI2tvA8UhivzhaPiJx56RfyVmVAnrl8qfBk0rG6J43TkPGA59R52vbvJkI_9k-kPw_OXJv35YDqgExn3i7CswGUZCX9TAGkET5mpm7u4w
Steps to Reproduce (for bugs)
- Deploy native-kubernetes by vagrant-script (link)
- Wait for 1-2days
- Deploy new deployment
[root@m-k8s ~]# k create deploy new-nginx --image=nginx --replicas=3
deployment.apps/new-nginx created
- Check deployment status
[root@m-k8s ~]# kubectl get po
NAME READY STATUS RESTARTS AGE
new-nginx-6564b9dbcc-<hash> 0/1 ContainerCreating 0 15m
new-nginx-6564b9dbcc-<hash> 0/1 ContainerCreating 0 15m
new-nginx-6564b9dbcc-<hash> 0/1 ContainerCreating 0 15m
Context
It already applied to the code from #6218
node/pkg/cni/token_watch.go
const defaultCNITokenValiditySeconds = 24 * 60 * 60
const minTokenRetryDuration = 5 * time.Second
const defaultRefreshFraction = 4
func NewTokenRefresher(clientset *kubernetes.Clientset, namespace string, serviceAccountName string) *TokenRefresher {
return NewTokenRefresherWithCustomTiming(clientset, namespace, serviceAccountName, defaultCNITokenValiditySeconds, minTokenRetryDuration, defaultRefreshFraction)
}
So I decoded applied JWT on the calico-node.
It confirmed 1 year(365d) properly.
JWT
sh-4.4# cat /var/run/secrets/kubernetes.io/serviceaccount/token
eyJhbGciOiJSUzI1NiIsImtpZCI6IjlpTFk5RXlJR29yb01VZjlXOGg0UGhvLWhLRGhtZnNvekdyeU0xdVlFUTAifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiXSwiZXhwIjoxNzA1OTc1ODA5LCJpYXQiOjE2NzQ0Mzk4MDksImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsInBvZCI6eyJuYW1lIjoiY2FsaWNvLW5vZGUtOWRnZzIiLCJ1aWQiOiIxY2UwODRlYS1kNzIzLTQ5MDAtYjI1ZC00YzRhNTVmMmI0OWYifSwic2VydmljZWFjY291bnQiOnsibmFtZSI6ImNhbGljby1ub2RlIiwidWlkIjoiM2RhYmI5MmYtN2UzYy00ZTkyLWI4OTUtZmM3NzczM2RlMTBmIn0sIndhcm5hZnRlciI6MTY3NDQ0MzQxNn0sIm5iZiI6MTY3NDQzOTgwOSwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmNhbGljby1ub2RlIn0.SC5WdggKDD-SE2ZnIfNYaMROXNvJVqqdKXdF6SCN_qrLBwmLwXbSHnQA_vkBBFHqi1qsQP2CuBx0beYUzm5VkcBt7LMZeDBHaOfDIfBvwMbzkAAMcSoqd6bnZi1mZa8Mf2ZTVEvhLOJSyb9npGAa0te6xfWAvEbTmGWTOvZaQ59y-RqJ9OfqAiYYWoEDCLpjjjG0F1-ke2_6eRx7m6Ri2Ne47WKGGURfMVvf2GAtV0xrYuI2tvA8UhivzhaPiJx56RfyVmVAnrl8qfBk0rG6J43TkPGA59R52vbvJkI_9k-kPw_OXJv35YDqgExn3i7CswGUZCX9TAGkET5mpm7u4w
Decoded JWT's Payload
{
"aud": [
"https://kubernetes.default.svc.cluster.local"
],
"exp": 1705975809, <<<< Tue Jan 23 2024 02:10:09 GMT+0000
"iat": 1674439809,
"iss": "https://kubernetes.default.svc.cluster.local",
"kubernetes.io": {
"namespace": "kube-system",
"pod": {
"name": "calico-node-9dgg2",
"uid": "1ce084ea-d723-4900-b25d-4c4a55f2b49f"
},
"serviceaccount": {
"name": "calico-node",
"uid": "3dabb92f-7e3c-4e92-b895-fc77733de10f"
},
"warnafter": 1674443416
},
"nbf": 1674439809,
"sub": "system:serviceaccount:kube-system:calico-node"
}
Thus this issue is a little different logic to verify the authorization from kubernetes.
/var/log/message
from all nodes like below when it happened.
[control-plane node]
Jan 23 09:10:35 m-k8s kubelet: E0123 09:10:35.298683 4180 server.go:299] "Unable to authenticate the request due to an error" err="[invalid bearer token, service account token has been invalidated]"
Jan 23 09:10:50 m-k8s kubelet: E0123 09:10:50.303499 4180 server.go:299] "Unable to authenticate the request due to an error" err="[invalid bearer token, service account token has been invalidated]"
Jan 23 09:11:05 m-k8s kubelet: E0123 09:11:05.308058 4180 server.go:299] "Unable to authenticate the request due to an error" err="[invalid bearer token, service account token has been invalidated]"
Jan 23 09:11:20 m-k8s kubelet: E0123 09:11:20.300704 4180 server.go:299] "Unable to authenticate the request due to an error" err="[invalid bearer token, service account token has been invalidated]"
Jan 23 09:11:35 m-k8s kubelet: E0123 09:11:35.290727 4180 server.go:299] "Unable to authenticate the request due to an error" err="[invalid bearer token, service account token has been invalidated]"
<snipped>
[worker node]
Jan 21 16:44:12 w2-k8s kubelet: E0121 16:44:12.656423 3630 server.go:299] "Unable to authenticate the request due to an error" err="[invalid bearer token, service account token has been invalidated]"
Jan 21 16:44:27 w2-k8s kubelet: E0121 16:44:27.650877 3630 server.go:299] "Unable to authenticate the request due to an error" err="[invalid bearer token, service account token has been invalidated]"
Your Environment
- Calico version: v3.24.5, v3.25.0
- Orchestrator version (e.g. kubernetes, mesos, rkt): native-kubernetes v1.26.0
[root@m-k8s ~]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
m-k8s Ready control-plane 2d19h v1.26.0 192.168.1.10 <none> CentOS Linux 7 (Core) 3.10.0-1127.19.1.el7.x86_64 containerd://1.6.10
w1-k8s Ready <none> 2d19h v1.26.0 192.168.1.101 <none> CentOS Linux 7 (Core) 3.10.0-1127.19.1.el7.x86_64 containerd://1.6.10
w2-k8s Ready <none> 2d19h v1.26.0 192.168.1.102 <none> CentOS Linux 7 (Core) 3.10.0-1127.19.1.el7.x86_64 containerd://1.6.10
w3-k8s Ready <none> 2d18h v1.26.0 192.168.1.103 <none> CentOS Linux 7 (Core) 3.10.0-1127.19.1.el7.x86_64 containerd://1.6.10
- Operating System and version: CentOS 7.9 (3.10.0-1127.19.1.el7.x86_64)
- Link to your project (optional):