You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running into an issue where the Karpenter wants to disrupt a node with a stateful set running on it. Then karpenter terminates all the non-daemonset pods on that node. However, when the pod is scheduled to the new node it is unable to start as the volume is still attached to the old node and karpenter is not able to terminate that node:
$ kubectl describe pod
Status: Terminating (lasts 3h5m)
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Nominated 6m1s (x79 over 164m) karpenter Pod should schedule on: nodeclaim/default-on-demand-p27q8, node/ip-10-221-64-33.ec2.internal
When trying to find the volumeattachment and which node its attached to
You can see the that is another node than what the pod is scheduled to. Looking at the EBS CSI Driver attacher, I don't see any mentions of that attachment:
$ kubectl logs -n system-storage ebs-csi-driver-controller-659467997f-5rw4s -c csi-attacher | grep csi-c72d43bef46cd68c80357ffa7c5e647f8351bd0b01b2b747cb11f5d702745f7f
<empty> (I confirmed this was the leader....)
Once I run kubectl delete volumeattachment csi-c72d43bef46cd68c80357ffa7c5e647f8351bd0b01b2b747cb11f5d702745f7f the pod that was stuck terminating comes up on the new node.
What could this issue be caused by? I would expect the EBS CSI attacher to dettach the volume at some point.
The text was updated successfully, but these errors were encountered:
We noticed a similar issue, but in our case the pod is scheduled on the same node as the one mentioned in volume-attachment.
Even then the PV is stuck in terminating state and unable to recover.
The only relevant logs that I see are from the external-attacher -
ebs-csi-controller-866fcc7577-vwx5p csi-attacher I0227 14:46:04.338320 1 csi_handler.go:243] "Error processing" VolumeAttachment="csi-5e079e13957bfa0b6e7045d9e544afcdc47beaf74f862c3e321a70a655543f8e" err="failed to attach: PersistentVolume \"pvc-8cce5fbf-43b5-4bcf-bfdc-604a7e5a0ff0\" is marked for deletion"
Details of pod stuck in init phase -
Node: ip-10-141-167-0.sa-east-1.compute.internal/10.141.167.0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedAttachVolume 23s (x1428 over 2d) attachdetach-controller AttachVolume.Attach failed for volume "pvc-8cce5fbf-43b5-4bcf-bfdc-604a7e5a0ff0": PersistentVolume "pvc-8cce5fbf-43b5-4bcf-bfdc-604a7e5a0ff0" is marked for deletion
Details of the volume-attachment -
NAME ATTACHER PV NODE ATTACHED AGE
csi-5e079e13957bfa0b6e7045d9e544afcdc47beaf74f862c3e321a70a655543f8e ebs.csi.aws.com pvc-8cce5fbf-43b5-4bcf-bfdc-604a7e5a0ff0 ip-10-141-167-0.sa-east-1.compute.internal false 2d
I'm running into an issue where the Karpenter wants to disrupt a node with a stateful set running on it. Then karpenter terminates all the non-daemonset pods on that node. However, when the pod is scheduled to the new node it is unable to start as the volume is still attached to the old node and karpenter is not able to terminate that node:
When trying to find the volumeattachment and which node its attached to
You can see the that is another node than what the pod is scheduled to. Looking at the EBS CSI Driver attacher, I don't see any mentions of that attachment:
Once I run
kubectl delete volumeattachment csi-c72d43bef46cd68c80357ffa7c5e647f8351bd0b01b2b747cb11f5d702745f7f
the pod that was stuck terminating comes up on the new node.What could this issue be caused by? I would expect the EBS CSI attacher to dettach the volume at some point.
The text was updated successfully, but these errors were encountered: