You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
EBS CSI driver currently supports removing a taints on new ndoes once it has started running sucessfully(implemented via #1581)
What you expected to happen?
I could see that the taint is being removed upon csi driver pod running in that ndoe but current code removes it just before new node service is registered in func newNodeService. But it can be seen that still sometimes we see the issue mentioned in kubernetes/kubernetes#95911 is seen due to which pods are assigned to new ndoes even when volume mount limit will exceed the capacity of new node.
How to reproduce it (as minimally and precisely as possible)?
Easy way to reproduce this is to create a pod with 26 ebs mounts and try to provision it on a node which supports say only 25(t3 instance type for example). It can be seen that when this si ried using karpenter dependening on race condition this pod will eventually get scheduled onto a newly spun up t3 instance node.
Anything else we need to know?:
This can potentially be fixed if we introduce a sleep initially in removeTaintInBackground before it proceeds to remove the taint in a backed off mode. This might delay the csi node driver removing the taint which will give enough time for csinode limits to be properly registered.
Environment
AWS EKS where karpenter + ebs is being used for a node group
Kubernetes version (use kubectl version):
1.28 with karpenter 0.31.4
Driver version:
ebs csi driver - 1.24.0
If needed I can raise a PR for this
The text was updated successfully, but these errors were encountered:
/kind bug
EBS CSI driver currently supports removing a taints on new ndoes once it has started running sucessfully(implemented via #1581)
What you expected to happen?
I could see that the taint is being removed upon csi driver pod running in that ndoe but current code removes it just before new node service is registered in func newNodeService. But it can be seen that still sometimes we see the issue mentioned in kubernetes/kubernetes#95911 is seen due to which pods are assigned to new ndoes even when volume mount limit will exceed the capacity of new node.
How to reproduce it (as minimally and precisely as possible)?
Easy way to reproduce this is to create a pod with 26 ebs mounts and try to provision it on a node which supports say only 25(t3 instance type for example). It can be seen that when this si ried using karpenter dependening on race condition this pod will eventually get scheduled onto a newly spun up t3 instance node.
Anything else we need to know?:
This can potentially be fixed if we introduce a sleep initially in removeTaintInBackground before it proceeds to remove the taint in a backed off mode. This might delay the csi node driver removing the taint which will give enough time for csinode limits to be properly registered.
Environment
AWS EKS where karpenter + ebs is being used for a node group
kubectl version
):If needed I can raise a PR for this
The text was updated successfully, but these errors were encountered: