-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster deletion stuck due to AWSCluster finalizer issue #5107
Comments
This issue is currently awaiting triage. If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@randybias FYI |
We also use Helm charts to deploy CAPA resources and we've also ran into issues when deleting clusters, but I believe we were doing the same mistake that you seem to be doing now. Let me try to explain our journey. Maintainers or other folks with more experience can correct me if I'm wrong, because I may be :) CAPI and CAPA need to take care of deleting a cluster, specifically they need to follow certain order in which the different resources are removed. First worker nodes are removed (i.e. When using Helm charts, helm hijacks the whole deletion process, deleting all objects at once. The This may not apply to you, but as a final note, we ended up removing the I hope that helps. So I believe this is working as intended. Maybe we could add documentation about this particular use case. |
Thanks for your answer and suggestions, but I will have to disagree. Kubernetes is a declarative system, so when operating on objects I shouldn't maintain any order of execution. CAPA in this case fails to maintain it's own state and order of execution it requires. It deletes the finalizer on it's own resource which still has dependent resources present. All these resources are owned (reconciled) by a CAPA controller. So this is clearly a bug. I'm considering the solution with annotations (thanks for the hint btw) only as a workaround, but not as proper fix. |
Fair enough. Let's see if that could be improved, because it would be a good improvement to all users, not just the ones using Helm. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
This is not a CAPA specific thing but for the whole of CAPI. In CAPI the only supported way to delete a cluster is by deleting the
If you use Helm for the cluster definitions then as @fiunchinho you will need to use |
/kind bug
What steps did you take and what happened:
We have CAPI objects deployed using Helm chart. Thus when
helm uninstall <release>
is executed all objects get deleted simultaneously.Sometimes finalizer on
AWSCluster
object (and object itself) is removed before allAWSMachine
resources are removed. This is causingAWSMachines
to stuck forever, because awsmachine controller tries to patchAWSCluster
somehow.Ultimately this makes cluster deletion to stuck indefinitely. The
Cluster
object will then stuck in theDeleting
state.To fix this operator should manually delete finalizers on all
AWSMachine
objects which are left in the cluster.Note that this is intermittent issue, but happens pretty often in my tests (~7 times out of 10). Also I should note that I AWS resources seem to be properly cleaned up
What did you expect to happen:
AWSCluster
object finalizer should only get removed when no depended objects (likeAWSMachine
) are present.Environment:
v2.6.1
kubectl version
):v1.30.2+k0s
/etc/os-release
):Amazon Linux 2
The text was updated successfully, but these errors were encountered: