Kubernetes nodes cannot be provisioned any more in subnets tagged with sigs.k8s.io/cluster-api-provider-aws/association: secondary #5227

cellux · 2024-11-24T11:14:46Z

/kind bug

What steps did you take and what happened:

Upgraded the CAPA provider to v2.7.1 and then tried to upgrade one of my AWS clusters to a newer Kubernetes version.

During the rolling update of MachineDeployments, CAPA v2.7.1 rejected creation of new EC2 instances saying subnet XXXX belongs to a secondary CIDR block which won't be used to create instances.

What did you expect to happen:

The new EC2 instances are provisioned - as this used to happen before the upgrade to v2.7.1.

Anything else you would like to add:

Downgrading CAPA provider to v2.6.1 resolved the issue.

The problem might be around this code block in pkg/cloud/services/ec2/instance.go:

			tags := converters.TagsToMap(subnet.Tags)
			if tags[infrav1.NameAWSSubnetAssociation] == infrav1.SecondarySubnetTagValue {
				errMessage += fmt.Sprintf(" subnet %q belongs to a secondary CIDR block which won't be used to create instances.", *subnet.SubnetId)
				continue
			}

Environment:

Cluster-api-provider-aws version: v2.7.1
Kubernetes version: (use kubectl version): v1.29.10-eks-7f9249a
OS (e.g. from /etc/os-release): Ubuntu 20.04.6 LTS

We use four private subnets in AWS which are pre-provisioned by our IT team:

two for Transit and NAT gateways, VPC endpoints, etc. - these are connected to the company network
two for Kubernetes nodes and the pod network - nonrouted subnets sliced from 100.64.0.0/16

We followed the docs at https://cluster-api-aws.sigs.k8s.io/topics/eks/pod-networking#unmanaged-static-vpc:

custom VPC CNI configuration
secondary CIDR subnets tagged with sigs.k8s.io/cluster-api-provider-aws/association=secondary

We do not want to use the first two subnets for Kubernetes nodes as those are pretty small and could be easily exhausted when we scale out.

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-11-24T11:14:54Z

This issue is currently awaiting triage.

If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-triage-robot · 2025-02-22T11:23:12Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

cellux · 2025-02-27T11:59:02Z

Meanwhile we investigated our setup and there is good chance that the error is on our side.

If I understand correctly, subnets tagged with sigs.k8s.io/cluster-api-provider-aws/association: secondary should never be used for EC2 instances, only for the pod network. The new check in v2.7.x just codifies this contract.

Our mistake most likely is that we use the same subnet for the EC2 instances and as a secondary subnet for the pod network.

We should just remove the sigs.k8s.io/cluster-api-provider-aws/association: secondary tag from the EC2/pod subnet, replace it with kubernetes.io/cluster/<cluster-name> and kubernetes.io/role/internal-elb tags as described here and then we wouldn't need the custom networking or the ENIConfigs, everything would work with the default setup.

We'll verify these assumptions and report back.

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-priority labels Nov 24, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Nov 24, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes nodes cannot be provisioned any more in subnets tagged with sigs.k8s.io/cluster-api-provider-aws/association: secondary #5227

Kubernetes nodes cannot be provisioned any more in subnets tagged with sigs.k8s.io/cluster-api-provider-aws/association: secondary #5227

cellux commented Nov 24, 2024

k8s-ci-robot commented Nov 24, 2024

k8s-triage-robot commented Feb 22, 2025

cellux commented Feb 27, 2025

Kubernetes nodes cannot be provisioned any more in subnets tagged with sigs.k8s.io/cluster-api-provider-aws/association: secondary #5227

Kubernetes nodes cannot be provisioned any more in subnets tagged with sigs.k8s.io/cluster-api-provider-aws/association: secondary #5227

Comments

cellux commented Nov 24, 2024

k8s-ci-robot commented Nov 24, 2024

k8s-triage-robot commented Feb 22, 2025

cellux commented Feb 27, 2025