Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛fix: classic elb fix for TLS issues #5346

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

richardcase
Copy link
Member

@richardcase richardcase commented Feb 11, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

There is an issue when creating clusters (or upgrading clusters) with
kubernetes versions v1.30+ and using a classic elb with an SSL health
check (which the default for new clusters).

The problem is that Kubernetes v1.30+ switched to Go 1.22 which removed
the RSA ciphers. This then causes the ELB health check to fail.

There are a number of workarounds including manually specifying the
cipher suites to use for the api server.

This commit does the following:

  • Adds warnings to the AWSCluster webhook to alert users that:
    • their cluster is using a classic elb and this is now deprecated
    • their cluster is using the default health check protocol which was
      previously SSL and that now the default has changed to TCP.
  • Will update the health check to TCP if the load balancer is "classic"
    and the health check protocol is not set.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Fixes #5139
Fixes #5280
Relates #5335

Special notes for your reviewer:

Checklist:

  • squashed commits
  • includes documentation
  • includes emoji in title
  • adds unit tests
  • adds or updates e2e tests

Release note:

action required
From this release onwards we recommend not creating clusters using the classic ELB (which is the default for the API). Classic ELB support is deprected and support will be removed in a future version.
For new & existing clusters that use a classic elb AND do not specify the health check protocol then the protocol will be changed/set to TCP instead of SSL.
If you want to use a classic elb with an SSL healthcheck then you will need to specify the cipher suites to use in the KubeadmControlPlane:

apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: "${CLUSTER_NAME}-control-plane"
spec:
  kubeadmConfigSpec:
    clusterConfiguration:
      apiServer:
        extraArgs:
          cloud-provider: external

          # This is needed for Kubernetes v1.30+ since else it uses the Go defaults which don't
          # work with AWS classic load balancers, see
          # https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/5139. If you use
          # another load balancer type such as NLB, this is not needed.
          #
          # The list consists of the secure ciphers from Go 1.23.3, plus some less secure
          # RSA ciphers which the AWS classic load balancer instance health check supports.
          tls-cipher-suites: TLS_AES_128_GCM_SHA256,TLS_AES_256_GCM_SHA384,TLS_CHACHA20_POLY1305_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 11, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from richardcase. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Feb 11, 2025
@richardcase richardcase changed the title WIP: 🐛fix: classic elb fix for TLS issues 🐛fix: classic elb fix for TLS issues Feb 12, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 12, 2025
@richardcase
Copy link
Member Author

/test ?

@k8s-ci-robot
Copy link
Contributor

@richardcase: The following commands are available to trigger required jobs:

/test pull-cluster-api-provider-aws-build
/test pull-cluster-api-provider-aws-build-docker
/test pull-cluster-api-provider-aws-test
/test pull-cluster-api-provider-aws-verify

The following commands are available to trigger optional jobs:

/test pull-cluster-api-provider-aws-apidiff-main
/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-blocking
/test pull-cluster-api-provider-aws-e2e-clusterclass
/test pull-cluster-api-provider-aws-e2e-conformance
/test pull-cluster-api-provider-aws-e2e-conformance-with-ci-artifacts
/test pull-cluster-api-provider-aws-e2e-eks
/test pull-cluster-api-provider-aws-e2e-eks-gc
/test pull-cluster-api-provider-aws-e2e-eks-testing

Use /test all to run the following jobs that were automatically triggered:

pull-cluster-api-provider-aws-apidiff-main
pull-cluster-api-provider-aws-build
pull-cluster-api-provider-aws-build-docker
pull-cluster-api-provider-aws-test
pull-cluster-api-provider-aws-verify

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@richardcase
Copy link
Member Author

/test pull-cluster-api-provider-aws-e2e

@richardcase
Copy link
Member Author

/test pull-cluster-api-provider-aws-e2e

@richardcase
Copy link
Member Author

/test pull-cluster-api-provider-aws-e2e

@richardcase
Copy link
Member Author

Test failures are strange as the linting passes locally :(

@richardcase
Copy link
Member Author

/test pull-cluster-api-provider-aws-e2e

@richardcase
Copy link
Member Author

/test pull-cluster-api-provider-aws-test

@richardcase
Copy link
Member Author

#5352 will fix the issue with the tests. Just need to sort out the e2e.

There is an issue when creating clusters (or upgrading clusters) with
kubernetes versions v1.30+ and using a classic elb with an SSL health
check (which the default for new clusters).

The problem is that Kubernetes v1.30+ switched to Go 1.22 which removed
the RSA ciphers. This then causes the ELB health check to fail.

There are a number of workarounds including manually specifying the
cipher suites to use for the api server.

This commit does the following:

- Adds warnings to the AWSCluster webhook to alert users that:
  - their cluster is using a classic elb and this is now deprecated
  - their cluster is using the default health check protocol which warnings
  previously SSL and that now the default has changed to TCP.
- Will update the health check to TCP if the load balancer is "classic"
  and the health check protocol is not set.

Signed-off-by: Richard Case <[email protected]>
@richardcase
Copy link
Member Author

/test pull-cluster-api-provider-aws-e2e

@richardcase
Copy link
Member Author

@damdo - looks like the e2e is now passing. Worth a discussion on Monday at the office hours.

Copy link
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @richardcase for your work on this!
The logic for migrating health-check types from SSL->TCP (for classic LBs)
and the testing to verify that then works LGTM

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 24, 2025
@nrb
Copy link
Contributor

nrb commented Feb 26, 2025

For posterity, made a small edit to the release note to add do into this sentence:

For new & existing clusters that use a classic elb AND [do] not specify the health check protocol then the protocol will be changed/set to TCP instead of SSL.

}

if r.Spec.SecondaryControlPlaneLoadBalancer != nil {
if r.Spec.SecondaryControlPlaneLoadBalancer.LoadBalancerType == LoadBalancerTypeClassic {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case shouldn't even be possible, see the webhook: https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/main/api/v1beta2/awscluster_webhook.go#L322

That being said, I think we can remove this code in a follow up PR rather than block this further.

@@ -322,6 +367,9 @@ func (r *AWSCluster) validateControlPlaneLBs() field.ErrorList {
if r.Spec.SecondaryControlPlaneLoadBalancer.LoadBalancerType != LoadBalancerTypeNLB {
allErrs = append(allErrs, field.Invalid(field.NewPath("spec", "secondaryControlPlaneLoadBalancer", "loadBalancerType"), r.Spec.SecondaryControlPlaneLoadBalancer.LoadBalancerType, "secondary control plane load balancer must be a Network Load Balancer"))
}
if r.Spec.SecondaryControlPlaneLoadBalancer.LoadBalancerType == LoadBalancerTypeClassic {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed if we're also setting an error for this case on L367?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Quick start failing to register instances with API server ELB health check fails with Kubernetes >=v1.30.x
4 participants