Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-slb related bug fixes #7432

Merged

Conversation

nilo19
Copy link
Contributor

@nilo19 nilo19 commented Oct 29, 2024

What type of PR is this?

/kind bug

What this PR does / why we need it:

  1. All endpointslices of a local service should be included in local backend pool updater, instead of only the first endpointslice.
  2. In some rare cases, migration from NIC to IP-based LB can be in a middle state where the NIC references are removed, but those IPConfigs in the backend pool are not. In this case, we should manually exclude those IPConfigs from the request body.
  3. localServiceOwnsBackendPool should compare the full backend pool name, not just prefix, because two service names can share the same prefix.
  4. There is a corner case when the cluster is being updated to multi-slb from classic NIC-based single lb, not from an IP-based cluster. In this case, if the service being reconciled is local, the cloud provider will try to update a NIC pool to IP-based pool direct, which is not allowed. We should skip adding IPs to NIC-based pool in multi-slb mode.
  5. There is a bug in ReconcileBackendPools, where we by mistake parse the LB name to use as the backend pool name.

Which issue(s) this PR fixes:

Fixes #7113
Fixes #7200
Fixes #6980

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fix: several bugs related to multiple standard load balancers mode.
1. All endpointslices of a local service should be included in local backend pool updater, instead of only the first endpointslice.
2. In some rare cases, migration from NIC to IP-based LB can be in a middle state where the NIC references are removed, but those IPConfigs in the backend pool are not. In this case, we should manually exclude those IPConfigs from the request body.
3. localServiceOwnsBackendPool should compare the full backend pool name, not just prefix, because two service names can share the same prefix.
4. There is a corner case when the cluster is being updated to multi-slb from classic NIC-based single lb, not from an IP-based cluster. In this case, if the service being reconciled is local, the cloud provider will try to update a NIC pool to IP-based pool direct, which is not allowed. We should skip adding IPs to NIC-based pool in multi-slb mode.
5. There is a bug in ReconcileBackendPools, where we by mistake parse the LB name to use as the backend pool name.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 29, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nilo19

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 29, 2024
activeNodes = bi.getLocalServiceEndpointsNodeNames(service)
}

if isNICPool(backendPool) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix no. 4

@@ -886,7 +889,13 @@ func removeNodeIPAddressesFromBackendPool(
if addresses[i].LoadBalancerBackendAddressPropertiesFormat != nil {
ipAddress := ptr.Deref((*backendPool.LoadBalancerBackendAddresses)[i].IPAddress, "")
if ipAddress == "" {
klog.V(4).Infof("removeNodeIPAddressFromBackendPool: LoadBalancerBackendAddress %s is not IP-based, skipping", ptr.Deref(addresses[i].Name, ""))
if isNodeIP {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix no. 2

@@ -633,7 +634,7 @@ func (bi *backendPoolTypeNodeIP) ReconcileBackendPools(ctx context.Context, clus
if isMigration && bi.EnableMigrateToIPBasedBackendPoolAPI {
var backendPoolNames []string
for _, id := range lbBackendPoolIDsSlice {
name, err := getLBNameFromBackendPoolID(id)
name, err := getBackendPoolNameFromBackendPoolID(id)
Copy link
Contributor Author

@nilo19 nilo19 Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix no. 5

@@ -446,8 +445,10 @@ func (az *Cloud) getLocalServiceBackendPoolID(serviceName string, lbName string,

// localServiceOwnsBackendPool checks if a backend pool is owned by a local service.
func localServiceOwnsBackendPool(serviceName, bpName string) bool {
prefix := strings.Replace(serviceName, "/", "-", -1)
return strings.HasPrefix(strings.ToLower(bpName), strings.ToLower(prefix))
if strings.HasSuffix(strings.ToLower(bpName), consts.IPVersionIPv6StringLower) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix no. 3

ep = endpointSlice
foundInCache = true
return false
eps = append(eps, endpointSlice)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix no. 1

@nilo19 nilo19 force-pushed the fix/multi-slb/endpointslice branch from 2b510ac to 5084147 Compare October 29, 2024 04:30
client := fake.NewSimpleClientset(&svc)
// if tc.existingEPS != nil {
// client = fake.NewSimpleClientset(&svc, tc.existingEPS)
// } else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: delete unused codes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

bp.LoadBalancerBackendAddresses != nil {
for _, addr := range *bp.LoadBalancerBackendAddresses {
if ptr.Deref(addr.IPAddress, "") == "" {
logger.Info("The load balancer backend address has empty ip address, assuming it is a NIC pool",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to v(4)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@nilo19 nilo19 force-pushed the fix/multi-slb/endpointslice branch from 5084147 to 1731376 Compare November 11, 2024 00:44
@feiskyer
Copy link
Member

/retest

1 similar comment
@nilo19
Copy link
Contributor Author

nilo19 commented Nov 11, 2024

/retest

@nilo19
Copy link
Contributor Author

nilo19 commented Nov 12, 2024

/test pull-cloud-provider-azure-e2e-ccm-capz

@nilo19
Copy link
Contributor Author

nilo19 commented Nov 12, 2024

/test pull-cloud-provider-azure-e2e-ccm-vmss-capz

@feiskyer
Copy link
Member

Thanks for the fixes
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 12, 2024
@nilo19
Copy link
Contributor Author

nilo19 commented Nov 12, 2024

/test pull-cloud-provider-azure-e2e-ccm-capz

@nilo19
Copy link
Contributor Author

nilo19 commented Nov 13, 2024

/retest

@nilo19
Copy link
Contributor Author

nilo19 commented Nov 13, 2024

/retest

2 similar comments
@MartinForReal
Copy link
Contributor

/retest

@MartinForReal
Copy link
Contributor

/retest

@k8s-ci-robot k8s-ci-robot merged commit 8a39b57 into kubernetes-sigs:master Nov 15, 2024
18 checks passed
@nilo19 nilo19 deleted the fix/multi-slb/endpointslice branch November 18, 2024 22:03
@nilo19
Copy link
Contributor Author

nilo19 commented Nov 18, 2024

/cherrypick release-1.31

@nilo19
Copy link
Contributor Author

nilo19 commented Nov 18, 2024

/cherrypick release-1.30

@nilo19
Copy link
Contributor Author

nilo19 commented Nov 18, 2024

/cherrypick release-1.29

@nilo19
Copy link
Contributor Author

nilo19 commented Nov 18, 2024

/cherrypick release-1.28

@k8s-infra-cherrypick-robot

@nilo19: #7432 failed to apply on top of branch "release-1.30":

Applying: fix: Include all endpointslices for local services when using multi-slb
Using index info to reconstruct a base tree...
M	pkg/provider/azure_loadbalancer.go
M	pkg/provider/azure_loadbalancer_backendpool.go
M	pkg/provider/azure_loadbalancer_backendpool_test.go
M	pkg/provider/azure_local_services.go
M	pkg/provider/azure_local_services_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/provider/azure_local_services_test.go
Auto-merging pkg/provider/azure_local_services.go
CONFLICT (content): Merge conflict in pkg/provider/azure_local_services.go
Auto-merging pkg/provider/azure_loadbalancer_backendpool_test.go
CONFLICT (content): Merge conflict in pkg/provider/azure_loadbalancer_backendpool_test.go
Auto-merging pkg/provider/azure_loadbalancer_backendpool.go
Auto-merging pkg/provider/azure_loadbalancer.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0001 fix: Include all endpointslices for local services when using multi-slb

In response to this:

/cherrypick release-1.30

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-infra-cherrypick-robot

@nilo19: new pull request created: #7605

In response to this:

/cherrypick release-1.31

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-infra-cherrypick-robot

@nilo19: #7432 failed to apply on top of branch "release-1.29":

Patch is empty.
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To record the empty patch as an empty commit, run "git am --allow-empty".
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"

In response to this:

/cherrypick release-1.29

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-infra-cherrypick-robot

@nilo19: #7432 failed to apply on top of branch "release-1.28":

Applying: fix: Include all endpointslices for local services when using multi-slb
Using index info to reconstruct a base tree...
M	pkg/provider/azure_loadbalancer.go
M	pkg/provider/azure_loadbalancer_backendpool.go
M	pkg/provider/azure_loadbalancer_backendpool_test.go
M	pkg/provider/azure_local_services.go
M	pkg/provider/azure_local_services_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/provider/azure_local_services_test.go
CONFLICT (content): Merge conflict in pkg/provider/azure_local_services_test.go
Auto-merging pkg/provider/azure_local_services.go
CONFLICT (content): Merge conflict in pkg/provider/azure_local_services.go
Auto-merging pkg/provider/azure_loadbalancer_backendpool_test.go
CONFLICT (content): Merge conflict in pkg/provider/azure_loadbalancer_backendpool_test.go
Auto-merging pkg/provider/azure_loadbalancer_backendpool.go
CONFLICT (content): Merge conflict in pkg/provider/azure_loadbalancer_backendpool.go
Auto-merging pkg/provider/azure_loadbalancer.go
CONFLICT (content): Merge conflict in pkg/provider/azure_loadbalancer.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0001 fix: Include all endpointslices for local services when using multi-slb

In response to this:

/cherrypick release-1.28

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
5 participants