Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Lock updates on azure resources when other component is doing t… #7193

Merged
merged 2 commits into from
Oct 16, 2024

Conversation

nilo19
Copy link
Contributor

@nilo19 nilo19 commented Oct 3, 2024

…he same thing.

This PR utilizes a lease in each service reconciliation to prevent race conditions where cloud provider and others are updating the same azure resources.

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR utilizes a lease in each service reconciliation to prevent race conditions where cloud provider and others are updating the same azure resources. If the lease has not expired yet and is held by other component, the service reconciliation will be aborted and retried exponentially.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

feat: Lock updates on azure resources when other component is doing the same thing.

This PR utilizes a lease in each service reconciliation to prevent race conditions where cloud provider and others are updating the same azure resources.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Oct 3, 2024
@k8s-ci-robot k8s-ci-robot requested review from feiskyer and jwtty October 3, 2024 07:23
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 3, 2024
@nilo19 nilo19 force-pushed the feat/multi-slb/lock branch from 5995635 to 858a34e Compare October 3, 2024 08:03
@zarvd
Copy link
Contributor

zarvd commented Oct 4, 2024

I’m curious about which component would access the same Azure resource simultaneously, and exactly which resource is it? It seems like the lock covers the entire EnsureLoadBalancer.

@nilo19
Copy link
Contributor Author

nilo19 commented Oct 7, 2024

I’m curious about which component would access the same Azure resource simultaneously, and exactly which resource is it? It seems like the lock covers the entire EnsureLoadBalancer.

AKS RP may have conflicts with cloud provider.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 9, 2024
@nilo19 nilo19 force-pushed the feat/multi-slb/lock branch from 858a34e to fab4e84 Compare October 9, 2024 03:19
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 9, 2024
@nilo19 nilo19 force-pushed the feat/multi-slb/lock branch from fab4e84 to 22fc064 Compare October 10, 2024 03:48
lease.Annotations[consts.AzureResourceLockPreviousHolderNameAnnotation],
consts.AzureResourceLockHolderNameCloudControllerManager,
) {
l.Cloud.lbCache, err = l.Cloud.newLBCache()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalidates cache if the previous holder is another component. We can invalidate more caches here if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'd need clear caches for vmss, vmssvm and vm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@nilo19 nilo19 force-pushed the feat/multi-slb/lock branch from 22fc064 to a8a619b Compare October 10, 2024 04:00
@coveralls
Copy link

Coverage Status

coverage: 76.787% (+0.04%) from 76.75%
when pulling a8a619b on nilo19:feat/multi-slb/lock
into ac0079b on kubernetes-sigs:master.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 10, 2024
@nilo19 nilo19 force-pushed the feat/multi-slb/lock branch from f7fbaa8 to c949880 Compare October 11, 2024 03:25
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 11, 2024
@nilo19 nilo19 force-pushed the feat/multi-slb/lock branch from c949880 to cfcd0ef Compare October 11, 2024 03:36
Copy link
Member

@feiskyer feiskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 11, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: feiskyer, nilo19

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@nilo19
Copy link
Contributor Author

nilo19 commented Oct 11, 2024

/retest

1 similar comment
@nilo19
Copy link
Contributor Author

nilo19 commented Oct 12, 2024

/retest

@nilo19
Copy link
Contributor Author

nilo19 commented Oct 13, 2024

E1012 01:41:09.094776 1 controllermanager.go:93] Run: failed to configure cloud controller manager: --cloud-config cannot be empty when --enable-dynamic-reloading is not set to true

@nilo19
Copy link
Contributor Author

nilo19 commented Oct 14, 2024

/retest

@nilo19
Copy link
Contributor Author

nilo19 commented Oct 14, 2024

/retest

1 similar comment
@nilo19
Copy link
Contributor Author

nilo19 commented Oct 14, 2024

/retest

…he same thing.

This PR utilizes a lease in each service reconciliation to prevent race conditions where cloud provider and others are updating the same azure resources.
@nilo19 nilo19 force-pushed the feat/multi-slb/lock branch from cfcd0ef to 30c2743 Compare October 15, 2024 06:41
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 15, 2024
@feiskyer
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 16, 2024
@k8s-ci-robot k8s-ci-robot merged commit a33f77d into kubernetes-sigs:master Oct 16, 2024
18 checks passed
@nilo19 nilo19 deleted the feat/multi-slb/lock branch October 16, 2024 10:36
@nilo19
Copy link
Contributor Author

nilo19 commented Oct 16, 2024

/cherrypick release-1.31

@k8s-infra-cherrypick-robot

@nilo19: #7193 failed to apply on top of branch "release-1.31":

Applying: feat: Lock updates on azure resources when other component is doing the same thing.
Using index info to reconstruct a base tree...
M	go.sum
M	pkg/provider/azure.go
M	pkg/provider/azure_loadbalancer.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/provider/azure_loadbalancer.go
Auto-merging pkg/provider/azure.go
Auto-merging go.sum
Applying: invalidate vmset caches when locking
Using index info to reconstruct a base tree...
M	pkg/provider/azure_mock_vmsets.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/provider/azure_mock_vmsets.go
CONFLICT (content): Merge conflict in pkg/provider/azure_mock_vmsets.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0002 invalidate vmset caches when locking

In response to this:

/cherrypick release-1.31

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@MartinForReal
Copy link
Contributor

/cherrypick release-1.30

@k8s-infra-cherrypick-robot

@MartinForReal: #7193 failed to apply on top of branch "release-1.30":

Applying: feat: Lock updates on azure resources when other component is doing the same thing.
Using index info to reconstruct a base tree...
M	cmd/cloud-controller-manager/app/controllermanager.go
M	go.sum
M	pkg/consts/consts.go
M	pkg/node/node.go
M	pkg/node/nodearm.go
M	pkg/provider/azure.go
M	pkg/provider/azure_loadbalancer.go
M	pkg/provider/azure_loadbalancer_test.go
Falling back to patching base and 3-way merge...
CONFLICT (add/add): Merge conflict in pkg/provider/azure_lock.go
Auto-merging pkg/provider/azure_lock.go
Auto-merging pkg/provider/azure_loadbalancer_test.go
CONFLICT (content): Merge conflict in pkg/provider/azure_loadbalancer_test.go
Auto-merging pkg/provider/azure_loadbalancer.go
Auto-merging pkg/provider/azure.go
Auto-merging go.sum
Auto-merging cmd/cloud-controller-manager/app/controllermanager.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0001 feat: Lock updates on azure resources when other component is doing the same thing.

In response to this:

/cherrypick release-1.30

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@MartinForReal
Copy link
Contributor

/cherrypick release-1.30

@k8s-infra-cherrypick-robot

@MartinForReal: #7193 failed to apply on top of branch "release-1.30":

Applying: feat: Lock updates on azure resources when other component is doing the same thing.
Using index info to reconstruct a base tree...
M	cmd/cloud-controller-manager/app/controllermanager.go
M	go.sum
M	pkg/consts/consts.go
M	pkg/node/node.go
M	pkg/node/nodearm.go
M	pkg/provider/azure.go
M	pkg/provider/azure_loadbalancer.go
M	pkg/provider/azure_loadbalancer_test.go
Falling back to patching base and 3-way merge...
CONFLICT (add/add): Merge conflict in pkg/provider/azure_lock.go
Auto-merging pkg/provider/azure_lock.go
Auto-merging pkg/provider/azure_loadbalancer_test.go
CONFLICT (content): Merge conflict in pkg/provider/azure_loadbalancer_test.go
Auto-merging pkg/provider/azure_loadbalancer.go
Auto-merging pkg/provider/azure.go
Auto-merging go.sum
Auto-merging cmd/cloud-controller-manager/app/controllermanager.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0001 feat: Lock updates on azure resources when other component is doing the same thing.

In response to this:

/cherrypick release-1.30

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants