Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add leader transfer logic #63

Merged
merged 3 commits into from
Jan 27, 2025
Merged

Conversation

AmirAllahveran
Copy link
Contributor

  • Implemented functionality to check if a member is the leader before defragmentation.
  • Added logic to transfer leadership to a non-leader member if the current member is the leader.
  • Ensured robust error handling during leader transfer to prevent cluster disruption.
  • Integrated the leader transfer step into the defragmentation process with support for --move-leader flag.
  • Enhanced resilience by allowing continuation on error with the --continue-on-error flag.

This change mitigates the risk of outages caused by defragmenting the leader node in an etcd cluster.

Resolves #58

@ahrtr
Copy link
Owner

ahrtr commented Jan 26, 2025

could you please also update the readme?

@@ -39,6 +39,7 @@ It adds the following extra flags,
| `--defrag-rule` | defragmentation rule (etcd-defrag will run defragmentation if the rule is empty or it is evaluated to true), defaults to empty. See more details below. |
| `--dry-run` | evaluate whether or not endpoints require defragmentation, but don't actually perform it, defaults to `false`. |
| `--exclude-localhost` | whether to exclude localhost endpoints, defaults to `false`. |
| `--move-leader` | whether to move the leader to a randomly picked non-leader ID and make it the new leader, defaults to `false`. |
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `--move-leader` | whether to move the leader to a randomly picked non-leader ID and make it the new leader, defaults to `false`. |
| `--move-leader` | whether to move the leader to another follower before performing defragment on it to minimize potential performance impact, defaults to `false`. |

@@ -70,6 +71,7 @@ Flags:
--keepalive-time duration keepalive time for client connections (default 2s)
--keepalive-timeout duration keepalive timeout for client connections (default 6s)
--key string identify secure client using this TLS key file
--move-leader whether to move the leader to a randomly picked non-leader ID and make it the new leader
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, please update the description

@@ -24,6 +24,7 @@ func newDefragCommand() *cobra.Command {
defragCmd.Flags().StringSliceVar(&globalCfg.endpoints, "endpoints", []string{"127.0.0.1:2379"}, "comma separated etcd endpoints")
defragCmd.Flags().BoolVar(&globalCfg.useClusterEndpoints, "cluster", false, "use all endpoints from the cluster member list")
defragCmd.Flags().BoolVar(&globalCfg.excludeLocalhost, "exclude-localhost", false, "whether to exclude localhost endpoints")
defragCmd.Flags().BoolVar(&globalCfg.moveLeader, "move-leader", false, "whether to move the leader to a randomly picked non-leader ID and make it the new leader")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, please update the description

Comment on lines +235 to +238
defer c.Close()

ctx, cancel := commandCtx(gcfg.commandTimeout)
defer cancel()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment

Suggested change
defer c.Close()
ctx, cancel := commandCtx(gcfg.commandTimeout)
defer cancel()
ctx, cancel := commandCtx(gcfg.commandTimeout)
defer func() {
c.Close()
cancel()
}()

ctx, cancel := commandCtx(gcfg.commandTimeout)
defer cancel()

fmt.Printf("Requesting leader at %s to transfer leadership to member ID %d...\n", leaderEp, transfereeID)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest not to print log in this utility function, please print log in main.go only if needed

Suggested change
fmt.Printf("Requesting leader at %s to transfer leadership to member ID %d...\n", leaderEp, transfereeID)

return fmt.Errorf("failed to move leader: %w", err)
}

fmt.Println("successfully transferred leadership from", leaderEp, "to member ID ", transfereeID)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Suggested change
fmt.Println("successfully transferred leadership from", leaderEp, "to member ID ", transfereeID)

Copy link
Owner

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Thank you.

Let me resolve the comments in a separate PR.

@ahrtr ahrtr merged commit 8241946 into ahrtr:main Jan 27, 2025
6 of 7 checks passed
@ahrtr
Copy link
Owner

ahrtr commented Jan 27, 2025

Refer to 5b38140

It also fixed a bug in this PR. When there is only one member in the cluster, we shouldn't move leader at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add optional move-leader before defragmenting the leader
2 participants