You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
With DNS Controller configured with aws_sd provider, it exits with a fatal error occasionally due to a seeming race condition, eventually entering a crash loop.
{"level":"fatal","msg":"Failed to do run once: operation error ServiceDiscovery: RegisterInstance, https response error StatusCode: 400, RequestID: d270d6a7-a36e-44a7-b88d-72d197c38578, DuplicateRequest: Another operation of type DeregisterInstance and id tr2szldps72jcdtoj2oyb3ckwplbl55r-6buit117 is in progress","time":"2025-02-25T13:51:21Z"}
What you expected to happen:
Expect aws_sd to complete registration and de-registration successfully without panicking.
How to reproduce it (as minimally and precisely as possible):
The error seems to manifest when pods are rescheduled between nodes, usually due to Karpenter rebalancing the cluster. This results in changes of IP addresses, requiring recreation of Route53 records.
A minimal reproduction has not been attempted, as we have a fix.
Anything else we need to know?:
We have been running a patched version of DNS controller for our private namespace for approximately 3 years. It was believed this patch was an optimisation, but it turns out it also fixes this issue with interacting with the AWS API.
What happened:
With DNS Controller configured with aws_sd provider, it exits with a fatal error occasionally due to a seeming race condition, eventually entering a crash loop.
What you expected to happen:
Expect aws_sd to complete registration and de-registration successfully without panicking.
How to reproduce it (as minimally and precisely as possible):
The error seems to manifest when pods are rescheduled between nodes, usually due to Karpenter rebalancing the cluster. This results in changes of IP addresses, requiring recreation of Route53 records.
A minimal reproduction has not been attempted, as we have a fix.
Anything else we need to know?:
We have been running a patched version of DNS controller for our private namespace for approximately 3 years. It was believed this patch was an optimisation, but it turns out it also fixes this issue with interacting with the AWS API.
A PR that fixes this issue is forthcoming.
A PR was filed for this fix before.
#3123
Environment:
External-DNS version (use
external-dns --version
):0.15.1
DNS provider:
AWS Route53
Others:
Deployment configuration:
The text was updated successfully, but these errors were encountered: