Fatal error - Another operation of type DeregisterInstance is in progress #5134

stefaneg · 2025-02-28T09:18:40Z

What happened:
With DNS Controller configured with aws_sd provider, it exits with a fatal error occasionally due to a seeming race condition, eventually entering a crash loop.

{"level":"fatal","msg":"Failed to do run once: operation error ServiceDiscovery: RegisterInstance, https response error StatusCode: 400, RequestID: d270d6a7-a36e-44a7-b88d-72d197c38578, DuplicateRequest: Another operation of type DeregisterInstance and id tr2szldps72jcdtoj2oyb3ckwplbl55r-6buit117 is in progress","time":"2025-02-25T13:51:21Z"}

What you expected to happen:
Expect aws_sd to complete registration and de-registration successfully without panicking.

How to reproduce it (as minimally and precisely as possible):
The error seems to manifest when pods are rescheduled between nodes, usually due to Karpenter rebalancing the cluster. This results in changes of IP addresses, requiring recreation of Route53 records.
A minimal reproduction has not been attempted, as we have a fix.

Anything else we need to know?:
We have been running a patched version of DNS controller for our private namespace for approximately 3 years. It was believed this patch was an optimisation, but it turns out it also fixes this issue with interacting with the AWS API.

A PR that fixes this issue is forthcoming.

A PR was filed for this fix before.

#3123

Environment:

External-DNS version (use external-dns --version):
0.15.1
DNS provider:
AWS Route53
Others:
Deployment configuration:

    - args:
        - --log-level
        - info
        - --log-format
        - json
        - --provider
        - aws-sd
        - --registry
        - aws-sd
        - --policy
        - sync
        - --interval
        - 10s
        - --source
        - service
        - --aws-api-retries
        - "3"
        - --domain-filter
        - company.local
        - --aws-zone-type
        - private
        - --annotation-filter
        - dns.company.com/type=internal
        - --fqdn-template
        - '{{index .ObjectMeta.Annotations "dns.company.com/label"}}.company.local'

The text was updated successfully, but these errors were encountered:

stefaneg added the kind/bug Categorizes issue or PR as related to a bug. label Feb 28, 2025

stefaneg linked a pull request Feb 28, 2025 that will close this issue

Only de-register removed targets #5135

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fatal error - Another operation of type DeregisterInstance is in progress #5134

Fatal error - Another operation of type DeregisterInstance is in progress #5134

stefaneg commented Feb 28, 2025

Fatal error - Another operation of type DeregisterInstance is in progress #5134

Fatal error - Another operation of type DeregisterInstance is in progress #5134

Comments

stefaneg commented Feb 28, 2025