Failed to AttachVolume, expected device to be attached but was attaching #1801

adovy · 2023-10-25T11:20:16Z

/kind bug

What happened?

Getting Pod error FailedAttachVolume: AttachVolume.Attach failed for volume "pvc-XXX" : rpc error: code = Internal desc = Could not attach volume "vol-XXX" to node "i-XXX": attachment of disk "vol-XX" failed, expected device to be attached but was attaching

Error log from ebs-csi-controller:

I1025 11:08:34.489703       1 controller.go:415] "ControllerPublishVolume: attaching" volumeID="vol-XXX" nodeID="i-XXX"
E1025 11:08:49.490233       1 driver.go:124] "GRPC error" err="rpc error: code = Internal desc = Could not attach volume \"vol-XXX\" to node \"i-XXX\": context canceled"
I1025 11:08:49.497484       1 controller.go:415] "ControllerPublishVolume: attaching" volumeID="vol-XXX" nodeID="i-XXX"
E1025 11:08:49.629393       1 driver.go:124] "GRPC error" err="rpc error: code = Internal desc = Could not attach volume \"vol-XXX\" to node \"i-XXX\": attachment of disk \"vol-XXX\" failed, expected device to be attached but was attaching"

What you expected to happen?

EBS Volumes to be attached successfully to a node when a PVC is created.

Anything else we need to know?:

I see that actually the volume is attached to Node.
p.s. it doesn't happen with all volumes

Environment

Kubernetes version: v1.27.4-eks-2d98532
Driver version: v1.24.0

The text was updated successfully, but these errors were encountered:

ConnorJC3 · 2023-10-30T14:27:07Z

Do volumes get stuck in this state indefinitely (or for an abnormally long time)? It's usual to see this message if a volume is taking a long time to attach on EBS's side, but it should eventually fix itself after the volume becomes attached.

adovy · 2023-10-30T15:48:30Z

on AWS console I see that volume is attached to the node, but the controller keeps trying to attach it

adovy · 2023-11-15T14:57:16Z

seems like the problem was with reaching limit of EBS Volumes per Node.
But after setting volumeAttachLimit, the pod is still scheduled on the node which already reached the attached volumes limit.

k8s-triage-robot · 2024-02-13T16:16:20Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-03-14T16:57:51Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-04-13T17:11:12Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-04-13T17:11:17Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

aws-patrickc · 2024-05-07T05:39:53Z

Hello,

Any other workaround besides a manual intervention? Can this be consider as a feature request to have EBS CSI Controller to throw an error or better yet have a way to notify the scheduler to re-schedule the pod onto a different worker node?

divya898 · 2024-07-18T15:18:04Z

I want to deploy postgres on k8s cluster made by Kubeadm and getting the issue

"E0718 13:31:52.030062 1 batcher.go:161] "execute: error executing batch" err="error listing AWS instances: operation error EC2: DescribeInstances, https response error StatusCode: 0, RequestID: , canceled, context deadline exceeded"
E0718 13:31:52.030114 1 driver.go:107] "GRPC error" err="rpc error: code = Internal desc = Could not attach volume "vol-xxxxxxxxxxxxx" to node "i-xxxxxxxxxxxxx": error listing AWS instances: operation error EC2: DescribeInstances, https response error StatusCode: 0, RequestID: , canceled, context deadline exceeded""

And also I have created pv and pvc which is bounded with the help of ebs-csi-driver and I have deployed csi friver using helm.

"ubuntu@k8smaster:/application/postgres$ kubectl get pvc -n app
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
postgres-pvc Bound existing-ebs-pv 60Gi RWO ebs-postgres 115m
ubuntu@k8smaster:/application/postgres$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
existing-ebs-pv 60Gi RWO Retain Bound app/postgres-pvc ebs-postgres 115m"

AND

I have correctly configured the iam role and attach to all nodes
with this policy
"
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:CreateSnapshot",
"ec2:CreateTags",
"ec2:CreateVolume",
"ec2:DeleteSnapshot",
"ec2:DeleteTags",
"ec2:DeleteVolume",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeInstances",
"ec2:DescribeSnapshots",
"ec2:DescribeTags",
"ec2:DescribeVolumes",
"ec2:DescribeVolumesModifications",
"ec2:AttachVolume",
"ec2:DetachVolume"
],
"Resource": "*"
}
]
}
"

can any one tell what is the issue ?

torredil · 2024-07-18T15:36:41Z

@divya898 Deploying the driver with SDK debug mode enabled and taking a look at the logs once again will be very helpful for debugging the issue.

divya898 · 2024-07-18T17:45:15Z

@torredil
Ya
I have used this command helm repo add aws-ebs-csi-driver https://kubernetes-sigs.github.io/aws-ebs-csi-driver
and install the csi

pod/ebs-csi-controller-86957f8467-785bg 5/5 Running 0 5h54m
pod/ebs-csi-controller-86957f8467-vxcmc 5/5 Running 0 5h54m
pod/ebs-csi-node-4jkrz 3/3 Running 0 5h54m
pod/ebs-csi-node-kk6qw 3/3 Running 0 5h54m
pod/ebs-csi-node-zrljn 3/3 Running 0 5h54m

and I can able to bound with pvc and pv
But the issue persist pod are in creating state

Events:
Type Reason Age From Message

Normal Scheduled 72s default-scheduler Successfully assigned app/postgres-777d8c6765-d2tct to k8sworker2.example.net
Warning FailedAttachVolume 4s (x8 over 71s) attachdetach-controller AttachVolume.Attach failed for volume "existing-ebs-pv" : volume attachment is being deleted

Logs of CSI

controller pod :-

E0718 17:06:24.989148 1 batcher.go:161] "execute: error executing batch" err="error listing AWS instances: operation error EC2: DescribeInstances, https response error StatusCode: 0, RequestID: , canceled, context deadline exceeded"
E0718 17:06:24.989220 1 driver.go:107] "GRPC error" err="rpc error: code = Internal desc = Could not detach volume "vol-02a0b78825ece9614" from node "i-0492a4a4bf1856321": error listing AWS instances: operation error EC2: DescribeInstances, https response error StatusCode: 0, RequestID: , canceled, context deadline exceeded"
I0718 17:36:25.048006 1 controller.go:464] "ControllerUnpublishVolume: detaching" volumeID="vol-02a0b78825ece9614" nodeID="i-0492a4a4bf1856321"
E0718 17:36:55.551709 1 batcher.go:161] "execute: error executing batch" err="error listing AWS instances: operation error EC2: DescribeInstances, https response error StatusCode: 0, RequestID: , canceled, context deadline exceeded"
E0718 17:36:55.551774 1 driver.go:107] "GRPC error" err="rpc error: code = Internal desc = Could not detach volume "vol-02a0b78825ece9614" from node "i-0492a4a4bf1856321": error listing AWS instances: operation error EC2: DescribeInstances, https response error StatusCode: 0, RequestID: , canceled, context deadline exceeded"
I0718 17:38:21.934625 1 controller.go:464] "ControllerUnpublishVolume: detaching" volumeID="vol-02a0b78825ece9614" nodeID="i-0492a4a4bf1856321"
E0718 17:38:52.436619 1 batcher.go:161] "execute: error executing batch" err="error listing AWS instances: operation error EC2: DescribeInstances, https response error StatusCode: 0, RequestID: , canceled, context deadline exceeded"
E0718 17:38:52.436696 1 driver.go:107] "GRPC error" err="rpc error: code = Internal desc = Could not detach volume "vol-02a0b78825ece9614" from node "i-0492a4a4bf1856321": error listing AWS instances: operation error EC2: DescribeInstances, https response error StatusCode: 0, RequestID: , canceled, context deadline exceeded"

ubuntu@k8smaster:/application/postgres$ k logs ebs-csi-controller-86957f8467-vxcmc --tail=100 -n kube-system
Defaulted container "ebs-plugin" out of: ebs-plugin, csi-provisioner, csi-attacher, csi-resizer, liveness-probe
I0718 11:18:13.486934 1 main.go:157] "Initializing metadata"
I0718 11:18:13.509506 1 metadata.go:48] "Retrieved metadata from IMDS"
I0718 11:18:13.510885 1 driver.go:68] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.32.0"
ubuntu@k8smaster:/application/postgres$

ubuntu@k8smaster:~/application/postgres$ k logs ebs-csi-node-4jkrz --tail=100 -n kube-system
Defaulted container "ebs-plugin" out of: ebs-plugin, node-driver-registrar, liveness-probe
I0718 11:18:12.593894 1 main.go:157] "Initializing metadata"
I0718 11:18:12.633046 1 metadata.go:48] "Retrieved metadata from IMDS"
I0718 11:18:12.633999 1 driver.go:68] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.32.0"
E0718 11:18:13.644744 1 node.go:856] "Unexpected failure when attempting to remove node taint(s)" err="isAllocatableSet: driver not found on node k8smaster.example.net"
E0718 11:18:14.152630 1 node.go:856] "Unexpected failure when attempting to remove node taint(s)" err="isAllocatableSet: driver not found on node k8smaster.example.net"
E0718 11:18:15.159355 1 node.go:856] "Unexpected failure when attempting to remove node taint(s)" err="isAllocatableSet: driver not found on node k8smaster.example.net"
I0718 11:18:17.167929 1 node.go:936] "CSINode Allocatable value is set" nodeName="k8smaster.example.net" count=38

ubuntu@k8smaster:~/application/postgres$ k logs ebs-csi-node-kk6qw --tail=100 -n kube-system
Defaulted container "ebs-plugin" out of: ebs-plugin, node-driver-registrar, liveness-probe
I0718 11:18:12.840835 1 main.go:157] "Initializing metadata"
I0718 11:18:12.876464 1 metadata.go:48] "Retrieved metadata from IMDS"
I0718 11:18:12.878910 1 driver.go:68] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.32.0"
E0718 11:18:13.905032 1 node.go:856] "Unexpected failure when attempting to remove node taint(s)" err="isAllocatableSet: driver not found on node k8sworker2.example.net"
E0718 11:18:14.413693 1 node.go:856] "Unexpected failure when attempting to remove node taint(s)" err="isAllocatableSet: driver not found on node k8sworker2.example.net"
E0718 11:18:15.426821 1 node.go:856] "Unexpected failure when attempting to remove node taint(s)" err="isAllocatableSet: driver not found on node k8sworker2.example.net"
I0718 11:18:17.433850 1 node.go:936] "CSINode Allocatable value is set" nodeName="k8sworker2.example.net" count=26

ubuntu@k8smaster:~/application/postgres$ k logs ebs-csi-node-zrljn --tail=100 -n kube-system
Defaulted container "ebs-plugin" out of: ebs-plugin, node-driver-registrar, liveness-probe
I0718 11:18:11.396228 1 main.go:157] "Initializing metadata"
I0718 11:18:11.405170 1 metadata.go:48] "Retrieved metadata from IMDS"
I0718 11:18:11.405756 1 driver.go:68] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.32.0"
E0718 11:18:12.418598 1 node.go:856] "Unexpected failure when attempting to remove node taint(s)" err="isAllocatableSet: driver not found on node k8sworker1.example.net"
E0718 11:18:12.926204 1 node.go:856] "Unexpected failure when attempting to remove node taint(s)" err="isAllocatableSet: driver not found on node k8sworker1.example.net"
E0718 11:18:13.932893 1 node.go:856] "Unexpected failure when attempting to remove node taint(s)" err="isAllocatableSet: driver not found on node k8sworker1.example.net"
I0718 11:18:15.939380 1 node.go:936] "CSINode Allocatable value is set" nodeName="k8sworker1.example.net" count=38

Please tell what i am missing ?

torredil · 2024-07-18T17:57:09Z

@divya898 It looks like SDK debug logging has not been enabled, you can do so via the controller.sdkDebugLog=true parameter. Example:

helm upgrade --install aws-ebs-csi-driver --namespace kube-system ./charts/aws-ebs-csi-driver --values ./charts/aws-ebs-csi-driver/values.yaml --set controller.sdkDebugLog=true

you should be able to see more detailed information that will help troubleshoot why the HTTP request is timing out directly in the CSI controller logs:

Action=DescribeInstances&InstanceId.1=i-0c73a509f58829453&Version=2016-11-15
SDK 2024/07/18 17:51:31 DEBUG Response
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Cache-Control: no-cache, no-store
Content-Type: text/xml;charset=UTF-8
Date: Thu, 18 Jul 2024 17:51:31 GMT
Server: AmazonEC2
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: accept-encoding
X-Amzn-Requestid: 693436ba-xxxx-xxxx-xxxx-a03a27ea5640
...

divya898 · 2024-07-18T18:53:14Z

Hi @torredil
thankyou for your time I can see my issue is solved by restarting the csi-driver pod by updating the helm repo .

torredil · 2024-07-18T18:55:33Z

@divya898 Glad to hear 👍

Let us know if you run into any other issues and we'll be happy to help.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 25, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 13, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 14, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 13, 2024

torredil mentioned this issue Feb 26, 2025

Firewall fails to recognize TLS version and SNI when EBS CSI controller talks to AWS API endpoints #2364

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to AttachVolume, expected device to be attached but was attaching #1801

Failed to AttachVolume, expected device to be attached but was attaching #1801

adovy commented Oct 25, 2023

ConnorJC3 commented Oct 30, 2023

adovy commented Oct 30, 2023

adovy commented Nov 15, 2023 •

edited

Loading

k8s-triage-robot commented Feb 13, 2024

k8s-triage-robot commented Mar 14, 2024

k8s-triage-robot commented Apr 13, 2024

k8s-ci-robot commented Apr 13, 2024

aws-patrickc commented May 7, 2024

divya898 commented Jul 18, 2024

torredil commented Jul 18, 2024

divya898 commented Jul 18, 2024 •

edited

Loading

torredil commented Jul 18, 2024

divya898 commented Jul 18, 2024

torredil commented Jul 18, 2024

Failed to AttachVolume, expected device to be attached but was attaching #1801

Failed to AttachVolume, expected device to be attached but was attaching #1801

Comments

adovy commented Oct 25, 2023

ConnorJC3 commented Oct 30, 2023

adovy commented Oct 30, 2023

adovy commented Nov 15, 2023 • edited Loading

k8s-triage-robot commented Feb 13, 2024

k8s-triage-robot commented Mar 14, 2024

k8s-triage-robot commented Apr 13, 2024

k8s-ci-robot commented Apr 13, 2024

aws-patrickc commented May 7, 2024

divya898 commented Jul 18, 2024

torredil commented Jul 18, 2024

divya898 commented Jul 18, 2024 • edited Loading

torredil commented Jul 18, 2024

divya898 commented Jul 18, 2024

torredil commented Jul 18, 2024

adovy commented Nov 15, 2023 •

edited

Loading

divya898 commented Jul 18, 2024 •

edited

Loading