EKS Auto Mode seems to have a bug in its EBS CSI storage provisioner "ebs.csi.eks.amazonaws.com" in how it handles "ReadWriteOnce” #2331

setheliot · 2025-02-09T03:50:26Z

/kind bug

EKS Auto Mode seems to have a bug in its EBS CSI storage provisioner "ebs.csi.eks.amazonaws.com" in how it handles "ReadWriteOnce”

In summary, when using EKS with Auto Mode enabled and I create an EBS PV with access mode "ReadWriteOnce”…

I EXPECT: that all pods on the EKS Node with the mounted EBS volume can access the (PersistentVolume) PV
But ACTUALLY: only one out of the total three pods can access the PV
Additionally, when NOT using Auto Mode, I observe the expected behavior — all pods can access the PV

In more detail:

To illustrate this problem, I will compare two clusters
Cluster 1 is an EKS cluster WITHOUT Auto Mode (using "ebs.csi.aws.com”)
Cluster 2 is an EKS cluster with Auto Mode ENABLED (using "ebs.csi.eks.amazonaws.com”)

In both clusters:

I provision a StorageClass for an EBS volume
I provision a PVC with access mode "ReadWriteOnce”, using the StorageClass
There are three pods, all three configured to attach a volume mount using the PVC
All three pods are deployed to the EKS Node that has the attached EBS volume (as expected, to satisfy the PV Claim)

The difference is

For Cluster 1 all three pods can successfully access the EBS PV - as expected for “ReadWriteOnce”
For Cluster 2, only one out of the three pod can access the EBS PV, the other two fail with “access denied”

Observations on Cluster 2

When initializing, all three pods claim to successfully mount the volume in their event logs
The volume mount path is created on all three pods. But only one pod can successfully access it. The other two see “access denied” when trying to access that path

Repro:
This repo reliably reproduces the issue:
https://github.com/setheliot/eks_auto_mode

Once installed, open the app and make several writes to the table... after that refresh and see how the PV read responds per pod

Environment

Kubernetes version (use kubectl version): Server Version: v1.32.0-eks-5ca49cb
Driver version:
Repo: https://github.com/setheliot/eks_auto_mode

The text was updated successfully, but these errors were encountered:

AndrewSirenko · 2025-02-10T15:09:32Z

Hi @setheliot, you are correct that multiple pods on the node should be able to access that "ReadWriteOnce" PV. We appreciate your detailed description of the problem!

Thank you for submitting an issue, but this kubernetes-sigs AWS EBS CSI Driver project solely serves issues related to the kubernetes-sigs AWS EBS CSI Driver. This looks like an issue specific to EKS auto mode and the ebs.csi.eks.amazonaws.com driver.

Could you please file an AWS customer support ticket with this issue? Meanwhile I will ensure EKS Auto is aware of this issue.

Thank you.

setheliot · 2025-02-11T04:09:46Z

This is the answer from AWS Support. To me, this does not quite add up. What do you think?

==========

I understand you reached out regarding an EKS Auto Mode issue where the EBS CSI storage provisioner "ebs.csi.eks.amazonaws.com" is not handling the "ReadWriteOnce" access mode as expected. When using EKS with Auto Mode enabled, only one pod can access the EBS Persistent Volume (PV) even though all pods are on the same node, while the same setup works correctly without Auto Mode. Please clarify if I have correctly understood the situation or if there are any additional details you would like to provide.

Based on my analysis of the information you've provided and the reproduction steps in your GitHub repository, I can confirm this is a known behavior difference between the traditional EBS CSI driver and the Auto Mode implementation. Let me explain the situation and provide some solutions.

Root Cause: The key difference lies in how ReadWriteOnce (RWO) access mode is implemented:

Traditional EBS CSI driver (ebs.csi.aws.com) implements RWO at the node level [1]
Auto Mode CSI driver (ebs.csi.eks.amazonaws.com) implements RWO at the pod level [2]

This explains why you're seeing access denied errors for additional pods even though they're on the same node.

++Immediate Solutions:

Use StatefulSets instead of Deployments: This is the recommended approach as it ensures each pod gets its own PVC:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: guestbook
spec:
  serviceName: guestbook
  replicas: 3
  template:
    spec:
      containers:
      - name: guestbook
        volumeMounts:
        - name: data
          mountPath: /app/data
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 4Gi

Switch to EFS: If you need true shared storage, consider using Amazon EFS which natively supports ReadWriteMany access mode [3]. Here's a sample StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: efs-sc
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap
  fileSystemId: fs-xxxxxx
  directoryPerms: "700"

Revert to Manual Mode: If neither of the above solutions work for your use case, you can disable Auto Mode and use the traditional EBS CSI driver.

Questions you might have:

Q: Will switching to StatefulSets require application changes?
A: Yes, you'll need to modify your deployment manifests, but the container configurations can remain the same.

Q: Is there a performance impact using EFS instead of EBS?
A: Yes, EFS and EBS have different performance characteristics. EFS provides shared access but might have higher latency compared to EBS [4].

References:
[1] AWS EBS CSI Driver Documentation: https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html
[2] EKS Auto Mode Documentation: https://docs.aws.amazon.com/eks/latest/userguide/eks-add-ons.html
[3] AWS EFS CSI Driver: https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html
[4] EBS vs EFS Performance Comparison: https://docs.aws.amazon.com/efs/latest/ug/performance.html

AndrewSirenko · 2025-02-11T15:42:28Z

Thank you for reaching out to support.

Auto Mode CSI driver (ebs.csi.eks.amazonaws.com) implements RWO at the pod level [2]

As you likely already know, "implementing RWO at the pod level" doesn't sound Kubernetes conformant. As the Kubernetes access modes documentation explains:

ReadWriteOnce
The volume can be mounted as read-write by a single node. ReadWriteOnce access mode still can allow multiple pods to access (read from or write to) that volume when the pods are running on the same node.

ReadWriteOncePod
The volume can be mounted as read-write by a single Pod. Use ReadWriteOncePod access mode if you want to ensure that only one pod across the whole cluster can read that PVC or write to it.

I agree with you, EKS should take another look at this implementation decision. This may take some time. Thank you for your patience.

tzneal · 2025-02-11T19:33:56Z

EKS Auto Mode Nodes have an enhanced level of Pod isolation, there are some more details here and here.

To allow multiple Pods to share this volume, you can configure those Pods to share the same SELinux categories. Here the triple c123,c124,c125 is a unique set of categories that won't conflict with any other Pods that are assigned default categories on the node, since those are assigned category pairs. e.g.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-deployment
  labels:
    app: sample
spec:
  replicas: 3
  selector:
    matchLabels:
      app: sample
  template:
    metadata:
      labels:
        app: sample
    spec:
      securityContext:
       seLinuxOptions:
         level: "s0:c123,c124,c125"
...

Alternatively, you can allow your Pods access to all categories, but this has side effects of removing all of the Pod level isolation instead of creating a set of specific Pods for which you want to remove the isolation:

      securityContext:
       seLinuxOptions:
         level: "s0:c0.c1023"

AndrewSirenko · 2025-02-11T20:00:08Z

Many thanks @tzneal for pointing out that SELinux is at play here.

@setheliot looks like you would see similar behavior with the ebs.csi.aws.com driver on self-managed nodes with SELinux enforcing OS (E.g. self-managed Bottlerocket Nodes). TIL that this is a feature, not a bug.

setheliot · 2025-02-11T21:57:39Z

@tzneal called it... here is my latest response from AWS Support:

Q1. Why would Auto Mode CSI driver be designed to operate this way? There already is an access mode called ReadWriteOncePod which would allow access by only one Pod. With ReadWriteOnce there should be access by ALL pods on the same node.

You're absolutely correct in your understanding of how ReadWriteOnce should work. The behavior you're seeing isn't actually a design choice of the Auto Mode CSI driver. Instead, it's related to SELinux policies in Bottlerocket (the default AMI for EKS Auto Mode). This enhanced isolation between pods is a security feature of SELinux-enforcing operating systems, including Bottlerocket and RHEL when SELinux is enabled.

Q2. Could you please provide me with any AWS documentation or artifact that supports that this is the intended behavior with "ReadWriteOnce" in Auto Mode?

You're right to ask for documentation. Currently, our documentation doesn't adequately explain this SELinux-related behavior. We're in the process of reviewing and updating our documentation to better reflect this behavior and the required configuration. I apologize for the confusion this has caused.

setheliot · 2025-02-11T21:58:22Z

Guess I am going to have to learn how to use seLinuxOptions

tzneal · 2025-02-13T14:33:16Z

The docs are now updated to cover this scenario at https://docs.aws.amazon.com/eks/latest/userguide/auto-troubleshoot.html#auto-troubleshoot-share-pod-volumes

egachi · 2025-02-25T21:00:59Z

Hello, even after applying securityContext to the pods, if you have static provisioning, the EBS storage will be attached to a particular node by design, since EKS Automode will launch new nodes to fulfill new pods, there is a possibility that the new pods that are sharing the same static volume will be scheduled on the new nodes and they will fail with error:

Warning FailedAttachVolume 19m attachdetach-controller Multi-Attach error for volume "<static_pv>" Volume is already used by pod(s) ebs-app-0.

Solution

Use node selectors based in the same availability zone as the static EBS volume and host instance. You can use the same labels added by EKS Automode topology.ebs.csi.eks.amazonaws.com/zone and kubernetes.io/hostname, this will ensure that pods are attached to the same host and availability zone.
Tag your EBS volume with eks:eks-cluster-name=<clustername>

Manifest sample:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: ebs-app
spec:
  serviceName: "ebs-app"
  replicas: 40
  selector:
    matchLabels:
      app: ebs-app
  template:
    metadata:
      labels:
        app: ebs-app
    spec:
      nodeSelector:
        topology.ebs.csi.eks.amazonaws.com/zone: "<availability-zone>"
        kubernetes.io/hostname: "<node-hostname>"
      containers:
        - name: app
          image: centos
          command: ["/bin/sh"]
          args:
            [
              "-c",
              "while true; do echo $(date -u) >> /data/out.txt; sleep 2; done",
            ]
          volumeMounts:
            - name: persistent-storage
              mountPath: /data
          securityContext:
            seLinuxOptions:
              level: "s0:c123,c456,c789"
      volumes:
        - name: persistent-storage
          persistentVolumeClaim:
            claimName: ebs-claim

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EKS Auto Mode seems to have a bug in its EBS CSI storage provisioner "ebs.csi.eks.amazonaws.com" in how it handles "ReadWriteOnce” #2331

EKS Auto Mode seems to have a bug in its EBS CSI storage provisioner "ebs.csi.eks.amazonaws.com" in how it handles "ReadWriteOnce” #2331

setheliot commented Feb 9, 2025

AndrewSirenko commented Feb 10, 2025 •

edited

Loading

setheliot commented Feb 11, 2025

AndrewSirenko commented Feb 11, 2025 •

edited

Loading

tzneal commented Feb 11, 2025

AndrewSirenko commented Feb 11, 2025 •

edited

Loading

setheliot commented Feb 11, 2025

setheliot commented Feb 11, 2025

tzneal commented Feb 13, 2025

egachi commented Feb 25, 2025 •

edited

Loading

EKS Auto Mode seems to have a bug in its EBS CSI storage provisioner "ebs.csi.eks.amazonaws.com" in how it handles "ReadWriteOnce” #2331

EKS Auto Mode seems to have a bug in its EBS CSI storage provisioner "ebs.csi.eks.amazonaws.com" in how it handles "ReadWriteOnce” #2331

Comments

setheliot commented Feb 9, 2025

AndrewSirenko commented Feb 10, 2025 • edited Loading

setheliot commented Feb 11, 2025

AndrewSirenko commented Feb 11, 2025 • edited Loading

tzneal commented Feb 11, 2025

AndrewSirenko commented Feb 11, 2025 • edited Loading

setheliot commented Feb 11, 2025

setheliot commented Feb 11, 2025

tzneal commented Feb 13, 2025

egachi commented Feb 25, 2025 • edited Loading

Solution

AndrewSirenko commented Feb 10, 2025 •

edited

Loading

AndrewSirenko commented Feb 11, 2025 •

edited

Loading

AndrewSirenko commented Feb 11, 2025 •

edited

Loading

egachi commented Feb 25, 2025 •

edited

Loading