Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EKS Auto Mode seems to have a bug in its EBS CSI storage provisioner "ebs.csi.eks.amazonaws.com" in how it handles "ReadWriteOnce” #2331

Open
setheliot opened this issue Feb 9, 2025 · 9 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@setheliot
Copy link

/kind bug

EKS Auto Mode seems to have a bug in its EBS CSI storage provisioner "ebs.csi.eks.amazonaws.com" in how it handles "ReadWriteOnce”

In summary, when using EKS with Auto Mode enabled and I create an EBS PV with access mode "ReadWriteOnce”…

  • I EXPECT: that all pods on the EKS Node with the mounted EBS volume can access the (PersistentVolume) PV
  • But ACTUALLY: only one out of the total three pods can access the PV
  • Additionally, when NOT using Auto Mode, I observe the expected behavior — all pods can access the PV

In more detail:

To illustrate this problem, I will compare two clusters
Cluster 1 is an EKS cluster WITHOUT Auto Mode (using "ebs.csi.aws.com”)
Cluster 2 is an EKS cluster with Auto Mode ENABLED (using "ebs.csi.eks.amazonaws.com”)

In both clusters:

  • I provision a StorageClass for an EBS volume
  • I provision a PVC with access mode "ReadWriteOnce”, using the StorageClass
  • There are three pods, all three configured to attach a volume mount using the PVC
  • All three pods are deployed to the EKS Node that has the attached EBS volume (as expected, to satisfy the PV Claim)

The difference is

  • For Cluster 1 all three pods can successfully access the EBS PV - as expected for “ReadWriteOnce”
  • For Cluster 2, only one out of the three pod can access the EBS PV, the other two fail with “access denied”

Observations on Cluster 2

  • When initializing, all three pods claim to successfully mount the volume in their event logs
  • The volume mount path is created on all three pods. But only one pod can successfully access it. The other two see “access denied” when trying to access that path

Repro:
This repo reliably reproduces the issue:
https://github.com/setheliot/eks_auto_mode

  • Once installed, open the app and make several writes to the table... after that refresh and see how the PV read responds per pod

Environment

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 9, 2025
@AndrewSirenko
Copy link
Contributor

AndrewSirenko commented Feb 10, 2025

Hi @setheliot, you are correct that multiple pods on the node should be able to access that "ReadWriteOnce" PV. We appreciate your detailed description of the problem!

Thank you for submitting an issue, but this kubernetes-sigs AWS EBS CSI Driver project solely serves issues related to the kubernetes-sigs AWS EBS CSI Driver. This looks like an issue specific to EKS auto mode and the ebs.csi.eks.amazonaws.com driver.

Could you please file an AWS customer support ticket with this issue? Meanwhile I will ensure EKS Auto is aware of this issue.

Thank you.

@setheliot
Copy link
Author

This is the answer from AWS Support. To me, this does not quite add up. What do you think?

==========

I understand you reached out regarding an EKS Auto Mode issue where the EBS CSI storage provisioner "ebs.csi.eks.amazonaws.com" is not handling the "ReadWriteOnce" access mode as expected. When using EKS with Auto Mode enabled, only one pod can access the EBS Persistent Volume (PV) even though all pods are on the same node, while the same setup works correctly without Auto Mode. Please clarify if I have correctly understood the situation or if there are any additional details you would like to provide.

Based on my analysis of the information you've provided and the reproduction steps in your GitHub repository, I can confirm this is a known behavior difference between the traditional EBS CSI driver and the Auto Mode implementation. Let me explain the situation and provide some solutions.

Root Cause: The key difference lies in how ReadWriteOnce (RWO) access mode is implemented:

Traditional EBS CSI driver (ebs.csi.aws.com) implements RWO at the node level [1]
Auto Mode CSI driver (ebs.csi.eks.amazonaws.com) implements RWO at the pod level [2]

This explains why you're seeing access denied errors for additional pods even though they're on the same node.

++Immediate Solutions:

Use StatefulSets instead of Deployments: This is the recommended approach as it ensures each pod gets its own PVC:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: guestbook
spec:
  serviceName: guestbook
  replicas: 3
  template:
    spec:
      containers:
      - name: guestbook
        volumeMounts:
        - name: data
          mountPath: /app/data
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 4Gi

Switch to EFS: If you need true shared storage, consider using Amazon EFS which natively supports ReadWriteMany access mode [3]. Here's a sample StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: efs-sc
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap
  fileSystemId: fs-xxxxxx
  directoryPerms: "700"

Revert to Manual Mode: If neither of the above solutions work for your use case, you can disable Auto Mode and use the traditional EBS CSI driver.

Questions you might have:

Q: Will switching to StatefulSets require application changes?
A: Yes, you'll need to modify your deployment manifests, but the container configurations can remain the same.

Q: Is there a performance impact using EFS instead of EBS?
A: Yes, EFS and EBS have different performance characteristics. EFS provides shared access but might have higher latency compared to EBS [4].

References:
[1] AWS EBS CSI Driver Documentation: https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html
[2] EKS Auto Mode Documentation: https://docs.aws.amazon.com/eks/latest/userguide/eks-add-ons.html
[3] AWS EFS CSI Driver: https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html
[4] EBS vs EFS Performance Comparison: https://docs.aws.amazon.com/efs/latest/ug/performance.html

@AndrewSirenko
Copy link
Contributor

AndrewSirenko commented Feb 11, 2025

Thank you for reaching out to support.

Auto Mode CSI driver (ebs.csi.eks.amazonaws.com) implements RWO at the pod level [2]

As you likely already know, "implementing RWO at the pod level" doesn't sound Kubernetes conformant. As the Kubernetes access modes documentation explains:

ReadWriteOnce
The volume can be mounted as read-write by a single node. ReadWriteOnce access mode still can allow multiple pods to access (read from or write to) that volume when the pods are running on the same node.

ReadWriteOncePod
The volume can be mounted as read-write by a single Pod. Use ReadWriteOncePod access mode if you want to ensure that only one pod across the whole cluster can read that PVC or write to it.

I agree with you, EKS should take another look at this implementation decision. This may take some time. Thank you for your patience.

@tzneal
Copy link

tzneal commented Feb 11, 2025

EKS Auto Mode Nodes have an enhanced level of Pod isolation, there are some more details here and here.

To allow multiple Pods to share this volume, you can configure those Pods to share the same SELinux categories. Here the triple c123,c124,c125 is a unique set of categories that won't conflict with any other Pods that are assigned default categories on the node, since those are assigned category pairs. e.g.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-deployment
  labels:
    app: sample
spec:
  replicas: 3
  selector:
    matchLabels:
      app: sample
  template:
    metadata:
      labels:
        app: sample
    spec:
      securityContext:
       seLinuxOptions:
         level: "s0:c123,c124,c125"
...

Alternatively, you can allow your Pods access to all categories, but this has side effects of removing all of the Pod level isolation instead of creating a set of specific Pods for which you want to remove the isolation:

      securityContext:
       seLinuxOptions:
         level: "s0:c0.c1023"

@AndrewSirenko
Copy link
Contributor

AndrewSirenko commented Feb 11, 2025

Many thanks @tzneal for pointing out that SELinux is at play here.

@setheliot looks like you would see similar behavior with the ebs.csi.aws.com driver on self-managed nodes with SELinux enforcing OS (E.g. self-managed Bottlerocket Nodes). TIL that this is a feature, not a bug.

@setheliot
Copy link
Author

@tzneal called it... here is my latest response from AWS Support:

Q1. Why would Auto Mode CSI driver be designed to operate this way? There already is an access mode called ReadWriteOncePod which would allow access by only one Pod. With ReadWriteOnce there should be access by ALL pods on the same node.

You're absolutely correct in your understanding of how ReadWriteOnce should work. The behavior you're seeing isn't actually a design choice of the Auto Mode CSI driver. Instead, it's related to SELinux policies in Bottlerocket (the default AMI for EKS Auto Mode). This enhanced isolation between pods is a security feature of SELinux-enforcing operating systems, including Bottlerocket and RHEL when SELinux is enabled.

Q2. Could you please provide me with any AWS documentation or artifact that supports that this is the intended behavior with "ReadWriteOnce" in Auto Mode?

You're right to ask for documentation. Currently, our documentation doesn't adequately explain this SELinux-related behavior. We're in the process of reviewing and updating our documentation to better reflect this behavior and the required configuration. I apologize for the confusion this has caused.

@setheliot
Copy link
Author

Guess I am going to have to learn how to use seLinuxOptions

@tzneal
Copy link

tzneal commented Feb 13, 2025

The docs are now updated to cover this scenario at https://docs.aws.amazon.com/eks/latest/userguide/auto-troubleshoot.html#auto-troubleshoot-share-pod-volumes

@egachi
Copy link

egachi commented Feb 25, 2025

Hello, even after applying securityContext to the pods, if you have static provisioning, the EBS storage will be attached to a particular node by design, since EKS Automode will launch new nodes to fulfill new pods, there is a possibility that the new pods that are sharing the same static volume will be scheduled on the new nodes and they will fail with error:

Warning FailedAttachVolume 19m attachdetach-controller Multi-Attach error for volume "<static_pv>" Volume is already used by pod(s) ebs-app-0.

Solution

  • Use node selectors based in the same availability zone as the static EBS volume and host instance. You can use the same labels added by EKS Automode topology.ebs.csi.eks.amazonaws.com/zone and kubernetes.io/hostname, this will ensure that pods are attached to the same host and availability zone.
  • Tag your EBS volume with eks:eks-cluster-name=<clustername>

Manifest sample:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: ebs-app
spec:
  serviceName: "ebs-app"
  replicas: 40
  selector:
    matchLabels:
      app: ebs-app
  template:
    metadata:
      labels:
        app: ebs-app
    spec:
      nodeSelector:
        topology.ebs.csi.eks.amazonaws.com/zone: "<availability-zone>"
        kubernetes.io/hostname: "<node-hostname>"
      containers:
        - name: app
          image: centos
          command: ["/bin/sh"]
          args:
            [
              "-c",
              "while true; do echo $(date -u) >> /data/out.txt; sleep 2; done",
            ]
          volumeMounts:
            - name: persistent-storage
              mountPath: /data
          securityContext:
            seLinuxOptions:
              level: "s0:c123,c456,c789"
      volumes:
        - name: persistent-storage
          persistentVolumeClaim:
            claimName: ebs-claim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants