Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for customer AWS API endpoint Certificates #509

Closed
dmc5179 opened this issue May 21, 2020 · 8 comments
Closed

Add support for customer AWS API endpoint Certificates #509

dmc5179 opened this issue May 21, 2020 · 8 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@dmc5179
Copy link

dmc5179 commented May 21, 2020

Is your feature request related to a problem? Please describe.

In private AWS regions the CA that signs the AWS API endpoint is not accepted by the driver. Need to add support for adding certificates to the driver for custom endpoints.

Describe the solution you'd like in detail

Add support for either disabling SSL/TLS (not the greatest option) or more preferably adding custom certificates so the driver will talk to custom endpoints.

Describe alternatives you've considered

The nodes in my cluster all have the custom certificates on the node. I can issue the following command from each node to access the metadata endpoint (which doesn't need the certificates)

curl -X GET http://169.254.169.254/latest/meta-data/iam/security-credentials

The driver by default doesn't seem to work on OpenShift. I'm actually not entirely sure I understand how the driver works in it's current state at all. The metadata endpoint is a link-local address. Without hostNetwork:true set for the ebs-csi-controller it doesn't seem like the EBS CSI driver should ever work with IAM roles. To get it to this point I had to disable the liveness container and probes, then enable hostNetwork:true for the ebs-csi-controller pods.

I'm then able to deploy the driver, create a storage class, and create a PVC:
storage-class:

---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: ebs-sc
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer

PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ebs-claim
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ebs-sc
  resources:
    requests:
      storage: 4Gi

At this point the PVC will be pending for some pod to try to claim it. When I deploy a pod that tries to claim the PVC I get:

  Type     Reason                Age              From                                                                              Message
  ----     ------                ----             ----                                                                              -------
  Normal   WaitForFirstConsumer  9s               persistentvolume-controller                                                       waiting for first consumer to be created before binding
  Normal   ExternalProvisioning  3s (x3 over 5s)  persistentvolume-controller                                                       waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created by system administrator
  Normal   Provisioning          0s (x3 over 4s)  ebs.csi.aws.com_worker1.domain_3859b774-6b3b-4057-966e-caa41569c6e5  External provisioner is provisioning volume for claim "kube-system/ebs-claim"
  Warning  ProvisioningFailed    0s (x3 over 4s)  ebs.csi.aws.com_worker1.domain_3859b774-6b3b-4057-966e-caa41569c6e5  failed to provision volume with StorageClass "ebs-sc": rpc error: code = Internal desc = RequestError: send request failed
caused by: Post https://ec2.<custom endpoint>/: x509: certificate signed by unknown authority

I'm not assigning that endpoint. I think the driver is looking up the API endpoint with the metadata endpoint. I cannot find a way to add custom certs to the pods. I've tried to add onto the base images by overwriting this file: SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt, I've tried to add my certs to /etc/pki/ca-trust/source/anchors. Most of the images don't have update-ca-trust. Even when I put my certs into the anchors directory, they are not there when the container starts. I've not seen an image that would wipe out the anchors directory. It doesn't appear to be a volume mount.

What in the world is it doing....

@jgallucci32
Copy link

@dmc5179 I have successfully tested this workaround on Eucalyptus with custom endpoint and custom SSL certificate

NOTE: Pull request #505 is for custom endpoint support in private AWS regions

Create ConfigMap for custom CA certificates

cat /path/to/custom/ca-bundle.crt > ca-certificates.crt
kubectl create configmap -n kube-system --from-file=ca-certificates.crt aws-config

Modify ebs-csi-controller.yaml to mount the new CA bundle to the CSI controller

# Add this to the ebs-plugin volumeMounts: section of the first container
#       volumeMounts:
        - name: config-volume
          mountPath: /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
          subPath: ca-certificates.crt
# Add this to the volumes: section toward the bottom
#     volumes:
      - name: config-volume
        configMap:
          name: aws-config

Redeploy the csi-controller pod and it will overwrite the existing CA bundle with the one supplied in the ConfigMap. It should allow you to hit an endpoint with a custom SSL certificate.

@jgallucci32
Copy link

This is also discussed in issue #502

@dmc5179
Copy link
Author

dmc5179 commented May 23, 2020

@jgallucci32 Thanks! This is great. Note that I didn't have to do anything to get the custom endpoint to work. I think that once the pods could reach the metadata endpoint they were able to look up what the EC2 API endpoint is. I don't know that for certain but the pods figured it out somehow. The custom certificate part is great. My solution of hostPath mounting is probably not ideal. A config map is a much better idea. Thanks again!

@jgallucci32
Copy link

@dmc5179 Glad I could help, let me know how things go. I have been investigating using a sidecar approach using the same ConfigMap construct. Cloudbees has a good example of how to do SSL cert injection using a sidecar and I think ultimately that might be the best approach here.

Regarding the custom endpoint I am extremely intrigued how it is working out of the box for you. My first assumption would be some sort of DNS poisoning of amazonaws.com in your private AWS environment to allow the SDK to search for <endpoint>.<region>.amazonaws.com which is hard-coded in almost every AWS SDK. To my surprise Eucalyptus is doing this naturally and I only now discovered it after using it all these years. I confirmed Eucalyptus is doing this with host here

# host ec2.${AWS_DEFAULT_REGION}.amazonaws.com
ec2.region-1.amazonaws.com.ec2.internal has address 192.168.0.100

Since ec2.internal is in the DNS search order first, it is able to find the compute endpoint as if reaching out to the AWS public endpoints.

I plan to update my SSL cert this week to include the Subject Alternative Name (SAN) for *.region-1.amazonaws.com to see if this will work for our use case without having to modify the endpoints.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 23, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 22, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants