-
Notifications
You must be signed in to change notification settings - Fork 813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Firewall fails to recognize TLS version and SNI when EBS CSI controller talks to AWS API endpoints #2364
Comments
Hi there, thanks for the detailed bug report. I looked and saw no obvious relevant code changes or updates to how the driver handles connections between 1.34 and 1.35, so the behavior described here is interesting. Can you confirm if rolling back fixes the networking issue? I was actually thinking that driver version shouldn't be relevant here, but we need to validate that. I’ll run some tests on my end to dig deeper, but in the meantime, could you enable SDK debug logs and share them? that will give us a clearer picture of whats happening with the requests.
The two relevant configuration options that come to mind here are proxy settings, and supplying the controller with the right CA certificate (if you have a custom certificate) via volumeMounts (see example). |
Yes, rolling back solves the networking issue. And by “rolling back” I mean
“kubectl edit -n kube-system deployment ebs-csi-controller” and edit the
image tag to v1.34.0 and making no other changes.
I will try to get aws sdk debug logs!
…On Tue, Feb 25, 2025 at 4:01 PM Eddie Torres ***@***.***> wrote:
Hi there, thanks for the detailed bug report.
I looked and saw no obvious relevant code changes or updates to how the
driver handles connections between 1.34 and 1.35, so the behavior described
here is interesting. Can you confirm if rolling back fixes the networking
issue? I was actually thinking that driver version *shouldn't* be
relevant here, but we need to validate that.
I’ll run some tests on my end to dig deeper, but in the meantime, could
you enable SDK debug logs
<#1801 (comment)>
and share them? that will give us a clearer picture of whats happening with
the requests.
However, I don't know if this is a client error (insufficient
configuration).
The two relevant configuration options that come to mind here are proxy
<https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/fd95d0a4b4774e4b4927c37bd504a1dd3be54162/charts/aws-ebs-csi-driver/values.yaml#L176>
settings, and supplying the controller with the right CA certificate (if
you have a custom certificate) via volumeMounts
<https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/fd95d0a4b4774e4b4927c37bd504a1dd3be54162/charts/aws-ebs-csi-driver/values.yaml#L323>
(see example
<#509 (comment)>
).
—
Reply to this email directly, view it on GitHub
<#2364 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOS2VKHZFJBQDQUFSMNT3D2RT76VAVCNFSM6AAAAABX3VVXTWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBTGU2TQNZRGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
[image: torredil]*torredil* left a comment
(kubernetes-sigs/aws-ebs-csi-driver#2364)
<#2364 (comment)>
Hi there, thanks for the detailed bug report.
I looked and saw no obvious relevant code changes or updates to how the
driver handles connections between 1.34 and 1.35, so the behavior described
here is interesting. Can you confirm if rolling back fixes the networking
issue? I was actually thinking that driver version *shouldn't* be
relevant here, but we need to validate that.
I’ll run some tests on my end to dig deeper, but in the meantime, could
you enable SDK debug logs
<#1801 (comment)>
and share them? that will give us a clearer picture of whats happening with
the requests.
However, I don't know if this is a client error (insufficient
configuration).
The two relevant configuration options that come to mind here are proxy
<https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/fd95d0a4b4774e4b4927c37bd504a1dd3be54162/charts/aws-ebs-csi-driver/values.yaml#L176>
settings, and supplying the controller with the right CA certificate (if
you have a custom certificate) via volumeMounts
<https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/fd95d0a4b4774e4b4927c37bd504a1dd3be54162/charts/aws-ebs-csi-driver/values.yaml#L323>
(see example
<#509 (comment)>
).
—
Reply to this email directly, view it on GitHub
<#2364 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOS2VKHZFJBQDQUFSMNT3D2RT76VAVCNFSM6AAAAABX3VVXTWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBTGU2TQNZRGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I followed the linked instructions, and I can confirm the debug log argument is being added to the deployment, but the logs don't have any new information. Now, assuming the SDK logging isn't completely broken, my takeaway from this is that none of the SDK queries are actually completing. |
I apologize for being unclear. When I wrote:
What I meant was: I don't know if this was an intentional breaking change in the API that got missed in the dependency upgrade, or if this is an actual regression in the API that needs to be fixed by AWS. I just don't have the golang expertise to make that determination. |
Thank you for the updates! @gblues. It's unlikely that this issue is caused by a regression in the SDK itself. We suspect the culprit is upgrading Go from Would you be able to check with your network team to ensure the firewall allows connections with a ClientHello fragmented over multiple packets? See golang/go#70139 for more context on this. Assuming the above is true, you should be able to temporarily work around this issue by setting the
|
oho, that looks promising. I will try it out tomorrow. Thank you!
…On Wed, Feb 26, 2025 at 10:54 AM Eddie Torres ***@***.***> wrote:
Thank you for the updates! @gblues <https://github.com/gblues>.
It's unlikely that this issue is caused by a regression in the SDK itself.
We suspect the culprit is upgrading Go from 1.22 to 1.23 (which took
place in driver version 1.35). Notably, Go 1.23 includes several updates
to the crypto/tls package <https://tip.golang.org/doc/go1.23>.
Would you be able to check with your network team to ensure the firewall
allows connections with a ClientHello fragmented over multiple packets? See
golang/go#70139 <golang/go#70139> for more
context on this.
Assuming the above is true, you should be able to temporarily work around
this issue by setting the GODEBUG environment variable to tlskyber=0:
https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/fd95d0a4b4774e4b4927c37bd504a1dd3be54162/charts/aws-ebs-csi-driver/values.yaml#L220
—
Reply to this email directly, view it on GitHub
<#2364 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOS2VPH6DGAFSNNNM6F3YT2RYEV7AVCNFSM6AAAAABX3VVXTWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBVHEZDCMRSGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
[image: torredil]*torredil* left a comment
(kubernetes-sigs/aws-ebs-csi-driver#2364)
<#2364 (comment)>
Thank you for the updates! @gblues <https://github.com/gblues>.
It's unlikely that this issue is caused by a regression in the SDK itself.
We suspect the culprit is upgrading Go from 1.22 to 1.23 (which took
place in driver version 1.35). Notably, Go 1.23 includes several updates
to the crypto/tls package <https://tip.golang.org/doc/go1.23>.
Would you be able to check with your network team to ensure the firewall
allows connections with a ClientHello fragmented over multiple packets? See
golang/go#70139 <golang/go#70139> for more
context on this.
Assuming the above is true, you should be able to temporarily work around
this issue by setting the GODEBUG environment variable to tlskyber=0:
https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/fd95d0a4b4774e4b4927c37bd504a1dd3be54162/charts/aws-ebs-csi-driver/values.yaml#L220
—
Reply to this email directly, view it on GitHub
<#2364 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOS2VPH6DGAFSNNNM6F3YT2RYEV7AVCNFSM6AAAAABX3VVXTWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBVHEZDCMRSGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Update!
|
/kind bug
What happened?
After upgrading the EBS CSI driver from 1.34.0, EBS volume mounts fail because the connection to the AWS API endpoint is blocked by the firewall due to missing/incorrect SNI in TLS communication.
The exact error varies by version:
v1.35.0
v1.40.0
What you expected to happen?
I expected the EBS volume mounts to succeed
How to reproduce it (as minimally and precisely as possible)?
Anything else we need to know?:
Environment
kubectl version
): 1.31The text was updated successfully, but these errors were encountered: