Remove DeleteDisk call in CreateDisk path #2009

ConnorJC3 · 2024-04-12T18:32:42Z

Is this a bug fix or adding new feature?

Bug fix

What is this PR about? / Why do we need it?

Today, when we time out waiting for the volume to create, we delete it. This is broken for several reasons:

It often doesn't even run, because the sidecar will have already closed the context elsewhere
Whenever it does run, it will trigger Possible race condition in EBS volume creation #1951 when the CO re-calls CreateVolume
It handles behavior that is the CO's responsibility according to the CSI spec:

  //    The CO is responsible for cleaning up volumes it provisioned
  //    that it no longer needs. If the CO is uncertain whether a volume
  //    was provisioned or not when a `CreateVolume` call fails, the CO
  //    MAY call `CreateVolume` again, with the same name, to ensure the
  //    volume exists and to retrieve the volume's `volume_id`

Also takes this opportunity to cleanup the extremely unclear error message.

What testing is done?

N/A

Signed-off-by: Connor Catlett <[email protected]>

github-actions · 2024-04-12T18:48:14Z

Code Coverage Diff

This PR does not change the code coverage

torredil

Agree with the changes made in this PR and specifically the point which states that this logic handles behavior that is the CO's responsibility. My understanding is that having a DeleteDisk call in the CreateVolume path makes is a violation of the CSI spec.

Also worth highlighting that the external provisioner takes care of cleaning up orphaned volumes:

err = cleanupVolume(ctx, p, delReq, provisionerCredentials)
	if err != nil {
		capErr = fmt.Errorf("%v. Cleanup of volume %s failed, volume is orphaned: %v", capErr, pvName, err)
	}

AndrewSirenko · 2024-04-12T20:32:47Z

Also worth highlighting that the external provisioner takes care of cleaning up orphaned volumes:
err = cleanupVolume(ctx, p, delReq, provisionerCredentials)
	if err != nil {
		capErr = fmt.Errorf("%v. Cleanup of volume %s failed, volume is orphaned: %v", capErr, pvName, err)
	}

I'm not sure if this is true for EBS CSI Driver today, due to Kubernetes/External-Provisioner not knowing the volume-id of the orphaned volume (and therefore not knowing what EBS volume to delete).

Some other CSI Drivers store volume-id in PVC, because they can pass in a volume ID to their CreateVolume equivalent in their backend, which lets Kubernetes know the volume-id so K8s can be sure to delete volume if volume was orphaned.

Alas, because Kubernetes and EBS CSI Driver only know volume's Idempotency token (hash of PVC name), and we cannot call EC2 DeleteVolume with just an idempotency token today, we do risk leaking/orphaning a volume (as an edge case).

Can draw out a diagram later.

Either way I agree with the PR because this edge case mentioned above is NOT solved via the code @ConnorJC3 is deleting anyway, due to the context typically being cancelled by the time we reach this DeleteVolume call. Furthermore this code today can lead to stuck pods IF sidecar timeout is very large, createVolume succeeds, but waitForVolume times out (due to today's 1 min limit), so I agree it is safer to delete this code today and think of something better.

I do agree with first half of @torredil's statement though.

torredil · 2024-04-13T00:22:08Z

@AndrewSirenko Sorry that last bit was poorly phrased and confusing. Also worth highlighting that the external provisioner takes care of cleaning up volumes is a bit clearer.

Alas, because Kubernetes and EBS CSI Driver only know volume's Idempotency token (hash of PVC name), and we cannot call EC2 DeleteVolume with just an idempotency token today, we do risk leaking/orphaning a volume (as an edge case).

Can you clarify this point - what is the specific scenario in which the driver needs to call EC2's DeleteVolume in the CreateVolume path (CSI spec violation), where a volume ID is not available? I understand this to mean the RPC is not idempotent.

The correct behavior for cleaning up volumes is as follows:

  //    The CO is responsible for cleaning up volumes it provisioned
  //    that it no longer needs. If the CO is uncertain whether a volume
  //    was provisioned or not when a `CreateVolume` call fails, the CO
  //    MAY call `CreateVolume` again, with the same name, to ensure the
  //    volume exists and to retrieve the volume's `volume_id` (unless
  //    otherwise prohibited by "CreateVolume Errors").

In practice, this means failing the request and letting the provisioner retry as opposed to attempting to delete the volume, which is what this PR steers us towards - I think we all agree on that point.

Once Kubernetes (the provisioner) has retrieved the volume ID from a CreateVolume call, it can proceed with DeleteVolume where the volume ID is passed in via the DeleteVolumeRequest created by the provisioner.

this edge case mentioned above is NOT solved via the code @ConnorJC3 is deleting anyway

Agree 👍

I would like to learn more about this edge case and look forward to the diagram.

AndrewSirenko · 2024-04-14T23:57:47Z

Also worth highlighting that the external provisioner takes care of cleaning up volumes is a bit clearer.

Yep, I agree with this phrasing.

I only disagreed with the external provisioner cleaning up orphaned volumes via the code from the cleanupVolume snippet you linked to (which doesn't do anything in the aws-ebs-csi-driver case because EBS requires a volume-id today to delete, which external provisioner won't have access to until a successfully returned retry CreateVolume call).

Can you clarify this point - what is the specific scenario in which the driver needs to call EC2's DeleteVolume in the CreateVolume path (CSI spec violation), where a volume ID is not available?

There is no scenario. We all agree, the driver should not call EC2 DeleteVolume in CreateVolume RPC. My sentence was not arguing against that.

I'm saying that until EBS provides a way to call EC2 DeleteVolume with the idempotency token used for EC2 CreateVolume (instead of the volume-id), there may still be an edge case of volume leaks because the cleanupVolume snippet can't work. This edge case could occur if a cluster operator deletes a PVC object that hadn't yet led to a CreateVolume RPC success (due to a context timeout), but the volume was created in AWS backend.

Sidenote: Looking more closely at cleanupVolume, it only triggers if a volume was created with less than the desired capacity, so I'm not sure it's actually relevant to the discussion even if we had a way of deleting a volume without volume-id.

AndrewSirenko · 2024-04-15T00:00:04Z

/approve

k8s-ci-robot · 2024-04-15T00:00:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AndrewSirenko

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [AndrewSirenko]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from AndrewSirenko and torredil April 12, 2024 18:32

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 12, 2024

Remove DeleteDisk call in CreateDisk path

1129a0c

Signed-off-by: Connor Catlett <[email protected]>

ConnorJC3 force-pushed the remove-broken-deletevolume branch from 1acde90 to 1129a0c Compare April 12, 2024 18:45

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 12, 2024

torredil approved these changes Apr 12, 2024

View reviewed changes

k8s-ci-robot assigned torredil Apr 12, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 12, 2024

AndrewSirenko approved these changes Apr 14, 2024

View reviewed changes

k8s-ci-robot assigned AndrewSirenko Apr 14, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 15, 2024

k8s-ci-robot merged commit 1b242f8 into kubernetes-sigs:master Apr 15, 2024
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove DeleteDisk call in CreateDisk path #2009

Remove DeleteDisk call in CreateDisk path #2009

ConnorJC3 commented Apr 12, 2024

github-actions bot commented Apr 12, 2024

torredil left a comment

AndrewSirenko commented Apr 12, 2024 •

edited

Loading

torredil commented Apr 13, 2024

AndrewSirenko commented Apr 14, 2024 •

edited

Loading

AndrewSirenko commented Apr 15, 2024

k8s-ci-robot commented Apr 15, 2024

Remove DeleteDisk call in CreateDisk path #2009

Remove DeleteDisk call in CreateDisk path #2009

Conversation

ConnorJC3 commented Apr 12, 2024

github-actions bot commented Apr 12, 2024

Code Coverage Diff

torredil left a comment

Choose a reason for hiding this comment

AndrewSirenko commented Apr 12, 2024 • edited Loading

torredil commented Apr 13, 2024

AndrewSirenko commented Apr 14, 2024 • edited Loading

AndrewSirenko commented Apr 15, 2024

k8s-ci-robot commented Apr 15, 2024

AndrewSirenko commented Apr 12, 2024 •

edited

Loading

AndrewSirenko commented Apr 14, 2024 •

edited

Loading