[Flake] Change image behavior of high-priority-group pod #4438

mszadkow · 2025-02-28T11:48:12Z

What type of PR is this?

/kind flake

What this PR does / why we need it:

Prevent the situation that high-group-priority finish too fast.
Add control over when the pod group should finish.

Which issue(s) this PR fixes:

Fixes #4434

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

k8s-ci-robot · 2025-02-28T11:48:18Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mszadkow
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

netlify · 2025-02-28T11:48:29Z

✅ Deploy Preview for kubernetes-sigs-kueue ready!

Name	Link
🔨 Latest commit	`fdd8fc7`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/67c1a27f2cdd6d0008e6a44e
😎 Deploy Preview	https://deploy-preview-4438--kubernetes-sigs-kueue.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

mimowo · 2025-02-28T11:59:23Z

test/e2e/singlecluster/pod_test.go

@@ -517,6 +517,10 @@ var _ = ginkgo.Describe("Pod groups", func() {
 				}, util.Timeout, util.Interval).Should(gomega.Succeed())
 			})

+			ginkgo.By("Call high priority group pods to complete", func() {
+				util.WaitForActivePodsAndTerminate(ctx, k8sClient, restClient, cfg, ns.Name, 2, 0)


Actually, I don't think we should be terminating them by the /exit, because the Pod is already being deleted due to preemptions. So, we just need to wait for SIGKILL by kubelet. To make it faster we can specify spec.graceTerminationPeriodSeconds. See #4434 (comment)

Ah, sorry, this is already terminating the high-priority group. makes sense

If we do not terminate them they will never finish and replacement pods can't be ungated

yeah, got it.

mimowo · 2025-02-28T12:01:48Z

@mszadkow were you able to repro the issue locally before fix and confirm the code fixes it?

mszadkow · 2025-02-28T12:35:07Z

@mszadkow were you able to repro the issue locally before fix and confirm the code fixes it?

100 times repeated, but I did not catch it, even once.
Thus the idea of changing the approach a little bit and make sure we have more control over the test.

mimowo · 2025-02-28T12:44:46Z

Ok, but as you use "BehaviorWaitForDeletion" command, isn't the "Check that the preempted pods are deleted" step now taking long becuase we need to wait 30s for SIGKILL? If this is the case we may just limit the graceful termination period to 1s.

mszadkow · 2025-02-28T12:54:19Z

Ok, but as you use "BehaviorWaitForDeletion" command, isn't the "Check that the preempted pods are deleted" step now taking long becuase we need to wait 30s for SIGKILL? If this is the case we may just limit the graceful termination period to 1s.

It's a different group (default) that I didn't touch.
But as you said it, I think we can decrease the time.
Instead of deleting pods we could use different behaviour and send exit code 1 then we have the same effect but faster

Update:
I am wrong as the deletion happens from the preemption, right?

Fix flake by changing image behavior of high-priority-group

fdd8fc7

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/flake Categorizes issue or PR as related to a flaky test. labels Feb 28, 2025

k8s-ci-robot requested review from PBundyra and tenzen-y February 28, 2025 11:48

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 28, 2025

mimowo reviewed Feb 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Flake] Change image behavior of high-priority-group pod #4438

[Flake] Change image behavior of high-priority-group pod #4438

mszadkow commented Feb 28, 2025

k8s-ci-robot commented Feb 28, 2025

netlify bot commented Feb 28, 2025 •

edited

Loading

mimowo Feb 28, 2025

mimowo Feb 28, 2025

mszadkow Feb 28, 2025

mimowo Feb 28, 2025

mimowo commented Feb 28, 2025

mszadkow commented Feb 28, 2025 •

edited

Loading

mimowo commented Feb 28, 2025

mszadkow commented Feb 28, 2025 •

edited

Loading

[Flake] Change image behavior of high-priority-group pod #4438

Are you sure you want to change the base?

[Flake] Change image behavior of high-priority-group pod #4438

Conversation

mszadkow commented Feb 28, 2025

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

k8s-ci-robot commented Feb 28, 2025

netlify bot commented Feb 28, 2025 • edited Loading

✅ Deploy Preview for kubernetes-sigs-kueue ready!

mimowo Feb 28, 2025

Choose a reason for hiding this comment

mimowo Feb 28, 2025

Choose a reason for hiding this comment

mszadkow Feb 28, 2025

Choose a reason for hiding this comment

mimowo Feb 28, 2025

Choose a reason for hiding this comment

mimowo commented Feb 28, 2025

mszadkow commented Feb 28, 2025 • edited Loading

mimowo commented Feb 28, 2025

mszadkow commented Feb 28, 2025 • edited Loading

netlify bot commented Feb 28, 2025 •

edited

Loading

mszadkow commented Feb 28, 2025 •

edited

Loading

mszadkow commented Feb 28, 2025 •

edited

Loading