Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky Test: Pod groups when Single CQ should allow to preempt the lower priority group #4434

Open
tenzen-y opened this issue Feb 27, 2025 · 3 comments · May be fixed by #4438
Open

Flaky Test: Pod groups when Single CQ should allow to preempt the lower priority group #4434

tenzen-y opened this issue Feb 27, 2025 · 3 comments · May be fixed by #4438
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/flake Categorizes issue or PR as related to a flaky test.

Comments

@tenzen-y
Copy link
Member

What happened:
The below E2E test failed.

End To End Suite: kindest/node:v1.30.0: [It] Pod groups when Single CQ should allow to preempt the lower priority group
{Timed out after 45.001s.
The function passed to Eventually failed at /home/prow/go/src/kubernetes-sigs/kueue/test/e2e/singlecluster/pod_test.go:483 with:
Expected
    <v1.PodPhase>: Succeeded
to equal
    <v1.PodPhase>: Failed failed [FAILED] Timed out after 45.001s.
The function passed to Eventually failed at /home/prow/go/src/kubernetes-sigs/kueue/test/e2e/singlecluster/pod_test.go:483 with:
Expected
    <v1.PodPhase>: Succeeded
to equal
    <v1.PodPhase>: Failed
In [It] at: /home/prow/go/src/kubernetes-sigs/kueue/test/e2e/singlecluster/pod_test.go:485 @ 02/27/25 01:48:44.842
}

What you expected to happen:
No errors.

How to reproduce it (as minimally and precisely as possible):
https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/periodic-kueue-test-e2e-release-0-10-1-30/1894925220897099776

Image

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Kueue version (use git describe --tags --dirty --always):
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@tenzen-y tenzen-y added the kind/bug Categorizes issue or PR as related to a bug. label Feb 27, 2025
@tenzen-y
Copy link
Member Author

/kind flake

@k8s-ci-robot k8s-ci-robot added the kind/flake Categorizes issue or PR as related to a flaky test. label Feb 27, 2025
@mimowo
Copy link
Contributor

mimowo commented Feb 27, 2025

/assign @mszadkow
I believe this is after the recent changes, as we use BehaviorExitFast. The Pod succeeds if it has enough time to complete, it fails if the Delete request is faster. I think we should use WaitForDeletion, and just let the pod to be deleted and failed. We may just need to use Pod's spec.terminationgraceperiodseconds=1 to make it fast. Alternatively trigger /exit 1 instead of exit 1 to let it fail. - this will not work becuase the Pod is deleted due to preemption.

@mszadkow
Copy link
Contributor

Got it, will try with suggested solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/flake Categorizes issue or PR as related to a flaky test.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants