-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiKueue when Creating a multikueue admission check Should run an appwrapper containing a job on worker if admitted #4378
Comments
/kind flake This might be same flake with #4376 |
/retitle MultiKueue when Creating a multikueue admission check Should run an appwrapper containing a job on worker if admitted |
cc: @dgrove-oss |
In both cases, the appwrapper controller on worker1 (where the job is expected to run), has an odd problem shown below during startup when it is trying to load a config map to get the operator configuration.
I wonder if we the |
@dgrove-oss just a speculation, but could this be due to interference with this test: kueue/test/e2e/multikueue/e2e_test.go Line 862 in 398c74d
This message makes me think: "dial tcp 10.96.0.1:443: connect: network is unreachable". Maybe this somehow makes the AppWrapper controller crashing? IIRC the tests don't run in parallel, but maybe even then the previous test could give hard time to the AppWrapper controller? |
When the AppWrapper controller is initializing, it is written to exit on errors. In this particular case, the I'm open to other ways of structuring AppWrapper's startup code, but it seemed like exiting with an error and letting the pod restart was more robust than trying to handle it or masking with a retry loop. Kueue's main seems to be structured similarly. |
What happened:
This e2e multikueue test flaked:
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_kueue/4301/pull-kueue-test-e2e-multikueue-main/1894013562272092160
What you expected to happen:
I expected test to succeed
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
):git describe --tags --dirty --always
):cat /etc/os-release
):uname -a
):The text was updated successfully, but these errors were encountered: