Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for jobset operator on starting e2e tests. #2456

Merged

Conversation

mbobrovskyi
Copy link
Contributor

@mbobrovskyi mbobrovskyi commented Jun 20, 2024

What type of PR is this?

/kind bug
/kind flake

What this PR does / why we need it:

Wait for jobset operator ready before start e2e tests.

Which issue(s) this PR fixes:

Fixes #2428

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. kind/flake Categorizes issue or PR as related to a flaky test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 20, 2024
@k8s-ci-robot k8s-ci-robot requested review from denkensk and mimowo June 20, 2024 12:21
Copy link

netlify bot commented Jun 20, 2024

Deploy Preview for kubernetes-sigs-kueue ready!

Name Link
🔨 Latest commit 47f733c
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/66741eab850bb000082057a0
😎 Deploy Preview https://deploy-preview-2456--kubernetes-sigs-kueue.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 20, 2024
@mbobrovskyi
Copy link
Contributor Author

/test all

@mimowo
Copy link
Contributor

mimowo commented Jun 20, 2024

I think we might also need to have an update to JobSet to wait for the webhook service with probes as we do here:

kueue/cmd/kueue/main.go

Lines 294 to 320 in 5be1c20

func setupProbeEndpoints(mgr ctrl.Manager, certsReady <-chan struct{}) {
defer setupLog.Info("Probe endpoints are configured on healthz and readyz")
if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil {
setupLog.Error(err, "unable to set up health check")
os.Exit(1)
}
// Wait for the webhook server to be listening before advertising the
// Kueue replica as ready. This allows users to wait with sending the first
// requests, requiring webhooks, until the Kueue deployment is available, so
// that the early requests are not rejected during the Kueue's startup.
// We wrap the call to GetWebhookServer in a closure to delay calling
// the function, otherwise a not fully-initialized webhook server (without
// ready certs) fails the start of the manager.
if err := mgr.AddReadyzCheck("readyz", func(req *http.Request) error {
select {
case <-certsReady:
return mgr.GetWebhookServer().StartedChecker()(req)
default:
return errors.New("certificates are not ready")
}
}); err != nil {
setupLog.Error(err, "unable to set up ready check")
os.Exit(1)
}
}
. Otherwise JobSet can say it is available but it is really not.

@mbobrovskyi
Copy link
Contributor Author

mbobrovskyi commented Jun 20, 2024

Thanks @mimowo. I've created an issue kubernetes-sigs/jobset#607.

@mbobrovskyi mbobrovskyi marked this pull request as ready for review June 21, 2024 12:26
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 21, 2024
@alculquicondor
Copy link
Contributor

I guess this doesn't fix #2428 but itself, but it gets us closer.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 21, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 1fc671aaa089ff878e2337bb762efa2c1a1fab4a

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, mbobrovskyi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 21, 2024
@alculquicondor
Copy link
Contributor

/cherry-pick release-0.7

@k8s-infra-cherrypick-robot
Copy link
Contributor

@alculquicondor: once the present PR merges, I will cherry-pick it on top of release-0.7 in a new PR and assign it to you.

In response to this:

/cherry-pick release-0.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot merged commit f7bbc79 into kubernetes-sigs:main Jun 21, 2024
16 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.8 milestone Jun 21, 2024
@k8s-infra-cherrypick-robot
Copy link
Contributor

@alculquicondor: #2456 failed to apply on top of branch "release-0.7":

Applying: Wait for jobset operator on starting e2e tests.
Using index info to reconstruct a base tree...
M	test/e2e/multikueue/suite_test.go
M	test/e2e/singlecluster/suite_test.go
Falling back to patching base and 3-way merge...
Auto-merging test/e2e/singlecluster/suite_test.go
Auto-merging test/e2e/multikueue/suite_test.go
CONFLICT (content): Merge conflict in test/e2e/multikueue/suite_test.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Wait for jobset operator on starting e2e tests.
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-0.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mbobrovskyi mbobrovskyi deleted the fix/flaky-e2e-test-for-job-set branch June 21, 2024 17:43
@alculquicondor
Copy link
Contributor

@mbobrovskyi can you prepare a cherry-pick for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/flake Categorizes issue or PR as related to a flaky test. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky e2e test for JobSet
5 participants