Releases: kubernetes-sigs/jobset
Releases · kubernetes-sigs/jobset
v0.8.0
Release v0.8.0
Highlights
Deprecations
Changelog
- Doc updates for 0.7 [main branch] by @danielvegamyhre in #692
- Bump the kubernetes group with 7 updates by @dependabot in #693
- Image support multi-arch by @phuhung273 in #694
- Use meta api native condition status by @tenzen-y in #695
- Introduce Go std slices and maps lib by @tenzen-y in #696
- Remove duplicated condition judgement by @tenzen-y in #697
- Bump github.com/onsi/gomega from 1.34.2 to 1.35.1 by @dependabot in #700
- Bump github.com/onsi/ginkgo/v2 from 2.20.2 to 2.21.0 by @dependabot in #699
- Add Coordinator concept by @avrittrohwer in #702
- Bump github.com/open-policy-agent/cert-controller from 0.11.0 to 0.12.0 by @dependabot in #704
- Propagate schedulingGates set on PodTemplate when resuming JobSet by @mimowo in #705
- Update docs for release v0.7.1 by @ahg-g in #712
- Bump sigs.k8s.io/structured-merge-diff/v4 from 4.4.1 to 4.4.3 in the kubernetes group across 1 directory by @dependabot in #710
- Bump the kubernetes group with 7 updates by @dependabot in #714
- Bump github.com/onsi/ginkgo/v2 from 2.21.0 to 2.22.0 by @dependabot in #716
- fix wget failure to resolve relative paths by @kannon92 in #718
- Bump github.com/stretchr/testify from 1.9.0 to 1.10.0 by @dependabot in #715
- Bump github.com/onsi/gomega from 1.35.1 to 1.36.0 by @dependabot in #717
- Allow for one to install jobset in a different namespace by @kannon92 in #719
- Bump sigs.k8s.io/controller-runtime from 0.19.2 to 0.19.3 in the kubernetes group by @dependabot in #725
- Remove kube-rbac-proxy by @kannon92 in #722
- use component config for default installation by @kannon92 in #724
- Update documents to point to v0.7.2 by @ahg-g in #732
- fix docker warnings by @kannon92 in #727
- upgrade python dependencies for sdk by @kannon92 in #728
- KEP-672: Serial Job Execution with DependsOn API by @andreyvelich in #680
- Bump the kubernetes group with 7 updates by @dependabot in #733
- Bump github.com/onsi/gomega from 1.36.0 to 1.36.1 by @dependabot in #734
- update golang net to 0.33 to satisfy security alerts by @kannon92 in #735
- Bump github.com/onsi/ginkgo/v2 from 2.22.0 to 2.22.1 by @dependabot in #739
- Bump github.com/onsi/gomega from 1.36.1 to 1.36.2 by @dependabot in #741
- Bump github.com/onsi/ginkgo/v2 from 2.22.1 to 2.22.2 by @dependabot in #742
- Bump braces from 3.0.2 to 3.0.3 in /site by @kannon92 in #744
- update k8s to 0.32 apis by @kannon92 in #738
- Propagate job labels and annotations by @imreddy13 in #737
- disable http/2 for metrics server by @kannon92 in #745
- Minimize the number of unnecessary logs that get emitted by @imreddy13 in #746
- inject namespace in case we want to test against non standard deployment by @kannon92 in #749
- fix security warnings in client go code by @kannon92 in #743
- Bump sigs.k8s.io/controller-runtime from 0.19.3 to 0.19.4 in the kubernetes group by @dependabot in #750
- Remove Namespace from the JobSet Config by @andreyvelich in #752
- Turn off internal cert management via config by @ardaguclu in #755
- update gen-sdk.sh to generate sdk using docker container by @epicseven-cup in #681
- Use config metrics binding address if flag is not set by @ardaguclu in #756
- Bump the kubernetes group with 7 updates by @dependabot in #759
- Self nominate Kevin Hannon for approval rights by @kannon92 in #758
- copy all files in project rather than piece by piece by @kannon92 in #765
- add go mod download to Dockerfile by @kannon92 in #769
- update python sdk files with latest changes by @kannon92 in #770
- Bump sigs.k8s.io/controller-runtime from 0.20.0 to 0.20.1 in the kubernetes group by @dependabot in #772
- KEP-672: Implement the DependsOn API by @andreyvelich in #740
- Set user agent for requests coming from the jobset controller to "jobset" by @imreddy13 in #775
- Updating the documentation for Pod DNS and underlying headless service. by @raushan2016 in #779
- Remove the Configuration API as a CRD by @ahg-g in #781
- Set image tag and commit version at build time by @ahg-g in #780
- Increase memory limit and remove cpu limit for the default deployment by @priyanshikhetwani in #783
- Add missing external types to apply configurations by @astefanutti in #782
- Bump the kubernetes group with 7 updates by @dependabot in #784
- feature: add Helm chart for jobset by @ChenYi015 in #785
- helm: disable Promethues metrics exporting by default by @ChenYi015 in #789
- Bump github.com/google/go-cmp from 0.6.0 to 0.7.0 by @dependabot in #794
- add make file targets for helm by @kannon92 in #792
- Bump github.com/prometheus/client_golang from 1.20.5 to 1.21.0 by @dependabot in #795
- Fix helm chart push but disable it from cloud build for testing by @kannon92 in #798
- enable helm chart push for cloudbuild for postsubmit by @kannon92 in #799
- add gotoolchain and update cloudbuild name by @kannon92 in #800
New Contributors
- @phuhung273 made their first contribution in #694
- @avrittrohwer made their first contribution in #702
- @andreyvelich made their first contribution in #680
- @imreddy13 made their first contribution in #737
- @ardaguclu made their first contribution in #755
- @epicseven-cup made their first contribution in #681
- @raushan2016 made their first contribution in #779
- @priyanshikhetwani made their first contribution in #783
- @astefanutti made their first contribution in #782
- @ChenYi015 made their first contribution in #785
Full Changelog: v0.7.0...v0.8.0
v0.7.3
Release v0.7.2
What's Changed
- Update docs for v0.7.0 (release branch) by @danielvegamyhre in #691
- Automated cherry pick of #705: Propagate schedulingGates set on PodTemplate when resuming by @mimowo in #706
Full Changelog: v0.7.0...v0.7.2
Release v0.7.1
What's Changed
- Update docs for v0.7.0 (release branch) by @danielvegamyhre in #691
- Automated cherry pick of #705: Propagate schedulingGates set on PodTemplate when resuming by @mimowo in #706
Full Changelog: v0.7.0...v0.7.1
v0.7.0
Highlights
- Add restart strategy by @nstogner in #686
- Priority-based exclusive placement by @ahg-g in #687
- feat: add component config by @rainfd in #609
What's Changed
- fix: delete active jobs right away when job finishes even when TTLSecondsAfterFinished is set by @CecileRobertMichon in #667
- Bump github.com/onsi/ginkgo/v2 from 2.20.0 to 2.20.1 by @dependabot in #663
- Bump github.com/prometheus/client_golang from 1.20.0 to 1.20.2 by @dependabot in #664
- Bump kubernetes dependencies to v0.31.x. by @mbobrovskyi in #670
- Bump github.com/onsi/ginkgo/v2 from 2.20.1 to 2.20.2 by @dependabot in #668
- Bump github.com/onsi/gomega from 1.34.1 to 1.34.2 by @dependabot in #669
- chore: update README.md e2e test version for v1.31.0 by @googs1025 in #671
- Add test-python-sdk on Makefile test. by @mbobrovskyi in #673
- Bump github.com/prometheus/client_golang from 1.20.2 to 1.20.3 by @dependabot in #674
- feat: add component config by @rainfd in #609
- Bump the kubernetes group with 6 updates by @dependabot in #675
- Add global-job-replicas label/annotation by @GiuseppeTT in #677
- Add examples for three existing failure policy actions. by @jedwins1998 in #601
- Bump github.com/prometheus/client_golang from 1.20.3 to 1.20.4 by @dependabot in #679
- chore: use symbolic link instead of directory by @googs1025 in #630
- Priority-based exclusive placement by @ahg-g in #687
- Bump github.com/prometheus/client_golang from 1.20.4 to 1.20.5 by @dependabot in #688
- Add restart strategy by @nstogner in #686
New Contributors
- @CecileRobertMichon made their first contribution in #667
- @rainfd made their first contribution in #609
- @GiuseppeTT made their first contribution in #677
- @nstogner made their first contribution in #686
Full Changelog: v0.7.0-devel...v0.7.0
v0.6.0
Highlights
- New JobSet Failure Policy API - allows users to configure different behavior for different types of errors, enabling them to use compute resources more efficiently and improve ML training goodput.
- Add Coordinator field to JobSet spec, enabling user to define a global coordinator pod for distributed ML/HPC workloads. The stable network endpoint for this pod will be added as a label and annotation to every Job and Pod in the JobSet for easy use in application code. A common use case for this is TPU Multislice training with multiple different Job templates. See linked issue for details.
- Add global Job index label/annotation to every Job and Pod, which is needed to support TPU Multislice training with multiple different Job templates. See linked issue for details.
- Added new metrics
- Improved test coverage
- Bug fixes
- New examples and documentation
What's Changed
- feat: add e2e test for ttl seconds after finished in jobset by @dejanzele in #511
- add publish not ready headless service to jobset by @kannon92 in #505
- use kube-openapi rather than code generator openapi-gen by @kannon92 in #522
- Allow passing args to ginkgo for integration tests by @danielvegamyhre in #525
- Refactor create jobs by @danielvegamyhre in #516
- Do not default the managedBy field by @mimowo in #528
- feat: add event recorder event by @googs1025 in #507
- use t.Errorf instead of t.Fatalf by @googs1025 in #532
- Fix path for the error when attempting to mutate managedBy by @mimowo in #527
- Fix bug when checking if a JobSet is active during tests. by @jedwins1998 in #531
- Correct typo in configurable failure policy KEP. by @jedwins1998 in #539
- fix: fix ci error caused by typo by @googs1025 in #544
- Bump the kubernetes group with 4 updates by @dependabot in #542
- Bump github.com/onsi/gomega from 1.32.0 to 1.33.0 by @dependabot in #543
- docs: fix site url not found by @googs1025 in #541
- use hugo param to define variables in md language by @googs1025 in #540
- add unit tests for createHeadlessSvcIfNecessary by @dejanzele in #526
- test: add pod controller unit test by @googs1025 in #490
- Add comment explaining why we don't unconditionally compute firstFailedJob by @danielvegamyhre in #549
- Bump github.com/onsi/ginkgo/v2 from 2.17.1 to 2.17.2 by @dependabot in #552
- Track which features in roadmap have been released by @danielvegamyhre in #554
- docs: using kustomize for adjusting resources by @omerap12 in #558
- Bump github.com/onsi/gomega from 1.33.0 to 1.33.1 by @dependabot in #560
- Don't reconcile JobSets with deletion timestamp set by @danielvegamyhre in #562
- Improve the API generated docs for managedBy by @mimowo in #565
- chore: Upgrade e2e local image by @googs1025 in #567
- Bump github.com/onsi/ginkgo/v2 from 2.17.2 to 2.17.3 by @dependabot in #569
- Add support for feature gates by @googs1025 in #557
- Implement configurable failure policy. by @jedwins1998 in #537
- Update the JobSet version to 0.5.1 for installation by @mimowo in #577
- Bump github.com/onsi/ginkgo/v2 from 2.17.3 to 2.19.0 by @dependabot in #581
- Relax validation on ReplicatedJob PodTemplates of suspended JobSets by @danielvegamyhre in #580
- update makefile kind version to v1.30.0 by @googs1025 in #589
- Propagate Job pod template updates to suspended jobs when resuming by @danielvegamyhre in #590
- docs: update to v0.5.2 by @googs1025 in #593
- fix: fix log to avoid panic by @googs1025 in #595
- avoid log panic by @googs1025 in #598
- Add omitempty to annotation of OnJobFailureReasons. by @jedwins1998 in #596
- update readme docs e2e test version to v1.30 by @googs1025 in #602
- Update _index.md
MASTER_ADDR
by @song-william in #604 - Add client-go example by @danielvegamyhre in #606
- Wait for the webhook service to be listening before advertising the Jobset replica as ready. by @mbobrovskyi in #608
- docs: add simple example for network field by @googs1025 in #550
- feat: add terminalState to jobset status by @googs1025 in #594
- Integration test improvement: rename "update" to "step" by @danielvegamyhre in #610
- docs: add argo workflow example for jobset by @googs1025 in #612
- docs: add JobSet API reference by @googs1025 in #611
- docs: fix typo, Github -> GitHub by @highpon in #615
- Allow mutating schedulingGates when the Jobset is suspended by @mimowo in #623
- Add Coordinator field to JobSet spec by @danielvegamyhre in #618
- Validation for Coordinator field by @danielvegamyhre in #627
- Add example for coordinator by @danielvegamyhre in #628
- docs: add prometheus-operator example for jobset by @googs1025 in #629
- Bump github.com/onsi/gomega from 1.33.1 to 1.34.0 by @dependabot in #631
- Bump github.com/onsi/ginkgo/v2 from 2.19.0 to 2.19.1 by @dependabot in #632
- feat: add metrics for jobset by @googs1025 in #614
- docs: update metrics info for site by @googs1025 in #633
- chore: add github issue, pr template by @googs1025 in #634
- Bump github.com/onsi/gomega from 1.34.0 to 1.34.1 by @dependabot in #638
- fix error output by @googs1025 in #636
- Bump k8s dependencies to 1.30 dependencies and modify update-codegen.sh to be compatible with new code-generator by @danielvegamyhre in #641
- Fix bug in replicatedJobByName by @danielvegamyhre in #645
- Allow to update JobSets on suspend by @mimowo in #644
- Refactor jobset webhook by @danielvegamyhre in #646
- add the unparam linter to golangci and fix those issues flagged by @kannon92 in #643
- drop job-name from labels as it is not used by @kannon92 in #642
- Bump github.com/onsi/ginkgo/v2 from 2.19.1 to 2.20.0 by @dependabot in #647
- Add new job-id annotation to assign globally unique job index to each job by @danielvegamyhre in #650
- Bump github.com/prometheus/client_golang from 1.19.1 to 1.20.0 by @dependabot in #653
- update to k8s 0.30.4 by @kannon92 in #654
New Contributors
- @mimowo made their first contribution in #528
- @omerap12 made their first contribution in #558
- @song-william made their first contribution in #604
- @mbobrovskyi made their first contribution in #608
- @highpon made their first contribution in #615
Full Changelog: v0.6.0-devel...v0.6.0
JobSet v0.5.2
What's Changed
- Automated cherry pick of #580: relax validation on replicated jobs by @danielvegamyhre in #584
- Automated cherry pick of #590: propagate job pod template updates to suspended jobs when by @danielvegamyhre in #591
Full Changelog: v0.5.1...v0.5.2
v0.5.1
Highlights
- Fixed bug causing foreground cascading deletion policy to not work properly on JobSets #562
- Fixed field path in error message in validation for ManagedBy field #527
- Test coverage improvements, refactoring, additional documentation
What's Changed
- Update docs for 0.5.0 by @danielvegamyhre in #517
- [Release-0.5] Do not default the managedBy field by @kannon92 in #533
- Automated cherry pick of #527: Fix path for the error when mutating managedBy by @kannon92 in #534
- Automated cherry pick of #562: don't reconcile jobsets with deletion timestamp set by @danielvegamyhre in #564
Full Changelog: v0.6.0-devel...v0.5.1
v0.5.0
What's Changed
Highlights
- JobSet TTL support added in #443
- Docsite is live at https://jobset.sigs.k8s.io/ with updated documentation and examples.
- Include first failed job name in event emitted when JobSet fails, to speed up the debugging process for large complex workloads #477
- Lower default resource request for JobSet controller manager so it fits on default cloud CPU VMs, but keep high limit to support maximum performance #480
- Perform only 1 JobSet status update per reconcile attempt to reduce pressure on k8s apiserver #494
- Introduced MangedBy field to the JobSet spec to enable Multi-Kueue support
Detailed release notes
- Add info to landing page by @danielvegamyhre in #435
- Validate follower pod owned by same Job as leader pod by @danielvegamyhre in #433
- Bump github.com/stretchr/testify from 1.8.4 to 1.9.0 by @dependabot in #439
- Add descriptions to ReplicatedJobStatus fields by @danielvegamyhre in #442
- Bump github.com/onsi/ginkgo/v2 from 2.15.0 to 2.16.0 by @dependabot in #444
- Add JobSet diagram and other doc updates by @danielvegamyhre in #446
- Update installation version to latest release in public docs by @danielvegamyhre in #450
- add concept image by @moficodes in #454
- Update tasks documentation by @danielvegamyhre in #453
- Emit Job creation failed event by @danielvegamyhre in #448
- Remove Jobset Docs from root by @moficodes in #455
- Fix 404 error when clicking on driver-worker-success-policy.yaml by @kannon92 in #456
- Rename FAQ to troubleshooting on docsite by @danielvegamyhre in #457
- Bump the kubernetes group with 4 updates by @dependabot in #459
- Add features overview to README by @danielvegamyhre in #452
- Update Makefile rules to use more specific paths by @danielvegamyhre in #470
- Fix typo in readme by @danielvegamyhre in #472
- Add jobset roadmap to README by @danielvegamyhre in #468
- Bump github.com/onsi/gomega from 1.31.1 to 1.32.0 by @dependabot in #475
- Bump github.com/onsi/ginkgo/v2 from 2.16.0 to 2.17.1 by @dependabot in #474
- update golang to 1.22 by @kannon92 in #471
- Lower default resource request for controller manager but keep high limit by @danielvegamyhre in #480
- Include first failed job name in event emitted when JobSet fails, as well as the JobSet failure condition by @danielvegamyhre in #477
- Update README.md to correct concepts link by @jtorrex in #486
- Code cleanup and refactoring by @danielvegamyhre in #484
- Move headless service creation outside of createJobs by @danielvegamyhre in #483
- Remove Duplicate Import by @jedwins1998 in #488
- Introduce
managedBy
field and Removemanaged-by
label by @jedwins1998 in #487 - fix some typo error by @googs1025 in #489
- Move JobSet webhook into same webhooks package as pod webhook by @danielvegamyhre in #460
- add unit test for jobset webhook updates by @kannon92 in #464
- feat: add support for ttl cleanup for finished jobsets by @dejanzele in #443
- Add unit tests to jobset success policy functions by @zhifei92 in #501
- fix: add IsNotFoundErr when get headlessSvc by @googs1025 in #503
- Update envtest and add back crd generation when updating the api by @kannon92 in #510
- Call Status.Update once in each reconcile attempt by @danielvegamyhre in #494
- Clean up outdated comments by @danielvegamyhre in #512
- Bump sigs.k8s.io/controller-runtime from 0.17.2 to 0.17.3 in the kubernetes group by @dependabot in #513
- Update docs for 0.5.0 by @danielvegamyhre in #517
New Contributors
- @jtorrex made their first contribution in #486
- @jedwins1998 made their first contribution in #488
- @zhifei92 made their first contribution in #501
Full Changelog: v0.5.0-devel...v0.5.0
v0.4.0
What's Changed
- Update main branch installation docs for release v0.3.0 by @danielvegamyhre in #349
- use kind export logs by @kannon92 in #352
- add suspend to replicated job status by @kannon92 in #250
- Update the installation docs to mention the CPU nodes minimum necessary CPU/memory resources by @danielvegamyhre in #354
- Use jobset-system instead of kind-system for jobset by @kannon92 in #358
- A KEP for StartupPolicy by @kannon92 in #244
- Add patches for Kustomize to add objectSelectors to pod webhook configurations by @danielvegamyhre in #362
- Update installation docs for v0.3.1 [main] by @danielvegamyhre in #368
- Bump k8s.io/apimachinery from 0.28.4 to 0.28.5 by @dependabot in #369
- Bump github.com/open-policy-agent/cert-controller from 0.10.0 to 0.10.1 by @dependabot in #373
- Bump k8s.io/api from 0.28.4 to 0.28.5 by @dependabot in #370
- Bump k8s.io/code-generator from 0.28.3 to 0.28.5 by @dependabot in #371
- Bump k8s.io/client-go from 0.28.4 to 0.28.5 by @dependabot in #372
- Bump github.com/onsi/ginkgo/v2 from 2.13.2 to 2.14.0 by @dependabot in #376
- update kind to 0.20.0 by @kannon92 in #359
- Bump k8s.io/code-generator from 0.28.5 to 0.28.6 by @dependabot in #382
- Bump github.com/onsi/gomega from 1.30.0 to 1.31.1 by @dependabot in #383
- Bump k8s.io/client-go from 0.28.5 to 0.28.6 by @dependabot in #384
- upgrade kubernetes apis to 0.29 by @kannon92 in #387
- Move exclusive placement annotation to ReplicatedJob template by @danielvegamyhre in #389
- add dependabot groups for k8s packages by @kannon92 in #391
- add a message to events by @kannon92 in #390
- Migrate from background to foreground cascading deletion policy by @danielvegamyhre in #393
- Default service name in JobSet controller by @danielvegamyhre in #395
- bumping controller tools to see if this fixes ci by @kannon92 in #403
- add suspend field to printcolumn by @kannon92 in #400
- add jobset docsite by @moficodes in #402
- KEP 262: Configurable Failure Policy API by @danielvegamyhre in #381
- Get subdomain via a func instead of defaulting it on the jobset object by @ahg-g in #404
- Bump the kubernetes group with 1 update by @dependabot in #406
- Startup policy implementation by @kannon92 in #246
- Minor cleanup to ensureConditionOpts by @ahg-g in #410
- Validate longest pod name for jobset will not exceed 63 chars by @danielvegamyhre in #409
- Add managed-by label support. by @trasc in #407
- Improve error messages and logging in webhooks by @danielvegamyhre in #421
- Update installation docs for v0.3.2 by @danielvegamyhre in #424
- typo: Fix some comments by @googs1025 in #426
- Bump the kubernetes group with 5 updates by @dependabot in #431
- Update docsite title and subtitle by @danielvegamyhre in #432
New Contributors
- @moficodes made their first contribution in #402
- @trasc made their first contribution in #407
- @googs1025 made their first contribution in #426
Full Changelog: v0.4.0-devel...v0.4.0