Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-4136: Admission Fair Sharing #4252

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mwielgus
Copy link
Contributor

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

KEP for a new resource fair sharing method.

Which issue(s) this PR fixes:

Fixes #4136

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 12, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mwielgus
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 12, 2025
Copy link

netlify bot commented Feb 12, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit b0617c4
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/67bef2cebed3010008da1ae8

@mimowo
Copy link
Contributor

mimowo commented Feb 13, 2025

/assign @PBundyra @gabesaba
To help with the review. I appreciate it is planned for 0.12 as we already have a plan for 0.11: #4249
and this seems big.

* Allow to specify the relative importance of LocalQueues targeting the same ClusterQueue.
* Amend the admission mechanism to work on admission scopes instead of only on ClusterQueues.
* Select the appropriate admission candidates for each of the admission scopes and admit them according to the selected queueing policy.
* Make the new mechanism complementary to the existing preemption-based fair sharing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will these mechanisms work together? Can we specify this in detail, or make these features mutually exclusive - only one of them may be enabled?

I have a bias towards the latter approach, at the very least within the same (root)Cohort.

admission logic. If there are two AdmissionScopes on the path from CQ/Cohort to the top of
the hierarchy tree, the higher one is used.

Const (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix formatting


Kueue sorts the workloads by their LQ usage (if mode is not NoFairSharing), priority and
timestamp and tries to admit the first one from the list. If it fails and the second, third
or following is possible then that workload is admitted, under condition that it might get preempted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we indicate that these workloads are preemptible? Is this usage counted against this workload's fair sharing value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue against supporting this mode. It seems complex, when the idea of this mode was to have fair sharing without preemption.

Couldn't a user define some best-effort queue which sits outside of the admission scope, which may be preempted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In best-effort queues workloads may hop over the first one, under condition that the first workload may preempt them if the sum of free resources and these consumed by skip-the-line workloads is greater than the needs of the first workloads.

We may not support this in the first iteration but KEP should at least briefly explain what it could look like if we decided to complete the picture.

nominal quota, then it is admitted immediately, if not it goes into cohort-level fair sharing.
For Cohort we select all the “sticking out” workloads, and sort them by their CQ usage, priority
and timestamp. Kueue attempts to admit the first workload from the list of sticking-out, just
like if it was one big strict FIFO queue. For multi-level hierarchy under one AdmissionScope
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the first workload fits from CQ, and after its admission the 2nd workload would require borrowing. Would we consider this 2nd workload at the Cohort level in the same scheduling cycle, or it is always one candidate per level per cycle? I suppose the second model is simpler, so I would favor it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the second but I guess it is a bit of an implementation detail. I would be ok with both.


usage_sum = (1-A) * previous_usage_sum + A * current_usage.

The value will be stored in FairSharingStatus for all LQ, CQ, and Cohorts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't only for information, but the working value we'll use upon Kueue restart? Does it belong in Status?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would you put it otherwise?


* Create a new struct AdmissionScope and make it an optional field for CQ and Cohort Spec. If
not provided, CQ or Cohort is not considered an AdmissionScope and is not a subject for new
admission logic. If there are two AdmissionScopes on the path from CQ/Cohort to the top of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Let's make this invalid state, as the lower scope is doing nothing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because someone may be changing the scope. Scope change is not atomic and we cannot block the entire hierarchy in the meantime.

or following is possible then that workload is admitted, under condition that it might get preempted.

3. AdmissionScope at Cohort level - Kueue operates in a mixed mode. Inside CQ workloads are
selected according to their AdmissionMode (if specified). If a workload fits entirely into
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside CQ workloads are selected according to their AdmissionMode (if specified)

Only highest AdmissionScope is used. Do you mean Queueing Policy (FIFO/BestEffort)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The candidates inside individual CQs are selected based on the specified logic and bubbled up.

// FairSharing based on usage, with QueuingStrategy as defined in CQ.
UsageBasedFairSharing AdmissionMode = UsageBasedFairSharing

NoFairSharing AdmissionMode = NoFairSharing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it differ from excluding a CQ/Cohort from the new logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly this. However CQ may be in a cohort that is an admission scope. Marking CQ as no fair sharing will skip any fair sharing among LQs.

// with decaying function applied.
// The value is populated if usage consumption functionality is enabled in Kueue config.
ConsumedResources map[string]resource.Quantity `json:"consumedResources,omitempty"`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to have a field in status for the result of applying weights to ConsumedResources so it's easier to monitor and debug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a need for that - that will be a duplication of data. Let's not add it now and wait for explicit user request.


* Establish a method for how shared resource usage is calculated and recorded and how users can fine tune the mechanism.
* Allow to specify a fair admission scope at either individual Cluster Queue or Cohort scope.
* Allow to specify the relative importance of LocalQueues targeting the same ClusterQueue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is solved via hierarchical cohorts and the fairSharing.weight I think we shouldn't complicate the LocalQueue API. Maintaining it as a pointer to a ClusterQueue feels the most natural to me rather than splitting configuration details

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could approximate the solution by making admission scope at the cohort level only and use cluster queues as entry points/proxies/local queues. However I'm not 100% sure that we need this level of complication. The KEP covers it just in case so we have the full picture, but suggests starting small. Maybe doing admission fair sharing only at CQ+LQ level would be enough for most users needing the feature. Implementation-wise CQ+LQ is much simpler and introduces far less changes to already complex admission algorithm than going full hierarchical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fair sharing mechanism without preemptions
6 participants