-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-4136: Admission Fair Sharing #4252
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mwielgus The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
* Allow to specify the relative importance of LocalQueues targeting the same ClusterQueue. | ||
* Amend the admission mechanism to work on admission scopes instead of only on ClusterQueues. | ||
* Select the appropriate admission candidates for each of the admission scopes and admit them according to the selected queueing policy. | ||
* Make the new mechanism complementary to the existing preemption-based fair sharing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will these mechanisms work together? Can we specify this in detail, or make these features mutually exclusive - only one of them may be enabled?
I have a bias towards the latter approach, at the very least within the same (root)Cohort.
admission logic. If there are two AdmissionScopes on the path from CQ/Cohort to the top of | ||
the hierarchy tree, the higher one is used. | ||
|
||
Const ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix formatting
|
||
Kueue sorts the workloads by their LQ usage (if mode is not NoFairSharing), priority and | ||
timestamp and tries to admit the first one from the list. If it fails and the second, third | ||
or following is possible then that workload is admitted, under condition that it might get preempted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we indicate that these workloads are preemptible? Is this usage counted against this workload's fair sharing value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue against supporting this mode. It seems complex, when the idea of this mode was to have fair sharing without preemption.
Couldn't a user define some best-effort queue which sits outside of the admission scope, which may be preempted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In best-effort queues workloads may hop over the first one, under condition that the first workload may preempt them if the sum of free resources and these consumed by skip-the-line workloads is greater than the needs of the first workloads.
We may not support this in the first iteration but KEP should at least briefly explain what it could look like if we decided to complete the picture.
nominal quota, then it is admitted immediately, if not it goes into cohort-level fair sharing. | ||
For Cohort we select all the “sticking out” workloads, and sort them by their CQ usage, priority | ||
and timestamp. Kueue attempts to admit the first workload from the list of sticking-out, just | ||
like if it was one big strict FIFO queue. For multi-level hierarchy under one AdmissionScope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the first workload fits from CQ, and after its admission the 2nd workload would require borrowing. Would we consider this 2nd workload at the Cohort level in the same scheduling cycle, or it is always one candidate per level per cycle? I suppose the second model is simpler, so I would favor it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the second but I guess it is a bit of an implementation detail. I would be ok with both.
|
||
usage_sum = (1-A) * previous_usage_sum + A * current_usage. | ||
|
||
The value will be stored in FairSharingStatus for all LQ, CQ, and Cohorts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't only for information, but the working value we'll use upon Kueue restart? Does it belong in Status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where would you put it otherwise?
|
||
* Create a new struct AdmissionScope and make it an optional field for CQ and Cohort Spec. If | ||
not provided, CQ or Cohort is not considered an AdmissionScope and is not a subject for new | ||
admission logic. If there are two AdmissionScopes on the path from CQ/Cohort to the top of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Let's make this invalid state, as the lower scope is doing nothing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because someone may be changing the scope. Scope change is not atomic and we cannot block the entire hierarchy in the meantime.
or following is possible then that workload is admitted, under condition that it might get preempted. | ||
|
||
3. AdmissionScope at Cohort level - Kueue operates in a mixed mode. Inside CQ workloads are | ||
selected according to their AdmissionMode (if specified). If a workload fits entirely into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inside CQ workloads are selected according to their AdmissionMode (if specified)
Only highest AdmissionScope
is used. Do you mean Queueing Policy (FIFO/BestEffort)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The candidates inside individual CQs are selected based on the specified logic and bubbled up.
// FairSharing based on usage, with QueuingStrategy as defined in CQ. | ||
UsageBasedFairSharing AdmissionMode = UsageBasedFairSharing | ||
|
||
NoFairSharing AdmissionMode = NoFairSharing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it differ from excluding a CQ/Cohort from the new logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly this. However CQ may be in a cohort that is an admission scope. Marking CQ as no fair sharing will skip any fair sharing among LQs.
// with decaying function applied. | ||
// The value is populated if usage consumption functionality is enabled in Kueue config. | ||
ConsumedResources map[string]resource.Quantity `json:"consumedResources,omitempty"` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to have a field in status for the result of applying weights to ConsumedResources
so it's easier to monitor and debug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a need for that - that will be a duplication of data. Let's not add it now and wait for explicit user request.
|
||
* Establish a method for how shared resource usage is calculated and recorded and how users can fine tune the mechanism. | ||
* Allow to specify a fair admission scope at either individual Cluster Queue or Cohort scope. | ||
* Allow to specify the relative importance of LocalQueues targeting the same ClusterQueue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is solved via hierarchical cohorts and the fairSharing.weight
I think we shouldn't complicate the LocalQueue API. Maintaining it as a pointer to a ClusterQueue feels the most natural to me rather than splitting configuration details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could approximate the solution by making admission scope at the cohort level only and use cluster queues as entry points/proxies/local queues. However I'm not 100% sure that we need this level of complication. The KEP covers it just in case so we have the full picture, but suggests starting small. Maybe doing admission fair sharing only at CQ+LQ level would be enough for most users needing the feature. Implementation-wise CQ+LQ is much simpler and introduces far less changes to already complex admission algorithm than going full hierarchical.
What type of PR is this?
/kind feature
/kind api-change
What this PR does / why we need it:
KEP for a new resource fair sharing method.
Which issue(s) this PR fixes:
Fixes #4136
Special notes for your reviewer:
Does this PR introduce a user-facing change?