Kueue scheduler fragmentation optimization #4329

shaowei-su · 2025-02-19T18:00:31Z

What would you like to be added:
In the current Kueue implementation, each queue resources are simply added up (e.g total of 8 GPUs) without the awareness of the actual topology (e.g 1 * 8 vs 2 * 4). As a result, Kueue would admit workload that "admittable" but cannot be scheduled at runtime. This wrongly admitted workload would pending indefinitely until previously workloads free up the resources, while blocking any new workload that could have been running (e.g requesting single GPU) from running. As a result, the fragmentation issue would lead to low cluster allocation rate overall.

The suggested solution would be re-schedule if the fragmentation issue happens, and permit future workloads that immediately schedulable to be admitted.

Why is this needed:
Further improve the cluster allocation rate.

Completion requirements:
N/A

This enhancement requires the following artifacts:

Design doc
API change
Docs update

The artifacts should be linked in subsequent comments.

tenzen-y · 2025-02-19T18:08:40Z

You might be intersted in TopologyAwareScheduling: https://kueue.sigs.k8s.io/docs/concepts/topology_aware_scheduling/

Note that this feature is still alpha.

shaowei-su · 2025-02-19T18:17:44Z

Thanks @tenzen-y ! I'm not aware of this alpha feature, checking it out now.

tenzen-y · 2025-02-19T18:20:05Z

Thanks @tenzen-y ! I'm not aware of this alpha feature, checking it out now.

Does TAS satisfy your request?

shaowei-su · 2025-02-19T19:36:38Z

Hey @tenzen-y , I read through the docs - it looks like TAS is addressing the static cluster topology (racks, blocks etc..), but the challenge in this issue is mostly around runtime deployment topology (i.e how many available resources per node at the scheduling time). So I'm afraid TAS alone won't solve this issue, but please correct me if I'm wrong.

tenzen-y · 2025-02-19T19:38:56Z

Hey @tenzen-y , I read through the docs - it looks like TAS is addressing the static cluster topology (racks, blocks etc..), but the challenge in this issue is mostly around runtime deployment topology (i.e how many available resources per node at the scheduling time). So I'm afraid TAS alone won't solve this issue, but please correct me if I'm wrong.

In that case, you can use flat topology by "kubernetes.io/hostname".

tenzen-y · 2025-02-19T19:42:40Z

Hey @tenzen-y , I read through the docs - it looks like TAS is addressing the static cluster topology (racks, blocks etc..), but the challenge in this issue is mostly around runtime deployment topology (i.e how many available resources per node at the scheduling time). So I'm afraid TAS alone won't solve this issue, but please correct me if I'm wrong.

In that case, you can use flat topology by "kubernetes.io/hostname".

If you specify "kubernetes.io/hostname" for topology, Kueue traverses all Node's allocatable resources, and packing Pods as much as possible to nodes (similar to kube-scheduler mostAllocated).

shaowei-su · 2025-02-19T20:38:04Z

Thanks, we'll test this out and keep this issue updated.

tenzen-y · 2025-02-19T21:10:50Z

Thanks, we'll test this out and keep this issue updated.

I would recommend using the main branch to confirm all features for TAS since only the main branch is guaranteed to support obviously "mostAllocated" scheduling. The older released versions do not support obviously "mostAllocated" scheduling.

shaowei-su added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kueue scheduler fragmentation optimization #4329

Kueue scheduler fragmentation optimization #4329

shaowei-su commented Feb 19, 2025

tenzen-y commented Feb 19, 2025 •

edited

Loading

shaowei-su commented Feb 19, 2025

tenzen-y commented Feb 19, 2025

shaowei-su commented Feb 19, 2025

tenzen-y commented Feb 19, 2025

tenzen-y commented Feb 19, 2025

shaowei-su commented Feb 19, 2025

tenzen-y commented Feb 19, 2025 •

edited

Loading

Kueue scheduler fragmentation optimization #4329

Kueue scheduler fragmentation optimization #4329

Comments

shaowei-su commented Feb 19, 2025

tenzen-y commented Feb 19, 2025 • edited Loading

shaowei-su commented Feb 19, 2025

tenzen-y commented Feb 19, 2025

shaowei-su commented Feb 19, 2025

tenzen-y commented Feb 19, 2025

tenzen-y commented Feb 19, 2025

shaowei-su commented Feb 19, 2025

tenzen-y commented Feb 19, 2025 • edited Loading

tenzen-y commented Feb 19, 2025 •

edited

Loading

tenzen-y commented Feb 19, 2025 •

edited

Loading