Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LoadBalancingPolicy field in InferencePool #404

Open
Kuromesi opened this issue Feb 25, 2025 · 0 comments
Open

Add LoadBalancingPolicy field in InferencePool #404

Kuromesi opened this issue Feb 25, 2025 · 0 comments

Comments

@Kuromesi
Copy link
Contributor

Kuromesi commented Feb 25, 2025

What would you like to be added:
As discussed in #169, I think we should provide LoadBalancingPolicy in InferencePool spec so that users can configure different policy and choose the one with better performance in their environment.

Maybe the schema like:

type LoadBalancingPolicy string

const (
	DefaultLoadBalancingPolicy LoadBalancingPolicy = "default"
)

type LoadBalancing struct {
	// Policy specifies the load balancing policy to use when routing requests to the endpoints of the pool.
	//
	// +kubebuilder:validation:Optional
	// +kubebuilder:default="default"
	Policy LoadBalancingPolicy `json:"policy,omitempty"`

	// Maybe provide some customization about the filter?
	// QueueingThreshold specifies the number of requests that can be queued before the proxy starts dropping requests.
	//
	// +kubebuilder:validation:Optional
	QueueThresholdCritical int `json:"queueThresholdCritical,omitempty"`

	// QueueingThreshold specifies the number of requests that can be queued before the proxy starts dropping requests.
	//
	// +kubebuilder:validation:Optional
	QueueingThresholdLoRA int `json:"queueingThresholdLoRA,omitempty"`
}

// InferencePoolSpec defines the desired state of InferencePool
type InferencePoolSpec struct {
	// Selector defines a map of labels to watch model server pods
	// that should be included in the InferencePool.
	// In some cases, implementations may translate this field to a Service selector, so this matches the simple
	// map used for Service selectors instead of the full Kubernetes LabelSelector type.
	//
	// +kubebuilder:validation:Required
	Selector map[LabelKey]LabelValue `json:"selector"`

	// TargetPortNumber defines the port number to access the selected model servers.
	// The number must be in the range 1 to 65535.
	//
	// +kubebuilder:validation:Minimum=1
	// +kubebuilder:validation:Maximum=65535
	// +kubebuilder:validation:Required
	TargetPortNumber int32 `json:"targetPortNumber"`

	// LoadBalancing provider load balancing options.
	//
	// +kubebuilder:validation:Optional
	LoadBalancing LoadBalancing `json:"loadBalancingPolicy,omitempty"`

	// EndpointPickerConfig specifies the configuration needed by the proxy to discover and connect to the endpoint
	// picker service that picks endpoints for the requests routed to this pool.
	EndpointPickerConfig `json:",inline"`
}

Why is this needed:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant