You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What would you like to be added:
As discussed in #169, I think we should provide LoadBalancingPolicy in InferencePool spec so that users can configure different policy and choose the one with better performance in their environment.
Maybe the schema like:
typeLoadBalancingPolicystringconst (
DefaultLoadBalancingPolicyLoadBalancingPolicy="default"
)
typeLoadBalancingstruct {
// Policy specifies the load balancing policy to use when routing requests to the endpoints of the pool.//// +kubebuilder:validation:Optional// +kubebuilder:default="default"PolicyLoadBalancingPolicy`json:"policy,omitempty"`// Maybe provide some customization about the filter?// QueueingThreshold specifies the number of requests that can be queued before the proxy starts dropping requests.//// +kubebuilder:validation:OptionalQueueThresholdCriticalint`json:"queueThresholdCritical,omitempty"`// QueueingThreshold specifies the number of requests that can be queued before the proxy starts dropping requests.//// +kubebuilder:validation:OptionalQueueingThresholdLoRAint`json:"queueingThresholdLoRA,omitempty"`
}
// InferencePoolSpec defines the desired state of InferencePooltypeInferencePoolSpecstruct {
// Selector defines a map of labels to watch model server pods// that should be included in the InferencePool.// In some cases, implementations may translate this field to a Service selector, so this matches the simple// map used for Service selectors instead of the full Kubernetes LabelSelector type.//// +kubebuilder:validation:RequiredSelectormap[LabelKey]LabelValue`json:"selector"`// TargetPortNumber defines the port number to access the selected model servers.// The number must be in the range 1 to 65535.//// +kubebuilder:validation:Minimum=1// +kubebuilder:validation:Maximum=65535// +kubebuilder:validation:RequiredTargetPortNumberint32`json:"targetPortNumber"`// LoadBalancing provider load balancing options.//// +kubebuilder:validation:OptionalLoadBalancingLoadBalancing`json:"loadBalancingPolicy,omitempty"`// EndpointPickerConfig specifies the configuration needed by the proxy to discover and connect to the endpoint// picker service that picks endpoints for the requests routed to this pool.EndpointPickerConfig`json:",inline"`
}
Why is this needed:
The text was updated successfully, but these errors were encountered:
What would you like to be added:
As discussed in #169, I think we should provide LoadBalancingPolicy in InferencePool spec so that users can configure different policy and choose the one with better performance in their environment.
Maybe the schema like:
Why is this needed:
The text was updated successfully, but these errors were encountered: