Skip to content

Commit

Permalink
Slight cleanup of some of our readmes (#221)
Browse files Browse the repository at this point in the history
* Slight cleanup of some of our readmes

* testing site build issue

* Adding a note that you need envoy gateway to work to use something that depends on envoy gateway

* Feedback fixes

* restructuring and feedback comments

* removing make install
  • Loading branch information
kfswain authored Jan 27, 2025
1 parent fbe77dd commit 86178fb
Show file tree
Hide file tree
Showing 11 changed files with 40 additions and 29 deletions.
18 changes: 3 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,25 +8,13 @@ This extension is intented to provide value to multiplexed LLM services on a sha

This project is currently in development.

For more rapid testing, our PoC is in the `./examples/` dir.


## Getting Started

**Install the CRDs into the cluster:**

```sh
make install
```

**Delete the APIs(CRDs) from the cluster:**
Follow this [README](./pkg/README.md) to get the inference-extension up and running on your cluster!

```sh
make uninstall
```
## Website

**Deploying the ext-proc image**
Refer to this [README](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/README.md) on how to deploy the Ext-Proc image.
Detailed documentation is available on our website: https://gateway-api-inference-extension.sigs.k8s.io/

## Contributing

Expand Down
Empty file added examples/placeholder.md
Empty file.
45 changes: 31 additions & 14 deletions pkg/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
## Quickstart

This quickstart guide is intended for engineers familiar with k8s and model servers (vLLM in this instance). The goal of this guide is to get a first, single InferencePool up and running!

### Requirements
The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher.
- Envoy Gateway [v1.2.1](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher
- A cluster that has built-in support for `ServiceType=LoadBalancer`. (This can be validated by ensuring your Envoy Gateway is up and running)
- For example, with Kind, you can follow these steps: https://kind.sigs.k8s.io/docs/user/loadbalancer

### Steps

Expand All @@ -11,30 +15,40 @@ The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
```bash
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
kubectl apply -f ../examples/poc/manifests/vllm/vllm-lora-deployment.yaml
kubectl apply -f ./manifests/vllm/vllm-lora-deployment.yaml
```

1. **Install the CRDs into the cluster:**

```sh
kubectl apply -f config/crd/bases
```

1. **Deploy InferenceModel and InferencePool**

Deploy a sample InferenceModel and InferencePool configuration based on the vLLM deployments mentioned above.
```bash
kubectl apply -f ../examples/poc/manifests/inferencepool-with-model.yaml
kubectl apply -f ./manifests/inferencepool-with-model.yaml
```

1. **Update Envoy Gateway Config to enable Patch Policy**

Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run:
```bash
kubectl apply -f ./manifests/enable_patch_policy.yaml
kubectl apply -f ./manifests/gateway/enable_patch_policy.yaml
kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system
```
Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again.

1. **Deploy Gateway**

```bash
kubectl apply -f ./manifests/gateway.yaml
kubectl apply -f ./manifests/gateway/gateway.yaml
```
> **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./manifests/gateway/ext-proc.yaml` file, and an additional `./manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.***



1. **Deploy Ext-Proc**

Expand All @@ -45,8 +59,17 @@ The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.
1. **Deploy Envoy Gateway Custom Policies**

```bash
kubectl apply -f ./manifests/extension_policy.yaml
kubectl apply -f ./manifests/patch_policy.yaml
kubectl apply -f ./manifests/gateway/extension_policy.yaml
kubectl apply -f ./manifests/gateway/patch_policy.yaml
```
> **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
1. **OPTIONALLY**: Apply Traffic Policy

For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors.

```bash
kubectl apply -f ./manifests/gateway/traffic_policy.yaml
```

1. **Try it out**
Expand All @@ -63,10 +86,4 @@ The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.
"max_tokens": 100,
"temperature": 0
}'
```

## Scheduling Package in Ext Proc
The scheduling package implements request scheduling algorithms for load balancing requests across backend pods in an inference gateway. The scheduler ensures efficient resource utilization while maintaining low latency and prioritizing critical requests. It applies a series of filters based on metrics and heuristics to select the best pod for a given request.

# Flowchart
<img src="../docs/schedular-flowchart.png" alt="Scheduling Algorithm" width="400" />
```
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ metadata:
namespace: envoy-gateway-system
data:
# This manifest's main purpose is to set `enabledEnvoyPatchPolicy` to `true`.
# This only needs to be ran once on your cluster (unless you'd like to change anything. i.e. enabling the admin dash)
# Any field under `admin` is optional, and only for enabling the admin endpoints, for debugging.
# Admin Interface: https://www.envoyproxy.io/docs/envoy/latest/operations/admin
# PatchPolicy docs: https://gateway.envoyproxy.io/docs/tasks/extensibility/envoy-patch-policy/#enable-envoypatchpolicy
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
5 changes: 5 additions & 0 deletions pkg/scheduling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
## Scheduling Package in Ext Proc
The scheduling package implements request scheduling algorithms for load balancing requests across backend pods in an inference gateway. The scheduler ensures efficient resource utilization while maintaining low latency and prioritizing critical requests. It applies a series of filters based on metrics and heuristics to select the best pod for a given request.

# Flowchart
<img src="../docs/schedular-flowchart.png" alt="Scheduling Algorithm" width="400" />

0 comments on commit 86178fb

Please sign in to comment.