Slight cleanup of some of our readmes (#221)

* Slight cleanup of some of our readmes * testing site build issue * Adding a note that you need envoy gateway to work to use something that depends on envoy gateway * Feedback fixes * restructuring and feedback comments * removing make install
kubernetes-sigs · Jan 27, 2025 · 86178fb · 86178fb
1 parent fbe77dd
commit 86178fb
Show file tree

Hide file tree

Showing 11 changed files with 40 additions and 29 deletions.
diff --git a/README.md b/README.md
@@ -8,25 +8,13 @@ This extension is intented to provide value to multiplexed LLM services on a sha
 
 This project is currently in development. 
 
-For more rapid testing, our PoC is in the `./examples/` dir.
-
-
 ## Getting Started
 
-**Install the CRDs into the cluster:**
-
-```sh
-make install
-```
-
-**Delete the APIs(CRDs) from the cluster:**
+Follow this [README](./pkg/README.md) to get the inference-extension up and running on your cluster!
 
-```sh
-make uninstall
-```
+## Website
 
-**Deploying the ext-proc image**
-Refer to this [README](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/README.md) on how to deploy the Ext-Proc image.
+Detailed documentation is available on our website: https://gateway-api-inference-extension.sigs.k8s.io/
 
 ## Contributing
 

diff --git a/examples/placeholder.md b/examples/placeholder.md
diff --git a/pkg/README.md b/pkg/README.md
@@ -1,7 +1,11 @@
 ## Quickstart
 
+This quickstart guide is intended for engineers familiar with k8s and model servers (vLLM in this instance). The goal of this guide is to get a first, single InferencePool up and running! 
+
 ### Requirements
-The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher.
+ - Envoy Gateway [v1.2.1](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher
+ - A cluster that has built-in support for `ServiceType=LoadBalancer`. (This can be validated by ensuring your Envoy Gateway is up and running)
+   - For example, with Kind, you can follow these steps: https://kind.sigs.k8s.io/docs/user/loadbalancer
 
 ### Steps
 
@@ -11,30 +15,40 @@ The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.
    Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
    ```bash
    kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
-   kubectl apply -f ../examples/poc/manifests/vllm/vllm-lora-deployment.yaml
+   kubectl apply -f ./manifests/vllm/vllm-lora-deployment.yaml
+   ```
+
+1. **Install the CRDs into the cluster:**
+
+   ```sh
+   kubectl apply -f config/crd/bases
    ```
 
 1. **Deploy InferenceModel and InferencePool**
 
    Deploy a sample InferenceModel and InferencePool configuration based on the vLLM deployments mentioned above.
    ```bash
-   kubectl apply -f ../examples/poc/manifests/inferencepool-with-model.yaml
+   kubectl apply -f ./manifests/inferencepool-with-model.yaml
    ```
 
 1. **Update Envoy Gateway Config to enable Patch Policy**
 
    Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run:
    ```bash
-   kubectl apply -f ./manifests/enable_patch_policy.yaml
+   kubectl apply -f ./manifests/gateway/enable_patch_policy.yaml
    kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system
    ```
    Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again.
 
 1. **Deploy Gateway**
 
    ```bash
-   kubectl apply -f ./manifests/gateway.yaml
+   kubectl apply -f ./manifests/gateway/gateway.yaml
    ```
+   > **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./manifests/gateway/ext-proc.yaml` file, and an additional `./manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.***
+   
+
+
 
 1. **Deploy Ext-Proc**
 
@@ -45,8 +59,17 @@ The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.
 1. **Deploy Envoy Gateway Custom Policies**
 
    ```bash
-   kubectl apply -f ./manifests/extension_policy.yaml
-   kubectl apply -f ./manifests/patch_policy.yaml
+   kubectl apply -f ./manifests/gateway/extension_policy.yaml
+   kubectl apply -f ./manifests/gateway/patch_policy.yaml
+   ```
+   > **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
+
+1. **OPTIONALLY**: Apply Traffic Policy
+
+   For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors.
+
+   ```bash
+   kubectl apply -f ./manifests/gateway/traffic_policy.yaml
    ```
 
 1. **Try it out**
@@ -63,10 +86,4 @@ The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.
    "max_tokens": 100,
    "temperature": 0
    }'
-   ```
-
-## Scheduling Package in Ext Proc
-The scheduling package implements request scheduling algorithms for load balancing requests across backend pods in an inference gateway. The scheduler ensures efficient resource utilization while maintaining low latency and prioritizing critical requests. It applies a series of filters based on metrics and heuristics to select the best pod for a given request.
-
-# Flowchart
-<img src="../docs/schedular-flowchart.png" alt="Scheduling Algorithm" width="400" />
+   ```
diff --git a/pkg/manifests/enable_patch_policy.yaml → ...anifests/gateway/enable_patch_policy.yaml b/pkg/manifests/enable_patch_policy.yaml → ...anifests/gateway/enable_patch_policy.yaml
@@ -5,6 +5,7 @@ metadata:
   namespace: envoy-gateway-system
 data:
 # This manifest's main purpose is to set `enabledEnvoyPatchPolicy` to `true`.
+# This only needs to be ran once on your cluster (unless you'd like to change anything. i.e. enabling the admin dash)
 # Any field under `admin` is optional, and only for enabling the admin endpoints, for debugging.
 # Admin Interface: https://www.envoyproxy.io/docs/envoy/latest/operations/admin
 # PatchPolicy docs: https://gateway.envoyproxy.io/docs/tasks/extensibility/envoy-patch-policy/#enable-envoypatchpolicy 

diff --git a/pkg/manifests/extension_policy.yaml → pkg/manifests/gateway/extension_policy.yaml b/pkg/manifests/extension_policy.yaml → pkg/manifests/gateway/extension_policy.yaml
diff --git a/pkg/manifests/gateway.yaml → pkg/manifests/gateway/gateway.yaml b/pkg/manifests/gateway.yaml → pkg/manifests/gateway/gateway.yaml
diff --git a/pkg/manifests/patch_policy.yaml → pkg/manifests/gateway/patch_policy.yaml b/pkg/manifests/patch_policy.yaml → pkg/manifests/gateway/patch_policy.yaml
diff --git a/pkg/manifests/traffic_policy.yaml → pkg/manifests/gateway/traffic_policy.yaml b/pkg/manifests/traffic_policy.yaml → pkg/manifests/gateway/traffic_policy.yaml
diff --git a/...c/manifests/inferencepool-with-model.yaml → pkg/manifests/inferencepool-with-model.yaml b/...c/manifests/inferencepool-with-model.yaml → pkg/manifests/inferencepool-with-model.yaml
diff --git a/.../manifests/vllm/vllm-lora-deployment.yaml → pkg/manifests/vllm/vllm-lora-deployment.yaml b/.../manifests/vllm/vllm-lora-deployment.yaml → pkg/manifests/vllm/vllm-lora-deployment.yaml
diff --git a/pkg/scheduling.md b/pkg/scheduling.md
@@ -0,0 +1,5 @@
+## Scheduling Package in Ext Proc
+The scheduling package implements request scheduling algorithms for load balancing requests across backend pods in an inference gateway. The scheduler ensures efficient resource utilization while maintaining low latency and prioritizing critical requests. It applies a series of filters based on metrics and heuristics to select the best pod for a given request.
+
+# Flowchart
+<img src="../docs/schedular-flowchart.png" alt="Scheduling Algorithm" width="400" />