Skip to content

Commit e6ffb13

Browse files
committed
add elastic proxy
Signed-off-by: yuxinshan <[email protected]> Signed-off-by: CalvinXKY <[email protected]>
1 parent 8c65009 commit e6ffb13

File tree

2 files changed

+628
-0
lines changed

2 files changed

+628
-0
lines changed

examples/elastic_scaling/README.md

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
This file provides a elastic proxy demo to support elastic scaling for P/D instances based on KV pool.
2+
3+
We can launch multiple vllm instances (2 for prefill and 2 for decode), and
4+
launch this proxy demo through:
5+
6+
```shell
7+
export ADMIN_API_KEY=YOUR_ADMIN_API_KEY
8+
python3 examples/elastic_scaling/elastic_proxy.py \
9+
--model $model_name \
10+
--prefill localhost:8100 localhost:8101 \
11+
--decode localhost:8200 localhost:8201 \
12+
--port 8000
13+
```
14+
15+
After the proxy is deployed:
16+
```text
17+
INFO: Started server process [xxxxx]
18+
INFO: Waiting for application startup.
19+
INFO: Application startup complete.
20+
INFO: Uvicorn running on http://0.0.0.0:xxxx
21+
```
22+
23+
### Support API routes
24+
* `/v1/completions`: get completions request response.
25+
* `/v1/chat/completions`: get chat request response.
26+
* `/status`: get the supported prefill nodes and decode nodes list.
27+
* `/instances/add`: add prefill nodes or decode nodes to the list.
28+
* `/instances/remove`: remove prefill nodes or decode nodes from the list.
29+
30+
Examples:
31+
#### get request response
32+
```shell
33+
# /v1/completions
34+
curl -X POST http://0.0.0.0:8000/v1/completions \
35+
-H "Content-Type: application/json" \
36+
-d '{"model": "'$model_name'", "max_tokens": 50, "prompt": "hello"}'
37+
38+
# /v1/chat/completions
39+
curl -X POST http://0.0.0.0:8000/v1/chat/completions \
40+
-H "Content-Type: application/json" \
41+
-d '{"model": "'$model_name'", "max_tokens": 50,
42+
"messages": [{
43+
"role": "user",
44+
"content": "hello"
45+
}]}'
46+
```
47+
48+
#### get server status
49+
```shell
50+
# /status
51+
curl -X POST http://0.0.0.0:8000/status
52+
```
53+
The response:
54+
```text
55+
{"prefill_node_count":x,"decode_node_count":x,"prefill_nodes":[xx.xx.xx.xx:xxxx],"decode_nodes":[xx.xx.xx.xx:xxxx]}
56+
```
57+
58+
#### add nodes to the server
59+
```shell
60+
# /instance/add
61+
curl -X POST http://0.0.0.0:8000/instances/add \
62+
-H "Content-Type: application/json" \
63+
-H "X-Api-Key: YOUR_ADMIN_API_KEY" \
64+
-d '{"type": "prefill", "instance": "0.0.0.0:8100"}'
65+
```
66+
* Case 1: If the node is not available, the server will waiting for the node to be available:
67+
```text
68+
INFO: Verifying xx.xx.xx.xx:xxxx ...
69+
ERROR: Cannot connect to host xx.xx.xx.xx:xxxx ...
70+
INFO: Waiting for prefill_instance xx.xx.xx.xx:xxxx to start.
71+
INFO: Verifying xx.xx.xx.xx:xxxx ...
72+
...
73+
```
74+
The response:
75+
```text
76+
{"message":"Waiting for prefill_instance xx.xx.xx.xx:xxxx to start."}
77+
```
78+
* Case 2: If the node is available, try to add the node to the server:
79+
```text
80+
INFO: Verifying xx.xx.xx.xx:xxxx ...
81+
INFO: Instance: xx.xx.xx.xx:xxxx could be added.
82+
INFO: Added xx.xx.xx.xx:xxxx to prefill_instances. prefill node counts: x, decode node counts: x
83+
```
84+
If the node has been added to the server before:
85+
```text
86+
INFO: prefill_instance xx.xx.xx.xx:xxxx already exists.
87+
```
88+
The response:
89+
```text
90+
{"message":"Added xx.xx.xx.xx:xxxx to prefill_instances."}
91+
```
92+
93+
#### remove nodes from the server
94+
```shell
95+
# /instance/remove
96+
curl -X POST http://0.0.0.0:8000/instances/remove \
97+
-H "Content-Type: application/json" \
98+
-H "X-Api-Key: YOUR_ADMIN_API_KEY" \
99+
-d '{"type": "prefill", "instance": "0.0.0.0:8100"}'
100+
```
101+
After the node is removed:
102+
```text
103+
INFO: Removed xx.xx.xx.xx:xxxx from prefill_instances. prefill node counts: x, decode node counts: x
104+
```
105+
The response:
106+
```text
107+
{"message":"Removed xx.xx.xx.xx:xxxx from prefill_instances."}
108+
```
109+
110+
### Support functions
111+
112+
* Support adding prefill nodes or decode nodes at any time.
113+
- If prefill or decode server has been deployed, proxy can add nodes when the proxy is deployed.
114+
- If prefill or decode server deployed after the proxy deployed, server can use `/instances/add` API to join the proxy server. The prefill server or decode server sends a signal to the proxy server, and the proxy server will check the status of the node util the node is available.
115+
* Support removing nodes for the following two situations:
116+
- Support removing nodes when the prefill or decode server failed more than a certain number of times.
117+
- Support using `/instances/remove` API to delete the node from the proxy server.
118+
* Support elastic scaling.
119+
- When the current node is unavailable, the proxy server will schedule to the next available node.

0 commit comments

Comments
 (0)