Skip to content

Commit 4450cd9

Browse files
authored
update proxy docs (#3796)
1 parent 54575db commit 4450cd9

File tree

2 files changed

+16
-2
lines changed

2 files changed

+16
-2
lines changed

docs/en/llm/proxy_server.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ The request distributor service can parallelize multiple api_server services. Us
77
Start the proxy service:
88

99
```shell
10-
lmdeploy serve proxy --server-name {server_name} --server-port {server_port} --strategy "min_expected_latency"
10+
lmdeploy serve proxy --server-name {server_name} --server-port {server_port} --routing-strategy "min_expected_latency" --serving-strategy Hybrid
1111
```
1212

1313
After startup is successful, the URL of the proxy service will also be printed by the script. Access this URL in your browser to open the Swagger UI.
@@ -88,6 +88,13 @@ response = requests.post(url, headers=headers, data='', params=params)
8888
print(response.text)
8989
```
9090

91+
## Serving Strategy
92+
93+
LMDeploy currently supports two serving strategies:
94+
95+
- Hybrid: Does not distinguish between Prefill and Decoding instances, following the traditional inference deployment mode.
96+
- DistServe: Separates Prefill and Decoding instances, deploying them on different service nodes to achieve more flexible and efficient resource scheduling and scalability.
97+
9198
## Dispatch Strategy
9299

93100
The current distribution strategies of the proxy service are as follows:

docs/zh_cn/llm/proxy_server.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
启动代理服务:
88

99
```shell
10-
lmdeploy serve proxy --server-name {server_name} --server-port {server_port} --strategy "min_expected_latency"
10+
lmdeploy serve proxy --server-name {server_name} --server-port {server_port} --routing-strategy "min_expected_latency" --serving-strategy Hybrid
1111
```
1212

1313
启动成功后,代理服务的 URL 也会被脚本打印。浏览器访问这个 URL,可以打开 Swagger UI。
@@ -87,6 +87,13 @@ response = requests.post(url, headers=headers, data='', params=params)
8787
print(response.text)
8888
```
8989

90+
## 服务策略
91+
92+
LMDeploy 当前支持混合部署服务(Hybrid),以及 PD 分离部署服务(DistServe)
93+
94+
- Hybrid: 不区分 Prefill 和 Decoding 实例,即传统的推理部署模式。
95+
- DistServe: 将 Prefill 和 Decoding 实例分离,部署在不同的服务节点上以实现更灵活高效的资源调度和扩展。
96+
9097
## 分发策略
9198

9299
代理服务目前的分发策略如下:

0 commit comments

Comments
 (0)