Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions docs/best_practices/graceful_shutdown_service.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Graceful Service Node Shutdown Solution

## 1. Core Objective
Achieve graceful shutdown of service nodes, ensuring no in-flight user requests are lost during service termination while maintaining overall cluster availability.

## 2. Solution Overview
This solution combines **Nginx reverse proxy**, **Gunicorn server**, **Uvicorn server**, and **FastAPI** working in collaboration to achieve the objective.

![graceful_shutdown](images/graceful_shutdown.png)

## 3. Component Introduction

### 1. Nginx: Traffic Entry Point and Load Balancer
- **Functions**:
- Acts as a reverse proxy, receiving all external client requests and distributing them to upstream Gunicorn worker nodes according to load balancing policies.
- Actively monitors backend node health status through health check mechanisms.
- Enables instantaneous removal of problematic nodes from the service pool through configuration management, achieving traffic switching.

### 2. Gunicorn: WSGI HTTP Server (Process Manager)
- **Functions**:
- Serves as the master process, managing multiple Uvicorn worker child processes.
- Receives external signals (e.g., `SIGTERM`) and coordinates the graceful shutdown process for all child processes.
- Daemonizes worker processes and automatically restarts them upon abnormal termination, ensuring service robustness.

### 3. Uvicorn: ASGI Server (Worker Process)
- **Functions**:
- Functions as a Gunicorn-managed worker, actually handling HTTP requests.
- Runs the FastAPI application instance, processing specific business logic.
- Implements the ASGI protocol, supporting asynchronous request processing for high performance.

---

## Advantages

1. **Nginx**:
- Can quickly isolate faulty nodes, ensuring overall service availability.
- Allows configuration updates without downtime using `nginx -s reload`, making it transparent to users.

2. **Gunicorn** (Compared to Uvicorn's native multi-worker mode):
- **Mature Process Management**: Built-in comprehensive process spawning, recycling, and management logic, eliminating the need for custom implementation.
- **Process Daemon Capability**: The Gunicorn Master automatically forks new Workers if they crash, whereas in Uvicorn's `--workers` mode, any crashed process is not restarted and requires an external daemon.
- **Rich Configuration**: Offers numerous parameters for adjusting timeouts, number of workers, restart policies, etc.

3. **Uvicorn**:
- Extremely fast, built on uvloop and httptools.
- Natively supports graceful shutdown: upon receiving a shutdown signal, it stops accepting new connections and waits for existing requests to complete before exiting.

---

## Graceful Shutdown Procedure

When a specific node needs to be taken offline, the steps are as follows:

1. **Nginx Monitors Node Health Status**:
- Monitors the node's health status by periodically sending health check requests to it.

2. **Removal from Load Balancing**:
- Modify the Nginx configuration to mark the target node as `down` and reload the Nginx configuration.
- Subsequently, all new requests will no longer be sent to the target node.

3. **Gunicorn Server**:
- Monitors for stop signals. Upon receiving a stop signal (e.g., `SIGTERM`), it relays this signal to all Uvicorn child processes.

4. **Sending the Stop Signal**:
- Send a `SIGTERM` signal to the Uvicorn process on the target node, triggering Uvicorn's graceful shutdown process.

5. **Waiting for Request Processing**:
- Wait for a period slightly longer than `timeout_graceful_shutdown` before forcefully terminating the service, allowing the node sufficient time to complete processing all received requests.

6. **Shutdown Completion**:
- The node has now processed all remaining requests and exited safely.
Binary file added docs/best_practices/images/graceful_shutdown.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
71 changes: 71 additions & 0 deletions docs/zh/best_practices/graceful_shutdown_service.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# 服务节点优雅关闭方案

## 1. 核心目标
实现服务节点的优雅关闭,确保在停止服务时不丢失任何正在处理的用户请求,同时不影响整个集群的可用性。

## 2. 实现方案说明
该方案通过结合 **Nginx 反向代理**、**Gunicorn 服务器**、**Uvicorn 服务器** 和 **FastAPI** 协作来实现目标。

![graceful_shutdown](images/graceful_shutdown.png)

## 3. 组件介绍

### 1. Nginx:流量入口与负载均衡器
- **功能**:
- 作为反向代理,接收所有外部客户端请求并按负载均衡策略分发到上游(Upstream)的 Gunicorn 工作节点。
- 通过健康检查机制主动监控后端节点的健康状态。
- 通过配置管理,能够瞬时地将问题节点从服务池中摘除,实现流量切换。

### 2. Gunicorn:WSGI HTTP 服务器(进程管理器)
- **功能**:
- 作为主进程(Master Process),负责管理多个 Uvicorn 工作子进程(Worker Process)。
- 接收外部信号(如 `SIGTERM`),并协调所有子进程的优雅关闭流程。
- 守护工作进程,在进程异常退出时自动重启,保证服务健壮性。

### 3. Uvicorn:ASGI 服务器(工作进程)
- **功能**:
- 作为 Gunicorn 管理的 Worker,实际负责处理 HTTP 请求。
- 运行 FastAPI 应用实例,处理具体的业务逻辑。
- 实现 ASGI 协议,支持异步请求处理,高性能。

---

## 优势

1. **Nginx**:
- 能够快速隔离故障节点,保证整体服务的可用性。
- 通过 `nginx -s reload` 可不停机更新配置,对用户无感知。

2. **Gunicorn**(相比于 Uvicorn 原生的多 Worker):
- **成熟的进程管理**:内置了完善的进程生成、回收、管理逻辑,无需自己实现。
- **进程守护能力**:Gunicorn Master 会在 Worker 异常退出后自动 fork 新 Worker,而 Uvicorn `--workers` 模式下任何进程崩溃都不会被重新拉起,需要外部守护进程。
- **配置丰富**:提供大量参数用于调整超时、Worker 数量、重启策略等。

3. **Uvicorn**:
- 基于 uvloop 和 httptools,速度极快。
- 原生支持优雅关闭:在收到关闭信号后,会停止接受新连接,并等待现有请求处理完成后再退出。

---

## 优雅关闭流程

当需要下线某个特定节点时,步骤如下:

1. **Nginx 监控节点状态是否健康**:
- 通过向节点定时发送 health 请求,监控节点的健康状态。

2. **从负载均衡中摘除**:
- 修改 Nginx 配置,将该节点标记为 `down` 状态,并重载 Nginx 配置。
- 此后,所有新请求将不再被发送到目标节点。

3. **Gunicorn 服务器**:
- 监控停止信号,收到停止信号(如 `SIGTERM` 信号)时,会把此信号向所有的 Uvicorn 子进程发送。

4. **发送停止信号**:
- 向目标节点的 Uvicorn 进程发送 `SIGTERM` 信号,触发 Uvicorn 的优雅关闭流程。

5. **等待请求处理**:
- 等待一段稍长于 `timeout_graceful_shutdown` 的时间后强制终止服务,让该节点有充足的时间完成所有已接收请求的处理。

6. **关闭完成**:
- 此时,该节点已经处理完所有存量请求并安全退出。
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 9 additions & 0 deletions fastdeploy/entrypoints/openai/api_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,9 +77,17 @@
help="max waiting time for connection, if set value -1 means no waiting time limit",
)
parser.add_argument("--max-concurrency", default=512, type=int, help="max concurrency")

parser.add_argument(
"--enable-mm-output", action="store_true", help="Enable 'multimodal_content' field in response output. "
)
parser.add_argument(
"--timeout-graceful-shutdown",
default=0,
Comment on lines +84 to +86
Copy link

Copilot AI Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The default value of 0 seconds effectively disables graceful shutdown. Consider using a more reasonable default like 30 seconds to provide better out-of-the-box behavior for production deployments.

Suggested change
parser.add_argument(
"--timeout-graceful-shutdown",
default=0,
default=30,

Copilot uses AI. Check for mistakes.
type=int,
help="timeout for graceful shutdown in seconds (used by uvicorn)",
)
Comment on lines +84 to +89
Copy link

Copilot AI Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add input validation to ensure the timeout value is non-negative. Negative values don't make sense for a timeout parameter and could cause unexpected behavior.

Copilot uses AI. Check for mistakes.

parser = EngineArgs.add_cli_args(parser)
args = parser.parse_args()

Expand Down Expand Up @@ -431,6 +439,7 @@ def launch_api_server() -> None:
workers=args.workers,
log_config=UVICORN_CONFIG,
log_level="info",
timeout_graceful_shutdown=args.timeout_graceful_shutdown,
) # set log level to error to avoid log
except Exception as e:
api_server_logger.error(f"launch sync http server error, {e}, {str(traceback.format_exc())}")
Expand Down
Loading