diff --git a/docs/best_practices/graceful_shutdown_service.md b/docs/best_practices/graceful_shutdown_service.md new file mode 100644 index 00000000000..ea1b13e12ee --- /dev/null +++ b/docs/best_practices/graceful_shutdown_service.md @@ -0,0 +1,71 @@ +# Graceful Service Node Shutdown Solution + +## 1. Core Objective +Achieve graceful shutdown of service nodes, ensuring no in-flight user requests are lost during service termination while maintaining overall cluster availability. + +## 2. Solution Overview +This solution combines **Nginx reverse proxy**, **Gunicorn server**, **Uvicorn server**, and **FastAPI** working in collaboration to achieve the objective. + +![graceful_shutdown](images/graceful_shutdown.png) + +## 3. Component Introduction + +### 1. Nginx: Traffic Entry Point and Load Balancer +- **Functions**: + - Acts as a reverse proxy, receiving all external client requests and distributing them to upstream Gunicorn worker nodes according to load balancing policies. + - Actively monitors backend node health status through health check mechanisms. + - Enables instantaneous removal of problematic nodes from the service pool through configuration management, achieving traffic switching. + +### 2. Gunicorn: WSGI HTTP Server (Process Manager) +- **Functions**: + - Serves as the master process, managing multiple Uvicorn worker child processes. + - Receives external signals (e.g., `SIGTERM`) and coordinates the graceful shutdown process for all child processes. + - Daemonizes worker processes and automatically restarts them upon abnormal termination, ensuring service robustness. + +### 3. Uvicorn: ASGI Server (Worker Process) +- **Functions**: + - Functions as a Gunicorn-managed worker, actually handling HTTP requests. + - Runs the FastAPI application instance, processing specific business logic. + - Implements the ASGI protocol, supporting asynchronous request processing for high performance. + +--- + +## Advantages + +1. **Nginx**: + - Can quickly isolate faulty nodes, ensuring overall service availability. + - Allows configuration updates without downtime using `nginx -s reload`, making it transparent to users. + +2. **Gunicorn** (Compared to Uvicorn's native multi-worker mode): + - **Mature Process Management**: Built-in comprehensive process spawning, recycling, and management logic, eliminating the need for custom implementation. + - **Process Daemon Capability**: The Gunicorn Master automatically forks new Workers if they crash, whereas in Uvicorn's `--workers` mode, any crashed process is not restarted and requires an external daemon. + - **Rich Configuration**: Offers numerous parameters for adjusting timeouts, number of workers, restart policies, etc. + +3. **Uvicorn**: + - Extremely fast, built on uvloop and httptools. + - Natively supports graceful shutdown: upon receiving a shutdown signal, it stops accepting new connections and waits for existing requests to complete before exiting. + +--- + +## Graceful Shutdown Procedure + +When a specific node needs to be taken offline, the steps are as follows: + +1. **Nginx Monitors Node Health Status**: + - Monitors the node's health status by periodically sending health check requests to it. + +2. **Removal from Load Balancing**: + - Modify the Nginx configuration to mark the target node as `down` and reload the Nginx configuration. + - Subsequently, all new requests will no longer be sent to the target node. + +3. **Gunicorn Server**: + - Monitors for stop signals. Upon receiving a stop signal (e.g., `SIGTERM`), it relays this signal to all Uvicorn child processes. + +4. **Sending the Stop Signal**: + - Send a `SIGTERM` signal to the Uvicorn process on the target node, triggering Uvicorn's graceful shutdown process. + +5. **Waiting for Request Processing**: + - Wait for a period slightly longer than `timeout_graceful_shutdown` before forcefully terminating the service, allowing the node sufficient time to complete processing all received requests. + +6. **Shutdown Completion**: + - The node has now processed all remaining requests and exited safely. diff --git a/docs/best_practices/images/graceful_shutdown.png b/docs/best_practices/images/graceful_shutdown.png new file mode 100644 index 00000000000..b08e56d473e Binary files /dev/null and b/docs/best_practices/images/graceful_shutdown.png differ diff --git a/docs/zh/best_practices/graceful_shutdown_service.md b/docs/zh/best_practices/graceful_shutdown_service.md new file mode 100644 index 00000000000..90f0133aa8c --- /dev/null +++ b/docs/zh/best_practices/graceful_shutdown_service.md @@ -0,0 +1,71 @@ +# 服务节点优雅关闭方案 + +## 1. 核心目标 +实现服务节点的优雅关闭,确保在停止服务时不丢失任何正在处理的用户请求,同时不影响整个集群的可用性。 + +## 2. 实现方案说明 +该方案通过结合 **Nginx 反向代理**、**Gunicorn 服务器**、**Uvicorn 服务器** 和 **FastAPI** 协作来实现目标。 + +![graceful_shutdown](images/graceful_shutdown.png) + +## 3. 组件介绍 + +### 1. Nginx:流量入口与负载均衡器 +- **功能**: + - 作为反向代理,接收所有外部客户端请求并按负载均衡策略分发到上游(Upstream)的 Gunicorn 工作节点。 + - 通过健康检查机制主动监控后端节点的健康状态。 + - 通过配置管理,能够瞬时地将问题节点从服务池中摘除,实现流量切换。 + +### 2. Gunicorn:WSGI HTTP 服务器(进程管理器) +- **功能**: + - 作为主进程(Master Process),负责管理多个 Uvicorn 工作子进程(Worker Process)。 + - 接收外部信号(如 `SIGTERM`),并协调所有子进程的优雅关闭流程。 + - 守护工作进程,在进程异常退出时自动重启,保证服务健壮性。 + +### 3. Uvicorn:ASGI 服务器(工作进程) +- **功能**: + - 作为 Gunicorn 管理的 Worker,实际负责处理 HTTP 请求。 + - 运行 FastAPI 应用实例,处理具体的业务逻辑。 + - 实现 ASGI 协议,支持异步请求处理,高性能。 + +--- + +## 优势 + +1. **Nginx**: + - 能够快速隔离故障节点,保证整体服务的可用性。 + - 通过 `nginx -s reload` 可不停机更新配置,对用户无感知。 + +2. **Gunicorn**(相比于 Uvicorn 原生的多 Worker): + - **成熟的进程管理**:内置了完善的进程生成、回收、管理逻辑,无需自己实现。 + - **进程守护能力**:Gunicorn Master 会在 Worker 异常退出后自动 fork 新 Worker,而 Uvicorn `--workers` 模式下任何进程崩溃都不会被重新拉起,需要外部守护进程。 + - **配置丰富**:提供大量参数用于调整超时、Worker 数量、重启策略等。 + +3. **Uvicorn**: + - 基于 uvloop 和 httptools,速度极快。 + - 原生支持优雅关闭:在收到关闭信号后,会停止接受新连接,并等待现有请求处理完成后再退出。 + +--- + +## 优雅关闭流程 + +当需要下线某个特定节点时,步骤如下: + +1. **Nginx 监控节点状态是否健康**: + - 通过向节点定时发送 health 请求,监控节点的健康状态。 + +2. **从负载均衡中摘除**: + - 修改 Nginx 配置,将该节点标记为 `down` 状态,并重载 Nginx 配置。 + - 此后,所有新请求将不再被发送到目标节点。 + +3. **Gunicorn 服务器**: + - 监控停止信号,收到停止信号(如 `SIGTERM` 信号)时,会把此信号向所有的 Uvicorn 子进程发送。 + +4. **发送停止信号**: + - 向目标节点的 Uvicorn 进程发送 `SIGTERM` 信号,触发 Uvicorn 的优雅关闭流程。 + +5. **等待请求处理**: + - 等待一段稍长于 `timeout_graceful_shutdown` 的时间后强制终止服务,让该节点有充足的时间完成所有已接收请求的处理。 + +6. **关闭完成**: + - 此时,该节点已经处理完所有存量请求并安全退出。 diff --git a/docs/zh/best_practices/images/graceful_shutdown.png b/docs/zh/best_practices/images/graceful_shutdown.png new file mode 100644 index 00000000000..b08e56d473e Binary files /dev/null and b/docs/zh/best_practices/images/graceful_shutdown.png differ diff --git a/fastdeploy/entrypoints/openai/api_server.py b/fastdeploy/entrypoints/openai/api_server.py index db78ec6c296..b65cae5fac1 100644 --- a/fastdeploy/entrypoints/openai/api_server.py +++ b/fastdeploy/entrypoints/openai/api_server.py @@ -77,9 +77,17 @@ help="max waiting time for connection, if set value -1 means no waiting time limit", ) parser.add_argument("--max-concurrency", default=512, type=int, help="max concurrency") + parser.add_argument( "--enable-mm-output", action="store_true", help="Enable 'multimodal_content' field in response output. " ) +parser.add_argument( + "--timeout-graceful-shutdown", + default=0, + type=int, + help="timeout for graceful shutdown in seconds (used by uvicorn)", +) + parser = EngineArgs.add_cli_args(parser) args = parser.parse_args() @@ -431,6 +439,7 @@ def launch_api_server() -> None: workers=args.workers, log_config=UVICORN_CONFIG, log_level="info", + timeout_graceful_shutdown=args.timeout_graceful_shutdown, ) # set log level to error to avoid log except Exception as e: api_server_logger.error(f"launch sync http server error, {e}, {str(traceback.format_exc())}")