diff --git a/docs/best_practices/graceful_shutdown_service.md b/docs/best_practices/graceful_shutdown_service.md
new file mode 100644
index 00000000000..ea1b13e12ee
--- /dev/null
+++ b/docs/best_practices/graceful_shutdown_service.md
@@ -0,0 +1,71 @@
+# Graceful Service Node Shutdown Solution
+
+## 1. Core Objective
+Achieve graceful shutdown of service nodes, ensuring no in-flight user requests are lost during service termination while maintaining overall cluster availability.
+
+## 2. Solution Overview
+This solution combines **Nginx reverse proxy**, **Gunicorn server**, **Uvicorn server**, and **FastAPI** working in collaboration to achieve the objective.
+
+![graceful_shutdown](images/graceful_shutdown.png)
+
+## 3. Component Introduction
+
+### 1. Nginx: Traffic Entry Point and Load Balancer
+- **Functions**:
+  - Acts as a reverse proxy, receiving all external client requests and distributing them to upstream Gunicorn worker nodes according to load balancing policies.
+  - Actively monitors backend node health status through health check mechanisms.
+  - Enables instantaneous removal of problematic nodes from the service pool through configuration management, achieving traffic switching.
+
+### 2. Gunicorn: WSGI HTTP Server (Process Manager)
+- **Functions**:
+  - Serves as the master process, managing multiple Uvicorn worker child processes.
+  - Receives external signals (e.g., `SIGTERM`) and coordinates the graceful shutdown process for all child processes.
+  - Daemonizes worker processes and automatically restarts them upon abnormal termination, ensuring service robustness.
+
+### 3. Uvicorn: ASGI Server (Worker Process)
+- **Functions**:
+  - Functions as a Gunicorn-managed worker, actually handling HTTP requests.
+  - Runs the FastAPI application instance, processing specific business logic.
+  - Implements the ASGI protocol, supporting asynchronous request processing for high performance.
+
+---
+
+## Advantages
+
+1. **Nginx**:
+   - Can quickly isolate faulty nodes, ensuring overall service availability.
+   - Allows configuration updates without downtime using `nginx -s reload`, making it transparent to users.
+
+2. **Gunicorn** (Compared to Uvicorn's native multi-worker mode):
+   - **Mature Process Management**: Built-in comprehensive process spawning, recycling, and management logic, eliminating the need for custom implementation.
+   - **Process Daemon Capability**: The Gunicorn Master automatically forks new Workers if they crash, whereas in Uvicorn's `--workers` mode, any crashed process is not restarted and requires an external daemon.
+   - **Rich Configuration**: Offers numerous parameters for adjusting timeouts, number of workers, restart policies, etc.
+
+3. **Uvicorn**:
+   - Extremely fast, built on uvloop and httptools.
+   - Natively supports graceful shutdown: upon receiving a shutdown signal, it stops accepting new connections and waits for existing requests to complete before exiting.
+
+---
+
+## Graceful Shutdown Procedure
+
+When a specific node needs to be taken offline, the steps are as follows:
+
+1. **Nginx Monitors Node Health Status**:
+   - Monitors the node's health status by periodically sending health check requests to it.
+
+2. **Removal from Load Balancing**:
+   - Modify the Nginx configuration to mark the target node as `down` and reload the Nginx configuration.
+   - Subsequently, all new requests will no longer be sent to the target node.
+
+3. **Gunicorn Server**:
+   - Monitors for stop signals. Upon receiving a stop signal (e.g., `SIGTERM`), it relays this signal to all Uvicorn child processes.
+
+4. **Sending the Stop Signal**:
+   - Send a `SIGTERM` signal to the Uvicorn process on the target node, triggering Uvicorn's graceful shutdown process.
+
+5. **Waiting for Request Processing**:
+   - Wait for a period slightly longer than `timeout_graceful_shutdown` before forcefully terminating the service, allowing the node sufficient time to complete processing all received requests.
+
+6. **Shutdown Completion**:
+   - The node has now processed all remaining requests and exited safely.
diff --git a/docs/best_practices/images/graceful_shutdown.png b/docs/best_practices/images/graceful_shutdown.png
new file mode 100644
index 00000000000..b08e56d473e
Binary files /dev/null and b/docs/best_practices/images/graceful_shutdown.png differ
diff --git a/docs/zh/best_practices/graceful_shutdown_service.md b/docs/zh/best_practices/graceful_shutdown_service.md
new file mode 100644
index 00000000000..90f0133aa8c
--- /dev/null
+++ b/docs/zh/best_practices/graceful_shutdown_service.md
@@ -0,0 +1,71 @@
+# 服务节点优雅关闭方案
+
+## 1. 核心目标
+实现服务节点的优雅关闭，确保在停止服务时不丢失任何正在处理的用户请求，同时不影响整个集群的可用性。
+
+## 2. 实现方案说明
+该方案通过结合 **Nginx 反向代理**、**Gunicorn 服务器**、**Uvicorn 服务器** 和 **FastAPI** 协作来实现目标。
+
+![graceful_shutdown](images/graceful_shutdown.png)
+
+## 3. 组件介绍
+
+### 1. Nginx：流量入口与负载均衡器
+- **功能**：
+  - 作为反向代理，接收所有外部客户端请求并按负载均衡策略分发到上游（Upstream）的 Gunicorn 工作节点。
+  - 通过健康检查机制主动监控后端节点的健康状态。
+  - 通过配置管理，能够瞬时地将问题节点从服务池中摘除，实现流量切换。
+
+### 2. Gunicorn：WSGI HTTP 服务器（进程管理器）
+- **功能**：
+  - 作为主进程（Master Process），负责管理多个 Uvicorn 工作子进程（Worker Process）。
+  - 接收外部信号（如 `SIGTERM`），并协调所有子进程的优雅关闭流程。
+  - 守护工作进程，在进程异常退出时自动重启，保证服务健壮性。
+
+### 3. Uvicorn：ASGI 服务器（工作进程）
+- **功能**：
+  - 作为 Gunicorn 管理的 Worker，实际负责处理 HTTP 请求。
+  - 运行 FastAPI 应用实例，处理具体的业务逻辑。
+  - 实现 ASGI 协议，支持异步请求处理，高性能。
+
+---
+
+## 优势
+
+1. **Nginx**：
+   - 能够快速隔离故障节点，保证整体服务的可用性。
+   - 通过 `nginx -s reload` 可不停机更新配置，对用户无感知。
+
+2. **Gunicorn**（相比于 Uvicorn 原生的多 Worker）：
+   - **成熟的进程管理**：内置了完善的进程生成、回收、管理逻辑，无需自己实现。
+   - **进程守护能力**：Gunicorn Master 会在 Worker 异常退出后自动 fork 新 Worker，而 Uvicorn `--workers` 模式下任何进程崩溃都不会被重新拉起，需要外部守护进程。
+   - **配置丰富**：提供大量参数用于调整超时、Worker 数量、重启策略等。
+
+3. **Uvicorn**：
+   - 基于 uvloop 和 httptools，速度极快。
+   - 原生支持优雅关闭：在收到关闭信号后，会停止接受新连接，并等待现有请求处理完成后再退出。
+
+---
+
+## 优雅关闭流程
+
+当需要下线某个特定节点时，步骤如下：
+
+1. **Nginx 监控节点状态是否健康**：
+   - 通过向节点定时发送 health 请求，监控节点的健康状态。
+
+2. **从负载均衡中摘除**：
+   - 修改 Nginx 配置，将该节点标记为 `down` 状态，并重载 Nginx 配置。
+   - 此后，所有新请求将不再被发送到目标节点。
+
+3. **Gunicorn 服务器**：
+   - 监控停止信号，收到停止信号（如 `SIGTERM` 信号）时，会把此信号向所有的 Uvicorn 子进程发送。
+
+4. **发送停止信号**：
+   - 向目标节点的 Uvicorn 进程发送 `SIGTERM` 信号，触发 Uvicorn 的优雅关闭流程。
+
+5. **等待请求处理**：
+   - 等待一段稍长于 `timeout_graceful_shutdown` 的时间后强制终止服务，让该节点有充足的时间完成所有已接收请求的处理。
+
+6. **关闭完成**：
+   - 此时，该节点已经处理完所有存量请求并安全退出。
diff --git a/docs/zh/best_practices/images/graceful_shutdown.png b/docs/zh/best_practices/images/graceful_shutdown.png
new file mode 100644
index 00000000000..b08e56d473e
Binary files /dev/null and b/docs/zh/best_practices/images/graceful_shutdown.png differ
diff --git a/fastdeploy/entrypoints/openai/api_server.py b/fastdeploy/entrypoints/openai/api_server.py
index db78ec6c296..b65cae5fac1 100644
--- a/fastdeploy/entrypoints/openai/api_server.py
+++ b/fastdeploy/entrypoints/openai/api_server.py
@@ -77,9 +77,17 @@
     help="max waiting time for connection, if set value -1 means no waiting time limit",
 )
 parser.add_argument("--max-concurrency", default=512, type=int, help="max concurrency")
+
 parser.add_argument(
     "--enable-mm-output", action="store_true", help="Enable 'multimodal_content' field in response output. "
 )
+parser.add_argument(
+    "--timeout-graceful-shutdown",
+    default=0,
+    type=int,
+    help="timeout for graceful shutdown in seconds (used by uvicorn)",
+)
+
 parser = EngineArgs.add_cli_args(parser)
 args = parser.parse_args()
 
@@ -431,6 +439,7 @@ def launch_api_server() -> None:
             workers=args.workers,
             log_config=UVICORN_CONFIG,
             log_level="info",
+            timeout_graceful_shutdown=args.timeout_graceful_shutdown,
         )  # set log level to error to avoid log
     except Exception as e:
         api_server_logger.error(f"launch sync http server error, {e}, {str(traceback.format_exc())}")