-
Notifications
You must be signed in to change notification settings - Fork 107
Description
Describe the bug
On a RHEL 9.4 Intel Sapphire Rapids server with container image 'vllm-cpu-release-repo:v0.12.0' -or- 'vllm-cpu-release-repo:v0.11.2' running, starting the following GuideLLM Workload will cause vllm to shutdown during step 1 (sync):
pi-28# guidellm benchmark --target http://localhost:8000
--processor "$PWD/Models/Llama-3.2-1B-Instruct"
--rate-type sweep
--data "prompt_tokens=32,output_tokens=16"
GuideLLM hangs on first test (sync) & causes vLLM server to shutdown
(APIServer pid=1) INFO: 10.88.0.1:39996 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=1) INFO: Shutting down
Expected behavior
I don't believe GuideLLM workload should be able to shutdown a vllm server.
It would be acceptable, if the workload cannot be completed, that Guidellm timeout
Environment
Include all relevant environment information:
- OS [e.g. Ubuntu 20.04]: Red Hat Enterprise Linux 9.4 (Plow)
- kernel: 5.14.0-427.13.1.el9_4.x86_64
- Python version [e.g. 3.12.2]: Python 3.11.7
- guidellm version: 0.4.0
- podman version 4.9.4-rhel
To Reproduce
Exact steps to reproduce the behavior:
console1# podman run --name vllm-cpu --rm --privileged=true --shm-size=4g -p 8000:8000 -e VLLM_CPU_KVCACHE_SPACE=40 -v $PWD/Models:/model public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v0.12.0 --model "/model/Llama-3.2-1B-Instruct" --dtype=bfloat16
console2# guidellm benchmark --target http://localhost:8000
--processor "$PWD/Models/Llama-3.2-1B-Instruct"
--rate-type sweep
--data "prompt_tokens=32,output_tokens=16"
Errors
console1
GuideLLM hangs on first test (sync) & causes vLLM server to shutdown
(APIServer pid=1) INFO: 10.88.0.1:39996 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=1) INFO: Shutting down
console2 just hangs in GuideLLM output window, stuck on step 1 (sync)
Additional context
Add any other context about the problem here. Also include any relevant files.