trl vllm server generating stuck

### Reproduction

I'm training Qwen3-1.7B with GRPO:
```shell
CUDA_VISIBLE_DEVICES=0 trl vllm-serve --host 127.0.0.1 --port 8014 --max_model_len 512 --model "Qwen/Qwen3-1.7B"
python train_grpo.py
```
```python
os.environ["CUDA_VISIBLE_DEVICES"] = '2'
training_args = GRPOConfig(output_dir="Qwen3-1.7B-GRPO",
                            logging_steps=20,    
                            eval_strategy='steps',
                            save_strategy="steps",
                            save_steps=500,
                            num_train_epochs=2000,
                            
                            max_completion_length=512,
                        #    use_vllm=True,
                        #    bf16=True,
                           report_to=("wandb" if use_wandb else None),  
                            run_name=("sucai_1:1_all" if use_wandb else None),
                            per_device_train_batch_size=4,
                            per_device_eval_batch_size=4,
                            num_generations=4,
                            bf16=True,
                            gradient_accumulation_steps=4,
                            beta=0.004,

                            use_vllm=True,
                            vllm_mode="server",
                            # vllm_mode="colocate"
                            vllm_server_host='127.0.0.1',
                            vllm_server_port=8014,
                            # vllm_gpu_memory_utilization=0.3,
                            # vllm_guided_decoding_regex=r'<think>(.*?)</think><answer>(.*?)</answer>'
                           )

training_args.model_name=model_name
training_args.train_data_num=len(ds_train)
training_args.test_num=len(ds_test)
trainer = GRPOTrainer(
    model=model,
    reward_funcs=reward_len,
    args=training_args,
    train_dataset=ds_train,
    eval_dataset=ds_test,
)
trainer.train()
```
After init vllm client:
```python
self.vllm_client = VLLMClient(
                    args.vllm_server_host, args.vllm_server_port, connection_timeout=args.vllm_server_timeout
                )
self.vllm_client.init_communicator()
```
It got stuck when it started generating post and didn't report an error both on the client or on the server:
```python
url = f"http://{self.host}:{self.server_port}/generate/"
prompts=['xx','xx','xx','xx']
n=4
temperature=0.9
repetition_penalty=1.0
top_p=1.0
top_k=50
min_p=0
max_tokens=512
guided_decoding_regex=None
response = self.session.post(
    url,
    json={
        "prompts": prompts,
        "n": n,
        "repetition_penalty": repetition_penalty,
        "temperature": temperature,
        "top_p": top_p,
        "top_k": top_k,
        "min_p": min_p,
        "max_tokens": max_tokens,
        "guided_decoding_regex": guided_decoding_regex,
    },
)
```

no error is reported:
server:
```
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:35524 - "POST /update_named_param/ HTTP/1.1" 200 OK
INFO 05-19 16:17:14 [block_pool.py:264] Successfully reset prefix cache
INFO:     127.0.0.1:35524 - "POST /reset_prefix_cache/ HTTP/1.1" 200 OK
<<stuck here>>
```
client:
```
 0%|                | 0/6152000 [00:00<?, ?it/s]
```
I've tried some other port. However nothing changed.

### System Info

- Platform: Linux-5.4.0-169-generic-x86_64-with-glibc2.31
- Python version: 3.10.16
- TRL version: 0.18.0.dev0+4da4dc9
- PyTorch version: 2.6.0
- CUDA device(s): NVIDIA H20, NVIDIA H20, NVIDIA H20, NVIDIA H20, NVIDIA H20, NVIDIA H20, NVIDIA H20, NVIDIA H20
- Transformers version: 4.51.3
- Accelerate version: 1.6.0
- Accelerate config: not found
- Datasets version: 3.2.0
- HF Hub version: 0.31.1
- bitsandbytes version: 0.45.5
- DeepSpeed version: 0.16.7
- Diffusers version: not installed
- Liger-Kernel version: not installed
- LLM-Blender version: not installed
- OpenAI version: 1.78.1
- PEFT version: 0.15.2
- vLLM version: 0.8.5.post1

### Checklist

- [x] I have checked that my issue isn't already filed (see [open issues](https://github.com/huggingface/trl/issues?q=is%3Aissue))
- [x] I have included my system information
- [x] Any code provided is minimal, complete, and reproducible ([more on MREs](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any code provided is properly formatted in code blocks, (no screenshot, [more on code blocks](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any traceback provided is complete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

trl vllm server generating stuck #3467

Reproduction

System Info

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

trl vllm server generating stuck #3467

Description

Reproduction

System Info

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions