[Bug]: GLM-4.6-AWQ model outputs garbled text on vllm/vllm-openai:v0.10.2-x86_64

### ⚙️ Your current environment

vllm=0.10.2

transformers=4.56.1

torch=2.8.0+cu128

autoawq=0.2.9 (manually installed)

### 🐛 Describe the bug

Hello VLLM developers,

I am using your vllm/vllm-openai:v0.10.2-x86_64 Docker image, deployed on a Linux server with 6 H800 GPUs. The model I am trying to serve is: [GLM-4.6-AWQ](https://modelscope.cn/models/tclf90/GLM-4.6-AWQ)

After entering the Docker container, I ran the following command:：
```
vllm serve \
    /data \
    --served-model-name glm46 \
    --enable-auto-tool-choice \
    --tool-call-parser glm45 \
    --reasoning-parser glm45 \
    --swap-space 16 \
    --max-num-seqs 32 \
    --max-model-len 8192 \
    --gpu-memory-utilization 0.9 \
    --tensor-parallel-size 4 \
    --trust-remote-code \
    --host 0.0.0.0 \
    --port 8000
```
The server starts successfully:

<img width="1499" height="654" alt="Image" src="https://github.com/user-attachments/assets/0c73efd2-bc3e-484f-b510-0e880aba5bb7" />

(Apologies for the photo format, as our computers are offline.)

However, the output text is garbled:

<img width="1624" height="217" alt="Image" src="https://github.com/user-attachments/assets/df9f172d-2ebe-4923-850f-1113866413cc" />

I also tried loading the model in code using:

```
model = LLM('/data', tensor_parallel_size=4)
```

but the output is still garbled:

<img width="1580" height="260" alt="Image" src="https://github.com/user-attachments/assets/03f6b2ac-b5dc-49b0-a766-96dc5774b20b" />

Next, I tried loading the model without VLLM using Transformers and AutoAWQ:

```
model = AutoAWQForCausalLM.from_quantized(
    model_path,
    fuse_layers=True,                
    trust_remote_code=True,
    safetensors=True,               
    device_map="auto",
)

```
but it fails with: glm4_moe awq quantization isn't supported yet.

I also tried using AutoModelForCausalLM.from_pretrained, which outputs:

<img width="1660" height="158" alt="Image" src="https://github.com/user-attachments/assets/cabb2455-1104-4fd8-ba84-81ff1ae5a6d5" />

For reference, my environment versions are:

transformers=4.56.1

vllm=0.10.2

torch=2.8.0+cu128

These are all from the Docker image, with only autoawq=0.2.9 installed manually.

Btw,I have also tried using --chat-template, adding --quantization parameters, etc., but nothing works. I have confirmed that the model files are not corrupted.

Could you please advise on how to correctly serve this AWQ model using VLLM or Transformers?

Thank you very much!

### 🛠️ Steps to reproduce

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: GLM-4.6-AWQ model outputs garbled text on vllm/vllm-openai:v0.10.2-x86_64 #2089

⚙️ Your current environment

🐛 Describe the bug

🛠️ Steps to reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: GLM-4.6-AWQ model outputs garbled text on vllm/vllm-openai:v0.10.2-x86_64 #2089

Description

⚙️ Your current environment

🐛 Describe the bug

🛠️ Steps to reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions