Skip to content

[Bug]: GLM-4.6-AWQ model outputs garbled text on vllm/vllm-openai:v0.10.2-x86_64 #2089

@zzzyoyo

Description

@zzzyoyo

⚙️ Your current environment

vllm=0.10.2

transformers=4.56.1

torch=2.8.0+cu128

autoawq=0.2.9 (manually installed)

🐛 Describe the bug

Hello VLLM developers,

I am using your vllm/vllm-openai:v0.10.2-x86_64 Docker image, deployed on a Linux server with 6 H800 GPUs. The model I am trying to serve is: GLM-4.6-AWQ

After entering the Docker container, I ran the following command::

vllm serve \
    /data \
    --served-model-name glm46 \
    --enable-auto-tool-choice \
    --tool-call-parser glm45 \
    --reasoning-parser glm45 \
    --swap-space 16 \
    --max-num-seqs 32 \
    --max-model-len 8192 \
    --gpu-memory-utilization 0.9 \
    --tensor-parallel-size 4 \
    --trust-remote-code \
    --host 0.0.0.0 \
    --port 8000

The server starts successfully:

Image

(Apologies for the photo format, as our computers are offline.)

However, the output text is garbled:

Image

I also tried loading the model in code using:

model = LLM('/data', tensor_parallel_size=4)

but the output is still garbled:

Image

Next, I tried loading the model without VLLM using Transformers and AutoAWQ:

model = AutoAWQForCausalLM.from_quantized(
    model_path,
    fuse_layers=True,                
    trust_remote_code=True,
    safetensors=True,               
    device_map="auto",
)

but it fails with: glm4_moe awq quantization isn't supported yet.

I also tried using AutoModelForCausalLM.from_pretrained, which outputs:

Image

For reference, my environment versions are:

transformers=4.56.1

vllm=0.10.2

torch=2.8.0+cu128

These are all from the Docker image, with only autoawq=0.2.9 installed manually.

Btw,I have also tried using --chat-template, adding --quantization parameters, etc., but nothing works. I have confirmed that the model files are not corrupted.

Could you please advise on how to correctly serve this AWQ model using VLLM or Transformers?

Thank you very much!

🛠️ Steps to reproduce

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    vllmUsing vLLM

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions