Skip to content

[Bug]: QNN AOT Qwen2系列模型输出乱码 #665

@TRM-coding

Description

@TRM-coding

Prerequisites

  • I have searched the existing issues and confirmed this is not a duplicate.
  • I am using the latest version of the MLLM framework.

Bug Description

使用AOT官方文档编译产出example/qwen2_qnn_aot后在手机端执行。通过微小修改aot_run.cpp后decode产生乱码

设备:
SM8850芯片(V81)
一加15
模型:
qwen2.5 1.5B

请问是aot_run本身的bug还是我在复现的时候有误操作造成的?是否有推荐的排查路径?

Steps to Reproduce

aot_run修改:

#include <iostream>
#include <fmt/core.h>
#include <mllm/mllm.hpp>
#include <string>
#include "mllm/backends/qnn/aot_rt/QnnAOTRuntime.hpp"
#include "mllm/models/qwen3/configuration_qwen3.hpp"
#include "mllm/models/qwen3/tokenization_qwen3.hpp"

using mllm::Argparse;
using namespace mllm::qnn::aot;  // NOLINT

MLLM_MAIN({
  auto& help = Argparse::add<bool>("-h|--help").help("Show help message");
  auto& model_path = Argparse::add<std::string>("-m|--model").help("Model path").def("qwen2_qnn.mllm");
  auto& tokenizer_path = Argparse::add<std::string>("-t|--tokenizer").help("Tokenizer path").def("tokenizer.json");
  auto& config_path = Argparse::add<std::string>("-c|--config").help("Config path").required(true);
  auto& prompt_text = Argparse::add<std::string>("-p|--prompt").help("Prompt text").def("hello");
  auto& ar_len = Argparse::add<int>("--ar_len").help("Autoregressive length (chunk size)").def(128);
  auto& gen_len = Argparse::add<int>("--gen_len").help("Generate token length").def(32);

  Argparse::parse(argc, argv);

  if (help.isSet()) {
    Argparse::printHelp();
    return 0;
  }

  mllm::initQnnBackend(model_path.get());

  auto qwen2_cfg = mllm::models::qwen3::Qwen3Config(config_path.get());

  RunnerConfig config;
  config.num_layers = qwen2_cfg.num_hidden_layers;
  config.num_heads = qwen2_cfg.num_key_value_heads;
  config.head_dim = qwen2_cfg.head_dim;
  config.vocab_size = qwen2_cfg.vocab_size;
  config.context_len = 1024;
  config.ar_len = ar_len.get();

  auto tokenizer = mllm::models::qwen3::Qwen3Tokenizer(tokenizer_path.get());

  // Qwen2.5 chat models expect ChatML-style prompts. Avoid injecting <think> tags here:
  // the tokenizer used with the current qwen2.5 assets does not have them in vocab and
  // would map them to token id 0, which corrupts the prompt and leads to garbled output.
  const std::string prompt =
      "<|im_start|>user\n" + prompt_text.get() + "<|im_end|>\n<|im_start|>assistant\n";
  auto input_tensor = mllm::models::ARGenerationOutputPast{
      {"sequence", tokenizer.convert2Ids(tokenizer.tokenize(prompt))},
  };

  // DBG:
  mllm::print(input_tensor["sequence"].shape());
  mllm::print(input_tensor["sequence"]);

  Runner runner(config, &tokenizer);
  if (!runner.load()) {
    std::cerr << "Failed to load model\n";
    return 1;
  }

  runner.generate(
      input_tensor["sequence"], gen_len.get(), [](const std::string& token) { std::cout << token << std::flush; }, true);
  std::cout << "\n";

  return 0;
});

Expected Behavior

正常输出

Operating System

Android

Device

一加15

MLLM Framework Version

latest

Model Information

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions