-
Notifications
You must be signed in to change notification settings - Fork 186
[Bug]: QNN AOT Qwen2系列模型输出乱码 #665
Copy link
Copy link
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Prerequisites
- I have searched the existing issues and confirmed this is not a duplicate.
- I am using the latest version of the MLLM framework.
Bug Description
使用AOT官方文档编译产出example/qwen2_qnn_aot后在手机端执行。通过微小修改aot_run.cpp后decode产生乱码
设备:
SM8850芯片(V81)
一加15
模型:
qwen2.5 1.5B
请问是aot_run本身的bug还是我在复现的时候有误操作造成的?是否有推荐的排查路径?
Steps to Reproduce
aot_run修改:
#include <iostream>
#include <fmt/core.h>
#include <mllm/mllm.hpp>
#include <string>
#include "mllm/backends/qnn/aot_rt/QnnAOTRuntime.hpp"
#include "mllm/models/qwen3/configuration_qwen3.hpp"
#include "mllm/models/qwen3/tokenization_qwen3.hpp"
using mllm::Argparse;
using namespace mllm::qnn::aot; // NOLINT
MLLM_MAIN({
auto& help = Argparse::add<bool>("-h|--help").help("Show help message");
auto& model_path = Argparse::add<std::string>("-m|--model").help("Model path").def("qwen2_qnn.mllm");
auto& tokenizer_path = Argparse::add<std::string>("-t|--tokenizer").help("Tokenizer path").def("tokenizer.json");
auto& config_path = Argparse::add<std::string>("-c|--config").help("Config path").required(true);
auto& prompt_text = Argparse::add<std::string>("-p|--prompt").help("Prompt text").def("hello");
auto& ar_len = Argparse::add<int>("--ar_len").help("Autoregressive length (chunk size)").def(128);
auto& gen_len = Argparse::add<int>("--gen_len").help("Generate token length").def(32);
Argparse::parse(argc, argv);
if (help.isSet()) {
Argparse::printHelp();
return 0;
}
mllm::initQnnBackend(model_path.get());
auto qwen2_cfg = mllm::models::qwen3::Qwen3Config(config_path.get());
RunnerConfig config;
config.num_layers = qwen2_cfg.num_hidden_layers;
config.num_heads = qwen2_cfg.num_key_value_heads;
config.head_dim = qwen2_cfg.head_dim;
config.vocab_size = qwen2_cfg.vocab_size;
config.context_len = 1024;
config.ar_len = ar_len.get();
auto tokenizer = mllm::models::qwen3::Qwen3Tokenizer(tokenizer_path.get());
// Qwen2.5 chat models expect ChatML-style prompts. Avoid injecting <think> tags here:
// the tokenizer used with the current qwen2.5 assets does not have them in vocab and
// would map them to token id 0, which corrupts the prompt and leads to garbled output.
const std::string prompt =
"<|im_start|>user\n" + prompt_text.get() + "<|im_end|>\n<|im_start|>assistant\n";
auto input_tensor = mllm::models::ARGenerationOutputPast{
{"sequence", tokenizer.convert2Ids(tokenizer.tokenize(prompt))},
};
// DBG:
mllm::print(input_tensor["sequence"].shape());
mllm::print(input_tensor["sequence"]);
Runner runner(config, &tokenizer);
if (!runner.load()) {
std::cerr << "Failed to load model\n";
return 1;
}
runner.generate(
input_tensor["sequence"], gen_len.get(), [](const std::string& token) { std::cout << token << std::flush; }, true);
std::cout << "\n";
return 0;
});
Expected Behavior
正常输出
Operating System
Android
Device
一加15
MLLM Framework Version
latest
Model Information
No response
Additional Context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working