[Bug]: QNN AOT Qwen2系列模型输出乱码

### Prerequisites

- [x] I have searched the existing issues and confirmed this is not a duplicate.
- [x] I am using the latest version of the MLLM framework.

### Bug Description

使用AOT官方文档编译产出example/qwen2_qnn_aot后在手机端执行。通过微小修改aot_run.cpp后decode产生乱码

设备：
SM8850芯片（V81）
一加15
模型：
qwen2.5 1.5B

请问是aot_run本身的bug还是我在复现的时候有误操作造成的？是否有推荐的排查路径？

### Steps to Reproduce

aot_run修改：
```cpp
#include <iostream>
#include <fmt/core.h>
#include <mllm/mllm.hpp>
#include <string>
#include "mllm/backends/qnn/aot_rt/QnnAOTRuntime.hpp"
#include "mllm/models/qwen3/configuration_qwen3.hpp"
#include "mllm/models/qwen3/tokenization_qwen3.hpp"

using mllm::Argparse;
using namespace mllm::qnn::aot;  // NOLINT

MLLM_MAIN({
  auto& help = Argparse::add<bool>("-h|--help").help("Show help message");
  auto& model_path = Argparse::add<std::string>("-m|--model").help("Model path").def("qwen2_qnn.mllm");
  auto& tokenizer_path = Argparse::add<std::string>("-t|--tokenizer").help("Tokenizer path").def("tokenizer.json");
  auto& config_path = Argparse::add<std::string>("-c|--config").help("Config path").required(true);
  auto& prompt_text = Argparse::add<std::string>("-p|--prompt").help("Prompt text").def("hello");
  auto& ar_len = Argparse::add<int>("--ar_len").help("Autoregressive length (chunk size)").def(128);
  auto& gen_len = Argparse::add<int>("--gen_len").help("Generate token length").def(32);

  Argparse::parse(argc, argv);

  if (help.isSet()) {
    Argparse::printHelp();
    return 0;
  }

  mllm::initQnnBackend(model_path.get());

  auto qwen2_cfg = mllm::models::qwen3::Qwen3Config(config_path.get());

  RunnerConfig config;
  config.num_layers = qwen2_cfg.num_hidden_layers;
  config.num_heads = qwen2_cfg.num_key_value_heads;
  config.head_dim = qwen2_cfg.head_dim;
  config.vocab_size = qwen2_cfg.vocab_size;
  config.context_len = 1024;
  config.ar_len = ar_len.get();

  auto tokenizer = mllm::models::qwen3::Qwen3Tokenizer(tokenizer_path.get());

  // Qwen2.5 chat models expect ChatML-style prompts. Avoid injecting <think> tags here:
  // the tokenizer used with the current qwen2.5 assets does not have them in vocab and
  // would map them to token id 0, which corrupts the prompt and leads to garbled output.
  const std::string prompt =
      "<|im_start|>user\n" + prompt_text.get() + "<|im_end|>\n<|im_start|>assistant\n";
  auto input_tensor = mllm::models::ARGenerationOutputPast{
      {"sequence", tokenizer.convert2Ids(tokenizer.tokenize(prompt))},
  };

  // DBG:
  mllm::print(input_tensor["sequence"].shape());
  mllm::print(input_tensor["sequence"]);

  Runner runner(config, &tokenizer);
  if (!runner.load()) {
    std::cerr << "Failed to load model\n";
    return 1;
  }

  runner.generate(
      input_tensor["sequence"], gen_len.get(), [](const std::string& token) { std::cout << token << std::flush; }, true);
  std::cout << "\n";

  return 0;
});

```

### Expected Behavior

正常输出

### Operating System

Android

### Device

一加15

### MLLM Framework Version

latest

### Model Information

_No response_

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: QNN AOT Qwen2系列模型输出乱码 #665

Prerequisites

Bug Description

Steps to Reproduce

Expected Behavior

Operating System

Device

MLLM Framework Version

Model Information

Additional Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug]: QNN AOT Qwen2系列模型输出乱码 #665

Description

Prerequisites

Bug Description

Steps to Reproduce

Expected Behavior

Operating System

Device

MLLM Framework Version

Model Information

Additional Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions