Llama-2-7B-Chat failing on Qualcomm NPU

**Is your feature request related to a problem? Please describe.**
I’m evaluating end-to-end device performance when different model modalities (text-only, vision-language, speech, etc.) run on-device. I need a pure text LLM baseline to compare against other modalities under the same benchmarking suite. There’s currently no turnkey, NPU-optimized package for meta-llama/Llama-2-7b-chat on Qualcomm AI Hub, which blocks performance benchmarking (latency, throughput, memory, power) for text-based workloads.

**Details of model being requested**
Model name: meta-llama/Llama-2-7b-chat

Source repo link: https://huggingface.co/meta-llama/Llama-2-7b-chat

Research paper link: https://arxiv.org/abs/2307.09288

Model use case:

Baseline text-only reasoning for on-device performance benchmarking.

Natural language understanding, summarization, conversational AI, closed-book Q&A, chain-of-thought reasoning.

Will serve as the text modality reference point in my cross-modality device evaluation project.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Llama-2-7B-Chat failing on Qualcomm NPU #227

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Llama-2-7B-Chat failing on Qualcomm NPU #227

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions