Skip to content

Llama-2-7B-Chat failing on Qualcomm NPU #227

@VaibMittal7

Description

@VaibMittal7

Is your feature request related to a problem? Please describe.
I’m evaluating end-to-end device performance when different model modalities (text-only, vision-language, speech, etc.) run on-device. I need a pure text LLM baseline to compare against other modalities under the same benchmarking suite. There’s currently no turnkey, NPU-optimized package for meta-llama/Llama-2-7b-chat on Qualcomm AI Hub, which blocks performance benchmarking (latency, throughput, memory, power) for text-based workloads.

Details of model being requested
Model name: meta-llama/Llama-2-7b-chat

Source repo link: https://huggingface.co/meta-llama/Llama-2-7b-chat

Research paper link: https://arxiv.org/abs/2307.09288

Model use case:

Baseline text-only reasoning for on-device performance benchmarking.

Natural language understanding, summarization, conversational AI, closed-book Q&A, chain-of-thought reasoning.

Will serve as the text modality reference point in my cross-modality device evaluation project.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions