-
Notifications
You must be signed in to change notification settings - Fork 145
Description
Is your feature request related to a problem? Please describe.
I’m evaluating end-to-end device performance when different model modalities (text-only, vision-language, speech, etc.) run on-device. I need a pure text LLM baseline to compare against other modalities under the same benchmarking suite. There’s currently no turnkey, NPU-optimized package for meta-llama/Llama-2-7b-chat on Qualcomm AI Hub, which blocks performance benchmarking (latency, throughput, memory, power) for text-based workloads.
Details of model being requested
Model name: meta-llama/Llama-2-7b-chat
Source repo link: https://huggingface.co/meta-llama/Llama-2-7b-chat
Research paper link: https://arxiv.org/abs/2307.09288
Model use case:
Baseline text-only reasoning for on-device performance benchmarking.
Natural language understanding, summarization, conversational AI, closed-book Q&A, chain-of-thought reasoning.
Will serve as the text modality reference point in my cross-modality device evaluation project.