eIQ GenAI Flow is a software pipeline for AI-powered experiences on edge devices. Currently, the Flow supports conversational AI on the NXP i.MX 95 and NXP i.MX 8MPLUS. .
The eIQ GenAI Flow integrates multiple AI technologies to create a seamless HMI experience. The conversational AI flow consists of the following stages:
- Wake-Word Detection: A VIT (Voice Intelligent Technology) Wake-Word triggers the ASR (Automatic Speech Recognition).
- Speech-to-Text (ASR): Converts spoken input into text.
- Retrieval-Augmented Generation (RAG): Enhances the Large Language Model (LLM) with relevant external knowledge.
- Text Generation (LLM): Generates a response based on the retrieved context.
- Text-to-Speech (TTS): Converts the response into speech output.
This demonstrator showcases some of eIQ GenAI Flow core capabilities. This English demo is designed to provide an overview of what the project can achieve and how it works. It's a subset of the full project: eIQ GenAI Flow Pro.
The complete version of the Flow offers more options for models, features, customization, RAG fine-tuning and better performance on audio tasks.
For more details, use the NXP Community Forum Generative AI & LLMs.
- Pre-requisites
- Installation
- Getting Started
- Software Components
- Using NPU Acceleration
- Hardware
- Examples
- FAQs
- Support
- Release Notes
This repository uses Git Large File Storage (LFS) to manage large files (e.g., models, datasets, binaries).
Before cloning this repository, ensure that Git LFS is installed and initialized on your machine.
Ubuntu / Debian:
sudo apt update
sudo apt install git-lfs
macOS (Homebrew):
brew install git-lfs
Windows: Download and install Git LFS from https://git-lfs.github.com/
git lfs install
git clone https://github.com/nxp-appcodehub/dm-eiq-genai-flow-demonstrator
cd dm-eiq-genai-flow-demonstrator
Git LFS will automatically download all tracked large files during or after the clone. If needed, you can run:
git lfs pull
to manually fetch any missing LFS files.
This demo requires a Linux BSP available at Embedded Linux for i.MX Applications Processors.
For i.MX8MP EVK, the NPU Acceleration is unavailable. LLM pipeline will run on CPUs for this platform. Therefore there is not a custom BSP and the default BSP is the only choice.
For i.MX95, although the demo can work on the regular NXP Q1 2025 BSP (L6.12.3-1.0.0), it works best on the BSP customized with the meta-eiq-genai-flow available in the package. This meta-layer updates:
- Linux kernel: matmul Neutron C-API
- Device tree: dedicated Continuous Memory Allocation (CMA) area for Neutron
- Onnxruntime: add Neutron Execution Provider
- Neutron assets: driver and firmware for matmul operations handling.
Note: The meta-layer is for the EVK i.MX95 A1 revision only on Q1 BSP (L6.12.3_1.0.0). It's not necessary for EVK with i.MX95 B0 revision or BSPs > Q1 2025.
Note: The package works on Q2 BSP (L6.12.20_2.0.0) on an EVK with i.MX95 B0 revision, Neutron acceleration is not supported. It will be available in a next package.
Note: The package works on Q2 BSP (L6.12.20_2.0.0) with Neutron acceleration for LLMs by replacing the /lib/firmware/NeutronFwllm.elf by the one present in the meta-layer on an EVK with i.MX95 A1 revision
See README in meta-eiq-genai-flow for build details.
This customization benefits are an important CPU load reduction plus a faster Time-To-First-Token (TTFT) on LLM operations. See LLM Benchmark section for details.
Once the BSP is flashed on the target, the eiq_genai_flow folder from this package must be copied on the linux home folder.
To set up the environment, run:
cd eiq_genai_flow
./install.sh
To run the demo, use the following command:
./eiq_genai_flow
Note: The binary file must always be executed from the
eiq_genai_flow
directory.Note: The trial period has a timeout of 2 hours.
Note: Cache is currently not enabled in i.MX 95 or i.MX8MP. Every time this application is executed, the warm up time is required (less than a minute).
Run
./eiq_genai_flow --help
to see available options.
The default mode is keyboard-to-speech, meaning the module VIT and ASR are disabled. To enable the speech-to-speech experience use the --input-mode vasr
argument.
The application supports various input/output options and model selections, which are detailed in the software components sections below.
VIT is NXP’s Voice UI technology that enables always-on Wake-Word detection using deep learning.
VIT is integrated with "HEY NXP" pre-defined Wake-Word.
✅ Enabling VIT
Use the -i vasr
argument to enable ASR after the Wake-Word detection.
Additional options include:
-c
(continuous mode): Allows continuous conversation without requiring the Wake-Word after each response.
ASR converts spoken language into text.
The demonstrator uses Whisper-small.en int8-quantized optimized for streaming with 244M parameters.
✅ Enabling ASR
Use the --input-mode
argument with one of the following values:
-i vasr
: Enables ASR after detecting the VIT Wake-Word.-i kasr
: Activates ASR via keyboard input (press "Enter" to start transcription).-i keyb
: Disables ASR, using keyboard input only.
To enable continuous ASR, pass the -c
flag. In this mode, ASR remains active until the user says "stop" or a timeout occurs due to inactivity.
📊 ASR Benchmarks
i.MX95:
Audio Duration | Transcription Time (after end of speech) |
---|---|
3s | 1.4s |
6s | 2.5s |
9s | 3.3s |
i.MX8MP:
Audio Duration | Transcription Time (after end of speech) |
---|---|
3s | 1.9s |
6s | 3.5s |
9s | 5.3s |
On LibriSpeech test-clean, in streaming, Word Error Rate (WER) = 4.1.
RAG enhances the LLM’s responses by grounding the input in factual information from a knowledge base. This significantly improves the relevancy of the response to the prompt and reduces LLM hallucinations overall.
The demonstrator uses all-MiniLM-L6-v2 int8-quantized embedding model with 22M parameters.
✅ Enabling RAG
Use the --use-rag
argument to activate RAG.
Note: Some words are censored by our RAG, meaning the system will not respond if they appear in the query. The censored word list can be found in the utils.py file.
The pre-generated RAG database is about medical healthcare for patients with diabetes, so questions related to this topic can be asked. This RAG database example was generated using the information in the Medical.pdf file (original file).
To create a RAG database, please follow the instructions of the Retrieval Database Generator.
The LLM is responsible for understanding input and generating relevant text-based responses. It predicts words based on the given input using advanced language modeling techniques.
The demonstrator uses Danube int8-quantized LLM with 500M parameters, derived from Llama LLM family.
✅ Enabling LLM
LLM is enabled by default and requires no additional parameters. Answers given by the LLM have a maximum number of words, if this number is reached, it will print "[...]".
📊 LLM Benchmarks
Expected performances of the Danube-INT8 model :
Platform | Accelerator | Time-To-First-Token (TTFT) | Tok/s | Command |
---|---|---|---|---|
i.MX8MP | CPU (4 threads) | 0.94s | 8.66 | ./eiq_genai_flow -b |
i.MX95 | CPU (6 threads) | 0.94s | 9.38 | ./eiq_genai_flow -b |
i.MX95 A1 | NPU (Neutron) | 0.59s | 9.72 | ./eiq_genai_flow -b --use-neutron |
Wikitext-2 perplexity of this model is 17.69 compared to the float reference at 14.76.
TTS converts the LLM-generated text responses into speech output.
The demonstrator uses a Vits int8-quantized model with 19.5M parameters.
✅ Enabling TTS
Use the --output-mode tts
argument to enable TTS, or --output-mode text
to disable it.
📊 TTS Benchmarks
Speech type | DNS-MOS |
---|---|
Reference (natural) | 4.39 |
Quantized Vits 16kHz | 4.23 ± 0.24 |
TTS Real-Time-Factor (RTF) is ~0.24 for the given model.
On custom BSPs, NPU acceleration can be used for LLM inference. Contact support for details on enabling this feature.
To enable NPU acceleration, pass the --use-neutron
flag when running the pipeline on supported BSPs.
To run the eIQ GenAI Flow
, an i.MX95 EVK (either 19x19 or 15x15) or i.MX8MP EVK is required. The demo's audio setup is based on the onboard WM8962 codec, which manages both input and output through a single 3.5mm jack connector CTIA.
To use the audio functionalities, the following setups are possible:
- 🎧 Headset Mode: Use a headset with an integrated microphone and a 4-pole CTIA connector.
- 🔊 Open Audio Setup: Use a 3.5mm jack audio splitter (4-pole CTIA) along with:
- 🎤 A standalone microphone (3-pole)
- 🔉 A loudspeaker
Setup example:
This ensures proper handling of both input and output audio during the demo.
When eIQ GenAI Flow Demonstrator starts running, the terminal should look like:
Default mode: ./eiq_genai_flow ![]() |
With RAG: ./eiq_genai_flow --use-rag ![]() |
---|---|
With NPU acceleration: ./eiq_genai_flow --use-neutron ![]() |
With Wake Word detection: ./eiq_genai_flow -i vasr ![]() |
How to change the RAG database?
RAG database can be created from textual files. Please check the retrieval-database-generator README.md.
How to run another LLM?
Danube-500M model is the only LLM enabled in this release version, but many other LLMs are supported in the Pro version.
How to change the ASR model?
In the Pro version, more ASR models are supported, including Whisper in different sizes with various languages.
How to change the TTS voice?
In the Pro version, a far broader and richer audio experience with hundreds of voices is proposed.
FileNotFoundError: [Errno 2] No such file or directory: 'asr' error
The demonstrator binary file must be executed from the eiq_gen_ai_flow directory.
For more general technical questions, use the NXP Community Forum Generative AI & LLMs.
Version | Description / Update | Date |
---|---|---|
1.1 | Initial release on Application Code Hub. This is solely for evaluation and development in combination with an NXP Product. |
June 20th 2025 |