- Voice-to-Voice AI Assistant
- Hardware Requirements
- Project Structure
- Workflow Overview
- Step-by-Step Setup
- Environment Configuration
- Manual Testing
- Detailed Description of bin/ Scripts
- Run the Assistant
- Screenshots & Block Diagrams
- Last Updated
Voice Assistant for Raspberry Pi, integrating:
- ποΈ Whisper.cpp β Speech-to-Text (STT)
- π§ TinyLLaMA (via llama.cpp) β Language Model for text generation
- π Piper TTS β Text-to-Speech (TTS)
Built as a Bachelor's Major Project, it delivers real-time, low-latency voice interactions on constrained hardware.
- Raspberry Pi 4 (4GB or 8GB)
- USB Microphone (or onboard mic)
- Speakers (3.5mm jack or HDMI)
- MicroSD card (32GB+ recommended)
pill/
βββ audio/ # Temporary audio files
β βββ speech.wav
βββ bin/ # Executable scripts
β βββ run.sh # Main voice-to-voice pipeline
β βββ speak # TTS wrapper
β βββ tokens # LLaMA wrapper
βββ stt/
β βββ bin/ # Whisper binary
β βββ models/ # Whisper model (e.g., ggml-tiny.bin)
βββ llm/
β βββ bin/ # llama.cpp binary and shared libs
β βββ models/ # GGUF LLaMA models
βββ tts/
β βββ piper/ # Piper binary
β βββ voice/ # ONNX voice models
βββ requirements.txt
βββ README.md
This pipeline will:
- Record audio from mic
- Transcribe with Whisper
- Generate reply with LLaMA
- Speak with Piper
- Audio is recorded with a microphone and saved to
audio/speech.wav.
- Audio is transcribed using Whisper.cpp with a small model like
ggml-tiny.bin.
- Transcribed text is passed to TinyLLaMA via
llama.cpp. - A quantized model (Q4/Q5) ensures fast inference on Raspberry Pi.
- Piper TTS synthesizes the response using a pre-downloaded ONNX voice model.
git clone https://github.com/rsvn/pill.git
cd pillpip install -r requirements.txtUsing venv:
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`./setup.sh
- Whisper.cpp: Build
mainaswhisper-cli - llama.cpp: Build
mainasllama-cli - Piper: Build binary and required shared libs
Place them in:
stt/bin/whisper-cli
llm/bin/llama-cli
tts/piper/piper
wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.bin -P stt/models/Ensure format is .gguf and quantized (e.g., Q4_K_M):
# Example (after conversion if needed)
mv <downloaded>.gguf llm/models/tinyllama_1b_q4_chat.ggufwget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US-libritts_r-medium.onnx -P tts/voice/libritts_r/Ensure llm/bin is in your shared library path:
export LD_LIBRARY_PATH=llm/bin:$LD_LIBRARY_PATH./stt/bin/whisper-cli ../audio/speech.wav --model ../stt/models/ggml-tiny.bin./llm/bin/llama-cli -m ../llm/models/tinyllama_1b_q4_chat.gguf -p "Hello, who are you?" -n 50Install Local TTS - Piper
A faster and lightweight alternative to MeloTTS
Download the Piper Binary and the voice from Github
Use the following link to install Piper Binary for your operating system.
Use the following link to download Piper voices.
Each voice will have two files:
| .onnx | Actual voice model |
| .onnx.json | Model configuration |
For example:
models/en_US-lessac-medium/
βββ en_US-lessac-medium.onnx
βββ en_US-lessac-medium.onnx.json./tts/piper/piper --model ../tts/voice/libritts_r/en_US-libritts_r-medium.onnx --text "Hello, I am your Pi assistant."
---
The bin/ folder in the PILL project contains the key scripts that power the full voice-to-voice AI pipeline. Each script is responsible for one or more stages of the interaction loop: capturing audio, generating a response, and speaking it back to the user.## π Main Scripts & Modules (bin/ and core components)
This section documents the purpose of each key script and module in the PILL project.
| Script/File | Description |
|---|---|
run-server.sh |
Main orchestrator script that handles the complete voice-to-voice assistant pipeline: audio recording β transcription β LLM response β TTS output. |
speak |
Wrapper for Piper TTS. Converts text input to spoken audio using the selected voice model. |
tokens |
Wrapper for LLaMA model. Takes user prompts and runs the language model to generate text responses. |
| Module/Folder | Description |
|---|---|
core/ |
Contains the logic for external data integrations such as news scraping and data preprocessing. Ideal for enriching responses with real-world context such as news,weather,maps,wiki. |
auto-comp/ |
Provides auto-completion or predictive typing features. Helps improve user experience by suggesting or auto-filling text. |
get.py |
Script to retrieve dynamic content like weather updates, news headlines, or Wikipedia summaries. Acts as a smart data fetcher for the assistant. |
run-server/ |
A script or module to run all major features or tests together. Useful for demos or integration testing. |
simple-cli/ |
A basic command-line interface to interact with the assistant via text. Does not involve voice processing. Useful for debugging or quick tests. |
spell/ |
Provides gpt response while taking input in the form of text from the user |
tokenizer/ |
Utility functions for tokenizing input or output text for compatibility with the LLaMA model. Handles text formatting, splitting, and pre-processing. |
This is the main script that ties the entire system together and initiates the voice assistant workflow.
- Records a voice input from the microphone and saves it as
audio/speech.wav - Transcribes the audio to text using Whisper (
whisper-cli) - Feeds the transcribed prompt to the LLM via the
tokenswrapper script - Converts the generated text response into speech using the
speakwrapper - Plays back the synthesized voice to the user
cd bin
./run-serversep 6, 2025




