Skip to content

risvn/voice-assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

27 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Voice-to-Voice AI Assistant

πŸ“‘ Table of Contents

  1. Voice-to-Voice AI Assistant
  2. Hardware Requirements
  3. Project Structure
  4. Workflow Overview
  5. Step-by-Step Setup
  6. Environment Configuration
  7. Manual Testing
  8. Detailed Description of bin/ Scripts
  9. Run the Assistant
  10. Screenshots & Block Diagrams
  11. Last Updated

Voice Assistant for Raspberry Pi, integrating:

  • πŸŽ™οΈ Whisper.cpp β€” Speech-to-Text (STT)
  • 🧠 TinyLLaMA (via llama.cpp) β€” Language Model for text generation
  • πŸ”Š Piper TTS β€” Text-to-Speech (TTS)

Built as a Bachelor's Major Project, it delivers real-time, low-latency voice interactions on constrained hardware.


Hardware Requirements

  • Raspberry Pi 4 (4GB or 8GB)
  • USB Microphone (or onboard mic)
  • Speakers (3.5mm jack or HDMI)
  • MicroSD card (32GB+ recommended)

Project Structure

pill/
β”œβ”€β”€ audio/ # Temporary audio files
β”‚ └── speech.wav
β”œβ”€β”€ bin/ # Executable scripts
β”‚ β”œβ”€β”€ run.sh # Main voice-to-voice pipeline
β”‚ β”œβ”€β”€ speak # TTS wrapper
β”‚ └── tokens # LLaMA wrapper
β”œβ”€β”€ stt/
β”‚ β”œβ”€β”€ bin/ # Whisper binary
β”‚ └── models/ # Whisper model (e.g., ggml-tiny.bin)
β”œβ”€β”€ llm/
β”‚ β”œβ”€β”€ bin/ # llama.cpp binary and shared libs
β”‚ └── models/ # GGUF LLaMA models
β”œβ”€β”€ tts/
β”‚ β”œβ”€β”€ piper/ # Piper binary
β”‚ └── voice/ # ONNX voice models
β”œβ”€β”€ requirements.txt
└── README.md


This pipeline will:

  1. Record audio from mic
  2. Transcribe with Whisper
  3. Generate reply with LLaMA
  4. Speak with Piper

Workflow Overview

️ 1. User Speaks

  • Audio is recorded with a microphone and saved to audio/speech.wav.

2. Speech-to-Text (STT)

  • Audio is transcribed using Whisper.cpp with a small model like ggml-tiny.bin.

3. Text Generation with LLaMA

  • Transcribed text is passed to TinyLLaMA via llama.cpp.
  • A quantized model (Q4/Q5) ensures fast inference on Raspberry Pi.

4. Text-to-Speech (TTS)

  • Piper TTS synthesizes the response using a pre-downloaded ONNX voice model.

Step-by-Step Setup

Clone the Repository

git clone https://github.com/rsvn/pill.git
cd pill

️⃣ Install Python Dependencies

pip install -r requirements.txt

Set up a virtual environment

Using venv:

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`

run set.sh for installing the required models or bulid it your self from below repo

./setup.sh

Build/Download Binaries

  • Whisper.cpp: Build main as whisper-cli
  • llama.cpp: Build main as llama-cli
  • Piper: Build binary and required shared libs

Place them in:

stt/bin/whisper-cli
llm/bin/llama-cli
tts/piper/piper

Download Models

Whisper Model

wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.bin -P stt/models/

TinyLLaMA (GGUF)

Ensure format is .gguf and quantized (e.g., Q4_K_M):

# Example (after conversion if needed)
mv <downloaded>.gguf llm/models/tinyllama_1b_q4_chat.gguf

Piper Voice Model

wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US-libritts_r-medium.onnx -P tts/voice/libritts_r/

Environment Configuration

Ensure llm/bin is in your shared library path:

export LD_LIBRARY_PATH=llm/bin:$LD_LIBRARY_PATH

if any thing breaks,make sure three api end points are working properly

Manual Testing

Whisper (STT)

./stt/bin/whisper-cli ../audio/speech.wav --model ../stt/models/ggml-tiny.bin

LLaMA (LLM)

./llm/bin/llama-cli -m ../llm/models/tinyllama_1b_q4_chat.gguf -p "Hello, who are you?" -n 50

Piper (TTS)

Install Local TTS - Piper

A faster and lightweight alternative to MeloTTS

Download the Piper Binary and the voice from Github

Use the following link to install Piper Binary for your operating system.

Use the following link to download Piper voices. Each voice will have two files: | .onnx | Actual voice model | | .onnx.json | Model configuration |

For example:

models/en_US-lessac-medium/
β”œβ”€β”€ en_US-lessac-medium.onnx
β”œβ”€β”€ en_US-lessac-medium.onnx.json

test from the cli

./tts/piper/piper --model ../tts/voice/libritts_r/en_US-libritts_r-medium.onnx --text "Hello, I am your Pi assistant."

---

Detailed Description of bin/ Scripts

The bin/ folder in the PILL project contains the key scripts that power the full voice-to-voice AI pipeline. Each script is responsible for one or more stages of the interaction loop: capturing audio, generating a response, and speaking it back to the user.## πŸ“ Main Scripts & Modules (bin/ and core components)

This section documents the purpose of each key script and module in the PILL project.

bin/ - Executable Scripts

Script/File Description
run-server.sh Main orchestrator script that handles the complete voice-to-voice assistant pipeline: audio recording β†’ transcription β†’ LLM response β†’ TTS output.
speak Wrapper for Piper TTS. Converts text input to spoken audio using the selected voice model.
tokens Wrapper for LLaMA model. Takes user prompts and runs the language model to generate text responses.

️ Core Components

Module/Folder Description
core/ Contains the logic for external data integrations such as news scraping and data preprocessing. Ideal for enriching responses with real-world context such as news,weather,maps,wiki.
auto-comp/ Provides auto-completion or predictive typing features. Helps improve user experience by suggesting or auto-filling text.
get.py Script to retrieve dynamic content like weather updates, news headlines, or Wikipedia summaries. Acts as a smart data fetcher for the assistant.
run-server/ A script or module to run all major features or tests together. Useful for demos or integration testing.
simple-cli/ A basic command-line interface to interact with the assistant via text. Does not involve voice processing. Useful for debugging or quick tests.
spell/ Provides gpt response while taking input in the form of text from the user
tokenizer/ Utility functions for tokenizing input or output text for compatibility with the LLaMA model. Handles text formatting, splitting, and pre-processing.

️ run-server.sh – Master Orchestrator

This is the main script that ties the entire system together and initiates the voice assistant workflow.

Responsibilities:

  • Records a voice input from the microphone and saves it as audio/speech.wav
  • Transcribes the audio to text using Whisper (whisper-cli)
  • Feeds the transcribed prompt to the LLM via the tokens wrapper script
  • Converts the generated text response into speech using the speak wrapper
  • Plays back the synthesized voice to the user

Run the Assistant

cd bin
./run-server

ScreenShots

Work flow rag architecture Terminal Interface Raspberry Pi Setup

Last Updated

sep 6, 2025

-- rsvn

About

A real-time offline voice-to-voice AI assistant built for Raspberry Pi

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages