Voice-to-Voice AI Assistant

📑 Table of Contents

Voice-to-Voice AI Assistant
Hardware Requirements
Project Structure
Workflow Overview
Step-by-Step Setup
Environment Configuration
Manual Testing
Detailed Description of bin/ Scripts
Run the Assistant
Screenshots & Block Diagrams
Last Updated

Voice Assistant for Raspberry Pi, integrating:

🎙️ Whisper.cpp — Speech-to-Text (STT)
🧠 TinyLLaMA (via llama.cpp) — Language Model for text generation
🔊 Piper TTS — Text-to-Speech (TTS)

Built as a Bachelor's Major Project, it delivers real-time, low-latency voice interactions on constrained hardware.

Hardware Requirements

Raspberry Pi 4 (4GB or 8GB)
USB Microphone (or onboard mic)
Speakers (3.5mm jack or HDMI)
MicroSD card (32GB+ recommended)

Project Structure

pill/
├── audio/ # Temporary audio files
│ └── speech.wav
├── bin/ # Executable scripts
│ ├── run.sh # Main voice-to-voice pipeline
│ ├── speak # TTS wrapper
│ └── tokens # LLaMA wrapper
├── stt/
│ ├── bin/ # Whisper binary
│ └── models/ # Whisper model (e.g., ggml-tiny.bin)
├── llm/
│ ├── bin/ # llama.cpp binary and shared libs
│ └── models/ # GGUF LLaMA models
├── tts/
│ ├── piper/ # Piper binary
│ └── voice/ # ONNX voice models
├── requirements.txt
└── README.md

This pipeline will:

Record audio from mic
Transcribe with Whisper
Generate reply with LLaMA
Speak with Piper

Workflow Overview

️ 1. User Speaks

Audio is recorded with a microphone and saved to audio/speech.wav.

2. Speech-to-Text (STT)

Audio is transcribed using Whisper.cpp with a small model like ggml-tiny.bin.

3. Text Generation with LLaMA

Transcribed text is passed to TinyLLaMA via llama.cpp.
A quantized model (Q4/Q5) ensures fast inference on Raspberry Pi.

4. Text-to-Speech (TTS)

Piper TTS synthesizes the response using a pre-downloaded ONNX voice model.

Step-by-Step Setup

Clone the Repository

git clone https://github.com/rsvn/pill.git
cd pill

️⃣ Install Python Dependencies

pip install -r requirements.txt

Set up a virtual environment

Using venv:

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`

run set.sh for installing the required models or bulid it your self from below repo

./setup.sh

Build/Download Binaries

Whisper.cpp: Build main as whisper-cli
llama.cpp: Build main as llama-cli
Piper: Build binary and required shared libs

Place them in:

stt/bin/whisper-cli
llm/bin/llama-cli
tts/piper/piper

Download Models

Whisper Model

wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.bin -P stt/models/

TinyLLaMA (GGUF)

Source: https://huggingface.co/cognitivecomputations/TinyLlama-1.1B-Chat-v1.0

Ensure format is .gguf and quantized (e.g., Q4_K_M):

# Example (after conversion if needed)
mv <downloaded>.gguf llm/models/tinyllama_1b_q4_chat.gguf

Piper Voice Model

wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US-libritts_r-medium.onnx -P tts/voice/libritts_r/

Environment Configuration

Ensure llm/bin is in your shared library path:

export LD_LIBRARY_PATH=llm/bin:$LD_LIBRARY_PATH

if any thing breaks,make sure three api end points are working properly

Manual Testing

Whisper (STT)

./stt/bin/whisper-cli ../audio/speech.wav --model ../stt/models/ggml-tiny.bin

LLaMA (LLM)

./llm/bin/llama-cli -m ../llm/models/tinyllama_1b_q4_chat.gguf -p "Hello, who are you?" -n 50

Piper (TTS)

Install Local TTS - Piper

A faster and lightweight alternative to MeloTTS

Download the Piper Binary and the voice from Github

Use the following link to install Piper Binary for your operating system.

For example:

models/en_US-lessac-medium/
├── en_US-lessac-medium.onnx
├── en_US-lessac-medium.onnx.json

test from the cli

./tts/piper/piper --model ../tts/voice/libritts_r/en_US-libritts_r-medium.onnx --text "Hello, I am your Pi assistant."

---

Detailed Description of `bin/` Scripts

The bin/ folder in the PILL project contains the key scripts that power the full voice-to-voice AI pipeline. Each script is responsible for one or more stages of the interaction loop: capturing audio, generating a response, and speaking it back to the user.## 📁 Main Scripts & Modules (bin/ and core components)

This section documents the purpose of each key script and module in the PILL project.

`bin/` - Executable Scripts

Script/File	Description
`run-server.sh`	Main orchestrator script that handles the complete voice-to-voice assistant pipeline: audio recording → transcription → LLM response → TTS output.
`speak`	Wrapper for Piper TTS. Converts text input to spoken audio using the selected voice model.
`tokens`	Wrapper for LLaMA model. Takes user prompts and runs the language model to generate text responses.

️ Core Components

Module/Folder	Description
`core/`	Contains the logic for external data integrations such as news scraping and data preprocessing. Ideal for enriching responses with real-world context such as news,weather,maps,wiki.
`auto-comp/`	Provides auto-completion or predictive typing features. Helps improve user experience by suggesting or auto-filling text.
`get.py`	Script to retrieve dynamic content like weather updates, news headlines, or Wikipedia summaries. Acts as a smart data fetcher for the assistant.
`run-server/`	A script or module to run all major features or tests together. Useful for demos or integration testing.
`simple-cli/`	A basic command-line interface to interact with the assistant via text. Does not involve voice processing. Useful for debugging or quick tests.
`spell/`	Provides gpt response while taking input in the form of text from the user
`tokenizer/`	Utility functions for tokenizing input or output text for compatibility with the LLaMA model. Handles text formatting, splitting, and pre-processing.

️ `run-server.sh` – Master Orchestrator

This is the main script that ties the entire system together and initiates the voice assistant workflow.

Responsibilities:

Records a voice input from the microphone and saves it as audio/speech.wav
Transcribes the audio to text using Whisper (whisper-cli)
Feeds the transcribed prompt to the LLM via the tokens wrapper script
Converts the generated text response into speech using the speak wrapper
Plays back the synthesized voice to the user

Run the Assistant

cd bin
./run-server

ScreenShots

Last Updated

sep 6, 2025

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
assets		assets
config		config
src		src
.gitignore		.gitignore
README.md		README.md
requriments.txt		requriments.txt
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

Voice-to-Voice AI Assistant

📑 Table of Contents

Hardware Requirements

Project Structure

Workflow Overview

️ 1. User Speaks

2. Speech-to-Text (STT)

3. Text Generation with LLaMA

4. Text-to-Speech (TTS)

Step-by-Step Setup

Clone the Repository

️⃣ Install Python Dependencies

Set up a virtual environment

run set.sh for installing the required models or bulid it your self from below repo

Build/Download Binaries

Download Models

Whisper Model

TinyLLaMA (GGUF)

Piper Voice Model

Environment Configuration

if any thing breaks,make sure three api end points are working properly

Manual Testing

Whisper (STT)

LLaMA (LLM)

Piper (TTS)

test from the cli

Detailed Description of bin/ Scripts

bin/ - Executable Scripts

️ Core Components

️ run-server.sh – Master Orchestrator

Responsibilities:

Run the Assistant

ScreenShots

Last Updated

-- rsvn

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Detailed Description of `bin/` Scripts

`bin/` - Executable Scripts

️ `run-server.sh` – Master Orchestrator

Packages