Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitattributes
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
components/third_party/* linguist-vendored
components/livekit/protocol/*.c linguist-generated
components/livekit/protocol/*.h linguist-generated
components/livekit/protocol/*.h linguist-generated
examples/**/main/images/*.c linguist-generated # LVGL images
examples/**/main/fonts/*.c linguist-generated # LVGL fonts
6 changes: 6 additions & 0 deletions examples/voice_agent_lcd/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# The following lines of boilerplate have to be in your project's CMakeLists
# in this exact order for cmake to work correctly
cmake_minimum_required(VERSION 3.5)
set(COMPONENTS main) # Trim build
include($ENV{IDF_PATH}/tools/cmake/project.cmake)
project(voice_agent_lcd)
86 changes: 86 additions & 0 deletions examples/voice_agent_lcd/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Voice Agent LCD

Example application combining this SDK with [LiveKit Agents](https://docs.livekit.io/agents/), enabling bidirectional voice communication with an AI agent. The agent can interact with hardware in response to user requests. Below is an example of a conversation between a user and the agent:

> **User:** What is the current CPU temperature? \
> **Agent:** The CPU temperature is currently 33°C.

> **User:** Turn on the blue LED. \
> **Agent:** *[turns blue LED on]*

> **User:** Turn on the yellow LED. \
> **Agent:** I'm sorry, the board does not have a yellow LED.

## Requirements

- Software:
- [IDF](https://docs.espressif.com/projects/esp-idf/en/stable/esp32/get-started/index.html) release v5.4 or later
- Python 3.9 or later
- LiveKit Cloud Project
- Sandbox Token Server (created from your cloud project)
- API keys for OpenAI, Deepgram, and Cartesia.
- Hardware
- Dev board: [ESP32-S3-Korvo-2](https://docs.espressif.com/projects/esp-adf/en/latest/design-guide/dev-boards/user-guide-esp32-s3-korvo-2.html)
- Two micro USB cables: one for power, one for flashing
- Mono enclosed speaker (example from [Adafruit](https://www.adafruit.com/product/3351))

## Run example

To run the example on your board, begin in your terminal by navigating to the example's root directory: *[examples/voice_agent](./examples/voice_agent/)*.

### 1. Configuration

The example requires a network connection and Sandbox ID from your [LiveKit Cloud Project](https://cloud.livekit.io/projects/p_/sandbox/templates/token-server). To configure these settings from your terminal, launch *menuconfig*:
```sh
idf.py menuconfig
```

With *menuconfig* open, navigate to the *LiveKit Example* menu and configure the following settings:

- Network → Wi-Fi SSID
- Network → Wi-Fi password
- Room connection → Sandbox ID

For more information about available options, please refer to [this guide](../README.md#configuration).

### 2. Build & flash

Begin by connecting your dev board via USB. With the board connected, use the following command
to build the example, flash it to your board, and monitor serial output:

```sh
idf.py -D SDKCONFIG_DEFAULTS=sdkconfig.bsp.{board} flash monitor
```

Once running on device, the example will establish a network connection and then connect to a LiveKit room. Once connected, you will see the following log message:

```sh
I (19508) livekit_example: Room state: connected
```

If you encounter any issues during this process, please refer to the example [troubleshooting guide](../README.md/#troubleshooting).

## Run agent

With the example running on your board, the next step is to run the agent so it can join the room.
Begin by navigating to the agent's source directory in your terminal: *[examples/voice_agent/agent](../voice_agent/agent)*.

In this directory, create a *.env* file containing the required API keys:

```sh
DEEPGRAM_API_KEY=<your Deepgram API Key>
OPENAI_API_KEY=<your OpenAI API Key>
CARTESIA_API_KEY=<your Cartesia API Key>
LIVEKIT_API_KEY=<your API Key>
LIVEKIT_API_SECRET=<your API Secret>
LIVEKIT_URL=<your server URL>
```

With the API keys in place, download the required files and run the agent in development mode as follows:

```sh
python agent.py download-files
python agent.py dev
```

With the agent running, it will discover and join the room, and you will now be able to engage in two-way conversation. Try asking some of the questions shown above.
13 changes: 13 additions & 0 deletions examples/voice_agent_lcd/agent/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Python-generated files
__pycache__/
*.py[oc]
build/
dist/
wheels/
*.egg-info

# Virtual environments
.venv

# Environment variables
.env
1 change: 1 addition & 0 deletions examples/voice_agent_lcd/agent/.python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.13
86 changes: 86 additions & 0 deletions examples/voice_agent_lcd/agent/agent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
import json
from enum import Enum
from dotenv import load_dotenv
from livekit import agents
from livekit.agents import (
AgentSession,
Agent,
RunContext,
RoomInputOptions,
function_tool,
get_job_context,
ToolError,
)
from livekit.plugins import (
openai,
cartesia,
deepgram,
silero,
)
from livekit.plugins.turn_detector.multilingual import MultilingualModel

# If enabled, RPC calls will not be performed.
TEST_MODE = False

load_dotenv()

class LEDColor(str, Enum):
RED = "red"
BLUE = "blue"

class Assistant(Agent):
def __init__(self) -> None:
super().__init__(
instructions="""You are a helpful voice AI assistant running on an ESP-32 dev board.
You answer user's questions about the hardware state.
"""
)
async def on_enter(self) -> None:
await self.session.say(
"Hi, how can I help you today?",
allow_interruptions=False
)

@function_tool()
async def get_cpu_temp(self, _: RunContext) -> float:
"""Get the current temperature of the CPU.

Returns:
The temperature reading in degrees Celsius.
"""
if TEST_MODE: return 25.0
try:
room = get_job_context().room
participant_identity = next(iter(room.remote_participants))
response = await room.local_participant.perform_rpc(
destination_identity=participant_identity,
method="get_cpu_temp",
response_timeout=10,
payload=""
)
if isinstance(response, str):
try:
response = float(response)
except ValueError:
raise ToolError("Received invalid temperature value")
return response
except Exception:
raise ToolError("Unable to retrieve CPU temperature")

async def entrypoint(ctx: agents.JobContext):
session = AgentSession(
stt=deepgram.STT(model="nova-3", language="multi"),
llm=openai.LLM(model="gpt-4o-mini"),
tts=cartesia.TTS(model="sonic-2", voice="c99d36f3-5ffd-4253-803a-535c1bc9c306"),
vad=silero.VAD.load(),
turn_detection=MultilingualModel(),
)
await session.start(
room=ctx.room,
agent=Assistant(),
room_input_options=RoomInputOptions()
)
await ctx.connect()

if __name__ == "__main__":
agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))
11 changes: 11 additions & 0 deletions examples/voice_agent_lcd/agent/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[project]
name = "esp-example-agent"
version = "0.1.0"
description = "Example agent for interacting with the ESP-32 voice chat example."
readme = "README.md"
license = {text = "MIT"}
requires-python = ">=3.13"
dependencies = [
"livekit-agents[cartesia,deepgram,openai,silero,turn-detector]~=1.0",
"python-dotenv>=1.1.1"
]
Loading
Loading