Skip to content

A voice-powered AI assistant built with the OpenAI Voice Agents SDK and FastAPI. This project demonstrates a real-time speech-to-speech interaction system that helps users break down complex estimation problems, track conversations, and generate structured markdown reports with visual flow diagrams.

Notifications You must be signed in to change notification settings

ShubhamDalvi1999/Guesstimation-Voice-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Guesstimation Analysis - Voice AI Assistant

MIT License FastAPI NextJS OpenAI API MongoDB

A voice-powered guesstimation analysis system built with OpenAI's Voice Agents SDK, FastAPI, Next.js, and MongoDB. This project demonstrates how to create a speech-to-speech AI assistant for breaking down complex estimation problems and generating structured markdown reports with visual flow diagrams.

🎯 Features

  • Voice-First Interface: Natural voice commands for all operations
  • Guesstimation Analysis: Break down complex estimation problems step by step
  • Conversation Tracking: Automatically track and analyze voice conversations
  • Markdown Export: Generate structured reports with flow diagrams
  • Session Management: Save and retrieve previous analysis sessions
  • Real-time Updates: WebSocket-based communication for instant feedback
  • Visual Flow Diagrams: Mermaid-based diagrams showing estimation logic
  • Multi-tenant Database: Supports multiple users with proper data isolation

πŸ—οΈ Architecture

graph TB
    subgraph "Frontend (Next.js)"
        UI[User Interface]
        Audio[Audio Recording/Playback]
        WS[WebSocket Client]
    end
    
    subgraph "Backend (FastAPI)"
        API[FastAPI Server]
        VoicePipeline[Voice Pipeline]
        Workflow[Voice Workflow]
    end
    
    subgraph "OpenAI Voice Agents SDK"
        Agent[AI Agent]
        Tools[Function Tools]
        Runner[Agent Runner]
    end
    
    subgraph "External Services"
        OpenAI[OpenAI API]
        Whisper[Whisper STT]
        TTS[Text-to-Speech]
    end
    
    subgraph "Database"
        MongoDB[(MongoDB)]
        Sessions[Guesstimation Sessions]
        Context[Conversation Context]
    end
    
    UI --> Audio
    Audio --> WS
    WS --> API
    API --> VoicePipeline
    VoicePipeline --> Workflow
    Workflow --> Runner
    Runner --> Agent
    Agent --> Tools
    Tools --> MongoDB
    Agent --> OpenAI
    OpenAI --> Whisper
    OpenAI --> TTS
    
    MongoDB --> Sessions
    MongoDB --> Context
    
    style UI fill:#e1f5fe
    style API fill:#f3e5f5
    style VoicePipeline fill:#fff3e0
    style Agent fill:#ffe0b2
    style Tools fill:#ffcc02
    style Runner fill:#ffb74d
    style OpenAI fill:#e8f5e8
    style Whisper fill:#c8e6c9
    style TTS fill:#a5d6a7
    style MongoDB fill:#fce4ec
Loading

πŸ—£οΈ Voice Commands

Starting Analysis

  • "I want to estimate how many people go to movies in India"
  • "Let's analyze the market size for electric vehicles"
  • "Help me estimate the number of coffee shops in New York"

Export and Save

  • "Create markdown from this conversation"
  • "Export my analysis"
  • "Save this guesstimation session"
  • "Generate a report"

Session Management

  • "Show my previous analyses"
  • "Clear this session"
  • "What sessions do I have?"

πŸ› οΈ Tech Stack

  • Backend: FastAPI, OpenAI Voice Agents SDK, Motor (MongoDB async driver)
  • Frontend: Next.js 15, TypeScript, Tailwind CSS, wavtools
  • Database: MongoDB (with indexes for performance)
  • AI: GPT-4o-mini, OpenAI Whisper (STT), OpenAI TTS
  • Communication: WebSockets for real-time voice interaction
  • Package Management: uv (Python), npm (Node.js)

πŸ“‹ Requirements

  • Python 3.11+
  • Node.js 18+
  • MongoDB (local or Atlas)
  • OpenAI API key with voice capabilities
  • Microphone access

πŸš€ Quick Start

1. Clone and Setup

git clone <your-repo>
cd experiment-openai-speech-to-speech

2. Environment Configuration

Create a .env file in the root directory:

# OpenAI API Configuration
OPENAI_API_KEY=your_openai_api_key_here

# Database Configuration
MONGODB_URI=mongodb://localhost:27017
DATABASE_NAME=guesstimation_analysis

3. Install Dependencies

# Backend dependencies
cd server && uv sync

# Frontend dependencies
cd ../frontend && npm install

4. Start Development Servers

# From root directory
make dev

Or run separately:

# Backend (Terminal 1)
cd server && uv run server.py

# Frontend (Terminal 2)
cd frontend && npm run dev

5. Access Application

πŸ—„οΈ Database Schema

Guesstimation Sessions Collection

{
  "_id": ObjectId,
  "user_id": ObjectId,           // Multi-tenant support
  "scenario_title": "Movie attendance in India",
  "conversation_summary": "Full conversation text...",
  "markdown_content": "Generated markdown with flow diagram...",
  "created_at": ISODate,
  "updated_at": ISODate
}

Conversation Context Collection

{
  "_id": ObjectId,
  "user_id": ObjectId,
  "session_id": ObjectId,
  "context": {
    "current_conversation": [...],
    "scenario_title": "Movie attendance in India",
    "analysis_steps": [...],
    "final_estimate": {...}
  },
  "created_at": ISODate,
  "updated_at": ISODate
}

πŸ”§ Development Commands

# Setup
make setup              # Initial setup with .env creation
make sync               # Install/update dependencies

# Development
make dev                # Start both frontend and backend
make frontend           # Start only frontend
make server             # Start only backend

# Production
make build              # Build frontend for production
make start              # Start production server

# Utilities
make clean              # Clean build artifacts
make help               # Show all commands

πŸ§ͺ Testing

Database Connection Test

cd server && uv run test_db.py

Voice Commands Test

  1. Start the application
  2. Grant microphone permissions
  3. Try voice commands from the examples above
  4. Check database for created sessions

πŸ“Š Example Usage Flow

  1. User: "I want to estimate how many people go to movies in India every month"
  2. Assistant: "Great! Let's break this down step by step. What's your starting point?"
  3. User: "Well, India has about 1.4 billion people, and maybe 35% live in cities..."
  4. Assistant: "Good start! Let's think about the age demographics..."
  5. User: "Create markdown from this conversation"
  6. System: Generates structured markdown with flow diagram and saves to database

πŸ“ Generated Markdown Example

# Guesstimation Analysis: Movie Attendance in India

## Scenario
Estimate how many people go to movies every month and every year in India.

## Conversation Summary
**User**: I want to estimate how many people go to movies in India
**Assistant**: Let's break this down step by step. What's your starting point?
**User**: Well, India has about 1.4 billion people, and maybe 35% live in cities...

## Step-by-Step Analysis

### 1. Total Population
- **Value**: 1.4 billion people
- **Source**: User estimate
- **Confidence**: High

### 2. Urban Population
- **Value**: 35% of total population
- **Calculation**: 1.4B Γ— 0.35 = 490 million
- **Assumption**: Movie theaters primarily in urban areas
- **Confidence**: Medium

## Flow Diagram

```mermaid
graph TD
    A[Total Population: 1.4B] --> B[Urban Population: 35%]
    B --> C[Urban Population: 490M]
    C --> D[Movie-going Age: 15-65]
    D --> E[Target Age Group: 60%]
    E --> F[Potential Audience: 294M]
    F --> G[Monthly Attendance Rate: 50%]
    G --> H[Monthly Attendance: 147M]
    H --> I[Yearly Attendance: 1.76B]
    
    style A fill:#e1f5fe
    style H fill:#c8e6c9
    style I fill:#ffcc02

Final Estimates

  • Monthly Movie Attendance: 147 million people
  • Yearly Movie Attendance: 1.76 billion people
  • Overall Confidence Level: Medium (60%)

## πŸ”§ Troubleshooting

### Common Issues

1. **Microphone Access Denied**
   - Check browser permissions
   - Use HTTPS in production
   - Clear browser cache

2. **Database Connection Failed**
   - Verify MongoDB is running
   - Check connection string
   - Ensure database exists

3. **OpenAI API Errors**
   - Verify API key is valid
   - Check API quota
   - Ensure voice capabilities enabled

4. **WebSocket Connection Issues**
   - Check server is running
   - Verify WebSocket URL
   - Check firewall settings

## 🀝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test thoroughly
5. Submit a pull request

## πŸ“„ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## πŸ™ Acknowledgments

- OpenAI for the Voice Agents SDK
- FastAPI for the excellent web framework
- Next.js for the frontend framework
- MongoDB for the database solution

## πŸ“ž Support

For questions or issues:
1. Check the troubleshooting section
2. Review the documentation
3. Open an issue on GitHub

---

**Happy Voice-Enabled Guesstimation Analysis! πŸŽ€πŸ“Š**

About

A voice-powered AI assistant built with the OpenAI Voice Agents SDK and FastAPI. This project demonstrates a real-time speech-to-speech interaction system that helps users break down complex estimation problems, track conversations, and generate structured markdown reports with visual flow diagrams.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published