Guesstimation Analysis - Voice AI Assistant

A voice-powered guesstimation analysis system built with OpenAI's Voice Agents SDK, FastAPI, Next.js, and MongoDB. This project demonstrates how to create a speech-to-speech AI assistant for breaking down complex estimation problems and generating structured markdown reports with visual flow diagrams.

🎯 Features

Voice-First Interface: Natural voice commands for all operations
Guesstimation Analysis: Break down complex estimation problems step by step
Conversation Tracking: Automatically track and analyze voice conversations
Markdown Export: Generate structured reports with flow diagrams
Session Management: Save and retrieve previous analysis sessions
Real-time Updates: WebSocket-based communication for instant feedback
Visual Flow Diagrams: Mermaid-based diagrams showing estimation logic
Multi-tenant Database: Supports multiple users with proper data isolation

🏗️ Architecture

graph TB
    subgraph "Frontend (Next.js)"
        UI[User Interface]
        Audio[Audio Recording/Playback]
        WS[WebSocket Client]
    end
    
    subgraph "Backend (FastAPI)"
        API[FastAPI Server]
        VoicePipeline[Voice Pipeline]
        Workflow[Voice Workflow]
    end
    
    subgraph "OpenAI Voice Agents SDK"
        Agent[AI Agent]
        Tools[Function Tools]
        Runner[Agent Runner]
    end
    
    subgraph "External Services"
        OpenAI[OpenAI API]
        Whisper[Whisper STT]
        TTS[Text-to-Speech]
    end
    
    subgraph "Database"
        MongoDB[(MongoDB)]
        Sessions[Guesstimation Sessions]
        Context[Conversation Context]
    end
    
    UI --> Audio
    Audio --> WS
    WS --> API
    API --> VoicePipeline
    VoicePipeline --> Workflow
    Workflow --> Runner
    Runner --> Agent
    Agent --> Tools
    Tools --> MongoDB
    Agent --> OpenAI
    OpenAI --> Whisper
    OpenAI --> TTS
    
    MongoDB --> Sessions
    MongoDB --> Context
    
    style UI fill:#e1f5fe
    style API fill:#f3e5f5
    style VoicePipeline fill:#fff3e0
    style Agent fill:#ffe0b2
    style Tools fill:#ffcc02
    style Runner fill:#ffb74d
    style OpenAI fill:#e8f5e8
    style Whisper fill:#c8e6c9
    style TTS fill:#a5d6a7
    style MongoDB fill:#fce4ec

🗣️ Voice Commands

Starting Analysis

"I want to estimate how many people go to movies in India"
"Let's analyze the market size for electric vehicles"
"Help me estimate the number of coffee shops in New York"

Export and Save

"Create markdown from this conversation"
"Export my analysis"
"Save this guesstimation session"
"Generate a report"

Session Management

"Show my previous analyses"
"Clear this session"
"What sessions do I have?"

🛠️ Tech Stack

Backend: FastAPI, OpenAI Voice Agents SDK, Motor (MongoDB async driver)
Frontend: Next.js 15, TypeScript, Tailwind CSS, wavtools
Database: MongoDB (with indexes for performance)
AI: GPT-4o-mini, OpenAI Whisper (STT), OpenAI TTS
Communication: WebSockets for real-time voice interaction
Package Management: uv (Python), npm (Node.js)

📋 Requirements

Python 3.11+
Node.js 18+
MongoDB (local or Atlas)
OpenAI API key with voice capabilities
Microphone access

🚀 Quick Start

1. Clone and Setup

git clone <your-repo>
cd experiment-openai-speech-to-speech

2. Environment Configuration

Create a .env file in the root directory:

# OpenAI API Configuration
OPENAI_API_KEY=your_openai_api_key_here

# Database Configuration
MONGODB_URI=mongodb://localhost:27017
DATABASE_NAME=guesstimation_analysis

3. Install Dependencies

# Backend dependencies
cd server && uv sync

# Frontend dependencies
cd ../frontend && npm install

4. Start Development Servers

# From root directory
make dev

Or run separately:

# Backend (Terminal 1)
cd server && uv run server.py

# Frontend (Terminal 2)
cd frontend && npm run dev

5. Access Application

Frontend: http://localhost:3000
Backend API: http://localhost:8000
WebSocket: ws://localhost:8000/ws

🗄️ Database Schema

Guesstimation Sessions Collection

{
  "_id": ObjectId,
  "user_id": ObjectId,           // Multi-tenant support
  "scenario_title": "Movie attendance in India",
  "conversation_summary": "Full conversation text...",
  "markdown_content": "Generated markdown with flow diagram...",
  "created_at": ISODate,
  "updated_at": ISODate
}

Conversation Context Collection

{
  "_id": ObjectId,
  "user_id": ObjectId,
  "session_id": ObjectId,
  "context": {
    "current_conversation": [...],
    "scenario_title": "Movie attendance in India",
    "analysis_steps": [...],
    "final_estimate": {...}
  },
  "created_at": ISODate,
  "updated_at": ISODate
}

🔧 Development Commands

# Setup
make setup              # Initial setup with .env creation
make sync               # Install/update dependencies

# Development
make dev                # Start both frontend and backend
make frontend           # Start only frontend
make server             # Start only backend

# Production
make build              # Build frontend for production
make start              # Start production server

# Utilities
make clean              # Clean build artifacts
make help               # Show all commands

🧪 Testing

Database Connection Test

cd server && uv run test_db.py

Voice Commands Test

Start the application
Grant microphone permissions
Try voice commands from the examples above
Check database for created sessions

📊 Example Usage Flow

User: "I want to estimate how many people go to movies in India every month"
Assistant: "Great! Let's break this down step by step. What's your starting point?"
User: "Well, India has about 1.4 billion people, and maybe 35% live in cities..."
Assistant: "Good start! Let's think about the age demographics..."
User: "Create markdown from this conversation"
System: Generates structured markdown with flow diagram and saves to database

📁 Generated Markdown Example

# Guesstimation Analysis: Movie Attendance in India

## Scenario
Estimate how many people go to movies every month and every year in India.

## Conversation Summary
**User**: I want to estimate how many people go to movies in India
**Assistant**: Let's break this down step by step. What's your starting point?
**User**: Well, India has about 1.4 billion people, and maybe 35% live in cities...

## Step-by-Step Analysis

### 1. Total Population
- **Value**: 1.4 billion people
- **Source**: User estimate
- **Confidence**: High

### 2. Urban Population
- **Value**: 35% of total population
- **Calculation**: 1.4B × 0.35 = 490 million
- **Assumption**: Movie theaters primarily in urban areas
- **Confidence**: Medium

## Flow Diagram

```mermaid
graph TD
    A[Total Population: 1.4B] --> B[Urban Population: 35%]
    B --> C[Urban Population: 490M]
    C --> D[Movie-going Age: 15-65]
    D --> E[Target Age Group: 60%]
    E --> F[Potential Audience: 294M]
    F --> G[Monthly Attendance Rate: 50%]
    G --> H[Monthly Attendance: 147M]
    H --> I[Yearly Attendance: 1.76B]
    
    style A fill:#e1f5fe
    style H fill:#c8e6c9
    style I fill:#ffcc02

Final Estimates

Monthly Movie Attendance: 147 million people
Yearly Movie Attendance: 1.76 billion people
Overall Confidence Level: Medium (60%)


## 🔧 Troubleshooting

### Common Issues

1. **Microphone Access Denied**
   - Check browser permissions
   - Use HTTPS in production
   - Clear browser cache

2. **Database Connection Failed**
   - Verify MongoDB is running
   - Check connection string
   - Ensure database exists

3. **OpenAI API Errors**
   - Verify API key is valid
   - Check API quota
   - Ensure voice capabilities enabled

4. **WebSocket Connection Issues**
   - Check server is running
   - Verify WebSocket URL
   - Check firewall settings

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test thoroughly
5. Submit a pull request

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

- OpenAI for the Voice Agents SDK
- FastAPI for the excellent web framework
- Next.js for the frontend framework
- MongoDB for the database solution

## 📞 Support

For questions or issues:
1. Check the troubleshooting section
2. Review the documentation
3. Open an issue on GitHub

---

**Happy Voice-Enabled Guesstimation Analysis! 🎤📊**

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
frontend		frontend
server		server
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
demo-commands.md		demo-commands.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Guesstimation Analysis - Voice AI Assistant

🎯 Features

🏗️ Architecture

🗣️ Voice Commands

Starting Analysis

Export and Save

Session Management

🛠️ Tech Stack

📋 Requirements

🚀 Quick Start

1. Clone and Setup

2. Environment Configuration

3. Install Dependencies

4. Start Development Servers

5. Access Application

🗄️ Database Schema

Guesstimation Sessions Collection

Conversation Context Collection

🔧 Development Commands

🧪 Testing

Database Connection Test

Voice Commands Test

📊 Example Usage Flow

📁 Generated Markdown Example

Final Estimates

About

Uh oh!

Releases

Packages

Languages

ShubhamDalvi1999/Guesstimation-Voice-Agent

Folders and files

Latest commit

History

Repository files navigation

Guesstimation Analysis - Voice AI Assistant

🎯 Features

🏗️ Architecture

🗣️ Voice Commands

Starting Analysis

Export and Save

Session Management

🛠️ Tech Stack

📋 Requirements

🚀 Quick Start

1. Clone and Setup

2. Environment Configuration

3. Install Dependencies

4. Start Development Servers

5. Access Application

🗄️ Database Schema

Guesstimation Sessions Collection

Conversation Context Collection

🔧 Development Commands

🧪 Testing

Database Connection Test

Voice Commands Test

📊 Example Usage Flow

📁 Generated Markdown Example

Final Estimates

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages