Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .claude/settings.local.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"permissions": {
"allow": [
"Bash(python -m pip:*)",
"Bash(where python)",
"Bash(python:*)",
"Read(//c/Users/Dan Zambello/.claude/agents/**)",
"Read(//c/Users/Dan Zambello/.claude/**)",
"mcp__browsermcp__browser_snapshot",
"mcp__browsermcp__browser_click"
],
"deny": [],
"ask": []
}
}
76 changes: 76 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

This is OpenAI's Whisper repository - a general-purpose speech recognition model that performs multilingual speech recognition, speech translation, and language identification. The codebase is built as a Python package with PyTorch.

## Architecture

### Core Components

- **whisper/__init__.py**: Main entry point with model loading (`load_model()`) and available models registry
- **whisper/model.py**: Core Whisper transformer model implementation
- **whisper/transcribe.py**: High-level transcription interface with CLI entry point
- **whisper/decoding.py**: Lower-level decoding logic and options
- **whisper/audio.py**: Audio processing utilities (loading, mel spectrograms, padding)
- **whisper/tokenizer.py**: Text tokenization and language handling
- **whisper/normalizers/**: Text normalization for different languages

### Model Architecture
- Transformer sequence-to-sequence model
- Multiple model sizes: tiny, base, small, medium, large, turbo
- Both English-only (.en) and multilingual variants
- Models downloaded from Azure CDN and cached locally

## Development Commands

### Testing
```bash
pytest # Run all tests
pytest tests/test_*.py # Run specific test file
pytest -m requires_cuda # Run CUDA-specific tests
```

### Code Quality
```bash
black . # Format code
isort . # Sort imports
flake8 # Lint code
pre-commit run --all-files # Run all pre-commit hooks
```

### Installation for Development
```bash
pip install -e .[dev] # Install in development mode with dev dependencies
```

## Package Structure

- Built using setuptools with pyproject.toml configuration
- Entry point: `whisper` command maps to `whisper.transcribe:cli`
- Dependencies: torch, numpy, tiktoken, tqdm, numba, more-itertools
- Optional triton dependency for Linux x86_64 optimization

## Key APIs

### High-level Usage
```python
import whisper
model = whisper.load_model("turbo")
result = model.transcribe("audio.mp3")
```

### Lower-level Usage
```python
audio = whisper.load_audio("audio.mp3")
mel = whisper.log_mel_spectrogram(audio)
result = whisper.decode(model, mel, options)
```

## Testing Notes

- Tests use pytest with custom markers for CUDA requirements
- Random seeds fixed for reproducibility (seed=42)
- Test coverage includes audio processing, normalization, timing, tokenization, and transcription
137 changes: 137 additions & 0 deletions README_SETUP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Voice to Text Converter - Setup Guide

## Quick Start

### Prerequisites
- Windows 10/11
- Internet connection (for initial setup)
- Microphone access

### Installation Steps

1. **Install Python** (if not already installed)
- Download from [python.org](https://python.org)
- **IMPORTANT**: Check "Add Python to PATH" during installation
- Minimum version: Python 3.8

2. **Run Setup**
- Double-click `setup.bat`
- Wait for dependencies to install (may take 5-10 minutes)
- Setup is complete when you see "Setup Complete!"

3. **Start Using**
- **GUI Mode**: Double-click `voice_to_text_gui.bat`
- **Terminal Mode**: Double-click `voice_to_text_terminal.bat`

## Usage Modes

### GUI Mode (Recommended)
- **Launch**: `voice_to_text_gui.bat` (batch window closes, GUI stays open)
- **Features**:
- Visual interface with buttons
- F1 global hotkey for recording
- Always on top option
- System tray integration
- Settings dialog
- **Best for**: Daily use and continuous workflow

### Terminal Mode
- **Launch**: `voice_to_text_terminal.bat`
- **Features**:
- Simple text interface
- Press Enter to stop recording
- Lightweight and fast
- **Best for**: Quick one-off recordings

## Creating Desktop Shortcuts

1. **Right-click** on desktop → **New** → **Shortcut**
2. **Browse** to the batch file you want (e.g., `voice_to_text_gui.bat`)
3. **Name** the shortcut (e.g., "Voice to Text")
4. **Optional**: Right-click shortcut → **Properties** → **Change Icon**

## First Run

- **Model Download**: First run will download Whisper model (~150MB)
- **Microphone Permission**: Windows may ask for microphone access
- **Settings**: GUI mode creates `voice_to_text_settings.json` for preferences

## File Structure

```
voice-to-text/
├── voice_to_text.py # Main application
├── setup.bat # One-time setup
├── voice_to_text_gui.bat # GUI launcher
├── voice_to_text_terminal.bat # Terminal launcher
├── requirements.txt # Python dependencies
├── transcripts/ # Saved transcriptions
├── voice_to_text_settings.json # Settings (created after first GUI run)
└── README_SETUP.md # This file
```

## Voice Commands

The system automatically converts natural speech into Claude Code prompts:

| Say This | Gets Converted To |
|----------|-------------------|
| "use agent python pro" | `@agent python-pro` |
| "run tool bash" | `@tool bash` |
| "file package.json" | `@file package.json` |
| "directory source" | `@dir source/` |
| "function get user" | `` `getUser()` function`` |

## Troubleshooting

### "Python is not installed"
- Install Python from [python.org](https://python.org)
- **Must check "Add Python to PATH"** during installation
- Restart command prompt/computer after installation

### "Failed to install dependencies"
- Check internet connection
- Try running `setup.bat` as administrator
- Manually run: `pip install -r requirements.txt`

### "No audio recorded"
- Check microphone permissions in Windows Settings
- Ensure microphone is not muted
- Try a different microphone

### "Poor transcription accuracy"
- Speak clearly and at normal pace
- Reduce background noise
- Move closer to microphone
- In GUI mode: Settings → Change to larger Whisper model

### GUI Hotkey Not Working
- Check if another application is using F1
- Try running as administrator
- Change hotkey in Settings dialog

### System Tray Issues
- If tray doesn't work, app falls back to normal minimize
- Some Windows configurations don't support system tray
- This doesn't affect core functionality

## Advanced Settings (GUI Mode)

Access via Settings button:
- **Global Hotkey**: Change from F1 to F2-F12
- **Whisper Model**: tiny (fast) to large (accurate)
- **Always on Top**: Keep window visible
- **Auto Copy**: Automatically copy to clipboard

## Support

- Check transcripts in `/transcripts` folder
- Settings saved in `voice_to_text_settings.json`
- For issues, check the console output in terminal mode

## Updates

To update the application:
1. Replace `voice_to_text.py` with new version
2. Update `requirements.txt` if needed
3. Run `setup.bat` again if dependencies changed
153 changes: 153 additions & 0 deletions assessment_analysis.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
{
"Q1": {
"question": "What is the purpose of determining and documenting requirements for a cabinet installation?",
"transcribed_answer": "To complete it accurately and efficiently, including detailed specification for the cabinets and all components.",
"written_summary": "To have a better understanding of the project for all persons involved and plan in advance for any difficulties and refer to documents as a guide for future projects.",
"assessment": {
"word_count": 15,
"has_substantial_content": true,
"keyword_relevance": 0.2857142857142857,
"transcription_summary_match": 0.15151515151515152
},
"transcription_details": {
"text": "To complete it accurately and efficiently, including detailed specification for the cabinets and all components.",
"language": "en",
"segments": [
{
"id": 0,
"seek": 0,
"start": 0.0,
"end": 8.0,
"text": " To complete it accurately and efficiently, including detailed specification for the cabinets and all components.",
"tokens": [
50364,
1407,
3566,
309,
20095,
293,
19621,
11,
3009,
9942,
31256,
337,
264,
37427,
293,
439,
6677,
13,
50764
],
"temperature": 0.0,
"avg_logprob": -0.3367410898208618,
"compression_ratio": 1.1914893617021276,
"no_speech_prob": 0.07899459451436996
}
]
}
},
"Q2": {
"error": "File not found: 77914809189__571E73A4-D2E8-4B00-934C-5B2E54DE47A4.MOV"
},
"Q3": {
"question": "What information is found in the appliance manuals?",
"transcribed_answer": "It's including product identifications, safety warnings and precautions step by step operating instructions, installation and assembly instructions. My tenants got lined, troubleshooting tips, technical specifications and warranty information.",
"written_summary": "Fitting instructions and requirements",
"assessment": {
"word_count": 28,
"has_substantial_content": true,
"keyword_relevance": 0.25,
"transcription_summary_match": 0.03571428571428571
},
"transcription_details": {
"text": "It's including product identifications, safety warnings and precautions step by step operating instructions, installation and assembly instructions. My tenants got lined, troubleshooting tips, technical specifications and warranty information.",
"language": "en",
"segments": [
{
"id": 0,
"seek": 0,
"start": 0.0,
"end": 7.0,
"text": " It's including product identifications, safety warnings and precautions",
"tokens": [
50364,
467,
311,
3009,
1674,
2473,
7833,
11,
4514,
30009,
293,
34684,
50714
],
"temperature": 0.0,
"avg_logprob": -0.3135071884502064,
"compression_ratio": 1.5576923076923077,
"no_speech_prob": 0.02320166677236557
},
{
"id": 1,
"seek": 0,
"start": 7.0,
"end": 13.0,
"text": " step by step operating instructions, installation and assembly instructions.",
"tokens": [
50714,
1823,
538,
1823,
7447,
9415,
11,
13260,
293,
12103,
9415,
13,
51014
],
"temperature": 0.0,
"avg_logprob": -0.3135071884502064,
"compression_ratio": 1.5576923076923077,
"no_speech_prob": 0.02320166677236557
},
{
"id": 2,
"seek": 0,
"start": 13.0,
"end": 21.0,
"text": " My tenants got lined, troubleshooting tips, technical specifications and warranty information.",
"tokens": [
51014,
1222,
31216,
658,
17189,
11,
15379,
47011,
6082,
11,
6191,
29448,
293,
26852,
1589,
13,
51414
],
"temperature": 0.0,
"avg_logprob": -0.3135071884502064,
"compression_ratio": 1.5576923076923077,
"no_speech_prob": 0.02320166677236557
}
]
}
}
}
Loading