Voice-to-text overlay for Linux/Wayland/GNOME.
Important
Actively Evolving.
Voxize is under active development. Pin a commit if you need a stable target. See the implementation journal for progress and design decisions.
- Global hotkey opens a translucent overlay that floats above your workspace. Start speaking immediately.
- Real-time streaming transcription via the OpenAI Realtime API (
gpt-4o-transcribe). Text appears as you talk. - AI text cleanup via GPT-5.4 Mini — fixes spelling, punctuation, and filler words while preserving meaning.
- Clipboard output — cleaned text is copied automatically. Raw transcript is preserved in clipboard history as a fallback.
- Crash-safe audio — WAV streamed to disk from the first sample. Audio survives process crashes.
- Session archive with cost tracking — last 8 sessions stored under
~/.local/state/voxize/with audio, transcripts, and per-session API costs.
TBD
| Requirement | Notes |
|---|---|
| NixOS (tested on 25.11) | NixOS-first — flake.nix + shell.nix handle all system deps |
| GNOME / Wayland | Mutter compositor; D-Bus used for focused-window detection |
| Python >= 3.11 | Managed by uv |
| OpenAI API key | Stored in GNOME Keyring via secret-tool |
wl-clipboard |
wl-copy for clipboard output |
| PortAudio | Audio capture backend |
| Window Calls (optional) | GNOME Shell extension for focused-window detection. Required for WHISPER.txt prompt hints; without it, prompt detection is skipped silently |
Warning
There is no packaged install yet. The steps below are the development workflow — expect rough edges. Proper packaging will be considered once the project stabilizes.
Enter the dev shell (pulls all system dependencies):
nix developStore your OpenAI API key in the GNOME Keyring:
secret-tool store --label='OpenAI API Key' service openai key apiRun:
uv run python -m voxizeTo bind Voxize to a global hotkey (e.g., GNOME Settings > Keyboard > Custom Shortcuts), use the full command which can be invoked from any directory:
nix develop /path/to/voxize --command bash -c "cd /path/to/voxize && uv run python -m voxize"Tip
The first nix develop invocation evaluates the full shell derivation, which can take several seconds. To avoid this on every hotkey press, use nix-direnv — it caches the evaluated dev shell so subsequent entries are near-instant. If you manage your environment with Home Manager, enable it with programs.direnv.nix-direnv.enable = true. A proper packaging strategy (e.g., a writeShellScript wrapper with pinned dependencies) would eliminate the cold-start cost entirely and is left as an exercise for the reader.
Voxize runs as a single Python process per invocation. A state machine drives the session through INITIALIZING, RECORDING, CLEANING, and READY. During recording, microphone audio streams over a WebSocket to the OpenAI Realtime API with semantic VAD (low eagerness, tuned for dictation). Transcription deltas appear live in the overlay. On stop, the accumulated transcript is sent to GPT-5.4 Mini for cleanup, which streams corrected text back into the same overlay. The final text is copied to the clipboard. See docs/design.md for the original specification and docs/journal.md for implementation deviations and decisions made during development.
Backend modules (state.py, audio.py, transcribe.py, cleanup.py, storage.py, lock.py, clipboard.py, prompt.py) do not import GTK. They use GLib/Gio for thread-safe callbacks and I/O, but no widget code. Only app.py and ui.py touch GTK4. This separation exists as a seam for a future GNOME Shell extension frontend that would spawn the backend as a subprocess and communicate via stdin/stdout JSON Lines. See the architecture decision record for context.
WHISPER.txt — Place a WHISPER.txt file in your working directory. On launch, Voxize resolves the focused window's CWD (via the Window Calls extension and /proc) and loads the file as transcription context, improving accuracy for domain-specific vocabulary. There is no upward search — the file must be in the exact directory the focused process is running from.
VOXIZE_AUTOCLOSE — Seconds before the overlay auto-closes in READY state. Default 30. Set to 0 to disable.
Session data — ~/.local/state/voxize/. Each session directory contains audio.wav, transcription.txt, cleaned.txt, ws_events.jsonl, and debug.log.
