Voxize

Voice-to-text overlay for Linux/Wayland/GNOME.

Important

Actively Evolving.

Voxize is under active development. Pin a commit if you need a stable target. See the implementation journal for progress and design decisions.

Global hotkey opens a translucent overlay that floats above your workspace. Start speaking immediately.
Real-time streaming transcription via the OpenAI Realtime API (gpt-4o-transcribe). Text appears as you talk.
AI text cleanup via GPT-5.4 Mini — fixes spelling, punctuation, and filler words while preserving meaning.
Clipboard output — cleaned text is copied automatically. Raw transcript is preserved in clipboard history as a fallback.
Crash-safe audio — WAV streamed to disk from the first sample. Audio survives process crashes.
Session archive with cost tracking — last 8 sessions stored under ~/.local/state/voxize/ with audio, transcripts, and per-session API costs.

Screenshots

TBD

Requirements

Requirement	Notes
NixOS (tested on 25.11)	NixOS-first — `flake.nix` + `shell.nix` handle all system deps
GNOME / Wayland	Mutter compositor; D-Bus used for focused-window detection
Python >= 3.11	Managed by `uv`
OpenAI API key	Stored in GNOME Keyring via `secret-tool`
`wl-clipboard`	`wl-copy` for clipboard output
PortAudio	Audio capture backend
Window Calls (optional)	GNOME Shell extension for focused-window detection. Required for WHISPER.txt prompt hints; without it, prompt detection is skipped silently

Installation

Warning

There is no packaged install yet. The steps below are the development workflow — expect rough edges. Proper packaging will be considered once the project stabilizes.

Enter the dev shell (pulls all system dependencies):

nix develop

Store your OpenAI API key in the GNOME Keyring:

secret-tool store --label='OpenAI API Key' service openai key api

Run:

uv run python -m voxize

To bind Voxize to a global hotkey (e.g., GNOME Settings > Keyboard > Custom Shortcuts), use the full command which can be invoked from any directory:

nix develop /path/to/voxize --command bash -c "cd /path/to/voxize && uv run python -m voxize"

Tip

The first nix develop invocation evaluates the full shell derivation, which can take several seconds. To avoid this on every hotkey press, use nix-direnv — it caches the evaluated dev shell so subsequent entries are near-instant. If you manage your environment with Home Manager, enable it with programs.direnv.nix-direnv.enable = true. A proper packaging strategy (e.g., a writeShellScript wrapper with pinned dependencies) would eliminate the cold-start cost entirely and is left as an exercise for the reader.

How it works

Voxize runs as a single Python process per invocation. A state machine drives the session through INITIALIZING, RECORDING, CLEANING, and READY. During recording, microphone audio streams over a WebSocket to the OpenAI Realtime API with semantic VAD (low eagerness, tuned for dictation). Transcription deltas appear live in the overlay. On stop, the accumulated transcript is sent to GPT-5.4 Mini for cleanup, which streams corrected text back into the same overlay. The final text is copied to the clipboard. See docs/design.md for the original specification and docs/journal.md for implementation deviations and decisions made during development.

Architecture

Backend modules (state.py, audio.py, transcribe.py, cleanup.py, storage.py, lock.py, clipboard.py, prompt.py) do not import GTK. They use GLib/Gio for thread-safe callbacks and I/O, but no widget code. Only app.py and ui.py touch GTK4. This separation exists as a seam for a future GNOME Shell extension frontend that would spawn the backend as a subprocess and communicate via stdin/stdout JSON Lines. See the architecture decision record for context.

Configuration

WHISPER.txt — Place a WHISPER.txt file in your working directory. On launch, Voxize resolves the focused window's CWD (via the Window Calls extension and /proc) and loads the file as transcription context, improving accuracy for domain-specific vocabulary. There is no upward search — the file must be in the exact directory the focused process is running from.

VOXIZE_AUTOCLOSE — Seconds before the overlay auto-closes in READY state. Default 30. Set to 0 to disable.

Session data — ~/.local/state/voxize/. Each session directory contains audio.wav, transcription.txt, cleaned.txt, ws_events.jsonl, and debug.log.

License

AGPL-3.0

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
docs		docs
src/voxize		src/voxize
.envrc		.envrc
.gitattributes		.gitattributes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
pyproject.toml		pyproject.toml
shell.nix		shell.nix
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voxize

Screenshots

Requirements

Installation

How it works

Architecture

Configuration

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voxize

Screenshots

Requirements

Installation

How it works

Architecture

Configuration

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages