-
Notifications
You must be signed in to change notification settings - Fork 5k
voice transcription #3381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
voice transcription #3381
Conversation
- Hold Space on empty composer to record; release to transcribe - Block input and show 'Recording' hint while capturing - Send audio to OpenAI Whisper (whisper-1) via reqwest multipart - Resolve API key via codex_login auth (no env var read) - Insert transcription into composer Add cpal + hound deps for audio capture + WAV encoding.
- Insert atomic textarea element when transcription starts - Keep textarea fully editable; element moves with edits - Replace element by id when Whisper result returns; fallback insert at cursor - Add element id support to TextArea (named elements + replace by id) - Switch to AppEvent::TranscriptionComplete(id, text)
- Add AppEvent::TranscriptionFailed { id, error } - On error, delete the placeholder element; leave editor state intact - Fix voice thread to send failure event with correct id - Keep success path replacing placeholder by id
…ng' on release - Insert named 'recording' element at start of capture - On stop, change the same element text to 'transcribing' and send audio - Remove footer 'Recording' hint
- Add TextArea::update_named_element_by_id to preserve element id - On PageDown release, update existing element text to 'transcribing' - Final transcription replaces element with plain text; errors delete it - Route keys while recording; stop on Release or next key
- Use webrtc-vad to detect voiced frames (10ms) - Aggressive mode + 200ms padding to avoid clipping - Downmix to mono, resample to supported rates - Trim leading/trailing silence before upload - Skip upload and remove placeholder if no speech - Add webrtc-vad dependency to TUI
Fix push-to-talk voice mode where PageDown release didn't trigger transcription because Release events were filtered at the app layer. Now all key events are forwarded, allowing the composer to stop recording on release and send audio for transcription immediately.
- Short-clip handling: remove placeholder without transcribing when <1s - Hold-to-talk: start immediately on empty textarea; skip space + delay - Disable VAD trimming; always send full clip - Add live recording meter with adaptive gain and compression - Animate via new AppEvent::RecordingMeter and in-place updates - Use atomic peak from audio callback to avoid blocking audio thread - Normalize audio (peak with headroom) before WAV upload - History nav: trigger on Press/Repeat only - Hide cursor while recording - Meter UI: 12-char sparkline, scrolling left, no label
- Remove unused functions (to_mono_i16, resample_linear_i16, detect_voiced_bounds_webrtc) - Prune unused imports (std::convert::TryFrom, webrtc-vad types) - Remove webrtc-vad from tui/Cargo.toml - Delete unused local in recording meter task No behavior change; voice still records and transcribes full clip. Ran fmt/fix and tests for codex-tui.
- Remove AppEvent::SpaceHoldTimeout and app/chatwidget/bottom_pane handlers - Manage 500ms hold via tokio::spawn that flips an atomic flag - Convert to recording on next input event when flag is observed Behavior: identical in typical terminals; on non-repeat terminals, starts on next key event after timeout.
…repeats - Drop id from hold state and conversions - Spawn tokio task that flips atomic flag and schedules a frame - Process conversion in a new pre_draw_tick called before rendering - Pass FrameRequester into ChatComposer; update tests accordingly No AppEvent used for timeout; behavior now independent of key repeat.
…tick - Remove key-event path for timeout processing; rely on frame scheduled by timer - Keep local tokio task + atomic flag approach; fewer code paths All tests pass.
- Replace static "transcribing" with animated braille spinner frames via RecordingMeter updates - Spinner auto-stops after max duration or when placeholder is replaced/removed All TUI tests pass.
- Insert a named element containing a space on Space press - On release or cancel, replace the element with a plain space - On timeout, remove the element and begin recording Keeps behavior while simplifying state (no index math). All tests pass.
- Add stop_recording_and_start_transcription() and call from handle_key_event - Keeps behavior; improves readability and testability All TUI tests pass.
- Add start_recording_with_placeholder() and reuse for empty-text space press and hold-timeout - Keeps behavior; consolidates meter placeholder + spawn logic All TUI tests pass.
…lean up on drop - Maintain stop flags for spinner tasks; stop on replace/remove or when update fails - Implement Drop for ChatComposer to stop spinners and end capture on teardown - Make RecordingMeter path schedule a frame only when update applied This avoids runaway spinner tasks across UI changes (e.g., NewSession). All tests pass.
…ance and 60s cap - Remove explicit spinner stop flags and stop calls - Spinner tasks auto-expire after 60s; UI ignores updates once placeholder is gone - Keep Drop minimal: stop capture and clear placeholder All TUI tests pass.
…isappearance and 60s cap" This reverts commit 5461929.
- Add ChatComposer helpers (ta_* wrappers) that auto-sync popups after text changes - Use wrappers for programmatic edits (placeholders, spinner frames, space-hold element) - Remove scattered manual sync calls accordingly All TUI tests pass.
…y paths - Revert to direct TextArea calls - Ensure sync_command_popup/sync_file_search_popup are called in event handlers and key paths - Keep on-space-hold timeout and recording flows consistent All TUI tests pass.
- Centralize sync in handle_key_event end; for early-return branches, perform sync then return - Remove ad-hoc syncs added inside match branches now covered by centralized sync All TUI tests pass.
- Add ChatComposer::sync_popups() to unify command/file popup updates - Call sync_popups after key events; remove scattered explicit sync calls - BottomPane now triggers sync_popups after events (key, paste, inserts, pre-draw, history, transcription) - Keeps behavior consistent and simplifies control flow; tests and snapshots pass
- ChatComposer now syncs popups after key handling; remove extra syncs in BottomPane - Keep centralized sync on paste/insert/transcription/history/pre-draw only - No behavior change; reduces duplicate work in key path
Add apt-get update before installing musl tools and ALSA libraries in the CI and release workflows so Ubuntu runners have a fresh package index and dependencies available. Co-Authored-By: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codex Review: Here are some suggestions.
About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".
Adds voice transcription on press-and-hold of spacebar.
Screen.Recording.2025-09-19.at.12.24.02.PM.mov