voice transcription #3381

nornagon-openai · 2025-09-09T19:07:49Z

Adds voice transcription on press-and-hold of spacebar.

Screen.Recording.2025-09-19.at.12.24.02.PM.mov

- Hold Space on empty composer to record; release to transcribe - Block input and show 'Recording' hint while capturing - Send audio to OpenAI Whisper (whisper-1) via reqwest multipart - Resolve API key via codex_login auth (no env var read) - Insert transcription into composer Add cpal + hound deps for audio capture + WAV encoding.

- Insert atomic textarea element when transcription starts - Keep textarea fully editable; element moves with edits - Replace element by id when Whisper result returns; fallback insert at cursor - Add element id support to TextArea (named elements + replace by id) - Switch to AppEvent::TranscriptionComplete(id, text)

- Add AppEvent::TranscriptionFailed { id, error } - On error, delete the placeholder element; leave editor state intact - Fix voice thread to send failure event with correct id - Keep success path replacing placeholder by id

…ng' on release - Insert named 'recording' element at start of capture - On stop, change the same element text to 'transcribing' and send audio - Remove footer 'Recording' hint

- Add TextArea::update_named_element_by_id to preserve element id - On PageDown release, update existing element text to 'transcribing' - Final transcription replaces element with plain text; errors delete it - Route keys while recording; stop on Release or next key

- Use webrtc-vad to detect voiced frames (10ms) - Aggressive mode + 200ms padding to avoid clipping - Downmix to mono, resample to supported rates - Trim leading/trailing silence before upload - Skip upload and remove placeholder if no speech - Add webrtc-vad dependency to TUI

Fix push-to-talk voice mode where PageDown release didn't trigger transcription because Release events were filtered at the app layer. Now all key events are forwarded, allowing the composer to stop recording on release and send audio for transcription immediately.

- Short-clip handling: remove placeholder without transcribing when <1s - Hold-to-talk: start immediately on empty textarea; skip space + delay - Disable VAD trimming; always send full clip - Add live recording meter with adaptive gain and compression - Animate via new AppEvent::RecordingMeter and in-place updates - Use atomic peak from audio callback to avoid blocking audio thread - Normalize audio (peak with headroom) before WAV upload - History nav: trigger on Press/Repeat only - Hide cursor while recording - Meter UI: 12-char sparkline, scrolling left, no label

- Remove unused functions (to_mono_i16, resample_linear_i16, detect_voiced_bounds_webrtc) - Prune unused imports (std::convert::TryFrom, webrtc-vad types) - Remove webrtc-vad from tui/Cargo.toml - Delete unused local in recording meter task No behavior change; voice still records and transcribes full clip. Ran fmt/fix and tests for codex-tui.

- Remove AppEvent::SpaceHoldTimeout and app/chatwidget/bottom_pane handlers - Manage 500ms hold via tokio::spawn that flips an atomic flag - Convert to recording on next input event when flag is observed Behavior: identical in typical terminals; on non-repeat terminals, starts on next key event after timeout.

…repeats - Drop id from hold state and conversions - Spawn tokio task that flips atomic flag and schedules a frame - Process conversion in a new pre_draw_tick called before rendering - Pass FrameRequester into ChatComposer; update tests accordingly No AppEvent used for timeout; behavior now independent of key repeat.

…tick - Remove key-event path for timeout processing; rely on frame scheduled by timer - Keep local tokio task + atomic flag approach; fewer code paths All tests pass.

- Replace static "transcribing" with animated braille spinner frames via RecordingMeter updates - Spinner auto-stops after max duration or when placeholder is replaced/removed All TUI tests pass.

- Insert a named element containing a space on Space press - On release or cancel, replace the element with a plain space - On timeout, remove the element and begin recording Keeps behavior while simplifying state (no index math). All tests pass.

- Add stop_recording_and_start_transcription() and call from handle_key_event - Keeps behavior; improves readability and testability All TUI tests pass.

- Add start_recording_with_placeholder() and reuse for empty-text space press and hold-timeout - Keeps behavior; consolidates meter placeholder + spawn logic All TUI tests pass.

…lean up on drop - Maintain stop flags for spinner tasks; stop on replace/remove or when update fails - Implement Drop for ChatComposer to stop spinners and end capture on teardown - Make RecordingMeter path schedule a frame only when update applied This avoids runaway spinner tasks across UI changes (e.g., NewSession). All tests pass.

…ance and 60s cap - Remove explicit spinner stop flags and stop calls - Spinner tasks auto-expire after 60s; UI ignores updates once placeholder is gone - Keep Drop minimal: stop capture and clear placeholder All TUI tests pass.

…isappearance and 60s cap" This reverts commit 5461929.

- Add ChatComposer helpers (ta_* wrappers) that auto-sync popups after text changes - Use wrappers for programmatic edits (placeholders, spinner frames, space-hold element) - Remove scattered manual sync calls accordingly All TUI tests pass.

…y paths - Revert to direct TextArea calls - Ensure sync_command_popup/sync_file_search_popup are called in event handlers and key paths - Keep on-space-hold timeout and recording flows consistent All TUI tests pass.

- Centralize sync in handle_key_event end; for early-return branches, perform sync then return - Remove ad-hoc syncs added inside match branches now covered by centralized sync All TUI tests pass.

- Add ChatComposer::sync_popups() to unify command/file popup updates - Call sync_popups after key events; remove scattered explicit sync calls - BottomPane now triggers sync_popups after events (key, paste, inserts, pre-draw, history, transcription) - Keeps behavior consistent and simplifies control flow; tests and snapshots pass

- ChatComposer now syncs popups after key handling; remove extra syncs in BottomPane - Keep centralized sync on paste/insert/transcription/history/pre-draw only - No behavior change; reduces duplicate work in key path

Add apt-get update before installing musl tools and ALSA libraries in the CI and release workflows so Ubuntu runners have a fresh package index and dependencies available. Co-Authored-By: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>

nornagon-openai · 2025-09-19T19:25:25Z

@codex review

chatgpt-codex-connector

Codex Review: Here are some suggestions.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

codex-rs/tui/Cargo.toml

ChrisWiles · 2025-10-13T21:18:46Z

Thanks! this is going to be great feature

nornagon-openai added 30 commits August 14, 2025 21:28

tui: remove transcribing placeholder on error

b455e9b

- Add AppEvent::TranscriptionFailed { id, error } - On error, delete the placeholder element; leave editor state intact - Fix voice thread to send failure event with correct id - Keep success path replacing placeholder by id

key is pgdn

062e9f0

tui: show in-text placeholder during recording; update to 'transcribi…

7abde77

…ng' on release - Insert named 'recording' element at start of capture - On stop, change the same element text to 'transcribing' and send audio - Remove footer 'Recording' hint

fix lint

e887b06

Merge remote-tracking branch 'origin/main' into nornagon/voice-mode

1dc9065

fix auth

6e892e9

fix rendering

785b5e1

space hold

2accebe

tui(voice): simplify hold logic by handling timeout only in pre-draw …

e68b934

…tick - Remove key-event path for timeout processing; rely on frame scheduled by timer - Keep local tokio task + atomic flag approach; fewer code paths All tests pass.

tui(voice): animate transcribing with braille spinner

f0481fa

- Replace static "transcribing" with animated braille spinner frames via RecordingMeter updates - Spinner auto-stops after max duration or when placeholder is replaced/removed All TUI tests pass.

tui(voice): extract end-of-recording logic into helper

88f0145

- Add stop_recording_and_start_transcription() and call from handle_key_event - Keeps behavior; improves readability and testability All TUI tests pass.

tui(voice): extract start-recording logic into helper

2c7a8eb

- Add start_recording_with_placeholder() and reuse for empty-text space press and hold-timeout - Keeps behavior; consolidates meter placeholder + spawn logic All TUI tests pass.

Revert "tui(voice): simplify spinner lifecycle; rely on placeholder d…

643d707

…isappearance and 60s cap" This reverts commit 5461929.

tui: ensure popup sync runs for all key paths; remove mid-function syncs

456d786

- Centralize sync in handle_key_event end; for early-return branches, perform sync then return - Remove ad-hoc syncs added inside match branches now covered by centralized sync All TUI tests pass.

helpers

b01d34f

nornagon-openai and others added 18 commits August 22, 2025 17:13

tui: avoid redundant popup sync on key events

88144a9

- ChatComposer now syncs popups after key handling; remove extra syncs in BottomPane - Keep centralized sync on paste/insert/transcription/history/pre-draw only - No behavior change; reduces duplicate work in key path

Merge origin/main into nornagon/voice-mode

61ad57f

fix

50e06d1

Merge remote-tracking branch 'origin/main' into nornagon/voice-mode

ed00930

fix

cd61a85

Merge remote-tracking branch 'origin/main' into nornagon/voice-mode

bb48056

clippy

a80a954

update ci

b89601d

Merge remote-tracking branch 'origin/main' into nornagon/voice-mode

3894273

fix

5504a37

Merge remote-tracking branch 'origin/main' into nornagon/voice-mode

7518edf

fix

4e466e9

guard

2214968

guard2

bdd9c8c

fix

83473b3

more

c4cfd5b

cleanup

1eb57d8

nornagon-openai changed the title ~~add voice mode using gpt-4o-transcribe~~ voice transcription Sep 19, 2025

nornagon-openai added 2 commits September 19, 2025 11:38

reduce size of working indicator; add trace

59f782a

prompt for transcribe

dc029ef

nornagon-openai marked this pull request as ready for review September 19, 2025 19:23

nornagon-openai requested review from aibrahim-oai and easong-openai September 19, 2025 19:24

chatgpt-codex-connector bot reviewed Sep 19, 2025

View reviewed changes

codex-rs/tui/Cargo.toml Outdated Show resolved Hide resolved

nornagon-openai added 2 commits October 17, 2025 12:24

Merge remote-tracking branch 'origin/main' into nornagon/voice-mode

18532ff

Merge remote-tracking branch 'origin/main' into nornagon/voice-mode

41d18ce

etraut-openai added the oai-pr PRs posted by Codex team members label Oct 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

voice transcription #3381

voice transcription #3381

nornagon-openai commented Sep 9, 2025 •

edited

Loading

Uh oh!

nornagon-openai commented Sep 19, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

ChrisWiles commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

voice transcription #3381

Are you sure you want to change the base?

voice transcription #3381

Conversation

nornagon-openai commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nornagon-openai commented Sep 19, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ChrisWiles commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nornagon-openai commented Sep 9, 2025 •

edited

Loading