Conversation
…kend Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
…eSession Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
…port Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
|
Cool PR! Love the push-to-talk approach over VAD. I'm a big fan of the middleware voice agent approach in general, but I actually use push-to-talk myself when voice coding :D Was just configuring voice a bit more for 11 labs so broke some of the changes here. Have you considered on-device STT instead of OpenAI's API? NVIDIA's Parakeet v3 (0.6B) beats Whisper Large v3 on accuracy and runs fully on-device now. On iOS there's Could start with iOS only and figure out Android later. |
I love this idea, although I'm on Android so I wouldn't be able to test Perhaps we could start with the menu option for users to select their preferred voice mode with options for 11labs vs a local Android STT/TTS (plus PTT button) and add iOS/Parakeet after that? I'll see if I can rework this PR accordingly... |
Keen for feedback on this approach. This is something I've been developing for my own personal use but I'm hoping that at least some of it could be useful for the Happy community.
The basic idea is to add a settings dialogue to allow the user to select between voice backends, and add a new option to make use of OpenAI STT/TTS services to talk directly to Claude. It also adds a "push to talk" button to make this simpler and less prone to VAD errors / being triggered by background speech etc.
This takes a "talk directly to Claude" approach rather that talking to a voice agent that sits between the user and Claude. I found the voice agent approach to be unworkable with GPT 4o because the model is fine tuned to respond to questions and would often get into endless conversations with Claude rather than strictly always relay the users messages. SST/TTS works better in my experience since Claude has all of the project context.
Keen to discuss which parts of this if any we could merge into Happy, and then we can rework into something more solid. This was recently rebased onto the latest main so there may still be some things to fix.