Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .claude/rules/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,8 @@

Additional modules:
screenshot.py ─ Terminal text → PNG rendering (ANSI color, font fallback)
transcribe.py ─ Voice-to-text transcription via OpenAI API (gpt-4o-transcribe)
transcribe.py ─ Voice-to-text: local Whisper (faster-whisper + CTranslate2 + CUDA) + OpenAI API fallback
tts.py ─ Text-to-speech: edge-tts (Microsoft Edge neural voices) → OGG voice messages to Telegram
main.py ─ CLI entry point
utils.py ─ Shared utilities (ccbot_dir, atomic_write_json)

Expand Down Expand Up @@ -97,6 +98,8 @@ State files (~/.ccbot/ or $CCBOT_DIR/):
- **Tool use ↔ tool result pairing** — `tool_use_id` tracked across poll cycles; tool result edits the original tool_use Telegram message in-place.
- **MarkdownV2 with fallback** — All messages go through `safe_reply`/`safe_edit`/`safe_send` which convert via `telegramify-markdown` and fall back to plain text on parse failure.
- **No truncation at parse layer** — Full content preserved; splitting at send layer respects Telegram's 4096 char limit with expandable quote atomicity.
- **Local STT with API fallback** — Voice messages transcribed via faster-whisper (CTranslate2 + CUDA, model loaded lazily and resident). Falls back to OpenAI gpt-4o-transcribe API on failure if `OPENAI_API_KEY` is set. Engine selection via `CCBOT_STT_ENGINE` env var.
- **TTS voice responses** — Final assistant messages sent as Telegram voice notes via edge-tts (Microsoft Edge neural voices). Per-user toggle via `/voice` command. Text always sent first; audio appended after. Configurable voice and global auto-enable via `CCBOT_TTS_VOICE` / `CCBOT_TTS_AUTO`.
- Only sessions registered in `session_map.json` (via hook) are monitored.
- Notifications delivered to users via thread bindings (topic → window_id → session).
- **Startup re-resolution** — Window IDs reset on tmux server restart. On startup, `resolve_stale_ids()` matches persisted display names against live windows to re-map IDs. Old state.json files keyed by window name are auto-migrated.
4 changes: 3 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

ccmux — Telegram bot that bridges Telegram Forum topics to Claude Code sessions via tmux windows. Each topic is bound to one tmux window running one Claude Code instance.

Tech stack: Python, python-telegram-bot, tmux, uv.
Tech stack: Python, python-telegram-bot, tmux, uv, faster-whisper (CTranslate2 + CUDA), edge-tts (TTS).

## Common Commands

Expand All @@ -23,6 +23,8 @@ ccbot hook --install # Auto-install Claude Code SessionStart ho
- **Hook-based session tracking** — `SessionStart` hook writes `session_map.json`; monitor polls it to detect session changes.
- **Message queue per user** — FIFO ordering, message merging (3800 char limit), tool_use/tool_result pairing.
- **Rate limiting** — `AIORateLimiter(max_retries=5)` on the Application (30/s global). On restart, the global bucket is pre-filled to avoid burst against Telegram's server-side counter.
- **Local STT** — Voice messages transcribed via faster-whisper (CTranslate2 + CUDA) by default. OpenAI API as fallback. Model loaded lazily on first voice message, stays resident.
- **TTS** — Responses sent as Telegram voice messages via edge-tts (Microsoft Edge neural voices). Per-user toggle via `/voice` command. Configurable voice and auto-enable via env vars.

## Code Conventions

Expand Down
44 changes: 40 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ In fact, CCBot itself was built this way — iterating on itself through Claude
- **Topic-based sessions** — Each Telegram topic maps 1:1 to a tmux window and Claude session
- **Real-time notifications** — Get Telegram messages for assistant responses, thinking content, tool use/result, and local command output
- **Interactive UI** — Navigate AskUserQuestion, ExitPlanMode, and Permission Prompts via inline keyboard
- **Voice messages** — Voice messages are transcribed via OpenAI and forwarded as text
- **Voice messages** — Voice messages are transcribed locally via Whisper (faster-whisper + CUDA) and forwarded as text. OpenAI API available as fallback.
- **Send messages** — Forward text to Claude Code via tmux keystrokes
- **Slash command forwarding** — Send any `/command` directly to Claude Code (e.g. `/clear`, `/compact`, `/cost`)
- **Create new sessions** — Start Claude Code sessions from Telegram via directory browser
Expand Down Expand Up @@ -95,8 +95,15 @@ ALLOWED_USERS=your_telegram_user_id
| `CLAUDE_COMMAND` | `claude` | Command to run in new windows |
| `MONITOR_POLL_INTERVAL` | `2.0` | Polling interval in seconds |
| `CCBOT_SHOW_HIDDEN_DIRS` | `false` | Show hidden (dot) directories in directory browser |
| `OPENAI_API_KEY` | _(none)_ | OpenAI API key for voice message transcription |
| `CCBOT_STT_ENGINE` | `whisper` | STT engine: `whisper` (local, CUDA) or `openai` (API) |
| `CCBOT_WHISPER_MODEL` | `large-v3` | Whisper model size (`tiny`, `base`, `small`, `medium`, `large-v3`, `large-v3-turbo`) |
| `CCBOT_WHISPER_DEVICE` | `cuda` | Compute device: `cuda` or `cpu` |
| `CCBOT_WHISPER_COMPUTE_TYPE` | `float16` | Compute precision: `float16` (GPU), `int8` (GPU, less VRAM), `int8_float16` (balanced) |
| `OPENAI_API_KEY` | _(none)_ | OpenAI API key (used when `CCBOT_STT_ENGINE=openai` or as whisper fallback) |
| `OPENAI_BASE_URL` | `https://api.openai.com/v1` | OpenAI API base URL (for proxies or compatible APIs) |
| `CCBOT_TTS_ENABLED` | `true` | Enable TTS (text-to-speech) voice message responses |
| `CCBOT_TTS_AUTO` | `false` | Auto-enable TTS for all users (per-user toggle via `/voice`) |
| `CCBOT_TTS_VOICE` | `es-ES-ElviraNeural` | Edge TTS voice name (run `edge-tts --list-voices` for options) |

Message formatting is always HTML via `chatgpt-md-converter` (`chatgpt_md_converter` package).
There is no runtime formatter switch to MarkdownV2.
Expand Down Expand Up @@ -151,6 +158,8 @@ uv run ccbot
| `/history` | Message history for this topic |
| `/screenshot` | Capture terminal screenshot |
| `/esc` | Send Escape to interrupt Claude |
| `/voice` | Toggle TTS voice message responses |
| `/unbind` | Unbind topic from session (window stays alive) |

**Claude Code commands (forwarded via tmux):**

Expand Down Expand Up @@ -178,7 +187,34 @@ Any unrecognized `/command` is also forwarded to Claude Code as-is (e.g. `/revie

**Sending messages:**

Once a topic is bound to a session, just send text or voice messages in that topic — text gets forwarded to Claude Code via tmux keystrokes, and voice messages are automatically transcribed and forwarded as text.
Once a topic is bound to a session, just send text or voice messages in that topic — text gets forwarded to Claude Code via tmux keystrokes, and voice messages are automatically transcribed (locally via Whisper by default) and forwarded as text.

### Voice Messages (STT)

CCBot uses [faster-whisper](https://github.com/Sybren/faster-whisper) with CTranslate2 for **local, GPU-accelerated** speech-to-text. No API key required.

**How it works:**
1. You send a voice message in a Telegram topic
2. The bot downloads the OGG audio (in-memory, never written to disk permanently)
3. faster-whisper transcribes it on the local GPU (CUDA)
4. The transcribed text is forwarded to Claude Code via tmux

**Supported models** (set via `CCBOT_WHISPER_MODEL`):

| Model | Params | VRAM (float16) | Speed | Accuracy |
|-------|--------|----------------|-------|----------|
| `tiny` | 39M | ~1 GB | Fastest | Basic |
| `base` | 74M | ~1 GB | Very fast | Good |
| `small` | 244M | ~2 GB | Fast | Good |
| `medium` | 769M | ~5 GB | Moderate | Very good |
| `large-v3` | 1550M | ~10 GB | Moderate | Best |
| `large-v3-turbo` | 809M | ~3 GB | Fast | Near-best |

The default `large-v3` provides the best accuracy. Use `large-v3-turbo` for a good balance of speed and accuracy with less VRAM usage. The model is downloaded once from HuggingFace Hub and cached locally.

**Fallback:** If local Whisper fails and `OPENAI_API_KEY` is set, CCBot automatically falls back to OpenAI's `gpt-4o-transcribe` API.

**VRAM note:** The Whisper model stays loaded in GPU memory after the first voice message. This uses ~3-4 GB VRAM with `large-v3` at `float16`. If GPU memory is limited, use a smaller model or `int8` compute type.

**Killing a session:**

Expand Down Expand Up @@ -261,7 +297,7 @@ src/ccbot/
├── terminal_parser.py # Terminal pane parsing (interactive UI + status line)
├── html_converter.py # Markdown → Telegram HTML conversion + HTML-aware splitting
├── screenshot.py # Terminal text → PNG image with ANSI color support
├── transcribe.py # Voice-to-text transcription via OpenAI API
├── transcribe.py # Voice-to-text: local Whisper (CTranslate2+CUDA) + OpenAI fallback
├── utils.py # Shared utilities (atomic JSON writes, JSONL helpers)
├── tmux_manager.py # Tmux window management (list, create, send keys, kill)
├── fonts/ # Bundled fonts for screenshot rendering
Expand Down
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ dependencies = [
"Pillow>=10.0.0",
"aiofiles>=24.0.0",
"telegramify-markdown>=0.5.0,<1.0.0",
"faster-whisper>=1.2.1",
"edge-tts>=7.2.8",
]

[project.scripts]
Expand Down
142 changes: 139 additions & 3 deletions src/ccbot/bot.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@
from .tmux_manager import tmux_manager
from .transcribe import close_client as close_transcribe_client
from .transcribe import transcribe_voice
from .tts import get_voice, is_tts_enabled, set_voice, toggle_tts
from .utils import ccbot_dir

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -277,6 +278,134 @@ async def unbind_command(update: Update, context: ContextTypes.DEFAULT_TYPE) ->
)


async def voice_command(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
"""Toggle TTS or change voice.

Usage:
/voice — Toggle TTS on/off
/voice <name> — Set voice (e.g. /voice es-AR-ElenaNeural)
"""
user = update.effective_user
if not user or not is_user_allowed(user.id):
return
if not update.message:
return

if not config.tts_enabled:
await safe_reply(update.message, "❌ TTS is disabled globally (CCBOT_TTS_ENABLED=false).")
return

args = context.args if context.args else []

# /voice <name> — set voice (auto-enable TTS)
if args:
voice_name = args[0]
try:
set_voice(user.id, voice_name)
except ValueError as e:
await safe_reply(update.message, f"❌ {e}\nUse /voices to see available voices.")
return
if not is_tts_enabled(user.id):
toggle_tts(user.id)
await safe_reply(
update.message,
f"🔊 Voice set to `{voice_name}` — TTS ON\n"
"Use /voices to see available options.",
)
return

# /voice — toggle
new_state = toggle_tts(user.id)
status = "ON" if new_state else "OFF"
voice_name = get_voice(user.id)
await safe_reply(
update.message,
f"🔊 TTS {status} (voice: {voice_name})\n"
"Use /voice <name> to change voice, /voices to list options.",
)


async def voices_command(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
"""List available TTS voices.

Usage:
/voices — Compact index of all locales with voice counts
/voices <locale> — All voices for a locale (e.g. /voices es, /voices en)
"""
user = update.effective_user
if not user or not is_user_allowed(user.id):
return
if not update.message:
return

args = context.args if context.args else []
locale_filter = args[0].lower() if args else ""

try:
import edge_tts

all_voices = await edge_tts.list_voices()

if locale_filter:
# Detect if user used /voices instead of /voice to set a voice
if any(c.isupper() for c in locale_filter):
await safe_reply(
update.message,
f"💡 Did you mean `/voice {locale_filter}`?\n\n"
"/voice — Set a voice (also toggles TTS on)\n"
"/voices — List available voices",
)
return

filtered = [v for v in all_voices if v["Locale"].lower().startswith(locale_filter)]
if not filtered:
await safe_reply(
update.message,
f"❌ No voices found for '{locale_filter}'.\n"
"Use /voices to see available locales.",
)
return
lines = []
current = get_voice(user.id)
for v in sorted(filtered, key=lambda x: (x["Locale"], x["ShortName"])):
gender = "♂" if v["Gender"] == "Male" else "♀"
tag = " ★" if v["ShortName"] == current else ""
lines.append(f"{gender} `{v['ShortName']}` — {v['Locale']}{tag}")
header = f"🗣 {locale_filter} — {len(lines)} voices\n\n"
else:
from collections import Counter

locale_counts = Counter(v["Locale"] for v in all_voices)
locale_flags = {
"ar": "🇸🇦", "bg": "🇧🇬", "cs": "🇨🇿", "da": "🇩🇰", "de": "🇩🇪",
"el": "🇬🇷", "en": "🇬🇧", "es": "🇪🇸", "et": "🇪🇪", "fi": "🇫🇮",
"fr": "🇫🇷", "he": "🇮🇱", "hi": "🇮🇳", "hr": "🇭🇷", "hu": "🇭🇺",
"id": "🇮🇩", "it": "🇮🇹", "ja": "🇯🇵", "ko": "🇰🇷", "lt": "🇱🇹",
"lv": "🇱🇻", "ms": "🇲🇾", "nl": "🇳🇱", "no": "🇳🇴", "pl": "🇵🇱",
"pt": "🇧🇷", "ro": "🇷🇴", "ru": "🇷🇺", "sk": "🇸🇰", "sl": "🇸🇮",
"sv": "🇸🇪", "th": "🇹🇭", "tr": "🇹🇷", "uk": "🇺🇦", "vi": "🇻🇳",
"zh": "🇨🇳",
}
lines = []
for locale, count in sorted(locale_counts.items()):
prefix = locale.split("-")[0]
flag = locale_flags.get(prefix, "🌐")
lines.append(f"{flag} `{locale}` — {count} voices")
header = f"🗣 Available locales ({len(locale_counts)}):\n\n"

await safe_reply(update.message, header + "\n".join(lines))
except Exception as e:
err = str(e)
if "503" in err or "Service Unavailable" in err:
await safe_reply(
update.message,
"⚠ Microsoft TTS service is temporarily unavailable (503).\n"
"Try again in a few seconds.",
)
else:
await safe_reply(update.message, f"❌ Failed to list voices: {e}")


async def esc_command(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
"""Send Escape key to interrupt Claude."""
user = update.effective_user
Expand Down Expand Up @@ -642,11 +771,14 @@ async def voice_handler(update: Update, context: ContextTypes.DEFAULT_TYPE) -> N
if not update.message or not update.message.voice:
return

if not config.openai_api_key:
stt_available = (
config.stt_engine == "whisper" or config.openai_api_key
)
if not stt_available:
await safe_reply(
update.message,
"⚠ Voice transcription requires an OpenAI API key.\n"
"Set `OPENAI_API_KEY` in your `.env` file and restart the bot.",
"⚠ No STT backend available.\n"
"Set CCBOT_STT_ENGINE=whisper (local) or OPENAI_API_KEY (API) in .env.",
)
return

Expand Down Expand Up @@ -1792,6 +1924,8 @@ async def handle_new_message(msg: NewMessage, bot: Bot) -> None:
text=msg.text,
thread_id=thread_id,
image_data=msg.image_data,
role=msg.role,
is_complete=msg.is_complete,
)

# Update user's read offset to current file position
Expand Down Expand Up @@ -1895,6 +2029,8 @@ def create_bot() -> Application:
application.add_handler(CommandHandler("screenshot", screenshot_command))
application.add_handler(CommandHandler("esc", esc_command))
application.add_handler(CommandHandler("unbind", unbind_command))
application.add_handler(CommandHandler("voice", voice_command))
application.add_handler(CommandHandler("voices", voices_command))
application.add_handler(CommandHandler("usage", usage_command))
application.add_handler(CallbackQueryHandler(callback_handler))
# Topic closed event — auto-kill associated window
Expand Down
25 changes: 24 additions & 1 deletion src/ccbot/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,12 +101,35 @@ def __init__(self) -> None:
os.getenv("CCBOT_SHOW_HIDDEN_DIRS", "").lower() == "true"
)

# OpenAI API for voice message transcription (optional)
# STT engine: "whisper" (local, default) or "openai" (API)
self.stt_engine: str = os.getenv("CCBOT_STT_ENGINE", "whisper")
# Whisper config (local STT via faster-whisper + CTranslate2 + CUDA)
self.whisper_model: str = os.getenv("CCBOT_WHISPER_MODEL", "large-v3")
self.whisper_device: str = os.getenv("CCBOT_WHISPER_DEVICE", "cuda")
self.whisper_compute_type: str = os.getenv(
"CCBOT_WHISPER_COMPUTE_TYPE", "float16"
)
# OpenAI API for voice transcription (fallback when stt_engine=openai)
self.openai_api_key: str = os.getenv("OPENAI_API_KEY", "")
self.openai_base_url: str = os.getenv(
"OPENAI_BASE_URL", "https://api.openai.com/v1"
)

# TTS (Text-to-Speech) via edge-tts (Microsoft Edge neural voices)
self.tts_enabled: bool = os.getenv("CCBOT_TTS_ENABLED", "true").lower() in (
"true",
"1",
"yes",
)
self.tts_auto: bool = os.getenv("CCBOT_TTS_AUTO", "false").lower() in (
"true",
"1",
"yes",
)
self.tts_voice: str = os.getenv(
"CCBOT_TTS_VOICE", "es-ES-ElviraNeural"
)

# Scrub sensitive vars from os.environ so child processes never inherit them.
# Values are already captured in Config attributes above.
for var in SENSITIVE_ENV_VARS:
Expand Down
Loading
Loading