Conversation
added 5 commits
March 16, 2026 08:50
Two-pass pipeline: whisper.cpp transcribes, then sherpa-onnx diarizes the same audio and assigns speaker labels by timestamp overlap. - Raw FFI bindings to sherpa-onnx offline speaker diarization C API (not yet exposed by the sherpa-onnx Rust crate) - Dedicated worker thread for diarization (C types are !Send/!Sync) - CLI: --speakers N --diarize-segmentation-model --diarize-embedding-model - Env vars: DIARIZE_SEGMENTATION_MODEL, DIARIZE_EMBEDDING_MODEL - Speaker labels in VTT (<v Speaker 0>), SRT ([Speaker 0]), and manifest JSON - Segment struct gains optional speaker field - Gated behind sherpa-onnx feature flag
VAD segmentation via Silero VAD (sherpa-onnx): - Detects speech boundaries instead of silence dB thresholds - 250ms padding protects word boundaries from clipping - Merges chunks separated by <200ms gaps - Splits long chunks at lowest-energy points (not arbitrary positions) - Use --vad-model path/to/silero_vad.onnx to enable - Falls back to FFmpeg silencedetect when no VAD model Dependency upgrades: - whisper-rs 0.12 → 0.16 (iterator API, updated log callback) - reqwest 0.12 → 0.13 - indicatif 0.17 → 0.18 - bzip2 0.5 → 0.6 (pure Rust) Comprehensive docs update for VAD, diarization, and env vars.
transcribeit setup — downloads all components for full functionality: - models: default GGML base model from HuggingFace - vad: Silero VAD model (~628KB) for speech-aware segmentation - diarize: pyannote segmentation + wespeaker embedding models - sherpa-libs: platform-specific sherpa-onnx shared libraries (auto-detects macOS/Linux x64/ARM64) Selective install: transcribeit setup -c vad Extended download-model: --vad and --diarize flags Prints env var summary at the end showing what to add to .env. All downloads are idempotent (skip if already present).
Business Source License 1.1: - Free for non-commercial and evaluation use - Commercial/production use requires a separate license - Converts to Apache 2.0 on 2030-03-16 All dependencies verified compatible (MIT, Apache-2.0, BSD, ISC, Unlicense — no GPL/copyleft).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Major feature release bringing speaker diarization, VAD-based segmentation, self-bootstrapping setup, and dependency upgrades.
New features
--speakers Nlabels transcript segments with speaker identity in VTT, SRT, and manifest.silencedetectwhen--vad-modelis set.transcribeit setupcommand — self-bootstrapping CLI that downloads all components (models, VAD, diarization models, sherpa-onnx shared libraries) with platform auto-detection.Improvements
set_detect_language(true)bug causing empty transcripts--no-default-featuresto exclude)download-modelextended with--vadand--diarizeflags-fshort flagTest results
Test plan
cargo fmt -- --checkpassescargo clippy -- -W clippy::allpassescargo test— 28 tests passcargo build --no-default-featuresbuildstranscribeit setuptested (all components)