v1.2.0 — Speaker diarization, VAD segmentation, setup command by skitsanos · Pull Request #1 · transcriptintel/transcribeit

skitsanos · 2026-03-16T12:29:15Z

Summary

Major feature release bringing speaker diarization, VAD-based segmentation, self-bootstrapping setup, and dependency upgrades.

New features

Speaker diarization via sherpa-onnx C API (pyannote segmentation + speaker embeddings). --speakers N labels transcript segments with speaker identity in VTT, SRT, and manifest.
VAD-based segmentation using Silero VAD — speech-aware chunking that avoids mid-word cuts. Replaces FFmpeg silencedetect when --vad-model is set.
transcribeit setup command — self-bootstrapping CLI that downloads all components (models, VAD, diarization models, sherpa-onnx shared libraries) with platform auto-detection.
Auto-detect model architectures — sherpa-onnx engine detects Whisper, Moonshine, and SenseVoice from model directory contents.
BSL 1.1 license — free for non-commercial/evaluation use, commercial license required for production.

Improvements

Dependency upgrades: whisper-rs 0.16, reqwest 0.13, indicatif 0.18, bzip2 0.6
Fixed whisper-rs set_detect_language(true) bug causing empty transcripts
sherpa-onnx is now an optional feature flag (--no-default-features to exclude)
C++ stderr suppression for sherpa-onnx warnings
Code review fixes: dedup retry loops, static regex, negative timestamp guard
download-model extended with --vad and --diarize flags
Default output format changed to VTT, added -f short flag

Test results

28 tests passing
Zero clippy warnings
Both feature configurations build clean
Full 31-minute medical interview transcribed successfully (7.5x realtime with large-v3-turbo)

Test plan

cargo fmt -- --check passes
cargo clippy -- -W clippy::all passes
cargo test — 28 tests pass
cargo build --no-default-features builds
Tested on 5min, 10min, and 31min audio samples
Diarization tested with 2-speaker interview
VAD segmentation tested vs FFmpeg silencedetect
transcribeit setup tested (all components)

Two-pass pipeline: whisper.cpp transcribes, then sherpa-onnx diarizes the same audio and assigns speaker labels by timestamp overlap. - Raw FFI bindings to sherpa-onnx offline speaker diarization C API (not yet exposed by the sherpa-onnx Rust crate) - Dedicated worker thread for diarization (C types are !Send/!Sync) - CLI: --speakers N --diarize-segmentation-model --diarize-embedding-model - Env vars: DIARIZE_SEGMENTATION_MODEL, DIARIZE_EMBEDDING_MODEL - Speaker labels in VTT (<v Speaker 0>), SRT ([Speaker 0]), and manifest JSON - Segment struct gains optional speaker field - Gated behind sherpa-onnx feature flag

VAD segmentation via Silero VAD (sherpa-onnx): - Detects speech boundaries instead of silence dB thresholds - 250ms padding protects word boundaries from clipping - Merges chunks separated by <200ms gaps - Splits long chunks at lowest-energy points (not arbitrary positions) - Use --vad-model path/to/silero_vad.onnx to enable - Falls back to FFmpeg silencedetect when no VAD model Dependency upgrades: - whisper-rs 0.12 → 0.16 (iterator API, updated log callback) - reqwest 0.12 → 0.13 - indicatif 0.17 → 0.18 - bzip2 0.5 → 0.6 (pure Rust) Comprehensive docs update for VAD, diarization, and env vars.

transcribeit setup — downloads all components for full functionality: - models: default GGML base model from HuggingFace - vad: Silero VAD model (~628KB) for speech-aware segmentation - diarize: pyannote segmentation + wespeaker embedding models - sherpa-libs: platform-specific sherpa-onnx shared libraries (auto-detects macOS/Linux x64/ARM64) Selective install: transcribeit setup -c vad Extended download-model: --vad and --diarize flags Prints env var summary at the end showing what to add to .env. All downloads are idempotent (skip if already present).

Business Source License 1.1: - Free for non-commercial and evaluation use - Commercial/production use requires a separate license - Converts to Apache 2.0 on 2030-03-16 All dependencies verified compatible (MIT, Apache-2.0, BSD, ISC, Unlicense — no GPL/copyleft).

….16 API

skitsanos added 5 commits March 16, 2026 08:50

chore: Bump version to 1.2.0, fix cargo license warning, whisper-rs 0…

9857591

….16 API

skitsanos merged commit c6d60d9 into main Mar 16, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.2.0 — Speaker diarization, VAD segmentation, setup command#1

v1.2.0 — Speaker diarization, VAD segmentation, setup command#1
skitsanos merged 5 commits intomainfrom
develop

skitsanos commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

skitsanos commented Mar 16, 2026

Summary

New features

Improvements

Test results

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant