-
Notifications
You must be signed in to change notification settings - Fork 527
Open
Description
In a French audio file transcription, WhisperX engine with the large-v3 model is only returning a partial transcription.
The beginning and the end is missing. When using OpenAI Whisper engine, the problem is not visible.
I've found out a similar issue on WhisperX repository: m-bain/whisperX#764
It seems related to the VAD feature of the WhisperX engine which is stripping off some parts of the audio.
In mbain_whisperx_engine.py file with the following change seems to fix part of the issue :
asr_options = {"without_timestamps": False}
vad_options = {"vad_onset": 0.1, "vad_offset": 0.1}
self.model['whisperx'] = whisperx.load_model(
CONFIG.MODEL_NAME,
device=CONFIG.DEVICE,
compute_type=CONFIG.MODEL_QUANTIZATION,
asr_options=asr_options,
vad_options=vad_options
)
Metadata
Metadata
Assignees
Labels
No labels