Incomplete transcription of non-English audios (WhisperX)

In a French audio file transcription, WhisperX engine with the large-v3 model is only returning a partial transcription.
The beginning and the end is missing. When using OpenAI Whisper engine, the problem is not visible.

I've found out a similar issue on WhisperX repository: https://github.com/m-bain/whisperX/issues/764

-----

It seems related to the VAD feature of the WhisperX engine which is stripping off some parts of the audio.
In `mbain_whisperx_engine.py` file with the following change seems to fix part of the issue :  
```
        asr_options = {"without_timestamps": False}
        vad_options = {"vad_onset": 0.1, "vad_offset": 0.1}
        self.model['whisperx'] = whisperx.load_model(
            CONFIG.MODEL_NAME,
            device=CONFIG.DEVICE,
            compute_type=CONFIG.MODEL_QUANTIZATION,
            asr_options=asr_options,
            vad_options=vad_options
        )
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incomplete transcription of non-English audios (WhisperX) #340

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Incomplete transcription of non-English audios (WhisperX) #340

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions