Skip to content

Incomplete transcription of non-English audios (WhisperX) #340

@Renrhaf

Description

@Renrhaf

In a French audio file transcription, WhisperX engine with the large-v3 model is only returning a partial transcription.
The beginning and the end is missing. When using OpenAI Whisper engine, the problem is not visible.

I've found out a similar issue on WhisperX repository: m-bain/whisperX#764


It seems related to the VAD feature of the WhisperX engine which is stripping off some parts of the audio.
In mbain_whisperx_engine.py file with the following change seems to fix part of the issue :

        asr_options = {"without_timestamps": False}
        vad_options = {"vad_onset": 0.1, "vad_offset": 0.1}
        self.model['whisperx'] = whisperx.load_model(
            CONFIG.MODEL_NAME,
            device=CONFIG.DEVICE,
            compute_type=CONFIG.MODEL_QUANTIZATION,
            asr_options=asr_options,
            vad_options=vad_options
        )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions