paraformer-large-vad-punc_asr_nat-zh 模型 时间戳 不准确

时间戳结束时间大部分停在语音片段上，而不是静音处，如图：

<img width="465" alt="image" src="https://github.com/alibaba-damo-academy/FunASR/assets/18050469/732cde96-edbe-427a-90d0-f66d26166e4e">

使用模型： damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch
调用方式： modelscope pipeline
测试音频：[test-funasr.wav.zip](https://github.com/alibaba-damo-academy/FunASR/files/13863290/test-funasr.wav.zip)

> update: 是时间戳不准，不是 vad 不准

试图修改 `~/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/vad.yaml` 中的两个参数
- speech_noise_thres: 改到 -1 
- max end silence ：增加到 6s
但问题没有改善，大部分句子结尾时间戳依然停在 语音上，而不是在静音处

```
VAD常用参数调整说明(参考:/vad.yaml文件)
尾部连续检测到多长时间静音进行尾点判停，参数范围500ms~6000ms，默认值800ms(该值过低容易出现语音提前max end silence time:
截断的情况)。
speech_noise_thres:speech的得分减去noise的得分大于此值则判断为speech，参数范围:(-1,1)取值越趋于-1，噪音被误判定为语音的概率越大，FA越高
。取值越趋于+1，语音被误判定为噪音的概率越大，Pmiss越高
。通常情况下，该值会根据当前模型在长语音测试集上的效果取balance
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

paraformer-large-vad-punc_asr_nat-zh 模型时间戳不准确 #1226

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

paraformer-large-vad-punc_asr_nat-zh 模型 时间戳 不准确 #1226

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

paraformer-large-vad-punc_asr_nat-zh 模型时间戳不准确 #1226