Skip to content

Using the following code, wav can be of any length, but mp3 can only recognize a very short length. What's going on? #247

@jinwater88

Description

@jinwater88

Notice: In order to resolve issues more efficiently, please raise issue following the template.
(注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

❓ Questions and Help

Before asking:

  1. search the issues.
  2. search the docs.

What is your question?

Code

`
from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess
import time
model_dir = "./funasr_models/iic/SenseVoiceSmall"

vad_model_dir = "./funasr_models/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch"

s_time = time.time()
model = AutoModel(
model=model_dir,
trust_remote_code=False,
remote_code="./model.py",
# vad_model=vad_model_dir,
vad_kwargs={"max_single_segment_time": 30000},
device="cuda:0",
)
print(model.model_path)
load_time = time.time()
print(f"模型加载时间: {time.time() - s_time:.2f}秒")

en

input_file = f"{model.model_path}/example/en.mp3"

input_file = f"./data/像我这样的人-毛不易#hxmnf.mp3"
res = model.generate(
input=input_file,
cache={},
language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech"
use_itn=True,
# batch_size_s=60,
merge_vad=True, #
merge_length_s=15,
)
print(res)
text = rich_transcription_postprocess(res[0]["text"])
print(text)
print(f"推理时间: {time.time() - load_time:.2f}秒")
output: funasr version: 1.2.7. Check update of funasr, and it would cost few times. You may disable it by setdisable_update=True` in AutoModel
You are using the latest version of funasr-1.2.7
WARNING:root:trust_remote_code: False
./funasr_models/iic/SenseVoiceSmall
模型加载时间: 4.34秒
rtf_avg: 0.007: 100%|███████████████| 1/1 [00:01<00:00, 1.45s/it]
[{'key': '像我这样的人-毛不易#hxmnf', 'text': '<|zh|><|SAD|><|BGM|><|withitn|>优这样迷茫多少人像我这样孤单的人迷茫的人这样碌碌无为的人过多少人像我这样孤单的人这样不甘平凡的人世界上有多少人这样莫名其妙。'}]
🎼优这样迷茫多少人像我这样孤单的人迷茫的人这样碌碌无为的人过多少人像我这样孤单的人这样不甘平凡的人世界上有多少人这样莫名其妙。😔
推理时间: 1.46秒

What have you tried?

What's your environment?

  • OS (e.g., Linux):ubuntu22.04
  • FunASR Version (e.g., 1.0.0):1.2.7
    funasr-onnx:0.4.1
  • ModelScope Version (e.g., 1.11.0):1.29.1
  • PyTorch Version (e.g., 2.0.0):2.6.0
  • How you installed funasr (pip, source):pip
  • Python version:3.10
  • GPU (e.g., V100M32):RTX4070
  • CUDA/cuDNN version (e.g., cuda11.7):cuda12.4
  • Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
  • Any other relevant information:
    Using the following code, wav can be of any length, but mp3 can only recognize a very short length. What's going on? I converted mp3 to wav and then used the above code to output the same result, and the entire text could not be output.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions