-
Notifications
You must be signed in to change notification settings - Fork 504
feat: support qwen_2_5_omni fine-tuning #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
可以支持t2s吗 |
Talker模块我们正在加紧研究,今明应该可以弄好text-audio输入的微调~ |
有计划支持三模态的全量微调吗(文本system prompt,图片,语音指令) |
@DQYZHWK 这个事情我们非常感兴趣做,但是苦于没有相应的数据,您有参考不 |
@Gaiejj 请问支持音频和图片一起训练吗,就是一个batch里既有语音又有图片这种 |
|
@DQYZHWK @Alex-Songs 感谢推荐,我们近期会尝试尝试这种三模态微调! |
请问, 微调qwen2.5-omni的脚本被移除了吗? |
@Gaiejj 好像qwen2.5-omni用的transformers库加了个tp_plan,需要torch>=2.5,目前微调代码也需要torch>=2.5吗? |
@Alex-Songs 是的,需要遵从Qwen-2.5-Omni的官方依赖~ |
@Gaiejj 大佬,再问下qwen2.5-omni-7b的权重是thinker.visual.blocks.11.attn.proj.weight,直接加载thinker的话需要改成visual.blocks.11.attn.proj.weight吗? |
大佬,想咨询下,TMRoPE 没有实现的基础上可以直接微调这个模型吗?谢谢 |
hello大佬,我看了下qwen2.5-omni的code,如果需要训练talker,构造训练数据时需要语音tokenizer先对语音数据做tokenize转成语音的codec id,但是它似乎没开源语音tokenizer,想问下你们这里是怎么处理的 |
其实不是大佬orz,最近实现的时候也遇到了这些问题,感觉 @Alex-Songs 的说法是对的,我们之后有进展了会第一时间在这里更新~ |
tp_plan这个参数导致pretrain model的时候会报错“raise NotImplementedError("This model does not have a tensor parallel plan.")”请问有遇到过吗 |
请问有复现指南吗,我们可以帮忙解决一下 |
请问现在是否支持视频+音频(视频里的音频)+prompt的微调呢,我看官方代码里面给的是图片+prompt嘞,非常感谢 |
talker模块现在可以支持更换音色吗 比如用一些其他的音色微调 |
mark |
Like @liu6381810 mentioned, I faced the same issue and posted about it [here](https://huggingface.co/Qwen/Qwen2.5-Omni-7B/discussions/40) for the author's attention. However, there might be a reason why they haven't disclosed their speech tokenizer. Given that, I'm not currently expecting them to release it. It seems we'll likely need to train the |
请问talker部分的微调有进展吗 |
CosyVoice also uses a speech tokenizer architecture. Maybe we can refer to it. |
这个现在是支持(system prompt +文本指令+语音 -->text)的微调吗 |
我看代码里只有text-image输入的微调 |
代码里面仍然没有关于 audio 的微调 |
Hey all! We sincerely apologize for our initial misestimation of the progress timeline and the delayed response! During this period, we attempted to fine-tune the text-to-audio-to-text and text-to-audio functionalities. However, due to the exceptionally advanced architecture of qwen2.5-omni, our academic team lacked the necessary engineering expertise, which resulted in the abnormally poor performance of the trained models. This is the primary reason for our prolonged silence. We are continuing our efforts and will promptly report any breakthroughs. We also welcome community contributions through implementation references, which we will integrate into align-anything. Once again, our deepest apologies. |
Description
🎉! We supported the SFT training of Qwen2.5-Omni within 1 hour! Here are the specific training screenshots👇
Test
Please test your changes by running the following command:
cd scripts bash test/test_text_to_text.sh ./opt PATH_TO_OUTPUT_ROOT_DIR
Here,
./opt
is the directory containing the test scripts for theopt
model, andPATH_TO_OUTPUT_ROOT_DIR
is the path to the output root directory. The test scripts will save ~1GB data to the output root directory and delete it after the test. Please ensure you have enough space on your disk.Lint
Please run the following command in the root directory to check your code style:
Types of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!