feat: support qwen_2_5_omni fine-tuning #169

Gaiejj · 2025-03-26T18:15:16Z

Description

🎉! We supported the SFT training of Qwen2.5-Omni within 1 hour! Here are the specific training screenshots👇

Test

Please test your changes by running the following command:

cd scripts
bash test/test_text_to_text.sh ./opt PATH_TO_OUTPUT_ROOT_DIR

Here, ./opt is the directory containing the test scripts for the opt model, and PATH_TO_OUTPUT_ROOT_DIR is the path to the output root directory. The test scripts will save ~1GB data to the output root directory and delete it after the test. Please ensure you have enough space on your disk.

Lint

Please run the following command in the root directory to check your code style:

pip install pre-commit
pre-commit run --all-files

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide. (required)
My change requires a change to the documentation.
I have updated the tests accordingly. (required for a bug fix or a new feature)
I have updated the documentation accordingly.

deyituo · 2025-03-27T06:36:03Z

可以支持t2s吗

Gaiejj · 2025-03-27T09:03:05Z

Talker模块我们正在加紧研究，今明应该可以弄好text-audio输入的微调～

DQYZHWK · 2025-03-27T09:15:03Z

有计划支持三模态的全量微调吗（文本system prompt，图片，语音指令）

Gaiejj · 2025-03-27T11:08:34Z

@DQYZHWK 这个事情我们非常感兴趣做，但是苦于没有相应的数据，您有参考不

Alex-Songs · 2025-03-27T12:34:06Z

@Gaiejj 请问支持音频和图片一起训练吗，就是一个batch里既有语音又有图片这种

DQYZHWK · 2025-03-27T13:19:53Z

@DQYZHWK 这个事情我们非常感兴趣做，但是苦于没有相应的数据，您有参考不
很抱歉，我没有相关的数据集。
https://mp.weixin.qq.com/s/hJ5x8xUstBjwNZc1mmqE-g
但是可以参考这篇文章，您可以使用VQA数据集通过tts (chattts,fishspeech)转化成SQA数据集。期待未来能集成此demo。

Gaiejj · 2025-03-27T13:53:23Z

@DQYZHWK @Alex-Songs 感谢推荐，我们近期会尝试尝试这种三模态微调！

zuitbjc1096 · 2025-03-28T09:59:07Z

请问, 微调qwen2.5-omni的脚本被移除了吗?

Gaiejj · 2025-03-28T11:23:02Z

@zuitbjc1096 您好，代码在这里：https://github.com/Gaiejj/align-anything/tree/dev-omni

Alex-Songs · 2025-03-29T13:28:50Z

@Gaiejj 好像qwen2.5-omni用的transformers库加了个tp_plan，需要torch>=2.5，目前微调代码也需要torch>=2.5吗？

Gaiejj · 2025-03-31T08:49:31Z

@Alex-Songs 是的，需要遵从Qwen-2.5-Omni的官方依赖～

Alex-Songs · 2025-03-31T13:36:40Z

@Gaiejj 大佬，再问下qwen2.5-omni-7b的权重是thinker.visual.blocks.11.attn.proj.weight，直接加载thinker的话需要改成visual.blocks.11.attn.proj.weight吗？

shanhaidexiamo · 2025-04-02T08:56:06Z

大佬，想咨询下，TMRoPE 没有实现的基础上可以直接微调这个模型吗？谢谢

liu6381810 · 2025-04-02T13:34:37Z

hello大佬，我看了下qwen2.5-omni的code，如果需要训练talker，构造训练数据时需要语音tokenizer先对语音数据做tokenize转成语音的codec id，但是它似乎没开源语音tokenizer，想问下你们这里是怎么处理的

Gaiejj · 2025-04-02T14:14:23Z

其实不是大佬orz，最近实现的时候也遇到了这些问题，感觉 @Alex-Songs 的说法是对的，我们之后有进展了会第一时间在这里更新～

sky1170447398 · 2025-04-07T13:34:48Z

@Alex-Songs 是的，需要遵从Qwen-2.5-Omni的官方依赖～

tp_plan这个参数导致pretrain model的时候会报错“raise NotImplementedError("This model does not have a tensor parallel plan.")”请问有遇到过吗

Gaiejj · 2025-04-07T16:00:49Z

请问有复现指南吗，我们可以帮忙解决一下

jiahui-w · 2025-04-08T03:18:58Z

请问现在是否支持视频+音频（视频里的音频）+prompt的微调呢，我看官方代码里面给的是图片+prompt嘞，非常感谢

Kingdroper · 2025-04-08T09:20:51Z

Talker模块我们正在加紧研究，今明应该可以弄好text-audio输入的微调～

talker模块现在可以支持更换音色吗比如用一些其他的音色微调

zzchust · 2025-04-10T06:55:55Z

mark

SeungyounShin · 2025-04-15T05:28:49Z

Like @liu6381810 mentioned, I faced the same issue and posted about it [here](https://huggingface.co/Qwen/Qwen2.5-Omni-7B/discussions/40) for the author's attention. However, there might be a reason why they haven't disclosed their speech tokenizer. Given that, I'm not currently expecting them to release it. It seems we'll likely need to train the talker component from scratch using our own voice data.

pjgao · 2025-04-28T04:56:06Z

请问talker部分的微调有进展吗

dongkeun-livetoon · 2025-04-30T07:26:18Z

CosyVoice also uses a speech tokenizer architecture. Maybe we can refer to it.

wwfcnu · 2025-05-20T07:43:01Z

这个现在是支持（system prompt +文本指令+语音 -->text)的微调吗

wwfcnu · 2025-05-20T07:55:02Z

Talker模块我们正在加紧研究，今明应该可以弄好text-audio输入的微调～

我看代码里只有text-image输入的微调

candle1220 · 2025-07-04T11:15:43Z

Talker模块我们正在加紧研究，今明应该可以弄好text-audio输入的微调～

代码里面仍然没有关于 audio 的微调

Gaiejj · 2025-07-04T16:21:00Z

Hey all! We sincerely apologize for our initial misestimation of the progress timeline and the delayed response! During this period, we attempted to fine-tune the text-to-audio-to-text and text-to-audio functionalities. However, due to the exceptionally advanced architecture of qwen2.5-omni, our academic team lacked the necessary engineering expertise, which resulted in the abnormally poor performance of the trained models. This is the primary reason for our prolonged silence.

We are continuing our efforts and will promptly report any breakthroughs. We also welcome community contributions through implementation references, which we will integrate into align-anything.

Once again, our deepest apologies.

Gaiejj added 2 commits March 27, 2025 02:10

feat: support qwen_2_5_omni fine-tuning

b3e703a

wip

b355309

Gaiejj mentioned this pull request Mar 27, 2025

[Fine-tuning Code] Here is an implementation 👋 ! QwenLM/Qwen2.5-Omni#12

Closed

Gaiejj closed this Mar 31, 2025

Gaiejj deleted the dev-omni branch March 31, 2025 08:48

Gaiejj restored the dev-omni branch March 31, 2025 08:49

Gaiejj reopened this Mar 31, 2025

feat: support qwen_2_5_omni fine-tuning #169

Are you sure you want to change the base?

feat: support qwen_2_5_omni fine-tuning #169

Conversation

Gaiejj commented Mar 26, 2025

Description

Test

Lint

Types of changes

Checklist

Uh oh!

deyituo commented Mar 27, 2025

Uh oh!

Gaiejj commented Mar 27, 2025

Uh oh!

DQYZHWK commented Mar 27, 2025

Uh oh!

Gaiejj commented Mar 27, 2025

Uh oh!

Alex-Songs commented Mar 27, 2025

Uh oh!

DQYZHWK commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Gaiejj commented Mar 27, 2025

Uh oh!

zuitbjc1096 commented Mar 28, 2025

Uh oh!

Gaiejj commented Mar 28, 2025

Uh oh!

Alex-Songs commented Mar 29, 2025

Uh oh!

Gaiejj commented Mar 31, 2025

Uh oh!

Alex-Songs commented Mar 31, 2025

Uh oh!

shanhaidexiamo commented Apr 2, 2025

Uh oh!

liu6381810 commented Apr 2, 2025

Uh oh!

Gaiejj commented Apr 2, 2025

Uh oh!

sky1170447398 commented Apr 7, 2025

Uh oh!

Gaiejj commented Apr 7, 2025

Uh oh!

jiahui-w commented Apr 8, 2025

Uh oh!

Kingdroper commented Apr 8, 2025

Uh oh!

zzchust commented Apr 10, 2025

Uh oh!

SeungyounShin commented Apr 15, 2025

Uh oh!

pjgao commented Apr 28, 2025

Uh oh!

dongkeun-livetoon commented Apr 30, 2025

Uh oh!

wwfcnu commented May 20, 2025

Uh oh!

wwfcnu commented May 20, 2025

Uh oh!

candle1220 commented Jul 4, 2025

Uh oh!

Gaiejj commented Jul 4, 2025

Uh oh!

Uh oh!

DQYZHWK commented Mar 27, 2025 •

edited

Loading