Skip to content

Conversation

boji123
Copy link
Contributor

@boji123 boji123 commented Sep 20, 2024

c6695df4a89fd9984754a37bba6644f

我是柏基
#379 问题2的解决方案

flowmatching中的z和mu,跨chunk时对于每个index不是定值,是导致衔接处频谱模糊的因素之一(本质是flow的attention context问题,无解)

图中是flow的tts_mel输出,用于对比上下文及频谱模糊的问题
大图1列:不带cache;2列:带cache
小图左:前chunk最后34;中:(前+后)/2;右:后chunk开头34
可以发现带cache的,tts_mel频谱更清晰

*由于后续的mel fade、hifigan cache、speech fade的挽救,该项虽然更本质,但最终听感提升概率较小,多测测的确是有badcase得到改善的

@boji123 boji123 changed the title [feature] support flow cache, for sharper tts_mel output [debug] support flow cache, for sharper tts_mel output Sep 20, 2024
@boji123 boji123 closed this Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant