Skip to content

[Bug] sequence parallel in distillation_pipeline #884

@zengyh1900

Description

@zengyh1900

Describe the bug

Hi devs,

Thanks for your great work! I'd like to propose a request regarding the design of sequence parallelism in the distillation pipeline. As shown in distillation_pipeline:

        if self.sp_world_size > 1:
            noise = rearrange(noise,
                              "b (n t) c h w -> b n t c h w",
                              n=self.sp_world_size).contiguous()
            noise = noise[:, self.rank_in_sp_group, :, :, :, :]

This requires frame_num to be divisible by sp_world_size. However, sp_world_size is often equal to the number of GPUs, and frame_num is typically 21 in the Self Forcing framework, which could cause errors in this section.

I think a more practical approach would be to divide the token_length (i.e., t*h*w//(vae_stride**2)) instead of frame_num.

Reproduction

bash distill/SFWan2.1-T2V/distill_dmd_t2v_1.3B.sh

Environment

8*GPU
Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions