Skip to content

Different generation with Diffusers in I2V tasks for LTX-video #10565

@Kaihui-Cheng

Description

@Kaihui-Cheng

Describe the bug

Hello, I encountered an issue with the generation when attempting the I2V task using Diffusers. Is there any difference between the diffusers implementation and the LTX-video-inference scripts in the I2V task?

  • The above is the result from the inference.py, and the following is the result generated with diffuser.
  • Prompts: a person
img_to_vid_0_a-person_42_512x512x161_0.mp4
diffusers_512x512_a_person.mp4
diffusers_without_negative_prompt_512x512_a_person.mp4
  • test img
    ref

Besides, it seems that the text prompt has a significant impact on the I2V generation with 'diffusers'. Could I be missing any important arguments?
https://huggingface.co/docs/diffusers/api/pipelines/ltx_video

  • results
demo-A-young-girl-stands-calmly-in-the-foreground.mp4
demo-A-young-girl-stands-calmly.mp4
demo-A-young-girl-stands.mp4
demo-A-young-girl.mp4

Reproduction

python inference.py \
    --ckpt_path ./pretrained_models/LTX-Video \
    --output_path './samples' \
    --prompt "A person." \
    --input_image_path ./samples/test_cases.png \
    --height 512 \
    --width 512 \
    --num_frames 49 \
    --seed 42 
  • for diffuser generation: it seems that the negative prompts are causing the issues. However, even when I remove them, the results are still not satisfactory.
import argparse
import torch
from diffusers import LTXVideoTransformer3DModel
from diffusers import LTXImageToVideoPipeline
from diffusers import FlowMatchEulerDiscreteScheduler, AutoencoderKLLTXVideo
from diffusers.utils import export_to_video, load_image, load_video


from moviepy import VideoFileClip, AudioFileClip
import numpy as np
from pathlib import Path
import os
import imageio
from einops import rearrange
from PIL import Image
import random

def seed_everething(seed: int):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)

def generate_video(args):

    pipe = LTXImageToVideoPipeline.from_pretrained(args.ltx_model_path, torch_dtype=torch.bfloat16)
    pipe.to("cuda")

    negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"

    image = load_image(args.validation_image)
    prompt = "A person."
    negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
    generator = torch.Generator(
        device="cuda" if torch.cuda.is_available() else "cpu"
    ).manual_seed(42)

    video = pipe(
        image=image,
        prompt=prompt,
        guidance_scale=3,
        # stg_scale=1,
        generator=generator,
        callback_on_step_end=None,
        negative_prompt=negative_prompt,
        width=512,
        height=512,
        num_frames=49,
        num_inference_steps=50,
        decode_timestep=0.05,
        decode_noise_scale=0.025,

    ).frames[0]
    export_to_video(video, args.output_file, fps=24)
import torch
from diffusers import LTXImageToVideoPipeline
from diffusers.utils import export_to_video, load_image

pipe = LTXImageToVideoPipeline.from_pretrained("./pretrained_models/LTX-Video", torch_dtype=torch.bfloat16)
pipe.to("cuda")

image = load_image("samples/image.png")
prompt = "A young girl stands."
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"

video = pipe(
    image=image,
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=704,
    height=480,
    num_frames=161,
    num_inference_steps=50,
).frames[0]
modified_prompt = "-".join(prompt.split()[:14])
export_to_video(video, f"samples/test_out/demo-{modified_prompt}.mp4", fps=24)

Logs

System Info

torch 2.4.1
torchao 0.7.0
torchvision 0.19.1
diffusers 0.32.1
python 3.10

Who can help?

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingstaleIssues that haven't received updates

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions