-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Open
Labels
bugSomething isn't workingSomething isn't workingstaleIssues that haven't received updatesIssues that haven't received updates
Description
Describe the bug
Hello, I encountered an issue with the generation when attempting the I2V task using Diffusers. Is there any difference between the diffusers implementation and the LTX-video-inference scripts in the I2V task?
- The above is the result from the
inference.py, and the following is the result generated withdiffuser. - Prompts:
a person
img_to_vid_0_a-person_42_512x512x161_0.mp4
diffusers_512x512_a_person.mp4
diffusers_without_negative_prompt_512x512_a_person.mp4
Besides, it seems that the text prompt has a significant impact on the I2V generation with 'diffusers'. Could I be missing any important arguments?
https://huggingface.co/docs/diffusers/api/pipelines/ltx_video
- results
demo-A-young-girl-stands-calmly-in-the-foreground.mp4
demo-A-young-girl-stands-calmly.mp4
demo-A-young-girl-stands.mp4
demo-A-young-girl.mp4
Reproduction
- for LTX-video generation
https://github.com/Lightricks/LTX-Video/blob/main/inference.py
python inference.py \
--ckpt_path ./pretrained_models/LTX-Video \
--output_path './samples' \
--prompt "A person." \
--input_image_path ./samples/test_cases.png \
--height 512 \
--width 512 \
--num_frames 49 \
--seed 42
- for diffuser generation: it seems that the negative prompts are causing the issues. However, even when I remove them, the results are still not satisfactory.
import argparse
import torch
from diffusers import LTXVideoTransformer3DModel
from diffusers import LTXImageToVideoPipeline
from diffusers import FlowMatchEulerDiscreteScheduler, AutoencoderKLLTXVideo
from diffusers.utils import export_to_video, load_image, load_video
from moviepy import VideoFileClip, AudioFileClip
import numpy as np
from pathlib import Path
import os
import imageio
from einops import rearrange
from PIL import Image
import random
def seed_everething(seed: int):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed(seed)
def generate_video(args):
pipe = LTXImageToVideoPipeline.from_pretrained(args.ltx_model_path, torch_dtype=torch.bfloat16)
pipe.to("cuda")
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
image = load_image(args.validation_image)
prompt = "A person."
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
generator = torch.Generator(
device="cuda" if torch.cuda.is_available() else "cpu"
).manual_seed(42)
video = pipe(
image=image,
prompt=prompt,
guidance_scale=3,
# stg_scale=1,
generator=generator,
callback_on_step_end=None,
negative_prompt=negative_prompt,
width=512,
height=512,
num_frames=49,
num_inference_steps=50,
decode_timestep=0.05,
decode_noise_scale=0.025,
).frames[0]
export_to_video(video, args.output_file, fps=24)
- for demo images with difference text prompts
https://huggingface.co/docs/diffusers/api/pipelines/ltx_video
import torch
from diffusers import LTXImageToVideoPipeline
from diffusers.utils import export_to_video, load_image
pipe = LTXImageToVideoPipeline.from_pretrained("./pretrained_models/LTX-Video", torch_dtype=torch.bfloat16)
pipe.to("cuda")
image = load_image("samples/image.png")
prompt = "A young girl stands."
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
video = pipe(
image=image,
prompt=prompt,
negative_prompt=negative_prompt,
width=704,
height=480,
num_frames=161,
num_inference_steps=50,
).frames[0]
modified_prompt = "-".join(prompt.split()[:14])
export_to_video(video, f"samples/test_out/demo-{modified_prompt}.mp4", fps=24)
Logs
System Info
torch 2.4.1
torchao 0.7.0
torchvision 0.19.1
diffusers 0.32.1
python 3.10
Who can help?
No response
tin2tin
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingstaleIssues that haven't received updatesIssues that haven't received updates
