You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Base inference with the large 14B Wan 2.1 models can take up to 35GB of VRAM when generating videos at 720p resolution. We'll outline a few memory optimizations we can apply to reduce the VRAM required to run the model.
@@ -323,7 +363,7 @@ import numpy as np
323
363
from diffusers import AutoencoderKLWan, WanTransformer3DModel, WanImageToVideoPipeline
324
364
from diffusers.hooks.group_offloading import apply_group_offloading
325
365
from diffusers.utils import export_to_video, load_image
326
-
from transformers import UMT5EncoderModel, CLIPVisionMode
366
+
from transformers import UMT5EncoderModel, CLIPVisionModel
327
367
328
368
model_id ="Wan-AI/Wan2.1-I2V-14B-720P-Diffusers"
329
369
image_encoder = CLIPVisionModel.from_pretrained(
@@ -356,7 +396,7 @@ prompt = (
356
396
"An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in "
357
397
"the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot."
358
398
)
359
-
negative_prompt ="Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards
399
+
negative_prompt ="Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"
360
400
num_frames =33
361
401
362
402
output = pipe(
@@ -372,7 +412,7 @@ output = pipe(
372
412
export_to_video(output, "wan-i2v.mp4", fps=16)
373
413
```
374
414
375
-
###Using a Custom Scheduler
415
+
## Using a Custom Scheduler
376
416
377
417
Wan can be used with many different schedulers, each with their own benefits regarding speed and generation quality. By default, Wan uses the `UniPCMultistepScheduler(prediction_type="flow_prediction", use_flow_sigmas=True, flow_shift=3.0)` scheduler. You can use a different scheduler as follows:
- Keep `AutencoderKLWan` in `torch.float32` for better decoding quality.
408
448
-`num_frames` should satisfy the following constraint: `(num_frames - 1) % 4 == 0`
409
449
- For smaller resolution videos, try lower values of `shift` (between `2.0` to `5.0`) in the [Scheduler](https://huggingface.co/docs/diffusers/main/en/api/schedulers/flow_match_euler_discrete#diffusers.FlowMatchEulerDiscreteScheduler.shift). For larger resolution videos, try higher values (between `7.0` and `12.0`). The default value is `3.0` for Wan.
Copy file name to clipboardExpand all lines: docs/source/en/installation.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -161,10 +161,10 @@ Your Python environment will find the `main` version of 🤗 Diffusers on the ne
161
161
162
162
Model weights and files are downloaded from the Hub to a cache which is usually your home directory. You can change the cache location by specifying the `HF_HOME` or `HUGGINFACE_HUB_CACHE` environment variables or configuring the `cache_dir` parameter in methods like [`~DiffusionPipeline.from_pretrained`].
163
163
164
-
Cached files allow you to run 🤗 Diffusers offline. To prevent 🤗 Diffusers from connecting to the internet, set the `HF_HUB_OFFLINE` environment variable to `True` and 🤗 Diffusers will only load previously downloaded files in the cache.
164
+
Cached files allow you to run 🤗 Diffusers offline. To prevent 🤗 Diffusers from connecting to the internet, set the `HF_HUB_OFFLINE` environment variable to `1` and 🤗 Diffusers will only load previously downloaded files in the cache.
165
165
166
166
```shell
167
-
export HF_HUB_OFFLINE=True
167
+
export HF_HUB_OFFLINE=1
168
168
```
169
169
170
170
For more details about managing and cleaning the cache, take a look at the [caching](https://huggingface.co/docs/huggingface_hub/guides/manage-cache) guide.
@@ -179,14 +179,16 @@ Telemetry is only sent when loading models and pipelines from the Hub,
179
179
and it is not collected if you're loading local files.
180
180
181
181
We understand that not everyone wants to share additional information,and we respect your privacy.
182
-
You can disable telemetry collection by setting the `DISABLE_TELEMETRY` environment variable from your terminal:
182
+
You can disable telemetry collection by setting the `HF_HUB_DISABLE_TELEMETRY` environment variable from your terminal:
Copy file name to clipboardExpand all lines: docs/source/en/using-diffusers/loading.md
+17Lines changed: 17 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -95,6 +95,23 @@ Use the Space below to gauge a pipeline's memory requirements before you downloa
95
95
></iframe>
96
96
</div>
97
97
98
+
### Specifying Component-Specific Data Types
99
+
100
+
You can customize the data types for individual sub-models by passing a dictionary to the `torch_dtype` parameter. This allows you to load different components of a pipeline in different floating point precisions. For instance, if you want to load the transformer with `torch.bfloat16` and all other components with `torch.float16`, you can pass a dictionary mapping:
Copy file name to clipboardExpand all lines: examples/community/README.md
+101-1Lines changed: 101 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -85,7 +85,7 @@ PIXART-α Controlnet pipeline | Implementation of the controlnet model for pixar
85
85
| Stable Diffusion XL Attentive Eraser Pipeline |[[AAAI2025 Oral] Attentive Eraser](https://github.com/Anonym0u3/AttentiveEraser) is a novel tuning-free method that enhances object removal capabilities in pre-trained diffusion models.|[Stable Diffusion XL Attentive Eraser Pipeline](#stable-diffusion-xl-attentive-eraser-pipeline)|-|[Wenhao Sun](https://github.com/Anonym0u3) and [Benlei Cui](https://github.com/Benny079)|
86
86
| Perturbed-Attention Guidance |StableDiffusionPAGPipeline is a modification of StableDiffusionPipeline to support Perturbed-Attention Guidance (PAG).|[Perturbed-Attention Guidance](#perturbed-attention-guidance)|[Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/perturbed_attention_guidance.ipynb)|[Hyoungwon Cho](https://github.com/HyoungwonCho)|
87
87
| CogVideoX DDIM Inversion Pipeline | Implementation of DDIM inversion and guided attention-based editing denoising process on CogVideoX. |[CogVideoX DDIM Inversion Pipeline](#cogvideox-ddim-inversion-pipeline)| - |[LittleNyima](https://github.com/LittleNyima)|
88
-
88
+
| FaithDiff Stable Diffusion XL Pipeline | Implementation of [(CVPR 2025) FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolutionUnleashing Diffusion Priors for Faithful Image Super-resolution](https://arxiv.org/abs/2411.18824) - FaithDiff is a faithful image super-resolution method that leverages latent diffusion models by actively adapting the diffusion prior and jointly fine-tuning its components (encoder and diffusion model) with an alignment module to ensure high fidelity and structural consistency. |[FaithDiff Stable Diffusion XL Pipeline](#faithdiff-stable-diffusion-xl-pipeline)|[](https://huggingface.co/jychen9811/FaithDiff)|[Junyang Chen, Jinshan Pan, Jiangxin Dong, IMAG Lab, (Adapted by Eliseu Silva)](https://github.com/JyChen9811/FaithDiff)|
89
89
To load a custom pipeline you just need to pass the `custom_pipeline` argument to `DiffusionPipeline`, as one of the files in `diffusers/examples/community`. Feel free to send a PR with your own pipelines, we will merge them quickly.
This the implementation of the FaithDiff pipeline for SDXL, adapted to use the HuggingFace Diffusers.
5342
+
5343
+
For more details see the project links above.
5344
+
5345
+
## Example Usage
5346
+
5347
+
This example upscale and restores a low-quality image. The input image has a resolution of 512x512 and will be upscaled at a scale of 2x, to a final resolution of 1024x1024. It is possible to upscale to a larger scale, but it is recommended that the input image be at least 1024x1024 in these cases. To upscale this image by 4x, for example, it would be recommended to re-input the result into a new 2x processing, thus performing progressive scaling.
5348
+
5349
+
````py
5350
+
import random
5351
+
import numpy as np
5352
+
import torch
5353
+
from diffusers import DiffusionPipeline, AutoencoderKL, UniPCMultistepScheduler
prompt ="The image features a woman in her 55s with blonde hair and a white shirt, smiling at the camera. She appears to be in a good mood and is wearing a white scarf around her neck. "
0 commit comments