You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/features/low-vram.md
+10-8Lines changed: 10 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -86,24 +86,26 @@ But, if your GPU has enough VRAM to hold models fully, you might get a perf boos
86
86
# As an example, if your system has 32GB of RAM and no other heavy processes, setting the `max_cache_ram_gb` to 28GB
87
87
# might be a good value to achieve aggressive model caching.
88
88
max_cache_ram_gb: 28
89
+
89
90
# The default max cache VRAM size is adjusted dynamically based on the amount of available VRAM (taking into
90
91
# consideration the VRAM used by other processes).
91
-
# You can override the default value by setting `max_cache_vram_gb`. Note that this value takes precedence over the
92
-
# `device_working_mem_gb`.
93
-
# It is recommended to set the VRAM cache size to be as large as possible while leaving enough room for the working
94
-
# memory of the tasks you will be doing. For example, on a 24GB GPU that will be running unquantized FLUX without any
95
-
# auxiliary models, 18GB might be a good value.
96
-
max_cache_vram_gb: 18
92
+
# You can override the default value by setting `max_cache_vram_gb`.
93
+
# CAUTION: Most users should not manually set this value. See warning below.
94
+
max_cache_vram_gb: 16
97
95
```
98
96
99
-
!!! tip "Max safe value for `max_cache_vram_gb`"
97
+
!!! warning "Max safe value for `max_cache_vram_gb`"
98
+
99
+
Most users should not manually configure the `max_cache_vram_gb`. This configuration value takes precedence over the `device_working_mem_gb` and any operations that explicitly reserve additional working memory (e.g. VAE decode). As such, manually configuring it increases the likelihood of encountering out-of-memory errors.
100
100
101
-
To determine the max safe value for `max_cache_vram_gb`, subtract `device_working_mem_gb` from your GPU's VRAM. As described below, the default for `device_working_mem_gb` is 3GB.
101
+
For users who wish to configure `max_cache_vram_gb`, the max safe value can be determined by subtracting `device_working_mem_gb` from your GPU's VRAM. As described below, the default for `device_working_mem_gb` is 3GB.
102
102
103
103
For example, if you have a 12GB GPU, the max safe value for `max_cache_vram_gb` is `12GB - 3GB = 9GB`.
104
104
105
105
If you had increased `device_working_mem_gb` to 4GB, then the max safe value for `max_cache_vram_gb` is `12GB - 4GB = 8GB`.
106
106
107
+
Most users who override `max_cache_vram_gb` are doing so because they wish to use significantly less VRAM, and should be setting `max_cache_vram_gb` to a value significantly less than the 'max safe value'.
108
+
107
109
### Working memory
108
110
109
111
Invoke cannot use _all_ of your VRAM for model caching and loading. It requires some VRAM to use as working memory for various operations.
0 commit comments