-
Notifications
You must be signed in to change notification settings - Fork 422
Description
i dont know if im right, but building with:
-DSD_HIPBLAS=ON
for me flux chroma Q8_0 runs slow ~38s/it, 832x1216, dont laugh.
all fine cant run the model in f16 because of VRAM only 16GB
clip on gpu no issue all quants i.O
get_learned_condition completed, taking 1428 ms (nice)
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Pro WX 9100, gfx900:xnack- (0x900), VMM: no, Wave Size: 64
but if i build with:
-DSD_HIPBLAS=ON -DGGML_CUDA_FORCE_CUBLAS=ON
the inference gets about 40% faster ~21s/it, same settings but black picture.
clip on cpu fixes the issue
get_learned_condition completed, taking 45811 ms, again dont laugh it a stoneage CPU (bad)
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: yes
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Pro WX 9100, gfx900:xnack- (0x900), VMM: no, Wave Size: 64
the information about i got from this table
https://github.com/user-attachments/files/18832391/MI60-MI25-MI100_MAT_MUL.xlsx
ggml-org/llama.cpp#11931
what is exact what i could replicate.
my idea maybe its possible to deactivate cublas for t5xxl or other text encoders if HIPBLAS=ON
maybe someone else tries to test on different rocm platforms and gpu's or maybe also for nvidia if its not only a amd issue