Maybe T5 model on the CPU black image fix

i dont know if im right, but building with:
-DSD_HIPBLAS=ON
for me flux chroma Q8_0 runs slow ~38s/it, 832x1216, dont laugh.

all fine cant run the model in f16 because of VRAM only 16GB
clip on gpu no issue all quants i.O
get_learned_condition completed, taking 1428 ms (nice)

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon Pro WX 9100, gfx900:xnack- (0x900), VMM: no, Wave Size: 64

but if i build with:
-DSD_HIPBLAS=ON -DGGML_CUDA_FORCE_CUBLAS=ON
the inference gets about 40% faster ~21s/it, same settings but black picture.
clip on cpu fixes the issue
get_learned_condition completed, taking 45811 ms, again dont laugh it a stoneage CPU (bad)

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: yes
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon Pro WX 9100, gfx900:xnack- (0x900), VMM: no, Wave Size: 64

the information about i got from this table 
https://github.com/user-attachments/files/18832391/MI60-MI25-MI100_MAT_MUL.xlsx 
https://github.com/ggml-org/llama.cpp/issues/11931

what is exact what i could replicate.

my idea maybe its possible to deactivate cublas for t5xxl or other text encoders if HIPBLAS=ON

maybe someone else tries to test on different rocm platforms and gpu's or maybe also for nvidia if its not only a amd issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Maybe T5 model on the CPU black image fix #826

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Maybe T5 model on the CPU black image fix #826

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions