Skip to content

Maybe T5 model on the CPU black image fix #826

@phil2sat

Description

@phil2sat

i dont know if im right, but building with:
-DSD_HIPBLAS=ON
for me flux chroma Q8_0 runs slow ~38s/it, 832x1216, dont laugh.

all fine cant run the model in f16 because of VRAM only 16GB
clip on gpu no issue all quants i.O
get_learned_condition completed, taking 1428 ms (nice)

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Pro WX 9100, gfx900:xnack- (0x900), VMM: no, Wave Size: 64

but if i build with:
-DSD_HIPBLAS=ON -DGGML_CUDA_FORCE_CUBLAS=ON
the inference gets about 40% faster ~21s/it, same settings but black picture.
clip on cpu fixes the issue
get_learned_condition completed, taking 45811 ms, again dont laugh it a stoneage CPU (bad)

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: yes
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Pro WX 9100, gfx900:xnack- (0x900), VMM: no, Wave Size: 64

the information about i got from this table
https://github.com/user-attachments/files/18832391/MI60-MI25-MI100_MAT_MUL.xlsx
ggml-org/llama.cpp#11931

what is exact what i could replicate.

my idea maybe its possible to deactivate cublas for t5xxl or other text encoders if HIPBLAS=ON

maybe someone else tries to test on different rocm platforms and gpu's or maybe also for nvidia if its not only a amd issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions