Skip to content

Conversation

@rasmith
Copy link
Contributor

@rasmith rasmith commented Dec 3, 2025

This test skips FLASH_ATTN for test_register_kv_caches since it is not supported on ROCm.

This also updates test_register_kv_caches for TRITON_ATTN which was failing with the following error:

            # Verify get_reg_descs was called with caches_data
            assert mock_wrapper_instance.get_reg_descs.called
            caches_data, _ = mock_wrapper_instance.get_reg_descs.call_args[0]
>           assert len(caches_data) == 4
E           AssertionError: assert 2 == 4
E            +  where 2 = len([(588510336, 32768, 0, ''), (588769344, 32768, 0, '')])

This is because FLASH_MLA and TRITON_MLA use different shapes for KV cache, according to get_kv_cache_shape, in particular, TritonAttentionBackend.get_kv_cache_shape returns something of the form [num_blocks, 2, H, N, D], which causes TpKVTopology to set self._is_kv_layout_blocks_first to True for TRITON_ATTN, but False for FLASH_ATTN.

I adjusted the expected outputs in the test to reflect the expected differences in outputs.

=================

I found a second problem when using ROCM_ATTN instead of FLASH_ATTN on ROCm. When the test runs with the first backend, the _cached_get_attn_backend function was returning the backend from the previous test run (FLASH_ATTN on upstream CI since it runs before the TRITON_ATTN test) and as a result, simply retesting the previous backend.

So, I mocked the get_attn_backend function to return the backend that we want to test

=================

All tests pass now.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the test_register_kv_caches test to correctly handle differences in KV cache shapes between FLASH_ATTN and TRITON_ATTN backends. The changes adjust expected values for tensor sizes, base addresses, and block lengths based on the attention backend being tested. The logic seems correct and addresses the test failure described. I have one suggestion to make the test more robust and readable.

@rasmith rasmith changed the title [CI/Build] Update test_register_kv_caches for TRITON_ATTN [CI/Build][AMD] Update test_register_kv_caches for ROCm and TRITON_ATTN Dec 3, 2025
@rasmith rasmith changed the title [CI/Build][AMD] Update test_register_kv_caches for ROCm and TRITON_ATTN [CI/Build][AMD] Skip FLASH_ATTN test for test_register_kv_caches for ROCm and update test for TRITON_ATTN Dec 3, 2025
@mergify mergify bot added the rocm Related to AMD ROCm label Dec 3, 2025
"FLASH_ATTN",
marks=pytest.mark.skipif(
current_platform.is_rocm(),
reason="Attention backend FLASH_ATTN is not supported on ROCm",
Copy link
Collaborator

@tjtanaa tjtanaa Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try if ROCM_AITER_FA or ROCM_ATTN is suitable to replace FLASH_ATTN?
FLASH_ATTN, ROCM_AITER_FA, andROCM_ATTN have the same kvcache layout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjtanaa So, I was able to do this and it worked! However, it turns out that get_attn_backend was using _cached_get_attn_backend which was returning the backend class from the previous run, so FLASH_ATTN was being retesting during the TRITON_ATTN test run. I mocked get_attn_backend, and now all tests pass and test against the correct backend.

@rasmith rasmith changed the title [CI/Build][AMD] Skip FLASH_ATTN test for test_register_kv_caches for ROCm and update test for TRITON_ATTN [CI/Build][AMD] Use ROCM_ATTN instead of FLASH_ATTN test for test_register_kv_caches for ROCm and update test for TRITON_ATTN Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector rocm Related to AMD ROCm v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants