-
Notifications
You must be signed in to change notification settings - Fork 247
[CK TILE] Grouped conv fwd split image #2970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
JH-Leon-KIM-AMD
wants to merge
50
commits into
develop
Choose a base branch
from
LWPCK-3052-grouped-conv-split-image
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,130
−314
Open
Changes from all commits
Commits
Show all changes
50 commits
Select commit
Hold shift + click to select a range
4403d55
Refactor split-image implementation: simplify code and remove redunda…
JH-Leon-KIM-AMD 74a7179
Add padding debug output to split-image implementation
JH-Leon-KIM-AMD e94a485
Fix sign comparison warning after rebase with origin/develop
JH-Leon-KIM-AMD 9b31169
Fix Split-N with groups bug and clean up unused parameters
JH-Leon-KIM-AMD 9c7df2a
Implement recursive queue-based split-image detection and calculation
JH-Leon-KIM-AMD 1f48a5a
WIP: Split-Image investigation - found architecture mismatch
JH-Leon-KIM-AMD 68af577
Add 1D split-image implementation for grouped convolution (N=1 working)
JH-Leon-KIM-AMD b51fedb
Add basic split-image implementation for 1D/2D/3D grouped convolution
JH-Leon-KIM-AMD 2992c0b
Refactor split-image to unified structure for 1D/2D/3D
JH-Leon-KIM-AMD 5024d43
Add safety checks for split-image in all dimensions
JH-Leon-KIM-AMD 51c77f7
Fix Split-N + Split-Image compatibility issue
JH-Leon-KIM-AMD 74671dd
Implement unified threshold for Split-N and Split-Image
JH-Leon-KIM-AMD 49e46de
Comment out outdated split-image code (SplitConvProblem/LaunchKernelW…
JH-Leon-KIM-AMD eebb88d
Implement recursive split-image with depth limit (MAX_DEPTH=10)
JH-Leon-KIM-AMD 73a6adf
Summary of recursive split-image implementation:
JH-Leon-KIM-AMD 34326e2
Add comment explaining MAX_DEPTH capacity for 2GB threshold
JH-Leon-KIM-AMD 414e9a1
Refactor: move recursive split-image logic to transformer
JH-Leon-KIM-AMD 8c6d280
Apply clang-format-18 formatting
JH-Leon-KIM-AMD 54869a3
Fix clang-format-18 issues in forward kernel
JH-Leon-KIM-AMD 08bc24d
Merge branch 'develop' into LWPCK-3052-grouped-conv-split-image
JH-Leon-KIM-AMD e75944e
Update include/ck_tile/ops/grouped_convolution/utils/transform_conv_f…
JH-Leon-KIM-AMD a1f9d7e
Update include/ck_tile/ops/grouped_convolution/utils/transform_conv_f…
JH-Leon-KIM-AMD ca06bfe
Update include/ck_tile/ops/grouped_convolution/kernel/grouped_convolu…
JH-Leon-KIM-AMD 5fbaecf
Update include/ck_tile/ops/grouped_convolution/kernel/grouped_convolu…
JH-Leon-KIM-AMD 0ea5ece
Merge branch 'develop' into LWPCK-3052-grouped-conv-split-image
JH-Leon-KIM-AMD 03e44ee
Merge branch 'develop' into LWPCK-3052-grouped-conv-split-image
JH-Leon-KIM-AMD 58df1f6
Merge develop into LWPCK-3052: Accept universal GEMM pipeline, commen…
JH-Leon-KIM-AMD 89c6c92
Split-Image implementation with temporary fixed divider
JH-Leon-KIM-AMD a779af6
Fix 1D split-image padding issue with per-piece dimensions
JH-Leon-KIM-AMD a7871c8
Fix 2D/3D split-image with independent split factors per dimension
JH-Leon-KIM-AMD c85669e
Remove unused split-image struct fields
JH-Leon-KIM-AMD d48b4ed
Refactor split-image invoker code for improved readability
JH-Leon-KIM-AMD 49b622e
Refactor split-image code and remove debug prints
JH-Leon-KIM-AMD 7db8d77
Add split-image safety constraints and refactor to utils
JH-Leon-KIM-AMD a87da59
Change split-image from runtime to compile-time branching
JH-Leon-KIM-AMD 29fed44
Change split-image to compile-time branching
JH-Leon-KIM-AMD b28ea3c
Add split-image example as separate binary
JH-Leon-KIM-AMD badffd8
Replace linear search with binary search in find_piece_id
JH-Leon-KIM-AMD 91ffc82
Simplify split-image code and fix integer overflow
JH-Leon-KIM-AMD d6184ed
Merge branch 'develop' into LWPCK-3052-grouped-conv-split-image
JH-Leon-KIM-AMD 85c4c58
Trigger CI rerun - fix merge conflicts
JH-Leon-KIM-AMD 61d0e6a
Fix merge conflict markers
JH-Leon-KIM-AMD 02d33c3
Merge branch 'develop' into LWPCK-3052-grouped-conv-split-image
JH-Leon-KIM-AMD b8a94d5
Fix clang-format: remove space before {}
JH-Leon-KIM-AMD c80e237
Fix clang-format: comment wrapping and Swish constructor
JH-Leon-KIM-AMD 387361d
Merge branch 'develop' into LWPCK-3052-grouped-conv-split-image
JH-Leon-KIM-AMD cc7299b
Rename split_image to large_tensor for clarity
JH-Leon-KIM-AMD 8e03659
Update comments and include in large_tensor example
JH-Leon-KIM-AMD 2fbb436
Remove test code, restore 2GB threshold
JH-Leon-KIM-AMD 8558e07
Update include/ck_tile/ops/grouped_convolution/utils/transform_conv_f…
JH-Leon-KIM-AMD File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
63 changes: 63 additions & 0 deletions
63
example/ck_tile/20_grouped_convolution/grouped_convolution_forward_large_tensor.cpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,63 @@ | ||
| // SPDX-License-Identifier: MIT | ||
| // Copyright (c) 2025, Advanced Micro Devices, Inc. All rights reserved. | ||
|
|
||
| // Large tensor grouped convolution example | ||
| // This example demonstrates convolution for large tensors that exceed memory limits. | ||
| // It uses automatic tensor splitting when needed to handle large images. | ||
| // For regular convolution without tensor splitting, use grouped_convolution_forward.cpp | ||
|
|
||
| #include <hip/hip_runtime.h> | ||
|
|
||
| #include <cstring> | ||
| #include <iostream> | ||
| #include <ostream> | ||
| #include <string> | ||
| #include <tuple> | ||
|
|
||
| #include "ck_tile/host.hpp" | ||
| #include "grouped_convolution_utils.hpp" | ||
| #include "grouped_convolution_forward_large_tensor_invoker.hpp" | ||
| #include "run_grouped_convolution_fwd_example.inc" | ||
|
|
||
| template <template <typename PrecType> typename GemmConfig> | ||
| int run_grouped_conv_fwd_example(int argc, char* argv[]) | ||
| { | ||
| using Invoker = GroupedConvolutionForwardInvoker; | ||
|
|
||
| auto [result, arg_parser] = create_args(argc, argv); | ||
| if(!result) | ||
| return -1; | ||
|
|
||
| std::string data_type = arg_parser.get_str("prec"); | ||
| std::string in_layout = arg_parser.get_str("in_layout"); | ||
| std::string wei_layout = arg_parser.get_str("wei_layout"); | ||
| std::string out_layout = arg_parser.get_str("out_layout"); | ||
|
|
||
| if(data_type == "fp16") | ||
| { | ||
| return run_grouped_conv_fwd_example_prec_type<Invoker, | ||
| GemmConfig<ck_tile::half_t>, | ||
| ck_tile::half_t>( | ||
| in_layout, wei_layout, out_layout, argc, argv); | ||
| } | ||
| else if(data_type == "bf16") | ||
| { | ||
| return run_grouped_conv_fwd_example_prec_type<Invoker, | ||
| GemmConfig<ck_tile::bf16_t>, | ||
| ck_tile::bf16_t>( | ||
| in_layout, wei_layout, out_layout, argc, argv); | ||
| } | ||
| else | ||
| { | ||
| throw std::runtime_error("Unsupported data type for this operation !!!"); | ||
| } | ||
| } | ||
|
|
||
| int main(int argc, char* argv[]) | ||
| { | ||
| #if CK_TILE_USE_WMMA | ||
| return !run_grouped_conv_fwd_example<GemmConfigComputeV3_WMMA>(argc, argv); | ||
| #else | ||
| return !run_grouped_conv_fwd_example<GemmConfigComputeV3>(argc, argv); | ||
| #endif | ||
| } |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.