-
Notifications
You must be signed in to change notification settings - Fork 44
Add epilogue subtiling #948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
stack-info: PR: #948, branch: PaulZhang12/stack/14
cf439ac
to
fcc7492
Compare
stack-info: PR: #948, branch: PaulZhang12/stack/14
fcc7492
to
cdbedf6
Compare
stack-info: PR: #948, branch: PaulZhang12/stack/14
cdbedf6
to
58496fb
Compare
examples/matmul.py
Outdated
config=helion.Config( | ||
block_sizes=[64, 64, 64], | ||
loop_orders=[[0, 1]], | ||
l2_groupings=[4], | ||
range_unroll_factors=[0, 1], | ||
range_num_stages=[0, 3], | ||
range_multi_buffers=[None, False], | ||
range_flattens=[None, None], | ||
num_warps=8, | ||
num_stages=6, | ||
indexing='tensor_descriptor', | ||
pid_type='flat' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you probably dont want to check this in since the best config will depend on the machine
stack-info: PR: #948, branch: PaulZhang12/stack/14
58496fb
to
965b193
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this help with matmul perf?
import re | ||
host_function = HostFunction.current() | ||
block_size_expr = ", ".join(map(self.literal_expr, block_size)) | ||
pattern = r'triton_helpers\.div_floor_integer\(([^,]+),\s*(\d+)\)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yf225 didn't you add something to fix this somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - we have sanitization pass for triton_helpers.*
right now at
helion/helion/_compiler/device_function.py
Lines 519 to 532 in 1aaba3f
if isinstance(value, sympy.Expr): | |
sanitized = value.replace( # pyright: ignore[reportAttributeAccessIssue] | |
lambda node: isinstance(node, sympy.Function) | |
and getattr(node.func, "__name__", "") | |
== "triton_helpers.div_floor_integer", | |
lambda node: sympy.floor(node.args[0] / node.args[1]), # pyright: ignore[reportAttributeAccessIssue] | |
).replace( # pyright: ignore[reportAttributeAccessIssue] | |
lambda node: isinstance(node, sympy.Function) | |
and getattr(node.func, "__name__", "") | |
== "triton_helpers.remainder_integer", | |
lambda node: sympy.Mod(node.args[0], node.args[1]), # pyright: ignore[reportAttributeAccessIssue] | |
) | |
expr = cast("sympy.Expr", sanitized) | |
return HostFunction.current().sympy_expr(expr) |
examples/matmul.py
Outdated
Returns: | ||
Tensor: Resulting matrix of shape [m, n]. | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert unrelated change
|
||
return None | ||
|
||
def _supports_epilogue_subtiling(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be the same as the supports_tensor_descriptor helper we already have?
helion/autotuner/config_spec.py
Outdated
config.setdefault( | ||
"load_eviction_policies", self.load_eviction_policies.default() | ||
) | ||
config.setdefault("epilogue_subtiling", False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a list since we can have multiple stores in the program?
value=store_value, | ||
) | ||
|
||
def _codegen_epilogue_subtile_store( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is subtiling to store() only sufficient? Or do we want to have a graph base that collects any pointwise ops flowing into the store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will want any pointwise ops as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh so basically performing the epilogue on the subtile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly. Epilogue subtiling is to avoid materializing all the registers needed to compute the result over the entire TMEM allocated tile.
stack-info: PR: #948, branch: PaulZhang12/stack/14
965b193
to
2bc36d0
Compare
stack-info: PR: #948, branch: PaulZhang12/stack/14
2bc36d0
to
1c1e282
Compare
Stacked PRs:
Add epilogue subtiling