Skip to content

Conversation

PaulZhang12
Copy link
Contributor

@PaulZhang12 PaulZhang12 commented Oct 15, 2025

Stacked PRs:


Add epilogue subtiling

PaulZhang12 added a commit that referenced this pull request Oct 15, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 15, 2025
PaulZhang12 added a commit that referenced this pull request Oct 15, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
PaulZhang12 added a commit that referenced this pull request Oct 15, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
Comment on lines 31 to 43
config=helion.Config(
block_sizes=[64, 64, 64],
loop_orders=[[0, 1]],
l2_groupings=[4],
range_unroll_factors=[0, 1],
range_num_stages=[0, 3],
range_multi_buffers=[None, False],
range_flattens=[None, None],
num_warps=8,
num_stages=6,
indexing='tensor_descriptor',
pid_type='flat'
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably dont want to check this in since the best config will depend on the machine

PaulZhang12 added a commit that referenced this pull request Oct 15, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
Copy link
Contributor

@jansel jansel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this help with matmul perf?

import re
host_function = HostFunction.current()
block_size_expr = ", ".join(map(self.literal_expr, block_size))
pattern = r'triton_helpers\.div_floor_integer\(([^,]+),\s*(\d+)\)'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yf225 didn't you add something to fix this somewhere else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - we have sanitization pass for triton_helpers.* right now at

if isinstance(value, sympy.Expr):
sanitized = value.replace( # pyright: ignore[reportAttributeAccessIssue]
lambda node: isinstance(node, sympy.Function)
and getattr(node.func, "__name__", "")
== "triton_helpers.div_floor_integer",
lambda node: sympy.floor(node.args[0] / node.args[1]), # pyright: ignore[reportAttributeAccessIssue]
).replace( # pyright: ignore[reportAttributeAccessIssue]
lambda node: isinstance(node, sympy.Function)
and getattr(node.func, "__name__", "")
== "triton_helpers.remainder_integer",
lambda node: sympy.Mod(node.args[0], node.args[1]), # pyright: ignore[reportAttributeAccessIssue]
)
expr = cast("sympy.Expr", sanitized)
return HostFunction.current().sympy_expr(expr)
for constexpr arg, maybe we can extract a common util function to be used in both sites

Returns:
Tensor: Resulting matrix of shape [m, n].
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert unrelated change


return None

def _supports_epilogue_subtiling():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be the same as the supports_tensor_descriptor helper we already have?

config.setdefault(
"load_eviction_policies", self.load_eviction_policies.default()
)
config.setdefault("epilogue_subtiling", False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a list since we can have multiple stores in the program?

value=store_value,
)

def _codegen_epilogue_subtile_store(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is subtiling to store() only sufficient? Or do we want to have a graph base that collects any pointwise ops flowing into the store?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will want any pointwise ops as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh so basically performing the epilogue on the subtile?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly. Epilogue subtiling is to avoid materializing all the registers needed to compute the result over the entire TMEM allocated tile.

PaulZhang12 added a commit that referenced this pull request Oct 16, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
stack-info: PR: #948, branch: PaulZhang12/stack/14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants