[sharktank] Refactor mask generation to utils from ops #2496

archana-ramalingam · 2025-10-14T01:45:10Z

Revert the mask generation functions from ops back to utils

into refactor-mask-utils

sharktank/sharktank/utils/attention.py

into refactor-mask-utils

github-actions · 2025-10-16T03:37:27Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
sharktank/sharktank/layers
paged_attention.py
sharktank/sharktank/ops
default_impls.py
signatures.py
sharktank/sharktank/utils
attention.py					124-131, 166-182
sharktank/tests/models/llama
attention_test.py
sharktank/tests/models/llama4
llama4_test.py
Project Total

_{This report was generated by python-coverage-comment-action}

sharktank/tests/models/llama/attention_test.py

sharktank/sharktank/utils/attention.py

sogartar · 2025-10-16T14:47:34Z

sharktank/sharktank/utils/attention.py

+    boolean_mask = torch.logical_or(causal_mask, boolean_input_mask[:, None, None, :])
+    numeric_mask = torch.where(boolean_mask, max_negative_value(dtype, device), 0).to(


If we have torch.logical_or and torch.where implemented as sharktank.ops, the this function does not need to be decorated as trivially replicable. The same argument holds for some other functions in create_attention_mask_for_decode and create_chunked_attention_mask.
We generally want to expand the op converge to all that torch has. Definitely to all that we use.

I see that the problem is create_causal_context_mask. Because it creates un-replicated tensors with

src = torch.arange(src_len, device=device)[None, None, None, :] target = torch.arange(target_len, device=device)[None, None, :, None]

We will keep running into the problem of having to construct something that does not depend on a tensor arg, so there is nothing to propagate the sharding nature from. E.g. construct all-zeros or all-ones tensor. To generalize this approach we need to come up with something. Maybe allow downstream ops to mix sharded and unsharded args.

If we want to write generic model code we need a solution or we will keep dancing around the problem. This can make things quite nasty if you need to modify existing code to create such tensors where sharded tensors would be present. This can happen for example when extending our LLM to support some new architecture.

I have started on this (see #2532 for arange) for a different reason (the current approach introduces transfers that break fusion).

Maybe allow downstream ops to mix sharded and unsharded args.

This approach is already causing issue with elementwise_binary. The transfers break fusion. We can't use it. If we need ShardedTensor inputs we should be making those directly.

Will implement this seperately in a few PRs. I want to land this as is for now since this was supposed to be a simple move of the functions from A to B.

Signed-off-by: Alex Vasile <[email protected]>

into refactor-mask-utils

sharktank/sharktank/utils/attention.py

Refactor and cleanup unused imports in sharktank Dependent on #2496

archana-ramalingam added 2 commits October 14, 2025 01:13

Revert mask functions to utils from ops

4a3ef98

Update tests

fa26a60

archana-ramalingam requested a review from Alex-Vasile October 14, 2025 01:45

archana-ramalingam and others added 3 commits October 13, 2025 18:45

Merge branch 'main' into refactor-mask-utils

3f3e839

Fix tests

e724a47

Merge branch 'refactor-mask-utils' of https://github.com/nod-ai/shark-ai

8bc203a

into refactor-mask-utils

Alex-Vasile reviewed Oct 14, 2025

View reviewed changes

sharktank/sharktank/utils/attention.py Show resolved Hide resolved

Alex-Vasile reviewed Oct 14, 2025

View reviewed changes

sharktank/sharktank/utils/attention.py Outdated Show resolved Hide resolved

archana-ramalingam and others added 8 commits October 14, 2025 17:11

Organize the mask fns

263be91

Merge branch 'main' into refactor-mask-utils

1f83657

Remove TODOs addressed in #2293 & #2430

e1a1400

Merge branch 'refactor-mask-utils' of https://github.com/nod-ai/shark-ai

9de1c1a

into refactor-mask-utils

Make mask functions pipeline parallelism compatible

ea617b7

Allow trivially_replicable to be used outside overriding only ops

9311256

Add issue link to TODOs for chunked attention mask

bbecaa6

Merge branch 'main' into refactor-mask-utils

e42a590

Merge branch 'main' into refactor-mask-utils

351e8c8

Alex-Vasile reviewed Oct 16, 2025

View reviewed changes

sharktank/tests/models/llama/attention_test.py Show resolved Hide resolved

Alex-Vasile reviewed Oct 16, 2025

View reviewed changes

sharktank/sharktank/utils/attention.py Outdated Show resolved Hide resolved

sharktank/sharktank/utils/attention.py Outdated Show resolved Hide resolved

sharktank/sharktank/utils/attention.py Outdated Show resolved Hide resolved

sogartar reviewed Oct 16, 2025

View reviewed changes

Alex-Vasile self-assigned this Oct 20, 2025

Merge remote-tracking branch 'origin/main' into refactor-mask-utils

f5e331f

Signed-off-by: Alex Vasile <[email protected]>

Alex-Vasile force-pushed the refactor-mask-utils branch from fb339cf to f5e331f Compare October 20, 2025 21:02

Remove redundant to(device)

3759635

Signed-off-by: Alex Vasile <[email protected]>

Alex-Vasile approved these changes Oct 20, 2025

View reviewed changes

archana-ramalingam added 2 commits October 20, 2025 21:24

Remove refactors

92acce1

Merge branch 'refactor-mask-utils' of https://github.com/nod-ai/shark-ai

bed907f

into refactor-mask-utils

Alex-Vasile reviewed Oct 20, 2025

View reviewed changes

sharktank/sharktank/utils/attention.py Outdated Show resolved Hide resolved

Alex-Vasile reviewed Oct 20, 2025

View reviewed changes

sharktank/sharktank/utils/attention.py Outdated Show resolved Hide resolved

Alex-Vasile added 2 commits October 20, 2025 18:01

Apply suggestion from @Alex-Vasile

26641ae

Apply suggestion from @Alex-Vasile

64da616

archana-ramalingam mentioned this pull request Oct 20, 2025

[sharktank] Refactor and cleanup unused imports in sharktank #2561

Merged

archana-ramalingam merged commit 137e12c into main Oct 21, 2025
39 checks passed

archana-ramalingam deleted the refactor-mask-utils branch October 21, 2025 02:17

archana-ramalingam added a commit that referenced this pull request Oct 21, 2025

[sharktank] Refactor and cleanup unused imports in sharktank (#2561)

918366b

Refactor and cleanup unused imports in sharktank Dependent on #2496

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[sharktank] Refactor mask generation to utils from ops #2496

[sharktank] Refactor mask generation to utils from ops #2496

Uh oh!

archana-ramalingam commented Oct 14, 2025

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Oct 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sogartar Oct 16, 2025

Uh oh!

Alex-Vasile Oct 16, 2025 •

edited

Loading

Uh oh!

Alex-Vasile Oct 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		boolean_mask = torch.logical_or(causal_mask, boolean_input_mask[:, None, None, :])
		numeric_mask = torch.where(boolean_mask, max_negative_value(dtype, device), 0).to(

[sharktank] Refactor mask generation to utils from ops #2496

[sharktank] Refactor mask generation to utils from ops #2496

Uh oh!

Conversation

archana-ramalingam commented Oct 14, 2025

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sogartar Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Alex-Vasile Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Alex-Vasile Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Oct 16, 2025 •

edited

Loading

Alex-Vasile Oct 16, 2025 •

edited

Loading