webgpu: split transpose perm{2310} in two steps #26573

xhcao · 2025-11-14T08:59:05Z

In order to use transpose-shared instead transpose-naive, we could split transpose perm{2310} in two steps, which benifits Conv operator.

Description

Motivation and Context

In order to use transpose-shared instead transpose-naive, we could split transpose perm{2310} in two steps, which benifits Conv operator.

xhcao · 2025-11-14T09:02:16Z

The PR gets performance on sdunet-v1.5-demo-layernorm model, all Conv|Transpose time is from 224ms to 135ms

xhcao · 2025-11-14T09:02:40Z

@jchen10 @daijh PTAL

jchen10 · 2025-11-14T13:31:34Z

Looks great. As we discussed in #26554 (comment), we are going to cache the transposed kernel. This PR could be less beneficial for Conv|Transpose. Maybe we could find other place to apply this optimization later.

webgpu: split transpose perm{2310} in two steps

930bed3

In order to use transpose-shared instead transpose-naive, we could split transpose perm{2310} in two steps, which benifits Conv operator.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

webgpu: split transpose perm{2310} in two steps #26573

webgpu: split transpose perm{2310} in two steps #26573

xhcao commented Nov 14, 2025

Uh oh!

xhcao commented Nov 14, 2025

Uh oh!

xhcao commented Nov 14, 2025

Uh oh!

jchen10 commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

webgpu: split transpose perm{2310} in two steps #26573

Are you sure you want to change the base?

webgpu: split transpose perm{2310} in two steps #26573

Conversation

xhcao commented Nov 14, 2025

Description

Motivation and Context

Uh oh!

xhcao commented Nov 14, 2025

Uh oh!

xhcao commented Nov 14, 2025

Uh oh!

jchen10 commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants