-
Notifications
You must be signed in to change notification settings - Fork 108
Description
🚀 Feature
Replace usage of nvFuser's broadcast_in_dim in the executor with broadcast + expand operations.
Motivation
Currently, the Thunder executor uses nvFuser's broadcast_in_dim for broadcasting, as seen here.
Jacob suggested (in this PR review) that using explicit broadcast and expand ops would be preferable. This change would improve the developer experience by creating a closer match between Python-defined fusions and nvFuser's "math/kernel" representation.
Pitch
Update the executor logic to replace the direct usage of broadcast_in_dim with a combination of broadcast and expand operations where applicable.
Alternatives
Continue using broadcast_in_dim as it is currently implemented.
Additional context
- Current implementation:
lightning-thunder/thunder/executors/nvfuserex_impl.py
Lines 1180 to 1189 in 9ca2b0d
def broadcast_in_dim( a: TensorProxy, shape: list[int], broadcast_dimensions: list[int], *, fd: FusionDefinition, lc_to_nv_map: dict ) -> Any: nva = getnv(a, fd, lc_to_nv_map) if any(map(lambda x: isinstance(x, NumberProxy), shape)): nv_shape = getnv(shape, fd, lc_to_nv_map) else: nv_shape = shape return fd.ops.broadcast_in_dim(nva, nv_shape, broadcast_dimensions) - Suggestion source: Remove redundant Set operation from broadcast when no broadcasting occurs NVIDIA/Fuser#5507 (review)
cc @mruberry