Skip to content

Conversation

@jianyizh
Copy link

@jianyizh jianyizh commented Dec 8, 2025

I found XPU host is slow on host when creating too many events. It may stuck ~60s after some iteration. I optimize it by using less events. i.e. when I bench rms_norm, the first shape is [2048, 1024], it will bench ~150 kernels with repeat=1000. It will create 300k events. We can reuse events instead of maintaining so many.

Before change:
maintain #kernels * #repeat * 2 events, use 1 synchronize
Example of host time of each iteration when benchmark 200 kernels with repeat = 100:
image

After change:
maintain #kernels * 2 events, use #repeat synchronizes
image

cc: @EikanWang

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 8, 2025
@jianyizh jianyizh marked this pull request as draft December 9, 2025 07:43
@jianyizh jianyizh changed the title [xpu][fix] fix xpu hang by creating less events [WIP][xpu][fix] improve XPU host time in tuning by creating less events Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant