-
Notifications
You must be signed in to change notification settings - Fork 529
Added Radeon RX 9000 GPUs and Zen 5 9000 CPUs #1453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Not sure how to distinguish between iGPU and CPU figures for processors with an integrated GPU, but I think we can go with this for now.
"Ryzen Zen5 9000 (Ryzen 9)": { | ||
tflops: 0.56, | ||
}, | ||
"Ryzen Zen5 9000 (Ryzen 7)": { | ||
tflops: 0.56, | ||
}, | ||
"Ryzen Zen5 9000 (Ryzen 5)": { | ||
tflops: 0.56, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These correspond to the integrated GPU, not the CPU; but I guess it's ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no reliable proxy for CPU performance that I can think of if we want to equate them to GPUs. INT8 / INT4 performance would have been a better metric, but then we'd end up with two different quantities. Best to leave CPUs out of the question.
Also, another problem I noticed: all Ryzen 9s are clubbed together, while chips like the Ryzen 9 9900X and the Ryzen 9 9950X3D have large differences in their performance. Here, I tested the cheapest and the costliest Ryzen 9, for example:
DeBERTa inference:
- Ryzen 9 9900X: 0.7947s
- Ryzen 9 9950X3D: 0.6674s (+19%)
Granite Vision 3.2 2B 4-bit inference (image+text input):
- Ryzen 9 9900X: 32.4109s
- Ryzen 9 9950X3D: 30.5195s (+6.2%)
In pure text input, the 9950X3D did 28.01 tok/s with this model. That's already 1/5th of what the RTX 5060 Ti 16 GB did (137.89 tok/s) in the same test, a card we are currently allocating 42.32x compute power by going for TFLOPS.
Hence, the 0.56 TFLOPS number for all Ryzen 9000 and 7000 series CPUs is indeed not fair or comparable with GPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, those tokens/second are probably coming from CPU inference, that's why I think it's interesting to use CPU numbers if we can (it's what people will use for ML inference). The problem is that CPU manufacturers don't usually publish tflops. We can compute them by taking into account the number of cores, the clock frequency and the number of "representative" operations (multiply-add, for example) per clock cycle. The latter, however, is difficult to find and requires diving into tech sheets, see this comment for an example.
I would suggest we merge this PR and revisit the CPU numbers you reported in a follow-up. Personally, I'm happy to include an estimate based on comparative performance if we can't find anything else!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that sounds good to me. We can visit this in a different conversation. Do let me know the best medium for the exchange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A draft PR or an issue is good imo! 🤗
Added Radeon RX 9000 series GPUs to packages/tasks.src/hardware.ts
Added Zen 5 9000 Ryzen series CPUs to packaged/tasks/src/hardware.ts
Source for TFLOPS count: https://www.cpu-monkey.com/en/cpu-amd_ryzen_9_9950x3d