You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm looking to build a custom PC to minimize training time for my specific use case.
My current laptop with a Ryzen 9 7940HS CPU and RTX 4060 GPU is surprisingly capable, and can fit 4096 parallel environments (of my model) into the 8GB of available VRAM. It benchmarks well, reaching ~1/2 of the RTX 4090's FPS stated here.
The RTX 4090 has over 5 times the number of CUDA cores compared with the RTX 4060, and there is a similar difference for all other types of cores. It doesn't make sense to me that the RTX 4090 system should only perform twice as fast.
Looking at the posted L40 system benchmarks, training FPS increases ~linearly with the number of GPUs ran in parallel when the number of environments is held constant. On first glance this would imply that FPS is proportional to the number of cores available, but the contrast between the RTX 4090 and 4060 puts this into question.
This raises the question of whether you are better off purchasing many lower cost GPUs which run in parallel vs. one expensive GPU with the same number of total cores. Given the above, 4 RTX 4060s should achieve twice the performance of a single 4090 at half the cost. Or more practically, 2 RTX 4060s should achieve the same performance as a single 4090 at 1/4 the cost.
If you have any thoughts please chime in - these results are certainly unexpected.
I hope this post can help increase productivity with this great software!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi folks,
I'm looking to build a custom PC to minimize training time for my specific use case.
My current laptop with a Ryzen 9 7940HS CPU and RTX 4060 GPU is surprisingly capable, and can fit 4096 parallel environments (of my model) into the 8GB of available VRAM. It benchmarks well, reaching ~1/2 of the RTX 4090's FPS stated here.
The RTX 4090 has over 5 times the number of CUDA cores compared with the RTX 4060, and there is a similar difference for all other types of cores. It doesn't make sense to me that the RTX 4090 system should only perform twice as fast.
Looking at the posted L40 system benchmarks, training FPS increases ~linearly with the number of GPUs ran in parallel when the number of environments is held constant. On first glance this would imply that FPS is proportional to the number of cores available, but the contrast between the RTX 4090 and 4060 puts this into question.
This raises the question of whether you are better off purchasing many lower cost GPUs which run in parallel vs. one expensive GPU with the same number of total cores. Given the above, 4 RTX 4060s should achieve twice the performance of a single 4090 at half the cost. Or more practically, 2 RTX 4060s should achieve the same performance as a single 4090 at 1/4 the cost.
If you have any thoughts please chime in - these results are certainly unexpected.
I hope this post can help increase productivity with this great software!
Beta Was this translation helpful? Give feedback.
All reactions