Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> For example, when the GPU is fully idle, nvidia-smi tells me that it’s only pulling 88W of power.

I haven't used a non-laptop GPU in some time, but that is a crazy amount of "idle" power consumption. Is this normal for cards like this?

 help



Server cards are not optimized for idle power usage. They’re expected to be fully utilized.

For server gear it’s more common to have less dynamic power and voltage switching because it produces more predictable performance and latency.


For GeForce cards you can get similar behavior by setting “Prefer maximum performance” which disables some of the low power states.

CPU sleep states are normally disabled on servers because CPU demand can increase faster than the CPU speed can increase. It is much better to have a $20,000 CPU running at max MHz at all times.

If my gpu is sitting idle, and I mean idle with nothing loaded into its memory, it's sitting at about 18W. If I load in model that uses nearly all of the memory but that model is idle, it's at 36W. If that model is actively thinking, it's like 118W. I think this is likely due to the GPU being aware that there is real data loaded into memory and turning up the DRAM refresh rate whereas when nothing is loaded, the dynamic power is as low as possible.

Yes, I have some of these cards and AFAICT the HBM2e chips just always run at full speed. I have different variants of the pcie cards and while I can get the gpu itself into a lower power state the memory just runs full tilt. Though I see 40w on my “normal” cards and 60w on the Frankenstein card that thinks it’s an sxm4.

IIRC this was one of the issues with 2/2e, some combination of the various available memory controllers not agreeing on a standard to manage timings and power states. I haven't played around with my Radeon VII in a long while now.

That aside idle power consumption is a driver-to-driver affair from both amd and novideo, sometimes I'm only pulling 15-30W when nothing is happening and other times it decides it needs 110w for a static 500hz screen


I suspect the act of running nvidia-smi itself prevents the GPU from being put into a low-power state.

From memory this is true and nvml (Nvidia management library) is the way to get stats that doesn't cause the GPU to wake.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: