Efficiency

From CMU -- Language Technologies Institute -- HPC Wiki
Jump to navigation Jump to search

GPU power limits vs. efficiency

Over the last decade or so, there has been several studies regarding setting arbitrary limits on GPU power consumption (possible via nvidia-smi) in order to increase the amount of performance per watt.

The last 5-10% of power fed to a GPU has significantly diminishing returns compared to the first 90%; this set of graphs for A4000s and RTX 3090s shows this general trend.

Limiting maximum GPU power to around 95% and using the additional power overhead to add another GPU is one potential option to increase performance where power-limited, or simply to significantly reduce noise/heat generated by workstations without impacting performance.



Standard power limits in the Wean / Gates computer rooms (as of 2023-07-20)

Our current policy on how many GPUs can go in a rack is 48 per 208V 60A PDU. (The real max is 48A, or 17293W sustained, due to 80% rule)

This makes the general rule of thumb to be around 1 GPU per 360W.

For detailed questions, please contact Kirk Berthold (Computing Operations manager) and/or Junya Inohara (Datacenter Tech Lead).