Efficiency

From CMU -- Language Technologies Institute -- HPC Wiki
Revision as of 12:53, 20 July 2023 by 172.26.65.56 (talk) (Created page with "== GPU power limits vs. efficiency == Over the last decade or so, there has been several studies regarding setting arbitrary limits on GPU power consumption (possible via nvidia-smi) in order to increase the amount of compute power per watt. The last 5-10% of power fed to a GPU has significantly diminishing returns compared to the first 90%; this [https://www.pugetsystems.com/labs/hpc/nvidia-gpu-power-limit-vs-performance-2296/#GPU_Power_Scaling_vs_Performance_Results...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

GPU power limits vs. efficiency

Over the last decade or so, there has been several studies regarding setting arbitrary limits on GPU power consumption (possible via nvidia-smi) in order to increase the amount of compute power per watt.

The last 5-10% of power fed to a GPU has significantly diminishing returns compared to the first 90%; this set of graphs for A4000s and RTX 3090s shows this general trend.


Standard power limits in the Wean / Gates computer rooms (as of 2023-07-20)

Our current policy on how many GPUs can go in a rack is 48 per 208V 60A PDU. (The real max is 48A, or 17293W sustained, due to 80% rule)

This makes the general rule of thumb to be around 1 GPU per 360W.

For detailed questions, please contact Kirk Berthold (Computing Operations manager) and/or Junya Inohara (Datacenter Tech Lead).