Editing
Slurm Job Efficiency
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
[[Category:Slurm]] [[Category:Jobs]] == Slurm Job Efficiency Script == The '''Slurm Job Efficiency Script''' is a Python script for analyzing resource utilization of completed Slurm jobs. It uses Slurm's <code>sacct</code> and <code>seff</code> commands to retrieve job data, providing insights into CPU and memory usage, along with optimization recommendations. The script supports filtering by users, job IDs, partitions, and time ranges, and generates detailed or summary reports. === Overview === This script assists in monitoring and optimizing resource allocation in Slurm-based HPC clusters. It calculates CPU and memory efficiency, identifies under- or over-utilized resources, and outputs a tabular report with a summary of average efficiencies and actionable insights. === Example Usage === Below are examples of common use cases, assuming the script is saved as <code>/opt/cluster_tools/babel_contrib/slurm_job_efficiency.py</code> and run in a Slurm environment. ==== Analyze Jobs for a Specific User ==== To analyze completed jobs for user <code>alice</code> with the default time range (last 1 day): python3 /opt/cluster_tools/babel_contrib/slurm_job_efficiency.py alice To analyze jobs for user <code>alice</code> over the last 7 days or starting from April 1, 2025: python3 /opt/cluster_tools/babel_contrib/slurm_job_efficiency.py alice -t 7d python3 /opt/cluster_tools/babel_contrib/slurm_job_efficiency.py alice -t 2025-04-01 '''Output''' (for <code>-t 7d</code>): Slurm Job Efficiency: User: alice ββββββββββββ€ββββββββββββββββββ€βββββββββββββ€βββββββββββββ€βββββββββββββββββββββββββββββββββββββββββββββββββ β JobID β JobName β CPU Eff β Mem Eff β Insights β ββββββββββββͺββββββββββββββββββͺβββββββββββββͺβββββββββββββͺβββββββββββββββββββββββββββββββββββββββββββββββββ‘ β 12345 β simulation β 95.50% β 80.20% β - β ββββββββββββͺββββββββββββββββββͺβββββββββββββͺβββββββββββββͺβββββββββββββββββββββββββββββββββββββββββββββββββ‘ β 12346 β data_proc β 45.30% β 20.10% β Underutilized CPU; Reduce CPU cores β β β β β β Underutilized memory; Reduce memory allocation β ββββββββββββ§ββββββββββββββββββ§βββββββββββββ§βββββββββββββ§βββββββββββββββββββββββββββββββββββββββββββββββββ -------------------------------------- Total Jobs: 2 Avg CPU Eff: 70.40%, Avg Mem Eff: 50.15% Summary: - Underutilized CPU; Reduce requested resources or optimize job - Underutilized memory; Reduce memory allocation -------------------------------------- ==== Show Detailed Resource Usage ==== To analyze jobs for user <code>alice</code> with detailed resource columns: python3 /opt/cluster_tools/babel_contrib/slurm_job_efficiency.py alice -r '''Output''': Slurm Job Efficiency: User: alice ββββββββββββ€ββββββββββββββββββ€βββββββββββββ€βββββββββββββ€ββββββββββββββ€βββββββββββ€βββββββββββ€βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β JobID β JobName β CPU Eff β Mem Eff β Req Cores β Req Mem β Max Mem β Insights β ββββββββββββͺββββββββββββββββββͺβββββββββββββͺβββββββββββββͺββββββββββββββͺβββββββββββͺβββββββββββͺβββββββββββββββββββββββββββββββββββββββββββββββββββββββ‘ β 12348 β compute_task β 85.00% β 95.00% β 4 β 16GB β 15.20 GB β High memory usage; Increase memory to avoid swapping β ββββββββββββ§ββββββββββββββββββ§βββββββββββββ§βββββββββββββ§ββββββββββββββ§βββββββββββ§βββββββββββ§βββββββββββββββββββββββββββββββββββββββββββββββββββββββ -------------------------------------- Total Jobs: 1 Avg CPU Eff: 85.00%, Avg Mem Eff: 95.00% Summary: - High memory usage; Increase memory to avoid swapping -------------------------------------- ==== Analyze Specific Job IDs ==== To analyze job IDs <code>12345</code> and <code>12346</code>: python3 /opt/cluster_tools/babel_contrib/slurm_job_efficiency.py -j 12345,12346 '''Output''': Slurm Job Efficiency: User: Any ββββββββββββ€ββββββββββββββββββ€βββββββββββββ€βββββββββββββ€βββββββββββββββββββββββββββββββββββββββββββββββββ β JobID β JobName β CPU Eff β Mem Eff β Insights β ββββββββββββͺββββββββββββββββββͺβββββββββββββͺβββββββββββββͺβββββββββββββββββββββββββββββββββββββββββββββββββ‘ β 12345 β simulation β 95.50% β 80.20% β - β ββββββββββββͺββββββββββββββββββͺβββββββββββββͺβββββββββββββͺβββββββββββββββββββββββββββββββββββββββββββββββββ‘ β 12346 β data_proc β 45.30% β 20.10% β Underutilized CPU; Reduce CPU cores β β β β β β Underutilized memory; Reduce memory allocation β ββββββββββββ§ββββββββββββββββββ§βββββββββββββ§βββββββββββββ§βββββββββββββββββββββββββββββββββββββββββββββββββ -------------------------------------- Total Jobs: 2 Avg CPU Eff: 70.40%, Avg Mem Eff: 50.15% Summary: - Underutilized CPU; Reduce requested resources or optimize job - Underutilized memory; Reduce memory allocation -------------------------------------- ==== Analyze Jobs for a Partition ==== To analyze jobs for user <code>alice</code> on the <code>general</code> partition: python3 /opt/cluster_tools/babel_contrib/slurm_job_efficiency.py alice -p general '''Output''': Slurm Job Efficiency: User: alice (general) ββββββββββββ€ββββββββββββββββββ€βββββββββββββ€βββββββββββββ€βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β JobID β JobName β CPU Eff β Mem Eff β Insights β ββββββββββββͺββββββββββββββββββͺβββββββββββββͺβββββββββββββͺβββββββββββββββββββββββββββββββββββββββββββββββββββββββ‘ β 12348 β compute_task β 85.00% β 95.00% β High memory usage; Increase memory to avoid swapping β ββββββββββββ§ββββββββββββββββββ§βββββββββββββ§βββββββββββββ§βββββββββββββββββββββββββββββββββββββββββββββββββββββββ -------------------------------------- Total Jobs: 1 Avg CPU Eff: 85.00%, Avg Mem Eff: 95.00% Summary: - High memory usage; Increase memory to avoid swapping -------------------------------------- ==== Summary-Only Report ==== To display only summary statistics for user <code>alice</code> over the last 1 day: python3 /opt/cluster_tools/babel_contrib/slurm_job_efficiency.py alice -t 1d -s '''Output''': Slurm Job Efficiency: User: alice -------------------------------------- Total Jobs: 3 Avg CPU Eff: 75.33%, Avg Mem Eff: 68.67% Summary: - Underutilized CPU; Reduce requested resources or optimize job - Underutilized memory; Reduce memory allocation -------------------------------------- === Explanation of Options === * <code>-a, --all</code>: Analyzes all users with completed jobs. * <code>-j, --job-id</code>: Specifies job IDs to analyze (e.g., <code>-j 12345,12346</code>). * <code>-t, --time</code>: Sets time range (e.g., <code>7d</code> for 7 days, <code>12h</code> for 12 hours) or start date (e.g., <code>2025-04-01</code> for jobs starting from April 1, 2025). Defaults to <code>1d</code> if not specified. * <code>-p, --partition</code>: Filters by partitions (e.g., <code>-p compute,gpu</code>). * <code>-r, --show-resource-details</code>: Includes columns for requested cores, memory, and max memory used. * <code>-S, --sort-by</code>: Sorts table by <code>jobid</code>, <code>cpu_eff</code>, or <code>mem_eff</code>. * <code>-s, --summary-only</code>: Shows only summary statistics. * <code>--no-user</code>: Suppresses printing the username. === Notes === * Invalid job IDs or users trigger warnings or errors, and the script skips them. * Time ranges can be specified as <code>Nd</code> (days), <code>Nh</code> (hours), or a date in <code>YYYY-MM-DD</code> format. === See Also ===
Summary:
Please note that all contributions to CMU -- Language Technologies Institute -- HPC Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Project:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
Edit source
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Tools
What links here
Related changes
Page information