Efficiency using sacct
¶
There are multiple tools for
using your HPC resources efficiently you may need.
This page is about using your HPC resources efficiently
using the sacct
tool.
Here is the general strategy to effectively use your HPC resources:
Want to see a video?
Watch the YouTube video
obtain the CPU and memory usage of a job using sacct
to see how to do so.
Watch the YouTube video Efficient HPC resource use, using Slurm and sacct to see how the reasoning of this strategy works out.
flowchart TD
obtain_data[Obtain CPU and memory usage of a job]
lower_limit_based_on_memory(Book enough memory)
limited_by_cpu(For that amount of cores, would runtime by limited by CPU?)
lower_limit_based_on_cpu(Increase the number of cores, so that on average, the right amount of CPUs is booked)
done(Use that amount of cores)
add_one(Increase the number of cores by one for safety)
obtain_data --> lower_limit_based_on_memory
lower_limit_based_on_memory --> limited_by_cpu
limited_by_cpu --> |no| add_one
limited_by_cpu --> |yes| lower_limit_based_on_cpu
lower_limit_based_on_cpu --> done
add_one --> done
Why not look at CPU usage?
Because CPU is more flexible.
For example, imagine a job with a short CPU spike, that can be processed by 16 CPUs. If 1 core has enough memory, use 1 core of memory: the CPU spike will be turned into a 100% CPU use (of that one core) for a longer duration.
To obtain the CPU and memory usage of a job using sacct
:
for example:
This will produce output such as this:
Elapsed NCPUS NTasks UserCPU CPUTime AveCPU MaxVMSize ReqMem
---------- ---------- -------- ---------- ---------- ---------- ---------- ----------
00:00:13 38 00:01.615 00:08:14 222000M
00:00:13 38 1 00:01.615 00:08:14 00:00:00 3227532K
00:00:13 38 1 00:00:00 00:08:14
Need a worked-out example?
Here is an example output:
Elapsed NCPUS NTasks UserCPU CPUTime AveCPU MaxVMSize ReqMem
---------- ---------- -------- ---------- ---------- ---------- ---------- ----------
00:00:13 38 00:01.615 00:08:14 222000M
00:00:13 38 1 00:01.615 00:08:14 00:00:00 3227532K
00:00:13 38 1 00:00:00 00:08:14
Book enough memory
There were 38 CPUs booked, which provides for 222000 megabyte
of memory. The memory used was 3227532 kilobyte, which is around 3227
megabyte. So we only need 3227 megabyte out of 222000 megabyte.
3227 / 222000 = 0.014536036 =
1.5% of what we requested.
1.5% of 38 CPUs is 0.6 CPU needed. Hence, booking 1 CPU will provide
enough memory
For that amount of cores, would runtime by limited by CPU?
Yes: we need 2 cores.
On average, each of the 38 cores spent 0 seconds (i.e. max 0.049 seconds)
working, out of 13 seconds. Using 1 core instead, means that all the work,
0.049 seconds per core for 38 cores can be done in 0.049 * 38 =
1.9 core.
This means that in practice one books 2 cores.
Increase the number of cores by one for safety
This would result in 3 cores.
Sometimes, however, it is inevitable to use resources inefficiently.
Examples¶
Here are some examples of how inefficient jobs can look and what you can do to make them more efficient.
Inefficient job example 1: booking too much cores¶
Elapsed NCPUS NTasks UserCPU CPUTime AveCPU MaxVMSize ReqMem
---------- ---------- -------- ---------- ---------- ---------- ---------- ----------
00:00:01 64 00:12.995 00:01:04 375G
00:00:01 64 1 00:12.995 00:01:04 00:00:00 3424140K
00:00:01 64 1 00:00:00 00:01:04
Here booking ? cores is considered okay.
Book enough memory
There were 64 CPUs booked, which provides for 375 gigabyte
of memory. The memory used was 3424140 kilobyte, which is around 3424
megabyte. So we only need 3424 megabyte out of 375000 megabyte.
3424 / 375000 = 0.009130667 =
0.9% of what we requested.
0.9% of 64 cores is 0.6 core needed. Hence, booking 1 core will provide
enough memory
For that amount of cores, would runtime by limited by CPU?
Yes: we need 2 cores.
On average, each of the 64 cores spent 0 seconds (i.e. max 0.049 seconds)
working, out of 1 second. Using 1 core instead, means that all the work,
0.049 seconds per core for 64 cores can be done in `0.049 * 64 =` 3.13 core.
This means that in practice one books 4 cores.
Increase the number of cores by one for safety
This would result in 3 cores.