Efficiency¶
This page is a stub
As of now, this page is incomplete, possibly incorrect and open for contributions.
There are multiple types of resources you may need. This page is about using HPC resources efficiently, i.e. how to schedule your HPC jobs optimally.
What is not included is how to profile computer code, as this can be done on any local computer, instead of the heavier compute resources.
How was this overview created?
This overview was created by going through all HPC cluster centers and merging the material they provided regarding this topic.
This is the material found:
HPC cluster name | Guide on how to improve efficiency | Center(s) |
---|---|---|
Alvis | None found | ![]() ![]() |
Bianca | UPPMAX jobstats page |
![]() ![]() |
COSMOS | None found | ![]() ![]() |
COSMOS SENS | None found | ![]() |
Dardel | None found | ![]() ![]() |
Data Science Platform | None found | ![]() |
Kebnekaise | None found | ![]() |
LUMI | No guide | ![]() ![]() |
Rackham | UPPMAX jobstats page |
![]() ![]() |
Sigma | None found | ![]() |
Tetralith | None found | ![]() ![]() |
Trusted research environment | None found | ![]() |
Vera | None found | ![]() |
My center's guide is not linked to!
If your center's guide is not linked to, please contribute or contact us.
Additionally, searching for this topic, these sources were found too:
- Southern Methodist University best practices guide
- Stack Overflow post on how to get the CPU and memory usage
- Blog post on using
seff
andreportseff
From that, all material was merged into one.
Here is a strategy to effectively use your HPC resources:
flowchart TD
obtain_data[Obtain CPU and memory usage of a job]
lower_limit_based_on_memory(Pick the number of cores to have enough memory)
limited_by_cpu(For that amount of cores, would runtime by limited by CPU?)
lower_limit_based_on_cpu(Increase the number of cores, so that on average, the right amount of CPUs is booked)
done(Use that amount of cores)
add_one(Increase the number of cores by one for safety)
obtain_data --> lower_limit_based_on_memory
lower_limit_based_on_memory --> limited_by_cpu
limited_by_cpu --> |no| add_one
limited_by_cpu --> |yes| lower_limit_based_on_cpu
lower_limit_based_on_cpu --> done
add_one --> done
Why not look at CPU usage?
Because CPU is more flexible.
For example, imagine a job with a short CPU spike, that can be processed by 16 CPUs. If 1 core has enough memory, use 1 core of memory: the CPU spike will be turned into a 100% CPU use (of that one core) for a longer duration.
The first step, 'Obtain CPU and memory usage of a job' depends on your HPC cluster:
HPC cluster name | Tool and guide | Center(s) |
---|---|---|
Alvis | TODO | ![]() ![]() |
Bianca | jobstats |
![]() ![]() |
COSMOS | TODO | ![]() ![]() |
COSMOS SENS | TODO | ![]() |
Dardel | TODO | ![]() ![]() |
Data Science Platform | TODO | ![]() |
Kebnekaise | TODO | ![]() |
LUMI | TODO | ![]() ![]() |
Rackham | jobstats |
![]() ![]() |
Sigma | TODO | ![]() |
Tetralith | TODO | ![]() ![]() |
Trusted research environment | TODO | ![]() |
Vera | TODO | ![]() |
Need a worked-out example?
Worked-out examples can be found on each page specific to the tool used.