Skip to content

Efficiency

This page is a stub

As of now, this page is incomplete, possibly incorrect and open for contributions.

There are multiple types of resources you may need. This page is about using HPC resources efficiently, i.e. how to schedule your HPC jobs optimally.

What is not included is how to profile computer code, as this can be done on any local computer, instead of the heavier compute resources.

How was this overview created?

This overview was created by going through all HPC cluster centers and merging the material they provided regarding this topic.

This is the material found:

HPC cluster name Guide on how to improve efficiency Center(s)
Alvis None found C3SE NAISS
Bianca UPPMAX jobstats page NAISS UPPMAX
COSMOS None found LUNARC NAISS
COSMOS SENS None found LUNARC
Dardel None found NAISS PDC
Data Science Platform None found AIDA Data Hub
Kebnekaise None found HPC2N
LUMI No guide CSC NAISS
Rackham UPPMAX jobstats page NAISS UPPMAX
Sigma None found NSC
Tetralith None found NAISS NSC
Trusted research environment None found University of Gothenburg
Vera None found C3SE
My center's guide is not linked to!

If your center's guide is not linked to, please contribute or contact us.

Additionally, searching for this topic, these sources were found too:

From that, all material was merged into one.

Here is a strategy to effectively use your HPC resources:

flowchart TD
  obtain_data[Obtain CPU and memory usage of a job]
  lower_limit_based_on_memory(Pick the number of cores to have enough memory)
  limited_by_cpu(For that amount of cores, would runtime by limited by CPU?)
  lower_limit_based_on_cpu(Increase the number of cores, so that on average, the right amount of CPUs is booked)

  done(Use that amount of cores)

  add_one(Increase the number of cores by one for safety)

  obtain_data --> lower_limit_based_on_memory
  lower_limit_based_on_memory --> limited_by_cpu
  limited_by_cpu --> |no| add_one
  limited_by_cpu --> |yes| lower_limit_based_on_cpu
  lower_limit_based_on_cpu --> done
  add_one --> done
Why not look at CPU usage?

Because CPU is more flexible.

For example, imagine a job with a short CPU spike, that can be processed by 16 CPUs. If 1 core has enough memory, use 1 core of memory: the CPU spike will be turned into a 100% CPU use (of that one core) for a longer duration.

The first step, 'Obtain CPU and memory usage of a job' depends on your HPC cluster:

HPC cluster name Tool and guide Center(s)
Alvis TODO C3SE NAISS
Bianca jobstats NAISS UPPMAX
COSMOS TODO LUNARC NAISS
COSMOS SENS TODO LUNARC
Dardel TODO NAISS PDC
Data Science Platform TODO AIDA Data Hub
Kebnekaise TODO HPC2N
LUMI TODO CSC NAISS
Rackham jobstats NAISS UPPMAX
Sigma TODO NSC
Tetralith TODO NAISS NSC
Trusted research environment TODO University of Gothenburg
Vera TODO C3SE
Need a worked-out example?

Worked-out examples can be found on each page specific to the tool used.