/
Using partition gpu_cml

Using partition gpu_cml

About

Partition gpu_cml with nodes node879 is a private partition for members of the Institute for Environmental Studies (CML).

The hardware configuration of the nodes can be found here: About ALICE | Hardware Description

Access

  • Users from CML get access automatically

  • Users need to be a member of the group gpu_cml and have access to the account gpu_cml

    • you can check whether you a member of the group by running the command id on the command line

    • you can check whether you have access to the account by running sacctmgr show associations user=<username> where <username> should be replaced by your ALICE user name.

Partition Settings

Partition settings can be changed to adjust the need of the group. Requests for changes can be done either by the PI or by a group member with confirmation from the PI and should be send to the ALICE Helpdesk.

Job submission

  • users have to use their account gpu_cml for running jobs. This is necessary so that usage of this account does not impact the fairshare of other users.

    • In your batch script, you have to add: #SBATCH --account=gpu_cml

      • For other jobs on ALICE, your regular ALICE account is sufficient and you do not need to set this

  • It might be the case that the node is in a reservation so that members of CML have exclusive access to this node. If this is the case and you want to be able to submit jobs to the partition, you have to add the name of the reservation

    • First check the name of the reservation

      scontrol show reservation | grep node879
    • Then, add the reservation to your batch file

      #SBATCH --reservation=<name_of_reservation>

Software

Scientific software stack

  • You can make use of the general scientific software stack which can be accessed by running

    module load ALICE/default

    It is recommend to add this to your batch scripts, too.

  • If you want to use software fully optimized for the CPU architecture of the nodes, you have to build the software yourself.

Your own scripts/programmes

  • Because this node has a Genoa AMD CPU, building software on the login nodes even with AVX512 support should run on this node

  • If not, you need to compile such scripts/software as part of a batch or interactive job on the node

    • One way to do this is to create a short slurm batch job specifically for compiling your software, setting up your conda/Python environments, etc. If you only need to do this once, then there is no need to make this part of your production batch job.

    • Another option is to compile the first time you run your programme as part of a job. In this first job, you copy the compiled program back to your shared storage or home directory. For the next job, you use the already compiled version (see example below).

  • You can still use the login nodes for testing/debugging. In this case, you need to compile on the login nodes, run your test and for your job, compile on the compute node again.

Example

Here is an example of how a Slurm batch script could look like for using the node, including a HelloWorld OpenMP program to demonstrate the compiling and use of the local scratch storage.

If you are new to HPC, ALICE or Slurm, have a look at the https://pubappslu.atlassian.net/wiki/spaces/HPCWIKI/pages/5963809 first.

Batch script

 

OpenMP script

Here is the content of the file omp_hello.c from https://computing.llnl.gov/tutorials/openMP/samples/C/omp_hello.c

Related content

Using partition cpu_lorentz
Using partition cpu_lorentz
More like this
Your first GPU job
Your first GPU job
More like this
Using Node802 and Partition mem_mi
Using Node802 and Partition mem_mi
More like this
Partitions on ALICE
Partitions on ALICE
More like this
Your first bash job
Your first bash job
More like this
ALICE partition system update 2023-01
ALICE partition system update 2023-01
More like this