Partitions on ALICE

This page contains information about the available partitions (queues) on ALICE and their resource limits.

For an overview of the hardware configuration of each node, please see https://pubappslu.atlassian.net/wiki/spaces/HPCWIKI/pages/37519378

List of Partitions

Partition

Timelimit

Default Timelimit

Default Memory Per CPU

CPU Type

GPU available

Nodes

Nodelist

Description

Partition

Timelimit

Default Timelimit

Default Memory Per CPU

CPU Type

GPU available

Nodes

Nodelist

Description

testing

1:00:00

 

10000 MB

Intel

2

nodelogin[01-02]

For some basic and short testing of batch scripts.
Additional limits are: Maximum of 15 CPUs per Node; maximum memory per node is 150G.
Each login node is equipped with an NVIDIA Tesla T4 which can be used to test GPU jobs.

amd-short

04:00:00

01:00:00

4000 MB

AMD

15

node802, node[863-876]

For short cpu-only jobs on AMD nodes. Limits are: Maximum 24 cores per node; maximum amount of memory per node is 1TB (only node802). Default running time is 1h and default memory per cpu is 4G. (See below for further information)

amd-long

7-00:00:00

01:00:00

4000 MB

AMD

10

node[867-876]

For long cpu-only jobs on AMD nodes. Limits are: Maximum 24 cores per node; Default running time is 1h and default memory per cpu is 4G. (See below for further information)

amd-gpu-short

04:00:00

01:00:00

4000 MB

AMD

14

node[863-876]

For jobs that require GPU nodes and not more than 4h of running time using nodes with AMD CPUs.

amd-gpu-long

7-00:00:00

01:00:00

4000 MB

AMD

11

node[863-872], node876

For jobs that require GPU nodes and not more than 7d of running time using nodes with AMD CPUs.

cpu-short

4:00:00

01:00:00

16064 MB

Intel

20

node[001-020], node801, node8[53-60]

For jobs that require CPU nodes and not more than 4h of running time. This is the default partition

cpu-medium

1-00:00:00

01:00:00

16064 MB

Intel

19

node[002-020]

For jobs that require CPU nodes and not more than 1d of running time

cpu-long

7-00:00:00

01:00:00

16064 MB

Intel

18

node[003-020]

For jobs that require CPU nodes and not more than 7d of running time

gpu-short

4:00:00

01:00:00

15868 MB

Intel

10

node[851-860]

For jobs that require GPU nodes and not more than 4h of running time

gpu-medium

1-00:00:00

01:00:00

15868 MB

Intel

10

node[851-860]

For jobs that require GPU nodes and not more than 1d of running time

gpu-long

7-00:00:00

01:00:00

15868 MB

Intel

9

node[852-860]

For jobs that require GPU nodes and not more than 7d of running time

mem

14-00:00:00

01:00:00

85369 MB

Intel

1

node801

For jobs that require the high memory node.

mem_mi

4-00:00:00

01:00:00

31253 MB

AMD

1

node802

Partition only available to MI researchers. Default running time is 4h.

cpu_lorentz

7-00:00:00

01:00:00

4027 MB

AMD

3

node0[22-23]

Partition only available to researchers from Lorentz Institute

cpu_natbio

30-00:00:00

01:00:00

23552 MB

Intel

1

node021

Partition only available to researchers from the group of B. Wielstra

gpu_strw

7-00:00:00

01:00:00

2644 MB

AMD

2

node86[1-2]

Partition only available to researchers from the group of E. Rossi

gpu_lucdh

14-00:00:00

01:00:00

4000 MB

AMD

1

node877

Parition only available to researchers from LUCDH

You can find the GPUs available on each node here: https://pubappslu.atlassian.net/wiki/spaces/HPCWIKI/pages/37519378/About+ALICE#GPU-overview

Important information about the partition system

ALICE has nodes with Intel and AMD CPUs which are in separate partitions.

cpu-short partition

The cpu-short partition include additional nodes: the GPU nodes node8[53-60] and the high-memory node node801.

MPI-jobs / Infiniband in the cpu-short partition

The GPU and high-memory nodes that are part of the cpu-short partition do not have Infiniband.

Any MPI job or other type of job that requires Infiniband in the cpu-short partition, should add the following sbatch setting to the batch script:

#SBATCH --constraint=ib

This setting tells Slurm to allocate only nodes that have Infiniband, i.e., the cpu nodes. The setting can also be used for jobs in the other cpu partitions though it does not have any impact there at the moment because those partitions only consist of cpu nodes.

Partitions gpu-short/medium/long and mem

In order to ensure that jobs submitted to the gpu-short/medium/long and mem partitions are not blocked by jobs in the cpu-short partition, the gpu-short/medium/long and mem partitions have been given a higher priority factor compared to the cpu-short partition.

Therefore, short jobs that require a GPU or the high-memory node should always be submitted to the gpu-short or mem partition.

Partitions with AMD CPUs: amd-short, amd-long amd-gpu-short, amd-gpu-long

Software

The nodes in these partitions are equipped with AMD CPUs. Because we build software for the cluster with optimization for Intel or AMD, there is a separate AMD-branch of our software stack.

When you run a batch or interactive job on the AMD nodes, you can access the AMD software stack like this:

module load ALICE/default

Note that his also works for the Intel nodes, so you can always leave it in your batch job.

On the login nodes, you can access it using

module load ALICE/AMD

Because the AMD branch in our software stack is newer compared the Intel branch, the number of modules in the AMD branch is still significantly smaller compared to the Intel branch. It is very possible that modules that you have been using so far are missing in the AMD branch. We can always add new modules to the AMD branch. Therefore, if you encounter missing modules or if you need assistance with getting your job to run on the AMD nodes, do not hesitate to contact the ALICE Helpdesk.

It is not always possible to build or install software for the AMD nodes on the login nodes because those nodes have Intel CPUs. In this case you need to do this on one of the AMD nodes through a Slurm job (interactive or batch).

Network

Node8[63-76] are in a different data center than the rest of ALICE. Until all of ALICE is moved to one location, there is a only a single 10Gb/s connection between the sites and all node8[63-76] are internally connected with 1Gb/s. Therefore, we strongly recommend that you move data to local scratch on the nodes before processing it, write data products to local scratch first and at the end of your job move the data that you need to retain back to shared scratch.

All AMD nodes are currently not connected to the Infiniband network.

amd-short

Partition amd-short provides access to part of the resources of node802 and the GPU nodes 8[63-76]. These node are in their own partition because they have AMD CPUs. It is possible to request up to 1TB of memory for a single node job, but this will always require the job to run on node802. All other nodes in the partition have 256GB of memory in total.

amd-short, amd-long

Partitions amd-short and amd-long have limits on the number of CPUs per node. Slurm can only allocate 12 CPUs per socket on each node for a total of 24 CPUs per node. This is to ensure that GPU jobs in partitions amd-gpu-short and amd-gpu-long still have sufficient CPU resources available so that such jobs can start quickly.

amd-gpu-short, amd-gpu-long

In order to ensure that jobs submitted to the partitions amd-gpu-short/long are not blocked by jobs in the amd-short partition, the two amd-gpu partitions have been given a higher priority factor compared to the amd-short partition. Therefore, short jobs that require a GPU node should always be submitted to the amd-gpu-* partitions.

The GPU nodes in the two partitions offer access to different types of GPUs. While a small number of nodes provide full A100 with 80GBs of memory, in most nodes each A100s GPUs has been split up into two separate GPUs which are called MIGs. The MIGs are completely independent from each other. At the moment, the two different MIGs have the same amount of memory (40GB), but differ by one compute instances which is why there is a small performance difference between the two. You can choose which type of GPU you want, for example if you need 80GB of memory instead of 40GB. If you do not specify a type of GPU, then Slurm will pick one that is available. You can find an overview of the GPU configurations on https://pubappslu.atlassian.net/wiki/spaces/HPCWIKI/pages/37519378 or by using scontrol show node <node name> on ALICE

In order to select a specific type of GPU, you need to use #SBATCH --gres=gpu:<gpu_type>:<number_of_gpus>, for example

  • if you want one A100: #SBATCH --gres=gpu:a100:1

  • if you want one MIG 4g.40gb: #SBATCH --gres=gpu:4g.40gb:1

Because the new GPUs have significantly more memory than the 2080TIs (80GB/40GB versus 11 GB), the GPU billing is also higher for the two partitions. The GPU billing has been increased based on the memory of the GPU and relative to the 2080TIs which corresponds to a factor of 4. All types of GPUs in the two partitions are billed the same. The billing affects your fair share.

Multi-GPU jobs on MIG GPUs

Private partitions

Partition cpu_lorentz

Partition cpu_lorentz is only available for researchers from Lorentz Institute (LION). We recommend that you read the following before you start to use the partition:

If you have any questions or if you need assistance with getting your job to run on this partition, do not hesitate to contact ALICE Helpdesk.

Partition gpu_strw

Partition gpu_strw is only available for researchers from the group of E. Rossi (STRW) and members of STRW. We recommend that you read the following page before starting to use the partition:

If you have any questions or if you need assistance with getting your job to run on this partition, do not hesitate to contact ALICE Helpdesk.

Partition mem_mi

Partition mem_mi is available exclusively to users from MI. We recommend that you read the general instructions for using node802 before you start to use the partition ().

If you have any questions or if you need assistance with getting your job to run on this partition, do not hesitate to contact ALICE Helpdesk.

Partition Limits

The following limits currently apply to each partition:

Partition

#Allocated CPUs per User (running jobs)

#Allocated GPUs per User (running jobs)

#Jobs submitted per User

Partition

#Allocated CPUs per User (running jobs)

#Allocated GPUs per User (running jobs)

#Jobs submitted per User

cpu-short, amd-short

288

 

 

cpu-medium

240

 

 

cpu-long, amd-long

192

 

 

gpu-short, amd-gpu-short

168

28

 

gpu-medium

120

20

 

gpu-long, amd-gpu-long

96

16

 

mem

 

 

 

mem_mi

 

 

 

cpu_natbio

 

 

 

cpu_lorentz

 

 

 

gpu_strw

 

 

 

gpu_lucdh

 

 

 

testing

 

 

4

Only the testing partitions has limits on the amount of jobs that you can submit. You can submit as many jobs as you want to the cpu and gpu partitions, but slurm will only allocate jobs that fit in the above CPU and node limits. If you submit multiple jobs then slurm will sum up the number of CPUs or nodes that your job requests. Those jobs that exceed the limits will wait in the queue until running jobs have finished and the total number of allocated CPUs and nodes falls below the limits. Then Slurm will allocate waiting jobs if limits permit it. For those jobs that exceed the limits and wait in the queue, squeue will show "(QOSMaxNodePerUserLimit)".