Partitions on ALICE

This page contains information about the available partitions (queues) on ALICE and their resource limits.

For an overview of the hardware configuration of each node, please see About ALICE

List of Partitions

Partition

Timelimit

Default Timelimit

Default Memory Per CPU

GPU available

Nodes

Nodelist

Description

Partition

Timelimit

Default Timelimit

Default Memory Per CPU

GPU available

Nodes

Nodelist

Description

testing

1:00:00

 

10000 MB

2

nodelogin[01-02]

For some basic and short testing of batch scripts.
Additional limits are: Maximum of 15 CPUs per Node; maximum memory per node is 150G.
Each login node is equipped with an NVIDIA Tesla T4 which can be used to test GPU jobs.

cpu-short

4:00:00

01:00:00

4000 MB

44

node[001-020,801-802,853-860,863-876]

For jobs that require CPU nodes and not more than 4h of running time. Maximum 12 cores per socket. This is the default partition

cpu-medium

1-00:00:00

01:00:00

4000 MB

30

node[002-020,866-876]

For jobs that require CPU nodes and not more than 1d of running time. Maximum 12 cores per socket.

cpu-long

7-00:00:00

01:00:00

4000 MB

28

node[003-020,867-876]

For jobs that require CPU nodes and not more than 7d of running time. Maximum 12 cores per socket.

gpu-short

4:00:00

01:00:00

4000 MB

24

node[851-860,863-876]

For jobs that require GPU nodes and not more than 4h of running time

gpu-medium

1-00:00:00

01:00:00

4000 MB

23

node[852-860,863-876]

For jobs that require GPU nodes and not more than 1d of running time

gpu-long

7-00:00:00

01:00:00

4000 MB

18

node[853-860,864-872,876]

For jobs that require GPU nodes and not more than 7d of running time

mem

14-00:00:00

01:00:00

85369 MB

1

node801

For jobs that require the high memory node.

mem_mi

4-00:00:00

01:00:00

31253 MB

1

node802

Partition only available to MI researchers. Default running time is 4h.

cpu_lorentz

7-00:00:00

01:00:00

4027 MB

3

node0[22-23]

Partition only available to researchers from Lorentz Institute

cpu_natbio

30-00:00:00

01:00:00

23552 MB

1

node021

Partition only available to researchers from the group of B. Wielstra

gpu_strw

7-00:00:00

01:00:00

2644 MB

2

node86[1-2]

Partition only available to researchers from the group of E. Rossi

gpu_lucdh

14-00:00:00

01:00:00

4000 MB

1

node877

Parition only available to researchers from LUCDH

You can find the GPUs available on each node here: About ALICE | GPU overview

Important information about the partition system

Following the maintenance in April 2024, nodes with Intel and AMD CPUs have been merged into the same partitions. The amd-* partitions are deprecated and will be phased out.

Partitions cpu-short/medium/long

The cpu-* partition include several GPU nodes. However, this partition is meant to be used for CPU-only jobs. Scheduling of GPU jobs is more efficient when using the gpu-* partitions.

The nodes have a mix of Intel Skylake and AMD (Zen3) CPUs. If you need a specific type of CPU for your job, you can request one by specifying the corresponding feature, e.g.,

# for nodes with Intel CPUs #SBATCH --constraint=Intel.Skylake # for nodes with AMD CPUs #SBATCH --constraint=AMD.Zen3

MPI-jobs / Infiniband in the cpu-short/medium/long partition

The GPU and high-memory nodes that are part of the cpu-short partition do not have Infiniband.

Any MPI job or other type of job that requires Infiniband in the cpu-short partition, should add the following sbatch setting to the batch script:

#SBATCH --constraint=ib

This setting tells Slurm to allocate only nodes that have Infiniband, i.e., the cpu nodes. The setting can also be used for jobs in the other cpu partitions though it does not have any impact there at the moment because those partitions only consist of cpu nodes.

Partitions gpu-short/medium/long and mem

In order to ensure that jobs submitted to the gpu-short/medium/long and mem partitions are not blocked by jobs in the cpu-short partition, the gpu-short/medium/long and mem partitions have been given a higher priority factor compared to the cpu-short partition.

Therefore, short jobs that require a GPU or the high-memory node should always be submitted to the gpu-short or mem partition.

Requesting a specific GPU

The partitions gpu-short/medium/long consist of nodes with different type of GPUs. If you do not specify a GPU, Slurm will pick one for you. However, this GPU may not have sufficient memory for your jobs. In this case, you can specify the type of GPU using

#SBATCH --gres=gpu:<gpu_type>:<number_of_gpus>

Here are some examples:

It is not possible to run multi-GPU (parallel GPU) jobs on the MIG GPUs unless each GPU is used by a completely independent tasks.

Private partitions

Partition cpu_lorentz

Partition cpu_lorentz is only available for researchers from Lorentz Institute (LION). We recommend that you read the following before you start to use the partition: Using partition cpu_lorentz

If you have any questions or if you need assistance with getting your job to run on this partition, do not hesitate to contact ALICE Helpdesk.

Partition gpu_strw

Partition gpu_strw is only available for researchers from the group of E. Rossi (STRW) and members of STRW. We recommend that you read the following page before starting to use the partition: Using partition gpu_strw

If you have any questions or if you need assistance with getting your job to run on this partition, do not hesitate to contact ALICE Helpdesk.

Partition mem_mi

Partition mem_mi is available exclusively to users from MI. We recommend that you read the general instructions for using node802 before you start to use the partition (Using Node802 and Partition mem_mi).

If you have any questions or if you need assistance with getting your job to run on this partition, do not hesitate to contact ALICE Helpdesk.

Partition Limits

The following limits currently apply to each partition:

Partition

#Allocated CPUs per User (running jobs)

#Allocated GPUs per User (running jobs)

#Jobs submitted per User

Partition

#Allocated CPUs per User (running jobs)

#Allocated GPUs per User (running jobs)

#Jobs submitted per User

cpu-short, amd-short

288

 

 

cpu-medium

240

 

 

cpu-long, amd-long

192

 

 

gpu-short, amd-gpu-short

168

28

 

gpu-medium

120

20

 

gpu-long, amd-gpu-long

96

16

 

mem

 

 

 

mem_mi

 

 

 

cpu_natbio

 

 

 

cpu_lorentz

 

 

 

gpu_strw

 

 

 

gpu_lucdh

 

 

 

testing

 

 

4

Only the testing partitions has limits on the amount of jobs that you can submit. You can submit as many jobs as you want to the cpu and gpu partitions, but slurm will only allocate jobs that fit in the above CPU and node limits. If you submit multiple jobs then slurm will sum up the number of CPUs or nodes that your job requests. Those jobs that exceed the limits will wait in the queue until running jobs have finished and the total number of allocated CPUs and nodes falls below the limits. Then Slurm will allocate waiting jobs if limits permit it. For those jobs that exceed the limits and wait in the queue, squeue will show "(QOSMaxNodePerUserLimit)".