Partitions on ALICE
This page contains information about the available partitions (queues) on ALICE and their resource limits.
For an overview of the hardware configuration of each node, please see About ALICE
List of Partitions
Partition | Timelimit | Default Timelimit | Default Memory Per CPU | GPU available | Nodes | Nodelist | Description |
---|---|---|---|---|---|---|---|
testing | 1:00:00 | Â | 10000 MB |
| 2 | nodelogin[01-02] | For some basic and short testing of batch scripts. |
cpu-short | 4:00:00 | 01:00:00 | 4000 MB | 44 | node[001-020,801-802,853-860,863-876] | For jobs that require CPU nodes and not more than 4h of running time. Maximum 12 cores per socket. This is the default partition | |
cpu-medium | 1-00:00:00 | 01:00:00 | 4000 MB | 30 | node[002-020,866-876] | For jobs that require CPU nodes and not more than 1d of running time. Maximum 12 cores per socket. | |
cpu-long | 7-00:00:00 | 01:00:00 | 4000 MB | 28 | node[003-020,867-876] | For jobs that require CPU nodes and not more than 7d of running time. Maximum 12 cores per socket. | |
gpu-short | 4:00:00 | 01:00:00 | 4000 MB |
| 24 | node[851-860,863-876] | For jobs that require GPU nodes and not more than 4h of running time |
gpu-medium | 1-00:00:00 | 01:00:00 | 4000 MB |
| 23 | node[852-860,863-876] | For jobs that require GPU nodes and not more than 1d of running time |
gpu-long | 7-00:00:00 | 01:00:00 | 4000 MB |
| 18 | node[853-860,864-872,876] | For jobs that require GPU nodes and not more than 7d of running time |
mem | 14-00:00:00 | 01:00:00 | 85369 MB | 1 | node801 | For jobs that require the high memory node. | |
mem_mi | 4-00:00:00 | 01:00:00 | 31253 MB | 1 | node802 | Partition only available to MI researchers. Default running time is 4h. | |
cpu_lorentz | 7-00:00:00 | 01:00:00 | 4027 MB | 3 | node0[22-23] | Partition only available to researchers from Lorentz Institute | |
cpu_natbio | 30-00:00:00 | 01:00:00 | 23552 MB | 1 | node021 | Partition only available to researchers from the group of B. Wielstra | |
gpu_strw | 7-00:00:00 | 01:00:00 | 2644 MB |
| 2 | node86[1-2] | Partition only available to researchers from the group of E. Rossi |
gpu_lucdh | 14-00:00:00 | 01:00:00 | 4000 MB | 1 | node877 | Parition only available to researchers from LUCDH |
You can find the GPUs available on each node here: About ALICE | GPU overview
Important information about the partition system
Following the maintenance in April 2024, nodes with Intel and AMD CPUs have been merged into the same partitions. The amd-* partitions are deprecated and will be phased out.
Partitions cpu-short/medium/long
The cpu-*
partition include several GPU nodes. However, this partition is meant to be used for CPU-only jobs. Scheduling of GPU jobs is more efficient when using the gpu-*
partitions.
The nodes have a mix of Intel Skylake and AMD (Zen3) CPUs. If you need a specific type of CPU for your job, you can request one by specifying the corresponding feature, e.g.,
# for nodes with Intel CPUs
#SBATCH --constraint=Intel.Skylake
# for nodes with AMD CPUs
#SBATCH --constraint=AMD.Zen3
MPI-jobs / Infiniband in the cpu-short/medium/long partition
The GPU and high-memory nodes that are part of the cpu-short
partition do not have Infiniband.
Any MPI job or other type of job that requires Infiniband in the cpu-short partition, should add the following sbatch setting to the batch script:
#SBATCH --constraint=ib
This setting tells Slurm to allocate only nodes that have Infiniband, i.e., the cpu nodes. The setting can also be used for jobs in the other cpu partitions though it does not have any impact there at the moment because those partitions only consist of cpu nodes.
Partitions gpu-short/medium/long and mem
In order to ensure that jobs submitted to the gpu-short/medium/long and mem partitions are not blocked by jobs in the cpu-short partition, the gpu-short/medium/long and mem partitions have been given a higher priority factor compared to the cpu-short partition.
Therefore, short jobs that require a GPU or the high-memory node should always be submitted to the gpu-short or mem partition.
Requesting a specific GPU
The partitions gpu-short/medium/long consist of nodes with different type of GPUs. If you do not specify a GPU, Slurm will pick one for you. However, this GPU may not have sufficient memory for your jobs. In this case, you can specify the type of GPU using
#SBATCH --gres=gpu:<gpu_type>:<number_of_gpus>
Here are some examples:
It is not possible to run multi-GPU (parallel GPU) jobs on the MIG GPUs unless each GPU is used by a completely independent tasks.
Private partitions
Partition cpu_lorentz
Partition cpu_lorentz
is only available for researchers from Lorentz Institute (LION). We recommend that you read the following before you start to use the partition: Using partition cpu_lorentz
If you have any questions or if you need assistance with getting your job to run on this partition, do not hesitate to contact ALICE Helpdesk.
Partition gpu_strw
Partition gpu_strw
is only available for researchers from the group of E. Rossi (STRW) and members of STRW. We recommend that you read the following page before starting to use the partition: Using partition gpu_strw
If you have any questions or if you need assistance with getting your job to run on this partition, do not hesitate to contact ALICE Helpdesk.
Partition mem_mi
Partition mem_mi
is available exclusively to users from MI. We recommend that you read the general instructions for using node802 before you start to use the partition (Using Node802 and Partition mem_mi).
If you have any questions or if you need assistance with getting your job to run on this partition, do not hesitate to contact ALICE Helpdesk.
Partition Limits
The following limits currently apply to each partition:
Partition | #Allocated CPUs per User (running jobs) | #Allocated GPUs per User (running jobs) | #Jobs submitted per User |
---|---|---|---|
cpu-short, amd-short | 288 | Â | Â |
cpu-medium | 240 | Â | Â |
cpu-long, amd-long | 192 | Â | Â |
gpu-short, amd-gpu-short | 168 | 28 | Â |
gpu-medium | 120 | 20 | Â |
gpu-long, amd-gpu-long | 96 | 16 | Â |
mem | Â | Â | Â |
mem_mi | Â | Â | Â |
cpu_natbio | Â | Â | Â |
cpu_lorentz | Â | Â | Â |
gpu_strw | Â | Â | Â |
gpu_lucdh | Â | Â | Â |
testing | Â | Â | 4 |
Only the testing partitions has limits on the amount of jobs that you can submit. You can submit as many jobs as you want to the cpu and gpu partitions, but slurm will only allocate jobs that fit in the above CPU and node limits. If you submit multiple jobs then slurm will sum up the number of CPUs or nodes that your job requests. Those jobs that exceed the limits will wait in the queue until running jobs have finished and the total number of allocated CPUs and nodes falls below the limits. Then Slurm will allocate waiting jobs if limits permit it. For those jobs that exceed the limits and wait in the queue, squeue
will show "(QOSMaxNodePerUserLimit)".