Partitions on ALICE
This page contains information about the available partitions (queues) on ALICE and their resource limits.
For an overview of the hardware configuration of each node, please see https://pubappslu.atlassian.net/wiki/spaces/HPCWIKI/pages/37519378
- 1 List of Partitions
- 1.1 Important information about the partition system
- 1.1.1 cpu-short partition
- 1.1.2 Partitions gpu-short/medium/long and mem
- 1.1.3 Partitions with AMD CPUs: amd-short, amd-long amd-gpu-short, amd-gpu-long
- 1.1.3.1 Software
- 1.1.3.2 Network
- 1.1.3.3 amd-short
- 1.1.3.4 amd-short, amd-long
- 1.1.3.5 amd-gpu-short, amd-gpu-long
- 1.1.3.6 Multi-GPU jobs on MIG GPUs
- 1.2 Private partitions
- 1.2.1 Partition cpu_lorentz
- 1.2.2 Partition gpu_strw
- 1.2.3 Partition mem_mi
- 1.1 Important information about the partition system
- 2 Partition Limits
List of Partitions
Partition | Timelimit | Default Timelimit | Default Memory Per CPU | CPU Type | GPU available | Nodes | Nodelist | Description |
---|---|---|---|---|---|---|---|---|
testing | 1:00:00 |
| 10000 MB | Intel |
| 2 | nodelogin[01-02] | For some basic and short testing of batch scripts. |
amd-short | 04:00:00 | 01:00:00 | 4000 MB | AMD | 15 | node802, node[863-876] | For short cpu-only jobs on AMD nodes. Limits are: Maximum 24 cores per node; maximum amount of memory per node is 1TB (only node802). Default running time is 1h and default memory per cpu is 4G. (See below for further information) | |
amd-long | 7-00:00:00 | 01:00:00 | 4000 MB | AMD | 10 | node[867-876] | For long cpu-only jobs on AMD nodes. Limits are: Maximum 24 cores per node; Default running time is 1h and default memory per cpu is 4G. (See below for further information) | |
amd-gpu-short | 04:00:00 | 01:00:00 | 4000 MB | AMD |
| 14 | node[863-876] | For jobs that require GPU nodes and not more than 4h of running time using nodes with AMD CPUs. |
amd-gpu-long | 7-00:00:00 | 01:00:00 | 4000 MB | AMD |
| 11 | node[863-872], node876 | For jobs that require GPU nodes and not more than 7d of running time using nodes with AMD CPUs. |
cpu-short | 4:00:00 | 01:00:00 | 16064 MB | Intel | 20 | node[001-020], node801, node8[53-60] | For jobs that require CPU nodes and not more than 4h of running time. This is the default partition | |
cpu-medium | 1-00:00:00 | 01:00:00 | 16064 MB | Intel | 19 | node[002-020] | For jobs that require CPU nodes and not more than 1d of running time | |
cpu-long | 7-00:00:00 | 01:00:00 | 16064 MB | Intel | 18 | node[003-020] | For jobs that require CPU nodes and not more than 7d of running time | |
gpu-short | 4:00:00 | 01:00:00 | 15868 MB | Intel |
| 10 | node[851-860] | For jobs that require GPU nodes and not more than 4h of running time |
gpu-medium | 1-00:00:00 | 01:00:00 | 15868 MB | Intel |
| 10 | node[851-860] | For jobs that require GPU nodes and not more than 1d of running time |
gpu-long | 7-00:00:00 | 01:00:00 | 15868 MB | Intel |
| 9 | node[852-860] | For jobs that require GPU nodes and not more than 7d of running time |
mem | 14-00:00:00 | 01:00:00 | 85369 MB | Intel | 1 | node801 | For jobs that require the high memory node. | |
mem_mi | 4-00:00:00 | 01:00:00 | 31253 MB | AMD | 1 | node802 | Partition only available to MI researchers. Default running time is 4h. | |
cpu_lorentz | 7-00:00:00 | 01:00:00 | 4027 MB | AMD | 3 | node0[22-23] | Partition only available to researchers from Lorentz Institute | |
cpu_natbio | 30-00:00:00 | 01:00:00 | 23552 MB | Intel | 1 | node021 | Partition only available to researchers from the group of B. Wielstra | |
gpu_strw | 7-00:00:00 | 01:00:00 | 2644 MB | AMD |
| 2 | node86[1-2] | Partition only available to researchers from the group of E. Rossi |
gpu_lucdh | 14-00:00:00 | 01:00:00 | 4000 MB | AMD | 1 | node877 | Parition only available to researchers from LUCDH |
You can find the GPUs available on each node here: https://pubappslu.atlassian.net/wiki/spaces/HPCWIKI/pages/37519378/About+ALICE#GPU-overview
Important information about the partition system
ALICE has nodes with Intel and AMD CPUs which are in separate partitions.
cpu-short partition
The cpu-short
partition include additional nodes: the GPU nodes node8[53-60] and the high-memory node node801.
MPI-jobs / Infiniband in the cpu-short partition
The GPU and high-memory nodes that are part of the cpu-short
partition do not have Infiniband.
Any MPI job or other type of job that requires Infiniband in the cpu-short partition, should add the following sbatch setting to the batch script:
#SBATCH --constraint=ib
This setting tells Slurm to allocate only nodes that have Infiniband, i.e., the cpu nodes. The setting can also be used for jobs in the other cpu partitions though it does not have any impact there at the moment because those partitions only consist of cpu nodes.
Partitions gpu-short/medium/long and mem
In order to ensure that jobs submitted to the gpu-short/medium/long and mem partitions are not blocked by jobs in the cpu-short partition, the gpu-short/medium/long and mem partitions have been given a higher priority factor compared to the cpu-short partition.
Therefore, short jobs that require a GPU or the high-memory node should always be submitted to the gpu-short or mem partition.
Partitions with AMD CPUs: amd-short, amd-long amd-gpu-short, amd-gpu-long
Software
The nodes in these partitions are equipped with AMD CPUs. Because we build software for the cluster with optimization for Intel or AMD, there is a separate AMD-branch of our software stack.
When you run a batch or interactive job on the AMD nodes, you can access the AMD software stack like this:
module load ALICE/default
Note that his also works for the Intel nodes, so you can always leave it in your batch job.
On the login nodes, you can access it using
module load ALICE/AMD
Because the AMD branch in our software stack is newer compared the Intel branch, the number of modules in the AMD branch is still significantly smaller compared to the Intel branch. It is very possible that modules that you have been using so far are missing in the AMD branch. We can always add new modules to the AMD branch. Therefore, if you encounter missing modules or if you need assistance with getting your job to run on the AMD nodes, do not hesitate to contact the ALICE Helpdesk.
It is not always possible to build or install software for the AMD nodes on the login nodes because those nodes have Intel CPUs. In this case you need to do this on one of the AMD nodes through a Slurm job (interactive or batch).
Network
Node8[63-76] are in a different data center than the rest of ALICE. Until all of ALICE is moved to one location, there is a only a single 10Gb/s connection between the sites and all node8[63-76] are internally connected with 1Gb/s. Therefore, we strongly recommend that you move data to local scratch on the nodes before processing it, write data products to local scratch first and at the end of your job move the data that you need to retain back to shared scratch.
All AMD nodes are currently not connected to the Infiniband network.
amd-short
Partition amd-short
provides access to part of the resources of node802 and the GPU nodes 8[63-76]. These node are in their own partition because they have AMD CPUs. It is possible to request up to 1TB of memory for a single node job, but this will always require the job to run on node802. All other nodes in the partition have 256GB of memory in total.
amd-short, amd-long
Partitions amd-short
and amd-long
have limits on the number of CPUs per node. Slurm can only allocate 12 CPUs per socket on each node for a total of 24 CPUs per node. This is to ensure that GPU jobs in partitions amd-gpu-short
and amd-gpu-long
still have sufficient CPU resources available so that such jobs can start quickly.
amd-gpu-short, amd-gpu-long
In order to ensure that jobs submitted to the partitions amd-gpu-short/long
are not blocked by jobs in the amd-short
partition, the two amd-gpu
partitions have been given a higher priority factor compared to the amd-short
partition. Therefore, short jobs that require a GPU node should always be submitted to the amd-gpu-*
partitions.
The GPU nodes in the two partitions offer access to different types of GPUs. While a small number of nodes provide full A100 with 80GBs of memory, in most nodes each A100s GPUs has been split up into two separate GPUs which are called MIGs. The MIGs are completely independent from each other. At the moment, the two different MIGs have the same amount of memory (40GB), but differ by one compute instances which is why there is a small performance difference between the two. You can choose which type of GPU you want, for example if you need 80GB of memory instead of 40GB. If you do not specify a type of GPU, then Slurm will pick one that is available. You can find an overview of the GPU configurations on https://pubappslu.atlassian.net/wiki/spaces/HPCWIKI/pages/37519378 or by using scontrol show node <node name>
on ALICE
In order to select a specific type of GPU, you need to use #SBATCH --gres=gpu:<gpu_type>:<number_of_gpus>
, for example
if you want one A100:
#SBATCH --gres=gpu:a100:1
if you want one MIG 4g.40gb:
#SBATCH --gres=gpu:4g.40gb:1
Because the new GPUs have significantly more memory than the 2080TIs (80GB/40GB versus 11 GB), the GPU billing is also higher for the two partitions. The GPU billing has been increased based on the memory of the GPU and relative to the 2080TIs which corresponds to a factor of 4. All types of GPUs in the two partitions are billed the same. The billing affects your fair share.
Multi-GPU jobs on MIG GPUs
Private partitions
Partition cpu_lorentz
Partition cpu_lorentz
is only available for researchers from Lorentz Institute (LION). We recommend that you read the following before you start to use the partition:
If you have any questions or if you need assistance with getting your job to run on this partition, do not hesitate to contact ALICE Helpdesk.
Partition gpu_strw
Partition gpu_strw
is only available for researchers from the group of E. Rossi (STRW) and members of STRW. We recommend that you read the following page before starting to use the partition:
If you have any questions or if you need assistance with getting your job to run on this partition, do not hesitate to contact ALICE Helpdesk.
Partition mem_mi
Partition mem_mi
is available exclusively to users from MI. We recommend that you read the general instructions for using node802 before you start to use the partition ().
If you have any questions or if you need assistance with getting your job to run on this partition, do not hesitate to contact ALICE Helpdesk.
Partition Limits
The following limits currently apply to each partition:
Partition | #Allocated CPUs per User (running jobs) | #Allocated GPUs per User (running jobs) | #Jobs submitted per User |
---|---|---|---|
cpu-short, amd-short | 288 |
|
|
cpu-medium | 240 |
|
|
cpu-long, amd-long | 192 |
|
|
gpu-short, amd-gpu-short | 168 | 28 |
|
gpu-medium | 120 | 20 |
|
gpu-long, amd-gpu-long | 96 | 16 |
|
mem |
|
|
|
mem_mi |
|
|
|
cpu_natbio |
|
|
|
cpu_lorentz |
|
|
|
gpu_strw |
|
|
|
gpu_lucdh |
|
|
|
testing |
|
| 4 |
Only the testing partitions has limits on the amount of jobs that you can submit. You can submit as many jobs as you want to the cpu and gpu partitions, but slurm will only allocate jobs that fit in the above CPU and node limits. If you submit multiple jobs then slurm will sum up the number of CPUs or nodes that your job requests. Those jobs that exceed the limits will wait in the queue until running jobs have finished and the total number of allocated CPUs and nodes falls below the limits. Then Slurm will allocate waiting jobs if limits permit it. For those jobs that exceed the limits and wait in the queue, squeue
will show "(QOSMaxNodePerUserLimit)".