Off to research computing Wonderland

What is ALICE

ALICE (Academic Leiden Interdisciplinary Cluster Environment) is the high-performance computing (HPC) facility of the partnership between Leiden University and Leiden University Medical Center (LUMC). It is available to any researcher from both partners. Leiden University and LUMC aim to help deliver cutting edge research using innovative technology within the broad area of data-centric HPC. Both partners are responsible for the hosting, system support, scientific support and service delivery of several large super-computing and research data storage resources for the Leiden research community.

ALICE is one of the projects to provide research IT infrastructure and support centrally for Leiden University. If you want know more about other projects, please have a look at the confluence space of the RICC team: RICC

1 What is ALICE
2 Why ALICE
3 Overview of the cluster
4 Future plans
5 Costs overview
6 Hardware Description
- 6.1 List of Nodes
- 6.2 GPU overview

Why ALICE

High-Performance Computing (HPC) previously the domain of theoretical scientists and computer and software developers is becoming ever more important as a research tool in many research areas. An HPC facility, providing serious computational capabilities, combined with easy and flexible local access, is a strong advantage for these research areas. ALICE is the HPC facility that answers those needs for Leiden University (LU) and Leiden University Medical Center (LUMC). It is available to all researchers and students from both LU and LUMC.

The ALICE facility currently implemented is a first phase edition of what will be a larger Hybrid HPC facility for research, exceeding the capabilities of what individual institutes can build and will provide a stepping stone to the larger national facilities.

The facility aims to be an easily accessible, easily usable system with extensive local support at all levels of expertise. Given the expected diverse use, diversity is implemented in all aspects of computing, namely: the number of CPU's, GPU's and the ratio of these two numbers; the size of the core memory to the CPU's; the data storage size and location; and the speed of the network.

ALICE provides not only a sophisticated production machine but is also a tool for educating all aspects of HPC and a learning machine for young researchers to prepare themselves for national and international HPC.

Overview of the cluster

Conceptual View of ALICE

The ALICE cluster is a hybrid cluster consisting of

2 login nodes (4 TFlops)
20 CPU nodes (40 TFlops)
24 GPU nodes (68 GPU, 104 TFlops CPU + 1082 TFlops GPU)
1 High Memory CPU node (4 TFlops)
Storage Devices (local storage + home + shared-scratch): 45 x 15TB + 70TB + 364TB = 1109 TB)

In summary: 1234 TFlops, 1712 cores (3424 threads), 17.9 TB RAM.

In addition several research groups have also dedicated hardware within ALICE that is not listed above.

You can find a more comprehensive description of the individual components of ALICE in the section Hardware description.

ALICE is a pre-configuration system for the university to gain experience with managing, supporting and operating a university-wide HPC system. Once the system and governance have proven to be a functional research asset, it will be extended and continued for the coming years.

Future plans

Apart from our own expansion plans, we are always open to collaborate with other groups/institutes on expanding and improving ALICE.

The expansion that are currently planned include:

Additional CPU, high-memory and GPU nodes (estimated Q3 2025)
Switch from IB to 100GbE ethernet network (estimated Q3 2025)
Increase the storage capacity (Q2 2025)

In addition, we have the following major changes planned

Costs overview

The ALICE cluster is a shared facility owned by the participating groups and the University. Currently, access to the ALICE cluster and related services is provided free of charge to all researchers and students at LU and LUMC.

Hardware Description

List of Nodes

Node Name	CPU	Cores	Memory	Local scratch	GPUs per node	Infiniband	Public Access	Purpose	Type

Node Name	CPU	Cores	Memory	Local scratch	GPUs per node	Public Access	Purpose	Type
nodelogin0[1-2]	2 Intel Xeon Gold 6126 2.6GHz 12 core	24	384 GB	15 TB	1		Login nodes	Huawei FusionServer 2288H V5
node0[01-20]	2 Intel Xeon Gold 6126 2.6GHz 12-Core	24	384 GB	15 TB	0		CPU Node	Huawei FusionServer X6000 V5
node021	2 Intel Xeon Gold 6226R 2.90 GHz 16-Core	32	768 GB	11 TB	0		CPU Node	Supermicro SC116AC2-R706WB2
node0[22-24]	AMD EPYC 9534, 3.55GHz, 64 cores	64	256 GB	5 TB	0		CPU Node	Supermicro AS -1115CS-TNR
node801	4 Xeon Gold 6128 3.4GHz 6 core	24	2048 GB	20 TB	0		High-Memory Node	Dell PowerEdge R840
node802	2 AMD EPYC 7662 2.0GHz 64 cores	128	4096 GB	19 TB	0	(Partially)	High-Memory Node	Supermicro SERVERline Individual
node8[51-60]	2 Intel Xeon Gold 6126 2.6GHz 12 core	24	384 GB	15 TB	4		GPU Node	Huawei FusionServer G5500 / G560 V5
node86[1-2]	1 AMD EPYC 7443P 2.85GHz 24 cores	48	64GB	15T	1		GPU Node	DellEMC PowerEdge R7515
node8[63-64]	2 AMD EPYC 7513 2.6GHz 32 cores	64	256GB	15T	2		GPU Node	Gigabyte R282-Z93
node8[65-76]	2 AMD EPYC 7513 2.6GHz 32 cores	64	256GB	15T	4 (MIG)		GPU Node	Gigabyte R282-Z93
node877	2 x AMD EPYC 9334 2.7 GHz 32 cores	64	256GB	14T	2		GPU Node	Dell PowerEdge R7625
node878	2 x AMD EPYC 9354 3.25GHz 32 cores	64	768GB	7T	1		GPU Node	Gigabyte R283-ZF0-AAL1-000
node879	1 x AMD EPYC 9534 2.45GHz 64 cores	64	384GB	3T	1	(Partially)	GPU Node	Dell PowerEdge R7615

GPU overview

The table below lists the types of GPUs available on the GPU-equipped nodes.

Hostname	Public Access	GPU	GPU type in Slurm	Memory	Shader cores	Tensor Cores	CUDA Compute Capability

Hostname	Public Access	GPU	GPU type in Slurm	Memory	Shader cores	Tensor Cores	CUDA Compute Capability
nodelogin0[1-2]		Tesla T4		16 GB	2560	320	7.5
node8[51-60]		4 x PNY GeForce RTX 2080TI	`2080_ti`	11 GB	4352	544	7.5
node86[1-2]		Tesla T4		16GB	2560	320	7.5
node8[63-64], node8[75-76]		2 x A100	`a100`	80GB	6912	432	8.0
node8[65-74]		2 x A100 MIG 4g.40GB	`4g.40gb`	40GB	3949	246	8.0
node8[65-74]		2 x A100 MIG 3g40gb	`3g.40gb`	40GB	2962	185	8.0
node877		2 x A40	`a40`	48GB	10752	336	8.6
node878		1 x A100	`a100`	80GB	6912	432	8.0
node879	(Partially)	1 x L40S	`l40s`	48GB	18176	568	8.9
node8[80-83]		4 x L4	l4	24GB	7424	240	8.9

HPC wiki

About ALICE