About ALICE

Off to research computing Wonderland

What is ALICE

 

ALICE (Academic Leiden Interdisciplinary Cluster Environment) is the high-performance computing (HPC) facility of the partnership between Leiden University and Leiden University Medical Center (LUMC). It is available to any researcher from both partners. Leiden University and LUMC aim to help deliver cutting edge research using innovative technology within the broad area of data-centric HPC. Both partners are responsible for the hosting, system support, scientific support and service delivery of several large super-computing and research data storage resources for the Leiden research community.

ALICE is one of the projects to provide research IT infrastructure and support centrally for Leiden University. If you want know more about other projects, please have a look at the confluence space of the RICC team: RICC

Why ALICE

High-Performance Computing (HPC) previously the domain of theoretical scientists and computer and software developers is becoming ever more important as a research tool in many research areas. An HPC facility, providing serious computational capabilities, combined with easy and flexible local access, is a strong advantage for these research areas. ALICE is the HPC facility that answers those needs for Leiden University (LU) and Leiden University Medical Center (LUMC). It is available to all researchers and students from both LU and LUMC.

The ALICE facility currently implemented is a first phase edition of what will be a larger Hybrid HPC facility for research, exceeding the capabilities of what individual institutes can build and will provide a stepping stone to the larger national facilities.

The facility aims to be an easily accessible, easily usable system with extensive local support at all levels of expertise. Given the expected diverse use, diversity is implemented in all aspects of computing, namely: the number of CPU's, GPU's and the ratio of these two numbers; the size of the core memory to the CPU's; the data storage size and location; and the speed of the network.

ALICE provides not only a sophisticated production machine but is also a tool for educating all aspects of HPC and a learning machine for young researchers to prepare themselves for national and international HPC.

Overview of the cluster

Conceptual View of ALICE

The ALICE cluster is a hybrid cluster consisting of

  • 2 login nodes (4 TFlops)

  • 20 CPU nodes (40 TFlops)

  • 24 GPU nodes (68 GPU, 104 TFlops CPU + 1082 TFlops GPU)

  • 1 High Memory CPU node (4 TFlops)

  • Storage Devices (local storage + home + shared-scratch): 45 x 15TB + 70TB + 364TB = 1109 TB)

In summary: 1234 TFlops, 1712 cores (3424 threads), 17.9 TB RAM.

In addition several research groups have also dedicated hardware within ALICE that is not listed above.

You can find a more comprehensive description of the individual components of ALICE in the section Hardware description.

ALICE is a pre-configuration system for the university to gain experience with managing, supporting and operating a university-wide HPC system. Once the system and governance have proven to be a functional research asset, it will be extended and continued for the coming years.

Future plans

Apart from our own expansion plans, we are always open to collaborate with other groups/institutes on expanding and improving ALICE.

The expansion that are currently being discussed include:

  • Additional CPU, high-memory and GPU nodes (estimated end of Q2 2024)

  • 100GbE network (estimated end of Q1 2025)

In addition, we have the following major changes planned

  • Migration of ALICE to new operating system (estimated end of Q1 2024)

Costs overview

The ALICE cluster is a shared facility owned by the participating groups and the University. Currently, access to the ALICE cluster and related services is provided free of charge to all researchers and students at LU and LUMC.

Hardware Description

List of Nodes

Node Name

CPU

Cores

Memory

Local scratch

GPUs per node

Infiniband

Public Access

Purpose

Type

Node Name

CPU

Cores

Memory

Local scratch

GPUs per node

Infiniband

Public Access

Purpose

Type

nodelogin0[1-2]

2 Intel Xeon Gold 6126 2.6GHz 12 core

24

384 GB

15 TB

1

Login nodes

Huawei FusionServer 2288H V5

node0[01-20]

2 Intel Xeon Gold 6126 2.6GHz 12-Core

24

384 GB

15 TB

0

CPU Node

Huawei FusionServer X6000 V5

node021

2 Intel Xeon Gold 6226R 2.90 GHz 16-Core

32

768 GB

11 TB

0

CPU Node

Supermicro SC116AC2-R706WB2

node0[22-24]

AMD EPYC 9534, 3.55GHz, 64 cores

64

256 GB

5 TB

0

CPU Node

Supermicro AS -1115CS-TNR

node801

4 Xeon Gold 6128 3.4GHz 6 core

24

2048 GB

20 TB

0

High-Memory Node

Dell PowerEdge R840

node802

2 AMD EPYC 7662 2.0GHz 64 cores

128

4096 GB

19 TB

0

(Partially)

High-Memory Node

Supermicro SERVERline Individual

node8[51-60]

2 Intel Xeon Gold 6126 2.6GHz 12 core

24

384 GB

15 TB

4

GPU Node

Huawei FusionServer G5500 / G560 V5

node86[1-2]

1 AMD EPYC 7443P 2.85GHz 24 cores

48

64GB

15T

1

GPU Node

DellEMC PowerEdge R7515

node8[63-64]

2 AMD EPYC 7513 2.6GHz 32 cores

64

256GB

15T

2

GPU Node

Gigabyte R282-Z93

node8[65-76]

2 AMD EPYC 7513 2.6GHz 32 cores

64

256GB

15T

4 (MIG)

GPU Node

Gigabyte R282-Z93

GPU overview

The table below lists the types of GPUs available on the GPU-equipped nodes.

Hostname

Public Access

GPU type

Memory

Shader cores

Tensore Cores

CUDA Compute Capability

Hostname

Public Access

GPU type

Memory

Shader cores

Tensore Cores

CUDA Compute Capability

nodelogin0[1-2]

Tesla T4

16 GB

 2560

 320

7.5

node8[51-60]

4 x PNY GeForce RTX 2080TI

11 GB

4352

544

7.5

node86[1-2]

Tesla T4

16GB

2560

320

7.5

node8[63-64], node8[75-76]

A100

80GB

6912

432

8.0

node8[65-74]

2 x A100 MIG 4g.40GB

40GB

3949

246

8.0

node8[65-74]

2 x A100 MIG 3g40gb

40GB

2962

185

8.0