Maintenance on ALICE
This section is used to announce upcoming maintenance and provide information before, during and after it. For general information about our maintenance policy, please have a look here: https://pubappslu.atlassian.net/wiki/spaces/HPCWIKI/pages/37519739
- 1 ALICE User Migration to new Cluster Managament system
- 1.1 Status of Migration
- 1.1.1 Open issues
- 1.1.2 Questions and Assistance
- 1.1.3 Access
- 1.1.4 Slurm changes
- 1.1.5 Software
- 1.1.6 Migration of compute resources on 09 Oct 2025
- 1.1.7 Migration of compute resources on 15 Oct 2025
- 1.2 Overview
- 1.3 User Migration
- 1.4 What does this mean for you?
- 1.5 Timeline
- 1.6 FAQ and issues
- 1.7 Questions and Assistance
- 1.1 Status of Migration
ALICE User Migration to new Cluster Managament system
This migration affects all users on ALICE.
Status of Migration
finishing - Last Updated: Nov 5, 2025
5 Nov
Logging into the gateway (ssh-gw) with new passwords now works.
The open-ondemand porttal has been migrated and has new resources in the interactive partition.
29 Oct
All nodes have been migrated and old queueing system is down as also the license of the management system has expired.
The open-ondemand portal and ssh gateway are worked on.
eduvpn is an alternative to the ssh-gw if you have an ulcn: eduVPN (with ULCN) - HPC wiki
15 Oct
Third batch of nodes have been migrated to the new environment and are available through slurm
10 Oct.
The third batch of nodes are put in reservation and will be migrated on october 15th (see “Migration of compute resources on 15 Oct 2025“ below)
The private partitions have not yet been integrated. We are working on it The private partitions have all been migrated to the new system.
The second batch of nodes are put in reservation and will be migrated on october 9th.
2 oct
Early access to the new system is now available through the new login nodes login3 and login4.
Not all compute resources have been migrated, because the “old” system is still in use during the transition phase. Until the end of the transition phase, we will gradually migrate all compute resources to the new system.
Open issues
RDP on login3
RDP on login3 is not working properly yet. Apps can start, but you will see a black background. RDP is working fine on login4.
Open on demand is not yet migrated
Passwords can differ on the ssh-gw and login nodes as the ssh-gw is not migrated (yet)
Questions and Assistance
If you have any questions or need assistance, do not hesitate to contact us through the ALICE Helpdesk email address.
Access
the new ssh gateway for the new system is not yet online
for user created before 01 Oct. 2025
for LEI users
you can directly connect to the login3 and login4 without the ssh gateway using eduVPN
or connect through the ssh gateway, but we recommend that you set up ssh keys for this
Other user
you can reach login3 and login4 by tunneling through the current ALICE ssh gateway. If you have not done so yet, we recommend setting up ssh keys.
for user created after 01 Oct. 2025
for LEI users
you can directly connect to login3 and login4 without the ssh gateway using eduVPN
or connect through the ssh gateway, but you will have to set up ssh keys for this
Other user
you can reach login3 and login4 by tunneling through the current ALICE ssh gateway, but you will have to set up ssh keys.
The Open OnDemand portal is not yet available on the new system
Slurm changes
long and medium partitions have been merged and replaced by hardware-specific partitions (e.g., cpu-zen4, gpu-l4)
selecting specific hardware for the cpu-short and gpu-short partitions can still be done through features using
--constraintsyou will always have to specify a time limit for jobs
gpu paritions are only for jobs that need a gpu
separate testing partition for testing and debugging jobs
separate interactive partition
check available partitions and nodes with
sinfo
Software
There have been no changes to the module stack. If the stack is not automatically available, just run
module load ALICEIf you see the following warning after logging in:
Lmod has detected the following error: The following module(s) are unknown: "slurm" # and/or Lmod has detected the following error: The following module(s) are unknown: "gcc"you have to remove the line “module load slurm” and/or “module load gcc” in your
.bashrc. On the new system, slurm is not a module anymore and gcc should be loaded with a different module.Software installed in your own user environment, should continue to work on the new system.
Migration of compute resources on 09 Oct 2025
On 09 Oct 2025, we will migrate the next set of compute resources from all partitions, which is why they have been placed in a reservation.
We will also migrate all private partitions:
cpu_natbio
cpu_lorentz
gpu_strw
gpu_lion
gpu_cml
gpu_lucdh
This is the complete list of nodes that will be migrated on this day:
node[011-015,018-024,801-802,857-862,867-873,877-879]
During the migration, the nodes will be temporarily offline.
After the migration is complete, the compute resources will become available on the new system.
If you want to make use of the private paritions, you will have to switch to login3 or login4.
Migration of compute resources on 15 Oct 2025
On 15 Oct 2025, we will migrate the next set of compute resources from all partitions, which is why they have been placed in a reservation. They will continue to proces josb that will finish until the reservation starts. If you need to run longer job, please migrate to the new environment
This is the complete list of nodes that will be migrated on this day:
node[005-010,030-033,851-852,863-864,880-883]
During the migration, the nodes will be temporarily offline.
After the migration is complete, the compute resources will become available on the new system.
Overview
We have been building a new management node, because we are migrating to a new cluster management system (TrinityX) for ALICE. This step is necessary because of increasing licencing costs for the current cluster management system (Bright).
The process requires a complete rebuild of the ALICE cluster nodes (except the storage). On the plus side, it means that we can finally make all the new compute hardware that was bought earlier this year available to you. The new system will also make it easier for us to integrate new hardware into the cluster.
User Migration
Instead of a hard switch of all users from one system to the other, we aim for a transition phase for a limited amount of time. The transition phase will start on 01 Oct and end on 26 Oct. 2025.
During this transition phase both systems will run and users can start getting to know the new system, while we move nodes from the old to the new system.
There is a hard deadline for the migration though which is set by the expiration of the license of the current cluster management system. After 26 Oct., the “old” environment including access to login1 and login2 will no longer be available.
What does this mean for you?
During this migration, you will continue to have access to your data and submit jobs. The storage will be unaffected and shared between the systems.
We will update the user documentation for ALICE to reflect the changes.
On the new system, accessable via 2 new login nodes, the partition layout will become slightly different. We hope that this will make the partitions more intuitive.
the testing partition will be moved to compute nodes and will no longer run on the login nodes
we will add a dedicated partition for interactive jobs
there will only be a short and long partition. The medium partition will be dropped.
separate partitions per gpu type (e.g., gpu-l4, gpu-2080ti, gpu-a100, ..)
The Slurm accounting will start fresh, so job ids will start (almost) from the beginning. We recommend that you store job output files for jobs on the new system in a different location so that old files do not get overwritten.
A new ssh gateway and new login nodes will be used for the new system. You will have to adjust your ssh config settings accordingly. Your ssh keys will remain unaffected. We will communicate when the new gateway is available. Until then, you can use the current gateway to access the new login nodes, but we recommend that you set up ssh keys.
The software and ALICE module stack will be shared. There will be no changes to the ALICE module stack. The current stack will continue to work.
For some time now, we have been recommending RDP as a replacement for X2Go. On the new system, only RDP will be available. There is still an alternative to RDP through the ALICE Open OnDemand portal. We are planning to migrate the Open OnDemand portal on 27 Oct 2025 after the end of the transition phase.
Please try running your jobs on the new system as soon as it becomes available to you (a separate announcement will follow). The closer we will get to the deadline for the transition phase, the fewer resources will be available in the old system. If you run into issues, please let us know.
Timeline
01 Oct. 2025: New system will become online. First users will get access
09 Oct. 2025: New system will be generally available to all users
15 Oct. 2025: Third batch of resourcess wil be moved.
26 Oct. 2025: Transition phase will end. Old system will become unavailable. Only the new system will be available
27 Oct. 2025: Migration of Open Ondemand
FAQ and issues
The hostkey of login.alice.universiteitleiden.nl changed. The alias to the login servers has been changed to login3 and login4.
If you get the warning below, please update your ~/.ssh/known_hosts file by removing the old linessh-keygen -R 'login.alice.universiteitleiden.nl'@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that a host key has just been changed. The fingerprint for the ED25519 key sent by the remote host is SHA256:7+NddwW6meoT3eFEUUt5dt1juFnvmmgllhnsrih0AQo.Some users had problems loading Matlab in the graphical user interface, with error: Failed to load module "canberra-gtk-module".
This was solved by first loading the GTK3 from the software stack “module load GTK3”.
Questions and Assistance
If you have any questions or need assistance, do not hesitate to contact us through the ALICE Helpdesk email address.