General
Maintenance generally occurs on Mondays during working hours.
Minor Maintenance
Minor maintenance only affects a small part of the cluster (i.e., of the order of one node) and it will not affect you as the users significantly. This can be for example:
Test a new node image on single node
Test new features that require reserving a node.
Restarting of nodes to update images (including reserving/draining the nodes in question)
Info |
---|
Minor maintenance can occur every Monday without prior notice to the users |
Major Maintenance
Major maintenance affects large parts of the cluster and your ability to run jobs and/or access data. For example, this can be
Updates to one or more compute node groups which would make entire partitions unavailable
Updates to the cluster management nodes
Updates to the storage server
Info |
---|
Major maintenance can be scheduled on the first Monday of every month and will be announced at least one week in advance. If there is no need major maintenance then it will not happen and there will be no announcement. The corresponding Monday will still be available for minor maintenance. |
System Maintenance
System maintenance refers to maintenance that requires the entire cluster to be taken offline. Such maintenance can happen every half a year and will be announced at least in advance.
Critical/Emergency Maintenance
Critical/Emergency maintenance refers to maintenance that is required because of a critical and sudden issue which requires immediate attention. While this can happen any time, the general maintenance will ensure that such events remain rare. Naturally, critical/Emergency maintenance cannot be announced in advanced. However, we will strive to inform you when it happens, most likely through the maintenance page (see below)
Updates and Announcement of Maintenance
Information about upcoming or ongoing maintenance can be found here: Maintenance announcementson ALICE. Major and system maintenance will also be announced via email.