Storage on ALICE

The file system is one of the critical components to the service and users should aim to make the best use of the resources. This chapter details the different file systems in use.

A number of best practices for reading and writing data (I/O), and basic housekeeping of your files can be found here: https://pubappslu.atlassian.net/wiki/spaces/HPCWIKI/pages/37749471/Best+Practices#Best-Practices---Shared-File-System

By default, users have access to two directories: a home directory and a directory on scratch-shared. Both directories are own by the user and the unix group is set that of the institute or faculty to which the user belongs to.

In addition, a group of users that want to work collaboratively on ALICE can request a project directory to which all a members of the group have access to.

Summary of available file systems

File system

Mount point

User-relevant directory

Disk Quota

Speed

Shared between nodes

Expiration

Backup

Files removed?

File system

Mount point

User-relevant directory

Disk Quota

Speed

Shared between nodes

Expiration

Backup

Files removed?

Home

/home

/home/<username>

15 GB

Normal

Yes

None

Nightly incremental

No

Local scratch

/scratchdata

/scratchdata/<username>/<slurm_job_id>

10 TB

Fast

No

End of job

No

No automatic deletion currently

scratch-shared

/data1

/data1/<username>
(same as /home/<username>/data1)

5TB

Fast for nodes with Infiniband, normal for others

Yes

At most 28 days

No

No automatic deletion currently

 

 

/data1/projects/pi-<pi_name>

upon request

Fast for nodes with Infiniband, normal for others

Yes

At most 28 days

No

No automatic deletion currently

Cluster-wide software

/cm/shared

N/A (not for user storage)

N/A (not for user storage)

Normal

Yes

None

Nightly Incremental

N/A (not for user storage)

The home file system

The home file system contains the files you normally use to run your jobs such as programs, scripts, job files. There is only limited space available for each user which is why you should never store the data that you want to process or that your jobs will produce in your home.

By default, you have 15 GB disk space available. Your current usage is shown when you type in the command:

quota -s

The home file system is a network file system (NFS) that is available on all login and compute nodes. Thus, your jobs can access the home file system from all nodes. The downside is that the home file system is not particularly fast, especially with the handling of metadata: creating and destroying of files; opening and closing of files; many small updates to files and so on.

For newly created accounts, the home directory only contains a link to your directory on scratch-shared.

We periodically create backups of data in the home directory

On ALICE, your home directory is located at /home/<username> and by default, it is only accessible by the users.

Do NOT change permissions on your home directory unless you know exactly what you are doing. Setting read permissions at group level or world level should not be used to share data because it will most likely provide more users with access to your data than you intend to do. If you want to share data with one or more users on ALICE, request a project directory from the ALICE Helpdesk.

The scratch-shared file system on /data1

In addition to the home file system that is shared among all nodes, you also need storage for large files that is shared among the nodes. For this, we have a new shared scratch disk accessible that is mounted at /data1. The scratch-shared directory hosts user and project directories

User directories on scratch-shared

User directories on scratch-shared are available

cd /data1/<username> or cd ~/data1

The total size of this shared scratch space is currently 370 TB which is significantly more than the old shared scratch space. There is a default quota of 5TB for individual users. Project/PI directories are also hosted on the the shared scratch space, but quotas are assigned for each project individually.

You can check your user quota by running

beegfs-ctl --getquota --uid $USER

Please note that we do not generate backups of data on the shared scratch system.

Your user directory on the shared scratch is located at /data1/<username> and by default, it is only accessible by the users.

Do NOT change permissions on your scratch-shared directory unless you know exactly what you are doing. Setting read permissions at group level or world level should not be used to share data because it will most likely provide more users with access to your data than you intend to do. If you want to share data with one or more users on ALICE, request a project directory from the ALICE Helpdesk.

There is also a link to your scratch-shared directory in your home directory: /home/<username>/data1. It is important to keep in mind that even though it is in your home directory, it is only a link and using the link, stores the data on the shared scratch space. Therefore, the data are not part of the backup generated for the home directory.

Consider the following properties of the shared scratch space when you use it for your jobs:

  • The shared scratch space is a BeeGFS-based parallel file system internally connected with Infiniband (100Gb/s).

  • The best I/O performance can be achieved with parallel I/O workloads.

  • The cpu and login nodes are connected via Infiniband (100Gb/s) to the shared scratch storage whereas all other nodes are connected via standard 10Gb/s ethernet. Therefore speed and data transfer rate for /data1 will be best for the cpu and login nodes.

  • In most cases the speed and data transfer rate will be better than for the home file system, but still slower than the local scratch disk (see next section).

  • There is a quota on your user directory on /data1. So, you may not have enough space to write all the files you want. Thus, carefully think how your job will behave if it tries to write to /data1 and there is insufficient space: it would be a waste of budget if the results of a long computation are lost because of it.

  • Even though we currently do not enforce it by automatically deleting data, your data should not be older than a month. If you need to keep data for longer please contact the ALICE Helpdesk. In general, large amounts of data should only be stored as long as is necessary for completing your jobs.

Project directories

For teams or groups of users that want to work collaboratively on ALICE, we offer project directories as a shared storage space. Project directories can be used for example to share data, software/pipelines and data products.

The PI of the group or project has to request the project directory from the ALICE Helpdesk and functions as the main contact person for us.

Project directories get assigned a quota based on the needs of the project, but which can be increased when necessary.

The default setup is that team members cannot delete and (over-)write each other’s data. The alternative currently is the opposite, i.e., team members can delete and (over-)write each other’s data.

If you would like to make use of a project directory, the PI should contact the ALICE Heldpesk and provide the following information:

  • ALICE user name of the PI

  • ALICE user name of group/project members

  • Which mode for write permission is needed? The default setup is that team members cannot delete and (over-)write each other’s data. It is possible to write into directories created by another team members, but only after manually changing the write permissions on the directory. The alternative currently is that team members can delete and (over-)write each others' data.

Any changes to the project directory such as adding new users or requesting more quota need to be authorized by the PI.

You can check the quota of a project directory by running

beegfs-ctl --getquota --gid pi-<pi_username>

The local scratch file system on /scratchdata

The scratch file system on /scratchdata is a local file system on each node. It is intended as fast, temporary storage that can be used while running a job. The local scratch file system of a node can only be accessed when you run a job on that node.

There is no quota for the scratch file system, but use of it is eventually limited by the available storage space (see the Table in Summary of available file systems). Scratch disks are not backed up and are cleaned at the end of a job. This means that you have to move your data back to the shared scratch space at the end of your job or all your data will be lost.

Since the disks are local, read and write operations on /scratchdata are much faster than on the home file system or the shared scratch file system. This makes it very suitable for I/O intensive operations.

How to best use local scratch

In general, accessing the local scratch file system on /scratchdata should be incorporated into your job. For example, copy your input files from your directory on /home or /data to the local scratch at the start of a job, create all temporary files needed by your job on the local scratch (assuming they don't need to be shared with other nodes) and copy all output files at the end of a job back to your /home or /data directory.

There are two things to note:

  • On the node that your job is running on, a directory will be created for you upon the start of a job. The directory name is /scratchdata/${SLURM_JOB_USER}/${SLURM_JOB_ID} where SLURM_JOB_USER is your ALICE username and SLURM_JOB_ID is the id of the job. You do not have to define these two variables yourself. They will be available for your to use in your job script.

  • Do not forget to copy your results back to /home or /data! The local scratch space will be cleaned and the directory will be removed after your job finishes and your results will be lost if you forget this step.

Software file system

The software file system provides a consistent set of software packages for the entire cluster. It is mounted at /cm/shared on every node.

You do not need to access this file system directly, because we provide a much easier way of using avilable software. Also, as a user, you cannot change the content of this file system.

  • We do nightly incremental backups of the home and software file system.

  • Files that are open at the time of the backup will be skipped.

  • We can restore files and/or directories when you accidentally remove them up to 15 days back, provided they already existed during the last successful backup.

  • There is no backup for the shared scratch file system.

Compute Local

Each worker node has multiple file system mounts.

  • /dev/shm - On each worker, you may also create a virtual file system directly into memory, for extremely fast data access. It is the fastest available file system, but be advised that this will count against the memory used for your job. The maximum size is set to half the physical RAM size of the worker node.