On Linux and MacOS, using the command line interface to transfer files to and from the cluster is a very common approach

There are of course also tools that provide a graphical user interface.

A distinct advantage of rsync over scp and sftp is that it can not only copy data but also synchronise directories (update only files as needed)

Do not forget to check out our general recommendations for data transfer: Best Practices | Best Practices Data Transfer

This section assumes that you set up your ssh config as described here: Login to ALICE or SHARK from Linux | For regular users or “the more elegant way” (with or without the use of ssh keys).

We will also use the same aliases and limit the examples to using one login node (i.e., alice1, shark1) though you can always use the other login node, too.

1 SCP for data transfer
2 RSYNC for data transfer and synchronising directories
3 SFTP for data transfer
- 3.1 Tools with graphical user interface

SCP for data transfer

Unless otherwise mentioned, the commands specified below are written as if you were running on your local workstation and not on the cluster. If you are already logged into the cluster, you will have to run the commands using a second terminal window where you do not log into the cluster.

Always make sure that you know where you want to put your data on the cluster or where you want to copy it from.

Copying files to the cluster

Once you know where to/from you want to copy data, you can use the command scp to perform the copy operation in this manner:

ALICE

copy data from your local workstation to your user directory on the shared scratch

scp <local_file_name> alice1:data1/<some_directory>

Here, we make use of the fact that there is a symbolic link to your user directory on the shared scratch in your home directory.

copy data from your local workstation directly to your home directory

scp <local_file_name> alice1

SHARK

copy data from your local workstation to a share on the fast HPC storage to which you have access:

scp <local_file_name> shark1:/exports/<storage-share-name>/<some_directory>

where you should replace <storage-share-name> by the name of the share.

copy data from your local workstation directly to your home directory

This copies a local file to the cluster assuming you have already created the directory <some_directory>. If the directory does not exists yet, you have to create it first on the cluster, e.g.:

ALICE

SHARK

The path to the directory that you want to copy to can of course be longer.

Copying files from the cluster

To copy a file from the cluster back to your local desktop or storage medium, you can use, for example:

ALICE

from the shared scratch storage to the current working directory on your local workstation

from your home directory on the cluster:

SHARK

from the HPC storage to the current working directory on your local workstation

from your home directory on the cluster:

where you need to replace “<path_to_remote_file>” by the directory where the file is on the cluster and <remote_file_name> by the name of the file on the cluster. The path to the directory that you want to copy from can of course be longer and more complex

Copying entire directories

You can also copy an entire directory (including its sub-directories) to and from the cluster. This only requires adding the -r option to scp, e.g.,

ALICE

Copying from your local computer to your directory on the shared scratch storage:

where you need to replace “<local_directory>” by the directory on your local workstation that you want to copy.

Copying from your user directory on the shared scratch storage to your local computer

where you need to replace “<remote_directory>” by the directory on ALICE that you want to copy.

SHARK

Copying from your local computer to your directory on the HPC storage:

where you need to replace “<local_directory>” by the directory on your local workstation that you want to copy and <storage-share-name> by the share that you have access to.

Copying from your user directory on the shared scratch storage to your local computer

where you need to replace “<remote_directory>” by the directory on ALICE that you want to copy.

Of course, you need to adjust the path to which you copy to as needed.

For more details on how to use scp, you can use see the man pages for scp (man scp).

RSYNC for data transfer and synchronising directories

rsync is a powerful for data transfer. In addition to copying files and directories, it can also synchronise files and directories. This makes it possible to copy only the files that need to be updated reducing the amount of traffic. There is no additional options necessary to enable synchronisation. rsync will automatically check if all files need to copied or just updated ones.

This is an example of how you can copy/synchronise data between the cluster and your local workstation:

ALICE

for transferring the directory “<local_directory>” from your local workstation to the shared scratch storage:

for transferring the directory “<remote_directory>” from the shared scratch storage to the current working directory (“./”) on your local workstation

SHARK

for transferring the directory “<local_directory>” from your local workstation to a share on the HPC storage:

for transferring the directory “<remote_directory>” from a share on the HPC storage to the current working directory (“./”) on your local workstation

Here, we have used the following options

-a (or --archive): a short-cut for a combination of options that include recursion and preserve almost anything including symlinks and user permissions
-z: compress the file stream
-u: skip files that do not have not been modified
-v: enable verbose output and print out all files that are transferred
-e ssh: use ssh communication and copying

See the man page for rsync for more information about all its options.

SFTP for data transfer

Data can also be transferred using the sftp copy program. However, sftp works different from rsync and scp. With sftp, you connect to the server that you want to copy the to or from and the you upload (“put”) files to the server from your local workstation or download (“get”) files from the server to your local workstation.

Assuming that you have the ssh config setup accordingly, you can use:

ALICE

This way, you will have tunnelled through the ssh gateway and connected with a login node on ALICE. In the beginning, sftp will put you in your home directory.

If you did not set up ssh keys on ALICE, you will be asked to provide your user password first for the gateway and then for the login node.

SHARK

For demonstration purposes, we have tunnelled through the LUMC ssh gateway. However, if you are working from within the LUMC network, you can directly connect to one of the login nodes.

At this point, you are in your home directory on SHARK.

If you have set up ssh keys on the gateway and the login nodes, sftp will not ask you for your user password.

We can then use various commands to traverse and manipulate both file systems. A list of commands are listed below:

Command	Function	Example

Command	Function	Example
cd	Changes the directory of the remote computer	cd remote_directory
lcd	Changes the directory of the local computer	lcd local_directory
ls	Lists the contents of the remote directory	ls
lls	Lists the contents of the local directory	lls
pwd	Prints working directory of the remote computer	pwd
lpwd	Prints working directory of the local computer	lpwd
get	Copies a file from the remote directory to the local directory	get remote_file
put	Copies a file from the local directory to the remote directory	put local_file
exit	Closes the connection to the remote computer and exits the program	exit
help	Displays application information on using commands	help

Tools with graphical user interface

There are different sftp-based tools available with graphical user interfaces available on the web. Before you go for one, make sure that it supports tunnelling through the ssh gateway (proxy server) in case you need it.

HPC wiki

Data transfer to and from Linux and Mac OS

SCP for data transfer

Copying files to the cluster

Copying files from the cluster

Copying entire directories

RSYNC for data transfer and synchronising directories

SFTP for data transfer

Tools with graphical user interface

Related content