Data transfer to and from Linux and Mac OS

On Linux and MacOS, using the command line interface to transfer files to and from the cluster is a very common approach

There are of course also tools that provide a graphical user interface.

A distinct advantage of rsync over scp and sftp is that it can not only copy data but also synchronise directories (update only files as needed)

Do not forget to check out our general recommendations for data transfer: Best Practices | Best Practices Data Transfer

This section assumes that you set up your ssh config as described here: Login to ALICE or SHARK from Linux | For regular users or “the more elegant way” (with or without the use of ssh keys).

We will also use the same aliases and limit the examples to using one login node (i.e., alice1, shark1) though you can always use the other login node, too.

SCP for data transfer

Unless otherwise mentioned, the commands specified below are written as if you were running on your local workstation and not on the cluster. If you are already logged into the cluster, you will have to run the commands using a second terminal window where you do not log into the cluster.

Always make sure that you know where you want to put your data on the cluster or where you want to copy it from.

Copying files to the cluster

Once you know where to/from you want to copy data, you can use the command scp to perform the copy operation in this manner:


ALICE

  • copy data from your local workstation to your user directory on the shared scratch

scp <local_file_name> alice1:data1/<some_directory>

Here, we make use of the fact that there is a symbolic link to your user directory on the shared scratch in your home directory.

  • copy data from your local workstation directly to your home directory

scp <local_file_name> alice1

SHARK

  • copy data from your local workstation to a share on the fast HPC storage to which you have access:

scp <local_file_name> shark1:/exports/<storage-share-name>/<some_directory>

where you should replace <storage-share-name> by the name of the share.

  • copy data from your local workstation directly to your home directory


This copies a local file to the cluster assuming you have already created the directory <some_directory>. If the directory does not exists yet, you have to create it first on the cluster, e.g.:


ALICE


SHARK


The path to the directory that you want to copy to can of course be longer.

Copying files from the cluster

To copy a file from the cluster back to your local desktop or storage medium, you can use, for example:


ALICE

  • from the shared scratch storage to the current working directory on your local workstation

  • from your home directory on the cluster:


SHARK

  • from the HPC storage to the current working directory on your local workstation

  • from your home directory on the cluster:


where you need to replace “<path_to_remote_file>” by the directory where the file is on the cluster and <remote_file_name> by the name of the file on the cluster. The path to the directory that you want to copy from can of course be longer and more complex

Copying entire directories

You can also copy an entire directory (including its sub-directories) to and from the cluster. This only requires adding the -r option to scp, e.g.,


ALICE

  • Copying from your local computer to your directory on the shared scratch storage:

where you need to replace “<local_directory>” by the directory on your local workstation that you want to copy.

  • Copying from your user directory on the shared scratch storage to your local computer

where you need to replace “<remote_directory>” by the directory on ALICE that you want to copy.


SHARK

  • Copying from your local computer to your directory on the HPC storage:

where you need to replace “<local_directory>” by the directory on your local workstation that you want to copy and <storage-share-name> by the share that you have access to.

  • Copying from your user directory on the shared scratch storage to your local computer

where you need to replace “<remote_directory>” by the directory on ALICE that you want to copy.


Of course, you need to adjust the path to which you copy to as needed.

For more details on how to use scp, you can use see the man pages for scp (man scp).

RSYNC for data transfer and synchronising directories

rsync is a powerful for data transfer. In addition to copying files and directories, it can also synchronise files and directories. This makes it possible to copy only the files that need to be updated reducing the amount of traffic. There is no additional options necessary to enable synchronisation. rsync will automatically check if all files need to copied or just updated ones.

This is an example of how you can copy/synchronise data between the cluster and your local workstation:


ALICE

  • for transferring the directory “<local_directory>” from your local workstation to the shared scratch storage:

  • for transferring the directory “<remote_directory>” from the shared scratch storage to the current working directory (“./”) on your local workstation


SHARK

  • for transferring the directory “<local_directory>” from your local workstation to a share on the HPC storage:

  • for transferring the directory “<remote_directory>” from a share on the HPC storage to the current working directory (“./”) on your local workstation


Here, we have used the following options

  • -a (or --archive): a short-cut for a combination of options that include recursion and preserve almost anything including symlinks and user permissions

  • -z: compress the file stream

  • -u: skip files that do not have not been modified

  • -v: enable verbose output and print out all files that are transferred

  • -e ssh: use ssh communication and copying

See the man page for rsync for more information about all its options.

SFTP for data transfer

Data can also be transferred using the sftp copy program. However, sftp works different from rsync and scp. With sftp, you connect to the server that you want to copy the to or from and the you upload (“put”) files to the server from your local workstation or download (“get”) files from the server to your local workstation.

Assuming that you have the ssh config setup accordingly, you can use:


ALICE

This way, you will have tunnelled through the ssh gateway and connected with a login node on ALICE. In the beginning, sftp will put you in your home directory.

If you did not set up ssh keys on ALICE, you will be asked to provide your user password first for the gateway and then for the login node.


SHARK

For demonstration purposes, we have tunnelled through the LUMC ssh gateway. However, if you are working from within the LUMC network, you can directly connect to one of the login nodes.

At this point, you are in your home directory on SHARK.

If you have set up ssh keys on the gateway and the login nodes, sftp will not ask you for your user password.


We can then use various commands to traverse and manipulate both file systems. A list of commands are listed below:

Command

Function

Example

Command

Function

Example

cd

Changes the directory of the remote computer

cd remote_directory

lcd

Changes the directory of the local computer

lcd local_directory

ls

Lists the contents of the remote directory

ls

lls

Lists the contents of the local directory

lls

pwd

Prints working directory of the remote computer

pwd

lpwd

Prints working directory of the local computer

lpwd

get

Copies a file from the remote directory to the local directory

get remote_file

put

Copies a file from the local directory to the remote directory

put local_file

exit

Closes the connection to the remote computer and exits the program

exit

help

Displays application information on using commands

help

Tools with graphical user interface

There are different sftp-based tools available with graphical user interfaces available on the web. Before you go for one, make sure that it supports tunnelling through the ssh gateway (proxy server) in case you need it.