Data transfer to and from Linux and Mac OS
On Linux and MacOS, using the command line interface to transfer files to and from the cluster is a very common approach
There are of course also tools that provide a graphical user interface.
A distinct advantage of rsync
over scp and sftp is that it can not only copy data but also synchronise directories (update only files as needed)
Do not forget to check out our general recommendations for data transfer: Best Practices | Best Practices Data Transfer
This section assumes that you set up your ssh config as described here: Login to ALICE or SHARK from Linux | For regular users or “the more elegant way” (with or without the use of ssh keys).
We will also use the same aliases and limit the examples to using one login node (i.e., alice1, shark1) though you can always use the other login node, too.
SCP for data transfer
Unless otherwise mentioned, the commands specified below are written as if you were running on your local workstation and not on the cluster. If you are already logged into the cluster, you will have to run the commands using a second terminal window where you do not log into the cluster.
Always make sure that you know where you want to put your data on the cluster or where you want to copy it from.
Copying files to the cluster
Once you know where to/from you want to copy data, you can use the command scp
to perform the copy operation in this manner:
ALICE
copy data from your local workstation to your user directory on the shared scratch
scp <local_file_name> alice1:data1/<some_directory>
Here, we make use of the fact that there is a symbolic link to your user directory on the shared scratch in your home directory.
copy data from your local workstation directly to your home directory
scp <local_file_name> alice1
SHARK
copy data from your local workstation to a share on the fast HPC storage to which you have access:
scp <local_file_name> shark1:/exports/<storage-share-name>/<some_directory>
where you should replace <storage-share-name> by the name of the share.
copy data from your local workstation directly to your home directory
This copies a local file to the cluster assuming you have already created the directory <some_directory>
. If the directory does not exists yet, you have to create it first on the cluster, e.g.:
ALICE
SHARK
The path to the directory that you want to copy to can of course be longer.
Copying files from the cluster
To copy a file from the cluster back to your local desktop or storage medium, you can use, for example:
ALICE
from the shared scratch storage to the current working directory on your local workstation
from your home directory on the cluster:
SHARK
from the HPC storage to the current working directory on your local workstation
from your home directory on the cluster:
where you need to replace “<path_to_remote_file>” by the directory where the file is on the cluster and <remote_file_name> by the name of the file on the cluster. The path to the directory that you want to copy from can of course be longer and more complex
Copying entire directories
You can also copy an entire directory (including its sub-directories) to and from the cluster. This only requires adding the -r
option to scp, e.g.,
ALICE
Copying from your local computer to your directory on the shared scratch storage:
where you need to replace “<local_directory>” by the directory on your local workstation that you want to copy.
Copying from your user directory on the shared scratch storage to your local computer
where you need to replace “<remote_directory>” by the directory on ALICE that you want to copy.
SHARK
Copying from your local computer to your directory on the HPC storage:
where you need to replace “<local_directory>” by the directory on your local workstation that you want to copy and <storage-share-name> by the share that you have access to.
Copying from your user directory on the shared scratch storage to your local computer
where you need to replace “<remote_directory>” by the directory on ALICE that you want to copy.
Of course, you need to adjust the path to which you copy to as needed.
For more details on how to use scp, you can use see the man pages for scp (man scp
).
RSYNC for data transfer and synchronising directories
rsync is a powerful for data transfer. In addition to copying files and directories, it can also synchronise files and directories. This makes it possible to copy only the files that need to be updated reducing the amount of traffic. There is no additional options necessary to enable synchronisation. rsync will automatically check if all files need to copied or just updated ones.
This is an example of how you can copy/synchronise data between the cluster and your local workstation:
ALICE
for transferring the directory “<local_directory>” from your local workstation to the shared scratch storage:
for transferring the directory “<remote_directory>” from the shared scratch storage to the current working directory (“./”) on your local workstation
SHARK
for transferring the directory “<local_directory>” from your local workstation to a share on the HPC storage:
for transferring the directory “<remote_directory>” from a share on the HPC storage to the current working directory (“./”) on your local workstation
Here, we have used the following options
-a
(or--archive
): a short-cut for a combination of options that include recursion and preserve almost anything including symlinks and user permissions-z
: compress the file stream-u:
skip files that do not have not been modified-v
: enable verbose output and print out all files that are transferred-
e ssh
: use ssh communication and copying
See the man page for rsync for more information about all its options.
SFTP for data transfer
Data can also be transferred using the sftp copy program. However, sftp works different from rsync and scp. With sftp, you connect to the server that you want to copy the to or from and the you upload (“put”) files to the server from your local workstation or download (“get”) files from the server to your local workstation.
Assuming that you have the ssh config setup accordingly, you can use:
ALICE
This way, you will have tunnelled through the ssh gateway and connected with a login node on ALICE. In the beginning, sftp will put you in your home directory.
If you did not set up ssh keys on ALICE, you will be asked to provide your user password first for the gateway and then for the login node.
SHARK
For demonstration purposes, we have tunnelled through the LUMC ssh gateway. However, if you are working from within the LUMC network, you can directly connect to one of the login nodes.
At this point, you are in your home directory on SHARK.
If you have set up ssh keys on the gateway and the login nodes, sftp will not ask you for your user password.
We can then use various commands to traverse and manipulate both file systems. A list of commands are listed below:
Command | Function | Example |
---|---|---|
cd | Changes the directory of the remote computer | cd remote_directory |
lcd | Changes the directory of the local computer | lcd local_directory |
ls | Lists the contents of the remote directory | ls |
lls | Lists the contents of the local directory | lls |
pwd | Prints working directory of the remote computer | pwd |
lpwd | Prints working directory of the local computer | lpwd |
get | Copies a file from the remote directory to the local directory | get remote_file |
put | Copies a file from the local directory to the remote directory | put local_file |
exit | Closes the connection to the remote computer and exits the program | exit |
help | Displays application information on using commands | help |
Tools with graphical user interface
There are different sftp-based tools available with graphical user interfaces available on the web. Before you go for one, make sure that it supports tunnelling through the ssh gateway (proxy server) in case you need it.