SCP for data transfer
Unless otherwise mentioned, the commands specified below are written as if you were running on your local workstation and not on the cluster.
Always make sure that you know where you want to put your data on the cluster or where you want to copy it from.
Copying files to the cluster
Once you know where to/from you want to copy data, you can use the command scp
to perform the copy operation in this manner:
ALICE
copy data from your local workstation to your user directory on the shared scratch
scp <local_file_name> alice1:data1/<some_directory>
Here, we make use of the fact that there is a symbolic link to your user directory on the shared scratch in your home directory.
copy data from your local workstation directly to your home directory
scp <local_file_name> alice1
SHARK
copy data from your local workstation to a share on the fast HPC storage to which you have access:
scp <local_file_name> shark1:/exports/<storage-share-name>/<some_directory>
where you should replace <storage-share-name> by the name of the share.
copy data from your local workstation directly to your home directory
scp <local_file_name> shark1
If you do not have setup key-based authentication, scp will ask you for your password for the ssh-gateway and the login node.
This copies a local file to the cluster assuming you have already created the directory <some_directory>
. If the directory does not exists yet, you have to create it first on the cluster, e.g.:
ALICE
ssh hpc1 mkdir -p data1/<some_directory> logout
SHARK
ssh shark1 mkdir -p /exports/<storage-share-name>/<some_directory> logout
The path to the directory that you want to copy to can of course be longer.
Copying files from the cluster
To copy a file from the cluster back to your local desktop or storage medium, you can use, for example:
ALICE
from the shared scratch storage to the current working directory on your local workstation
scp alice1:data1/<remote_directory>/<remote_file_name> ./
from your home directory on the cluster:
scp alice1:<remote_directory>/<remote_file_name> ./
SHARK
from the HPC storage to the current working directory on your local workstation
scp shark1:/exports/<storage-share-name>/<remote_directory>/<remote_file_name> ./
from your home directory on the cluster:
scp shark1:<remote_directory>/<remote_file_name> ./
where you need to replace “<path_to_remote_file>” by the directory where the file is on the cluster and <remote_file_name> by the name of the file on the cluster. The path to the directory that you want to copy from can of course be longer and more complex
Copying entire directories
You can also copy an entire directory (including its sub-directories) to and from the cluster. This only requires adding the -r
option to scp, e.g.,
ALICE
Copying from your local computer to your directory on the shared scratch storage:
scp -r <local_directory> alice1:data1/
where you need to replace “<local_directory>” by the directory on your local workstation that you want to copy.
Copying from your user directory on the shared scratch storage to your local computer
scp -r alice1:data1/<remote_directory> ./
where you need to replace “<remote_directory>” by the directory on ALICE that you want to copy.
SHARK
Copying from your local computer to your directory on the HPC storage:
scp -r <local_directory> shark1:/exports/<storage-share-name>/
where you need to replace “<local_directory>” by the directory on your local workstation that you want to copy and <storage-share-name> by the share that you have access to.
Copying from your user directory on the shared scratch storage to your local computer
scp -r shark1:data1/<storage-share-name>/<remote_directory> ./
where you need to replace “<remote_directory>” by the directory on ALICE that you want to copy.
Of course, you need to adjust the path to which you copy to as needed.
For more details on how to use scp, you can use see the man pages for scp (man scp
).
RSYNC for data transfer and synchronising directories
rsync is a powerful for data transfer. In addition to copying files and directories, it can also synchronise files and directories. This makes it possible to copy only the files that need to be updated reducing the amount of traffic. There is no additional options necessary to enable synchronisation. rsync will automatically check if all files need to copied or just updated ones.
This is an example of how you can copy/synchronise data between the cluster and your local workstation:
ALICE
for transferring the directory “<local_directory>” from your local workstation to the shared scratch storage:
rsync -azuve ssh <local_directory> alice1:data1/
for transferring the directory “<remote_directory>” from the shared scratch storage to the current working directory (“./”) on your local workstation
rsync -azuve ssh alice1:data1/<remote_directory> ./
SHARK
for transferring the directory “<local_directory>” from your local workstation to a share on the HPC storage:
rsync -azuve ssh <local_directory> shark1:/exports/<storage-share-name>/
for transferring the directory “<remote_directory>” from a share on the HPC storage to the current working directory (“./”) on your local workstation
rsync -azuve ssh shark1:/exports/<storage-share-name>/<remote_directory> ./
If you do not have setup key-based authentication, rsync will ask you for your password for the ssh-gateway and the login node.
Here, we have used the following options
-a
(or--archive
): a short-cut for a combination of options that include recursion and preserve almost anything including symlinks and user permissions-z
: compress the file stream-u:
skip files that do not have not been modified-v
: enable verbose output and print out all files that are transferred-
e ssh
: use ssh communication and copying
See the man page for rsync for more information about all its options.
SFTP for data transfer
Data can also be transferred using the sftp copy program. However, sftp works different from rsync and scp. With sftp, you connect to the server that you want to copy the to or from and the you upload (“put”) files to the server from your local workstation or download (“get”) files from the server to your local workstation.
Assuming that you have the ssh config setup accordingly, you can use:
ALICE
$ sftp alice1 ############################################################# Welkom bij de Alice SSH Gateway van de Universiteit Leiden Deze gateway dient slechts als SSH toegang tot ALICE systemen. Oneigenlijk gebruik van deze server kan leiden tot het ontzeggen van toegang. Welcome to the Leiden University ALICE SSH Gateway The only purpose of this gateway is SSH access to ALICE system. Improper use can lead to denial of access. More information is found at https://wiki.alice.universiteitleiden.nl helpdesk@alice.leidenuniv.nl ############################################################# Connected to alice1. sftp>
This way, you will have tunnelled through the ssh gateway and connected with a login node on ALICE. In the beginning, sftp will put you in your home directory.
If you did not set up ssh keys on ALICE, you will be asked to provide your user password first for the gateway and then for the login node.
SHARK
# sftp shark1 <username>@res-ssh-alg01.researchlumc.nl's password: <username>@res-hpc-lo02.researchlumc.nl's password: Connected to shark2. sftp>
For demonstration purposes, we have tunnelled through the LUMC ssh gateway. However, if you are working from within the LUMC network, you can directly connect to one of the login nodes.
At this point, you are in your home directory on SHARK.
If you have set up ssh keys on the gateway and the login nodes, sftp will not ask you for your user password.
We can then use various commands to traverse and manipulate both file systems. A list of commands are listed below:
Command | Function | Example |
---|---|---|
cd | Changes the directory of the remote computer | cd remote_directory |
lcd | Changes the directory of the local computer | lcd local_directory |
ls | Lists the contents of the remote directory | ls |
lls | Lists the contents of the local directory | lls |
pwd | Prints working directory of the remote computer | pwd |
lpwd | Prints working directory of the local computer | lpwd |
get | Copies a file from the remote directory to the local directory | get remote_file |
put | Copies a file from the local directory to the remote directory | put local_file |
exit | Closes the connection to the remote computer and exits the program | exit |
help | Displays application information on using commands | help |
Tools with graphical user interface
There are different sftp-based tools available with graphical user interfaces available on the web. Before you go for one, make sure that it supports tunnelling through the ssh gateway (proxy server) in case you need it.