More on using R
This section provide additional information on different aspects of using R on ALICE and SHARK.
For a basic example of submitting an R job on both cluster, please see https://pubappslu.atlassian.net/wiki/spaces/HPCWIKI/pages/37027982
For background information on installing R packages yourself, please see https://pubappslu.atlassian.net/wiki/spaces/HPCWIKI/pages/37749400
For further reading, check out some of the references here: https://pubappslu.atlassian.net/wiki/spaces/HPCWIKI/pages/37749488/References+and+further+reading#R
Running an R script on the command line
There are several ways to launch an R script on the command line:
Rscript yourfile.R
R CMD BATCH yourfile.R
R --no-save < yourfile.R
The first approach (i.e. using the Rscript command) redirects the output into stdout. The second approach (i.e. using the R CMD BATCH command) redirects its output into a file (in case yourfile.Rout). A third approach is to redirect the input of the file yourfile.R to the R executable. Note that in the latter approach you must specify one of the following flags: --save
, --no-save
or --vanilla
. Careful with using the option --vanilla
, because it will also tell R to not read your user profile and environment.
Using R with OpenMPI
In addition to the examples for running R in parallel in https://pubappslu.atlassian.net/wiki/spaces/HPCWIKI/pages/37027982 , we provide here a basic HelloWorld example for using with OpenMPI
The R script
We will use the following R script and saved in a file called test_r_mpi.R
library(Rmpi)
id <- mpi.comm.rank(comm = 0)
np <- mpi.comm.size(comm = 0)
hostname <- mpi.get.processor.name()
msg <- sprintf("Hello world from process %03d of %03d, on host %s\n", id, np, hostname)
cat(msg)
mpi.barrier(comm = 0)
mpi.finalize()
The Slurm batch file
Now, we need a Slurm batch file to run the R script as a batch job which we call test_r_mpi.slurm
:
ALICE
#!/bin/bash
#SBATCH --job-name=test_r_mpi # Job name
#SBATCH --output=%x_%j.out # Output file name
#SBATCH --partition=testing # Partition
#SBATCH --time=00:05:00 # Time limit
#SBATCH --nodes=2 # Number of nodes
#SBATCH --ntasks-per-node=4 # MPI processes per node
module purge
module load slurm
module add R/4.0.5-foss-2020b
srun Rscript test_r_mpi.R
SHARK
#!/bin/bash
#SBATCH --job-name=test_r_mpi # Job name
#SBATCH --output=%x_%j.out # Output file name
#SBATCH --partition=short # Partition
#SBATCH --time=00:05:00 # Time limit
#SBATCH --nodes=2 # Number of nodes
#SBATCH --ntasks-per-node=4 # MPI processes per node
module purge
module load slurm
module add statistical/R/4.1.2/gcc.8.3.1
module add library/mpi/openmpi/4.1.1/gcc-8.3.1
srun Rscript test_r_mpi.R
After running the job above, Slurm will have created a file called test_r_mpi_<job_id>.out
whose content will look something like this
Running R interactively
You can start to run R interactively, just as an exercise and test. The recommended way is to run R in batch mode.
ALICE
SHARK
RStudio
RStudio is an Integrated Development Environment (IDE) for R. It includes a console, syntax highlighting editor that supports direct code execution, as well as tools for plotting, debugging, history and workspace management. For more information see RStudio webpage.
RStudio is installed on both clusters and can be invoked on a login node as follows:
ALICE
Note that you also need to load a version of R. Here, we are loading R/4.4.0
SHARK
Note that you also need to load a version of R
RStudio cannot be executed from a slurm job submitted with sbatch, but you can use it by running an interactive job.
Interactive jobs for RStudio
Interactive jobs can be submitted to queue by using the Slurm command salloc
which takes the same options as slurm batch files.
Since interactive jobs also go into the queue, it can take some time until your job runs depending on the load on the cluster. Therefore, it is best to submit the interactive job from a screen or tmux session.
Here is an example of a salloc
command
where <partition_name>
needs be replaced by a valid partition. The option --x11
is important for forwarding x11 from the compute node on which the job is running. The job uses a rather short running time because it is intended for testing how to launch RStudio.
Note that the above command will already log you into the compute node that was assigned to you.
Once your interactive is running, you can launch RStudio in the following way:
ALICE
Please note that you can safely ignore the following error message when starting rstudio: Failed to connect to the bus:
SHARK
RStudio on the OOD (OpenOnDemand portal)
OOD is only available to SHARK users.
You can also start an RStudio server on the OOD portal