More on using R

This section provide additional information on different aspects of using R on ALICE and SHARK.

For a basic example of submitting an R job on both cluster, please see Your first R job

For background information on installing R packages yourself, please see Installing R packages

For further reading, check out some of the references here: References and further reading | R

Running an R script on the command line

There are several ways to launch an R script on the command line:

  1. Rscript yourfile.R

  2. R CMD BATCH yourfile.R

  3. R --no-save < yourfile.R

The first approach (i.e. using the Rscript command) redirects the output into stdout. The second approach (i.e. using the R CMD BATCH command) redirects its output into a file (in case yourfile.Rout). A third approach is to redirect the input of the file yourfile.R to the R executable. Note that in the latter approach you must specify one of the following flags: --save, --no-save or --vanilla. Careful with using the option --vanilla, because it will also tell R to not read your user profile and environment.

Using R with OpenMPI

In addition to the examples for running R in parallel in Your first R job , we provide here a basic HelloWorld example for using with OpenMPI

The R script

We will use the following R script and saved in a file called test_r_mpi.R

library(Rmpi) id <- mpi.comm.rank(comm = 0) np <- mpi.comm.size(comm = 0) hostname <- mpi.get.processor.name() msg <- sprintf("Hello world from process %03d of %03d, on host %s\n", id, np, hostname) cat(msg) mpi.barrier(comm = 0) mpi.finalize()

The Slurm batch file

Now, we need a Slurm batch file to run the R script as a batch job which we call test_r_mpi.slurm:


ALICE

#!/bin/bash #SBATCH --job-name=test_r_mpi # Job name #SBATCH --output=%x_%j.out # Output file name #SBATCH --partition=testing # Partition #SBATCH --time=00:05:00 # Time limit #SBATCH --nodes=2 # Number of nodes #SBATCH --ntasks-per-node=4 # MPI processes per node module purge module load slurm module add R/4.0.5-foss-2020b srun Rscript test_r_mpi.R

SHARK

#!/bin/bash #SBATCH --job-name=test_r_mpi # Job name #SBATCH --output=%x_%j.out # Output file name #SBATCH --partition=short # Partition #SBATCH --time=00:05:00 # Time limit #SBATCH --nodes=2 # Number of nodes #SBATCH --ntasks-per-node=4 # MPI processes per node module purge module load slurm module add statistical/R/4.1.2/gcc.8.3.1 module add library/mpi/openmpi/4.1.1/gcc-8.3.1 srun Rscript test_r_mpi.R

After running the job above, Slurm will have created a file called test_r_mpi_<job_id>.out whose content will look something like this

Running R interactively

You can start to run R interactively, just as an exercise and test. The recommended way is to run R in batch mode.


ALICE


SHARK


RStudio

RStudio is an Integrated Development Environment (IDE) for R. It includes a console, syntax highlighting editor that supports direct code execution, as well as tools for plotting, debugging, history and workspace management. For more information see RStudio webpage.

RStudio is installed on both clusters and can be invoked on a login node as follows:


ALICE

Note that you also need to load a version of R. Here, we are loading R/4.4.0


SHARK

Note that you also need to load a version of R


RStudio cannot be executed from a slurm job submitted with sbatch, but you can use it by running an interactive job.

Interactive jobs for RStudio

Interactive jobs can be submitted to queue by using the Slurm command salloc which takes the same options as slurm batch files.

Since interactive jobs also go into the queue, it can take some time until your job runs depending on the load on the cluster. Therefore, it is best to submit the interactive job from a screen or tmux session.

Here is an example of a salloc command

where <partition_name> needs be replaced by a valid partition. The option --x11 is important for forwarding x11 from the compute node on which the job is running. The job uses a rather short running time because it is intended for testing how to launch RStudio.

Note that the above command will already log you into the compute node that was assigned to you.

Once your interactive is running, you can launch RStudio in the following way:


ALICE

Please note that you can safely ignore the following error message when starting rstudio: Failed to connect to the bus:


SHARK


RStudio on the OOD (OpenOnDemand portal)

OOD is only available to SHARK users.

You can also start an RStudio server on the OOD portal