More on using R

This section provide additional information on different aspects of using R on ALICE and SHARK.

For a basic example of submitting an R job on both cluster, please see https://pubappslu.atlassian.net/wiki/spaces/HPCWIKI/pages/37027982

For background information on installing R packages yourself, please see https://pubappslu.atlassian.net/wiki/spaces/HPCWIKI/pages/37749400

For further reading, check out some of the references here: https://pubappslu.atlassian.net/wiki/spaces/HPCWIKI/pages/37749488/References+and+further+reading#R

Running an R script on the command line

There are several ways to launch an R script on the command line:

  1. Rscript yourfile.R

  2. R CMD BATCH yourfile.R

  3. R --no-save < yourfile.R

The first approach (i.e. using the Rscript command) redirects the output into stdout. The second approach (i.e. using the R CMD BATCH command) redirects its output into a file (in case yourfile.Rout). A third approach is to redirect the input of the file yourfile.R to the R executable. Note that in the latter approach you must specify one of the following flags: --save, --no-save or --vanilla. Careful with using the option --vanilla, because it will also tell R to not read your user profile and environment.

Using R with OpenMPI

In addition to the examples for running R in parallel in , we provide here a basic HelloWorld example for using with OpenMPI

The R script

We will use the following R script and saved in a file called test_r_mpi.R

library(Rmpi) id <- mpi.comm.rank(comm = 0) np <- mpi.comm.size(comm = 0) hostname <- mpi.get.processor.name() msg <- sprintf("Hello world from process %03d of %03d, on host %s\n", id, np, hostname) cat(msg) mpi.barrier(comm = 0) mpi.finalize()

The Slurm batch file

Now, we need a Slurm batch file to run the R script as a batch job which we call test_r_mpi.slurm:


ALICE

#!/bin/bash #SBATCH --job-name=test_r_mpi # Job name #SBATCH --output=%x_%j.out # Output file name #SBATCH --partition=testing # Partition #SBATCH --time=00:05:00 # Time limit #SBATCH --nodes=2 # Number of nodes #SBATCH --ntasks-per-node=4 # MPI processes per node module purge module load slurm module add R/4.0.5-foss-2020b srun Rscript test_r_mpi.R

SHARK

#!/bin/bash #SBATCH --job-name=test_r_mpi # Job name #SBATCH --output=%x_%j.out # Output file name #SBATCH --partition=short # Partition #SBATCH --time=00:05:00 # Time limit #SBATCH --nodes=2 # Number of nodes #SBATCH --ntasks-per-node=4 # MPI processes per node module purge module load slurm module add statistical/R/4.1.2/gcc.8.3.1 module add library/mpi/openmpi/4.1.1/gcc-8.3.1 srun Rscript test_r_mpi.R

After running the job above, Slurm will have created a file called test_r_mpi_<job_id>.out whose content will look something like this

Running R interactively

You can start to run R interactively, just as an exercise and test. The recommended way is to run R in batch mode.


ALICE


SHARK


RStudio

RStudio is an Integrated Development Environment (IDE) for R. It includes a console, syntax highlighting editor that supports direct code execution, as well as tools for plotting, debugging, history and workspace management. For more information see RStudio webpage.

RStudio is installed on both clusters and can be invoked on a login node as follows:


ALICE

Note that you also need to load a version of R.


SHARK


RStudio cannot be executed from a slurm job submitted with sbatch, but you can use it by running an interactive job.

Interactive jobs for RStudio

Interactive jobs can be submitted to queue by using the Slurm command salloc which takes the same options as slurm batch files.

Since interactive jobs also go into the queue, it can take some time until your job runs depending on the load on the cluster. Therefore, it is best to submit the interactive job from a screen or tmux session.

Here is an example of a salloc command

where <partition_name> needs be replaced by a valid partition. The option --x11 is important for forwarding x11 from the compute node on which the job is running. The job uses a rather short running time because it is intended for testing how to launch RStudio.

Once your interactive is running, you can launch RStudio in the following way:


ALICE

where you should replace <your_alice_user_name> by your username on ALICE. The last step will launch RStudio on the compute node that has been assigned to you.


SHARK


RStudio on the OOD (OpenOnDemand portal)

OOD is only available to SHARK users.

You can also start an RStudio server on the OOD portal