About this tutorial

This tutorial will guide you through running a very simple Hello-World-type job with OpenMPI.

What you will learn?

Setting up the batch script for an OpenMPI job
Loading the necessary modules
Submitting your job
Monitoring your job
Collect information about your job

What this example will not cover?

Using other MPI compilers
Writing code for MPI
Optimizing your MPI code

What you should know before starting?

Basic understanding of MPI
Basic knowledge of how to use a Linux OS from the command line.
How to connect to ALICE or SHARK: How to login to ALICE or SHARK
How to move files to and from ALICE or SHARK: Transferring Data
How to setup a simple batch job as shown in: Your first bash job

OpenMPI on ALICE and SHARK

There are various versions of OpenMPI available on ALICE and SHARK. You can get an overview by running the following command

ALICE

module -r avail ^OpenMPI

Various modules on ALICE have been built with OpenMPI. When you load these modules, the version of OpenMPI that was used to built the module will be loaded automatically.

SHARK

module avail /mpi/

For this tutorial, we will be using OpenMPI 4.1.1.

Preparations

Log in to ALICE if you have not done it yet.

Before you set up your job or submit it, it is always best to have a look at the current job load on the cluster and what partitions are available to you.

Also, it helps to run some short, resource-friendly tests to see if your set up is working and you have a correct batch file. The “testing”-partition on ALICE or the “short” partition on SHARK can be used for such purpose. The examples in this tutorial are save to use on those partitions.

Here, we will assume that you have already created a directory called user_guide_tutorials in your $HOME from the previous tutorials. For this job, let's create a sub-directory and change into it:

 mkdir -p $HOME/user_guide_tutorials/first_MPI_job
 cd $HOME/user_guide_tutorials/first_MPI_job

We will first create the MPI program and then write the slurm batch file.

MPI program

This is a very basic Hello-World type of MPI program. It will print out information about the rank and node that it is running on. We will name this file helloworld_mpi.c

#include <stdio.h>
#include <mpi.h>

int main (int argc, char *argv[])
{
  int i, rank, size, processor_name_len;
  char name [MPI_MAX_PROCESSOR_NAME];

  MPI_Init (&argc, &argv);

  MPI_Comm_size (MPI_COMM_WORLD, &size);
  MPI_Comm_rank (MPI_COMM_WORLD, &rank);
  MPI_Get_processor_name (name, &processor_name_len);

  printf ("Hello World from rank %03d out of %03d running on %s!\n", rank, size, name);

  if (rank == 0 )
    printf ("MPI World size = %d processes\n", size);

  MPI_Finalize ();
  return 0;
}

Next, we load a version of OpenMPI and then we use mpicc to compile our program:

ALICE

 module load OpenMPI/4.1.1-GCC-10.3.0
 mpicc helloworld_mpi.c -o helloworld_mpi

SHARK

 module load library/mpi/openmpi/4.1.1
 mpicc helloworld_mpi.c -o helloworld_mpi

Slurm batch file

The slurm batch script helloworld_mpi.slurm for our MPI example program looks like this:

ALICE

#!/bin/bash
#SBATCH --job-name=helloworld_mpi
#SBATCH --mail-user="<your-email-address>"
#SBATCH --mail-type="ALL"
#SBATCH --time=00:00:10
#SBATCH --partition=testing
#SBATCH --output=%x_%j.out
#SBATCH --nodes=2
#SBATCH --ntasks=10
#SBATCH --mem-per-cpu=10M
#SBATCH --constraint=ib

# making sure we start with a clean module environment
module purge

echo "## Loading module"
module load slurm
module load OpenMPI/4.1.1-GCC-10.3.0

TEST_DIR=$(pwd)
echo "## Current dircectory $TEST_DIR"

echo "## Running test"
srun ./helloworld_mpi
# alternative command, but not needed because srun takes care of it
# mpirun -np $SLURM_NTASKS ./helloworld_mpi

echo "## Test finished. Goodbye"

SHARK

#!/bin/bash
#SBATCH --job-name=helloworld_mpi
#SBATCH --mail-user="<your-email-address>"
#SBATCH --mail-type="ALL"
#SBATCH --time=00:00:10
#SBATCH --partition=short
#SBATCH --output=%x_%j.out
#SBATCH --nodes=2
#SBATCH --ntasks=10
#SBATCH --mem-per-cpu=10M

# making sure we start with a clean module environment
module purge

echo "## Loading module"
module load slurm
module load library/mpi/openmpi/4.1.1

TEST_DIR=$(pwd)
echo "## Current dircectory $TEST_DIR"

echo "## Running test"
srun ./helloworld_mpi
# alternative command, but not needed because srun takes care of it
# mpirun -np $SLURM_NTASKS ./helloworld_mpi

echo "## Test finished. Goodbye"

where you should replace <your-email-address> by your e-mail address. Here, we have requested two nodes to run 10 tasks. The tasks will be distributed automatically over the two nodes.

The output from our MPI program will go into the Slurm output file. This is fine for the example here, but not the best approach because the processes running in parallel have to write to the same file.

The resources set in the batch script have been determined after running the job at least once with more conservative estimates. In this configuration, it is fine to run the job on the testing partition.

Job submission

Let us submit this MPI job to slurm:

 sbatch helloworld_mpi.slurm

Immediately after you have submitted this job, you should see something like this:

 [me@<login_node> first_MPI_job]$ sbatch helloworld_mpi.slurm
 Submitted batch job <job_id>

Job output

In the directory where you launched your job, there should be new file created by Slurm: test_openmpi_<jobid>.out. It contains all the output from your job which would have normally written to the command line. Check the file for any possible error messages. The content of the file should look something like this:

## Loading module
## Current dircectory <your_path>
## Setting Infiniband variables
## Running test
Hello World from rank 0 running on nodelogin01!
MPI World size = 10 processes
Hello World from rank 1 running on nodelogin01!
Hello World from rank 2 running on nodelogin01!
Hello World from rank 4 running on nodelogin01!
Hello World from rank 3 running on nodelogin01!
Hello World from rank 6 running on nodelogin02!
Hello World from rank 7 running on nodelogin02!
Hello World from rank 9 running on nodelogin02!
Hello World from rank 8 running on nodelogin02!
Hello World from rank 5 running on nodelogin02!
## Test finished. Goodbye

Because this is a parallel job, the output from each process is out of order.

You can get a quick overview of the resources actually used by your job by running:

 seff <job_id>

It might look something like this:

Job ID: <job_id>
Cluster: <cluster_name>
User/Group: <user_name>/<group_name>
State: COMPLETED (exit code 0)
Nodes: 2
Cores per node: 5
CPU Utilized: 00:00:01
CPU Efficiency: 5.00% of 00:00:20 core-walltime
Job Wall-clock time: 00:00:02
Memory Utilized: 1.35 MB
Memory Efficiency: 0.13% of 1000.00 MB

Cancelling your job

If you need to cancel your job, you can do so with:

scancel <job_id>

Your first MPI job

About this tutorial

What you will learn?

What this example will not cover?

What you should know before starting?

OpenMPI on ALICE and SHARK

Preparations

MPI program

Slurm batch file

Job submission

Job output

Cancelling your job