Your first MPI job
About this tutorial
This tutorial will guide you through running a very simple Hello-World-type job with OpenMPI.
What you will learn?
Setting up the batch script for an OpenMPI job
Loading the necessary modules
Submitting your job
Monitoring your job
Collect information about your job
What this example will not cover?
Using other MPI compilers
Writing code for MPI
Optimizing your MPI code
What you should know before starting?
Basic understanding of MPI
Basic knowledge of how to use a Linux OS from the command line.
How to connect to ALICE or SHARK: https://pubappslu.atlassian.net/wiki/spaces/HPCWIKI/pages/37748771
How to move files to and from ALICE or SHARK: https://pubappslu.atlassian.net/wiki/spaces/HPCWIKI/pages/37749117
How to setup a simple batch job as shown in: https://pubappslu.atlassian.net/wiki/spaces/HPCWIKI/pages/37027928
OpenMPI on ALICE and SHARK
There are various versions of OpenMPI available on ALICE and SHARK. You can get an overview by running the following command
ALICE
module -r avail ^OpenMPI
Various modules on ALICE have been built with OpenMPI. When you load these modules, the version of OpenMPI that was used to built the module will be loaded automatically.
SHARK
module avail /mpi/
For this tutorial, we will be using OpenMPI 4.1.1.
Preparations
Log in to ALICE if you have not done it yet.
Before you set up your job or submit it, it is always best to have a look at the current job load on the cluster and what partitions are available to you.
Also, it helps to run some short, resource-friendly tests to see if your set up is working and you have a correct batch file. The “testing”-partition on ALICE or the “short” partition on SHARK can be used for such purpose. The examples in this tutorial are save to use on those partitions.
Here, we will assume that you have already created a directory called user_guide_tutorials
in your $HOME
from the previous tutorials. For this job, let's create a sub-directory and change into it:
mkdir -p $HOME/user_guide_tutorials/first_MPI_job
cd $HOME/user_guide_tutorials/first_MPI_job
We will first create the MPI program and then write the slurm batch file.
MPI program
This is a very basic Hello-World type of MPI program. It will print out information about the rank and node that it is running on. We will name this file helloworld_mpi.c
Next, we load a version of OpenMPI and then we use mpicc
to compile our program:
ALICE
SHARK
Slurm batch file
The slurm batch script helloworld_mpi.slurm
for our MPI example program looks like this:
ALICE
SHARK
where you should replace <your-email-address>
by your e-mail address. Here, we have requested two nodes to run 10 tasks. The tasks will be distributed automatically over the two nodes.
The output from our MPI program will go into the Slurm output file. This is fine for the example here, but not the best approach because the processes running in parallel have to write to the same file.
The resources set in the batch script have been determined after running the job at least once with more conservative estimates. In this configuration, it is fine to run the job on the testing partition.
Job submission
Let us submit this MPI job to slurm:
Immediately after you have submitted this job, you should see something like this:
Job output
In the directory where you launched your job, there should be new file created by Slurm: test_openmpi_<jobid>.out
. It contains all the output from your job which would have normally written to the command line. Check the file for any possible error messages. The content of the file should look something like this:
Because this is a parallel job, the output from each process is out of order.
You can get a quick overview of the resources actually used by your job by running:
It might look something like this:
Cancelling your job
If you need to cancel your job, you can do so with: