Using Node802 and Partition mem_mi
About node802
Node802 has been purchased by the Mathematical Institute. Initially, access was exclusive for researchers from MI for some time. Recently, we opened up access to part of the resources on node802 to all ALICE users through partition amd-short
, but MI researchers maintain priority access through partition mem_mi
.
We have created partition amd-short
because node802 is currently the only node with AMD CPUs. Some software needs to be specifically compiled for AMD which is why we maintain a separate AMD-branch in our software stack. This branch is still smaller than our Intel branch which we use for all other nodes. The purpose of this partition is also to allow us to expand the software branch for AMD CPUs.
We welcome you to try out the new amd-short
partition. The partition definition can be found here: Partitions on ALICE
The hardware configuration of node802 can be found here: About ALICE | Hardware Description
- 1.1 About node802
- 2 Overview
- 3 Access
- 3.1 For all ALICE users
- 3.2 For MI users
- 4 Hardware
- 5 Software
- 5.1 Scientific software stack
- 5.2 Your own scripts/programmes
- 5.2.1 Example
- 5.2.1.1 Batch script
- 5.2.1.2 OpenMP script
- 5.2.1 Example
Overview
Access
For all ALICE users
All ALICE users can access part of the resources on node802 using partition "amd_short".
For MI users
ALICE users from MI also have access to partition "mem_mi". This partition is only accessible for MI users.
Jobs submitted to this partition should always get a higher priority than jobs submitted to amd-short. If you notice any issues with this, please contact the ALICE Helpdesk.
Hardware
The basic hardware configuration of node802 is available here: About ALICE | Hardware Description
Hyperthreading is active
Local node scratch
The two 10TB HDDs were combined into a single volume of about 20TB mounted at /scratchdata
You can use the local scratch on node802 in the same way that you would use it on other ALICE nodes.
Software
Scientific software stack
You can make use of the general scientific software stack which can be accessed by running
module load ALICE/default
It is recommend to add this to your batch scripts, too.
If you want to use software fully optimized for the CPU architecture of the nodes, you have to build the software yourself.
Your own scripts/programmes
Because this node has a different CPU, it is possible that conda environments or other software that you build on the login nodes are not working if the software is build optimized for CPU architecture.
In this case, you need to compile such scripts/software as part of a batch or interactive job
One way to do this is to create a short slurm batch job specifically for compiling your software, setting up your conda/Python environments, etc. If you only need to do this once, then there is no need to make this part of your production batch job.
Another option is to compile the first time you run your programme as part of a job. In this first job, you copy the compiled program back to your shared storage or home directory. For the next job, you use the already compiled version (see example below).
You can still use the login nodes for testing/debugging. In this case, you need to compile on the login nodes, run your test and for your job, compile on the compute node again.
Example
Here is an example of how a Slurm batch script could look like for using the node, including a HelloWorld OpenMP program to demonstrate the compiling and use of the local scratch storage.
If you are new to HPC, ALICE or Slurm, have a look at the https://pubappslu.atlassian.net/wiki/spaces/HPCWIKI/pages/5963809 first.
Batch script
#!/bin/bash
#SBATCH --partition=mem_mi
#SBATCH --job-name=test_job
#SBATCH --time=0-00:02:00
#SBATCH --output=%x_%j.out
#SBATCH --nodes=1
#SBATCH --ntasks=5
#SBATCH --cpus-per-task=3
#SBATCH --mem=10G
#SBATCH --mail-user="your-email-address"
#SBATCH --mail-type="ALL"
module load ALICE/default
module load OpenMPI/4.0.5-GCC-9.3.0
echo "#### Test started"
# return the name of the node
echo "## Which node is this: $HOSTNAME"
# check the number of cores (ntasks*cpus-per-task)
echo "How many cores do I have access to: ${SLURM_CPUS_ON_NODE}"
# Just to check that the AMD software stack is loaded
echo "Am I loading the from the right module path"
echo ${MODULEPATH%%:*}
# get the current working directory
CWD=$(pwd)
echo "## Where am I: ${CWD}"
# check out the nodes local scratch
echo "## My local scratch space on the node is: ${SCRATCH}"
cd $SCRATCH
echo "## Let us go there: $(pwd)"
# In case the file has already been compiled
# and stored in $CWD, the following six lines
# are not necessary
echo "## Let us copy the C script to it"
cp $CWD/omp_hello.c $SCRATCH/
echo "## Is the file there?"
ls -la omp_hello.c
echo "## Now we compile it on the node"
gcc -o omp_hello_amd -fopenmp omp_hello.c
# In case the file is already compiled
# the next four lines would copy it
# and check that it is there:
#echo "## Let us copy the compiled C programme to it"
#cp $CWD/omp_hello_amd $SCRATCH/
#echo "## Is the file there?"
#ls -la omp_hello_amd
echo "## Let us run it"
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASKS
srun ./omp_hello_amd
# Copy those files back to shared scratch or home
# that should be kept for later.
# Here, it is just the compiled C programme.
# It does not need to be copied back of course
# if it came from shared scratch or home.
echo "## Saving files that should be saved."
cp $SCRATCH/omp_hello_amd $CWD/
echo "## Now that this is done, I want to go home"
cd $CWD
echo "## Good to be back $(pwd)"
echo "#### Test finished"
OpenMP script
Here is the content of the file omp_hello.c from https://computing.llnl.gov/tutorials/openMP/samples/C/omp_hello.c
/******************************************************************************
* * FILE: omp_hello.c
* * DESCRIPTION:
* * OpenMP Example - Hello World - C/C++ Version
* * In this simple example, the master thread forks a parallel region.
* * All threads in the team obtain their unique thread number and print it.
* * The master thread only prints the total number of threads. Two OpenMP
* * library routines are used to obtain the number of threads and each
* * thread's number.
* * AUTHOR: Blaise Barney 5/99
* * LAST REVISED: 04/06/05
* ******************************************************************************/
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char *argv[])
{
int nthreads, tid;
/* Fork a team of threads giving them their own copies of variables */
#pragma omp parallel private(nthreads, tid)
{
/* Obtain thread number */
tid = omp_get_thread_num();
printf("Hello World from thread = %d\n", tid);
/* Only master thread does this */
if (tid == 0)
{
nthreads = omp_get_num_threads();
printf("Number of threads = %d\n", nthreads);
}
} /* All threads join master thread and disband */
}