Moab

From Montana Tech Computer Science Department
Jump to: navigation, search

Adaptive Computing's Moab [1] job scheduler was installed July, 2013. Moab is an advanced scheduling and management system. Moab supplies additional end user commands [2], described below.


Submitting Jobs with msub

A job is created by submitting an executable script to the Moab Workload Manager with msub [3]. The msub documentation describes a variety of command line arguments for requesting resources, declaring the job name, specifying the priority or destination queue, defining the mail options, etc.. The script contains the commands that will be executed on the compute node assigned by Moab/TORQUE for the job. For jobs that request multiple nodes, the script will run on a single node and should contain the commands necessary to utilize all the processors assigned to the job. An example of an MPI job script is below. The job scripts can contain PBS directives that replace the need to use the msub command line arguments.

Requesting Resources

There are 22 compute nodes with 32 processors per node in the cluster. If no resources are requested, then a single processor on a node will be assigned. Use the -l flag to request resources [4]. For example, "msub -l nodes=4" will allocate 1 processor on each of four nodes for the job, because the default is to assign 1 processor per node requested. To request all the processors on a node, use ppn=32 (i.e., msub -l nodes=4:ppn=32). Other resources that are often requested memory size and walltime.

Examples

Interactive Job

To run a program interactively on a compute node:

msub -I

If you want to request a specific node, use the -l option with the resource request:

msub -I -l nodes=n9

(Note Moab simply calls Torque's qsub -I for interactive jobs. Moab is currently experiencing communication problems with Torque, so using qsub instead is okay).

Script without PBS directives

A script does not require PBS directives. For instances a simple testjob script to print the host name and ping the management node would contain:

#!/bin/sh
hostname
ping -c 30 scyld

To request 2 nodes and 4 processors per node with a mail message when the job ends, the command line would look like:

msub testjob -l nodes=2:ppn=4 -m e -M username@mtech.edu

An output file will be created that contains the hostname that the script ran on and the output from pinging the management node for 30 seconds.

Script with PBS directives

Since scripts are normally submitted several times, it is more convenient to include the msub options in the script file as PBS directives. The previous testjob script would become:

#!/bin/sh
#PBS -l nodes=2:ppn=4
#PBS -N PingJob
#PBS -m e
#PBS -M username@mtech.edu
#PBS -l walltime=00:01:00
cd $PBS_O_WORKDIR
hostname
pwd
ping -c 30 scyld

The job is now simply submitted with:

msub testjob

Another example is using R to read/write file data:

#!/bin/sh
#PBS -l nodes=1:ppn=32
#PBS -N PingJob
#PBS -m e
#PBS -M username@mtech.edu
#PBS -l walltime=00:01:00
cd $PBS_O_WORKDIR
module load R/3.1.0
R < parLapply_test.R > parLapply_test.output --no-save

If for some reason your job is submitted from one directory for data and programs in another directory, the working directory can be specified in the script with the -d flag.

#!/bin/sh
#PBS -l nodes=2:ppn=4
#PBS -N PingJob
#PBS -d /home/mtech/username/working_dir
#PBS -m e
#PBS -M username@mtech.edu
#PBS -l walltime=00:01:00
hostname
ping -c 30 scyld

Memory Resources

To allocate the correct amount of memory for a job, a user should specify how much memory the job will need. This can be done on command line or with a PBS directive:

#PBS -l mem=16gb

The above will allocate 16 GB for a job to be split by the number of processes or tasks assigned to the job. If one node with ppn=4 is requested, then each process will get 4 GB. If only one processor (ppn=1) is requested, then it would get all 16 GB. Moab will assign the job to a node that has at least 16 GB free. A hard limit of 1.1 is set so that if a process exceeds its requested amount by 10% for more than one minute, it will be cancelled.

There are 5 nodes with 128 GB of memory. These nodes can be accessed by requesting the memnode feature:

#PBS -l feature=memnode

Script for MPI job

Applications that use MPI require slightly more sophisticated scripts that set the shell and MPI version, identifies the compute nodes allocated for the job, and initiates the mpd daemons on the assigned compute nodes. An example for MPICH2:

#!/bin/bash
#PBS -l nodes=4:ppn=32
#PBS -N MPIJob
#PBS -d /home/mtech/username
#PBS -S /bin/bash
#PBS -m e
#PBS -M username@mtech.edu
#PBS -l walltime=00:10:00
MPDHOSTS=mpd.hosts.$PBS_JOBID
sort -u $PBS_NODEFILE > $MPDHOSTS
NODES=`cat $MPDHOSTS | wc -l `
NPROCS=`cat $PBS_NODEFILE | wc -l`
echo "NODES=$NODES"
echo "NPROCS=$NPROCS"
module load mpich2/gnu
mpirun -np $NPROCS --hostfile $MPDHOSTS mympiapp
rm $MPDHOSTS

InfiniBand with OpenMPI

By default the 1 Gig eth network is used. To specify that OpenMPI uses the InfiniBand network, include --mca btl openib,sm,self :

module load openmpi/gnu
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib64
mpirun --mca btl openib,sm,self -np $NPROCS --hostfile $MPDHOSTS mympiapp

Monitoring jobs with showq and checkjob

showq will show the status of your jobs and the number of nodes in use. For more details including nodes assigned use showq -r

To get information on an individual job, use the checkjob command [5]. The checkjob -v gives more verbose information on the job.

To check status of all nodes for availability mdiag -n

Canceling Jobs

To terminate a job that is currently running or in the queue, use mjobctl -c[6] command. The canceljob [7] can also be used, but it is deprecated.

Admin Notes

Setting some default parameters can be done in Torque and Moab, the Moab settings take precedence.

To view, set, and unset parameters set in Torque for the batch queue:

qmgr -c "list queue batch"
qmgr -c "set queue batch resources_default.walltime=3600"
qmgr -c "unset queue batch resources_default.walltime"

Edit moab.cfg with:

CLASSCFG[DEFAULT] DEFAULT.WCLIMIT=3600

Changing default memory allocation was unsuccessful in Moab. In Torque:

qmgr -c "set queue batch resources_default.mem=4gb"

will set the total memory allocation for a job. The amount assigned for each process will be its proportional share of the total. If ppn=4, then each process will get 1 gb for this example. Note that the resources_assigned.mem = 4294967296b will automatically get set. It does not look like setting these in Torque enforces any memory restrictions to jobs.