Why use Slurm?

Within the SCG cluster, Slurm functions as a resource broker. Submitting a job to Slurm requests a set of CPU and memory resources. Slurm orders these requests and gives them a priority based on the cluster configuration and runs each job on the most appropriate available resource in the order that respects the job priority or, when possible, squeezes in short jobs via a backfill scheduler to harvest unused cpu time. Using Slurm allows many users to fairly share a set of computational resources with greatly reduced danger of negatively impacting anotehr persons jobs or work while also allowing the resources to be better and more fully utilized over time.

In addition to making it easier to effectively share a resource, Slurm also acts as a powerful tool for managing workflows. By encapsulating steps in job scripts it becomes possible to easily repeat and reuse workflows and having such job scripts adds to the documentation of the associated process.

What is a job script?

A job script is a normal unix shell script which optionally contains lines flagged such that the Slurm submission process can read arguments from them. A simple hello world job script using Bash would be

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
#!/bin/bash

# See `man sbatch` or https://slurm.schedmd.com/sbatch.html for descriptions
# of sbatch options.
#SBATCH --job-name=hello_world
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --partition=interactive
#SBATCH --account=default
#SBATCH --time=1:00:00

echo 'Hello World!'

Partition info

Partition Max mem Max CPUs Time limit
batch - - 14 Days
nih_s10 97GB 344 7 Days
interactive 128GB 16 24 Hours

Submitting a job

Saving the script as hello_world.sh and submitting it to Slurm with sbatch results in the job running and producing an output file. The default output file is slurm-JOB_ID.out located in the directory from which the job was submitted. For example:

[griznog@smsx10srw-srcf-d15-37 jobs]$ sbatch hello_world.sh 
Submitted batch job 6592914
[griznog@smsx10srw-srcf-d15-37 jobs]$ cat slurm-6592914.out
Hello World!

The sbatch man page lists all sbatch options.

Managing Slurm Jobs

squeue

Once a job is submitted, it either immediately runs if resources are available and there are no jobs ahead of it in the queue or it is queued and marked as Pending. Pending and running jobs can be monitored with the squeue command.

The default squeue output shows all jobs:

[griznog@smsx10srw-srcf-d15-37 jobs]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           6564781 interacti     wrap   crolle PD       0:00      1 (Resources)
           6564782 interacti     wrap   crolle PD       0:00      1 (Priority)
           6564783 interacti     wrap   crolle PD       0:00      1 (Priority)
           6564784 interacti     wrap   crolle PD       0:00      1 (Priority)
           6564785 interacti     wrap   crolle PD       0:00      1 (Priority)
           6564786 interacti     wrap   crolle PD       0:00      1 (Priority)
           6564787 interacti     wrap   crolle PD       0:00      1 (Priority)
           6564788 interacti     wrap   crolle PD       0:00      1 (Priority)
           6564789 interacti     wrap   crolle PD       0:00      1 (Priority)
           6564790 interacti     wrap   crolle PD       0:00      1 (Priority)
           6564791 interacti     wrap   crolle PD       0:00      1 (Priority)
           6564792 interacti     wrap   crolle PD       0:00      1 (Priority)
           6592902       dtn seqctr_s     root PD       0:00      1 (BeginTime)
           6564793 interacti     wrap   crolle PD       0:00      1 (Priority)
           6564794 interacti     wrap   crolle PD       0:00      1 (Priority)
           6564795 interacti     wrap   crolle PD       0:00      1 (Priority)
...

Some useful squeue commands are:

Command Description
squeue -u $USER Show only jobs owned by $USER.
squeue -t pd Show only pending jobs. -t r to show only running jobs’
squeue -j JOBID Show details for job JOBID.
squeue -j JOBID --format="%m" Use --format to show only the memory requested.

scancel

The simplest method of cancelling a running job or job array task is to use scancel. For example:

[griznog@smsx10srw-srcf-d15-37 jobs]$ squeue -u $USER
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           6593409 interacti     wrap  griznog PD       0:00      1 (Priority)
[griznog@smsx10srw-srcf-d15-37 jobs]$ scancel 6593409
[griznog@smsx10srw-srcf-d15-37 jobs]$ squeue -u $USER
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
[griznog@smsx10srw-srcf-d15-37 jobs]$ 

The scancel man page lists the options to scancel, many of which are useful for selecting subsets of jobs to operate on. scancel can also be used to send specific signals to a jobs processes. This can be useful for more advanced jobs which are designed to perform different functions in response to receiving a signal, for instance, applications that can perform a checkpoint might be triggered to do so manually with a signal sent via scancel.

sinfo

The sinfo command can be used to list available partitions, show status and list node and partition configurations. To list partitions:

[griznog@smsx10srw-srcf-d15-37 jobs]$ sinfo
PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
batch*         up 120-00:00:     22    mix dper730xd-srcf-d16-[01,03,05,07,09,11,13,15],dper7425-srcf-d15-13,sgisummit-frcf-111-[08,12,14,16,18,20,24,26,28,34,36,38],sgiuv20-rcf-111-32
batch*         up 120-00:00:      3  alloc sgisummit-frcf-111-[10,22,30]
batch*         up 120-00:00:     24   idle dper730xd-srcf-d16-[17,19,21,23,25,27,29,31,33,35,37,39],dper930-srcf-d15-05,dper7425-srcf-d15-[09,11,15,17,19,21,23,25,27,29,31]
interactive    up 120-00:00:      2 drain* hppsl230s-rcf-412-01-l,hppsl230s-rcf-412-02-l
interactive    up 120-00:00:      5    mix dper910-rcf-412-20,hppsl230s-rcf-412-01-r,hppsl230s-rcf-412-02-r,hppsl230s-rcf-412-03-l,hppsl230s-rcf-412-03-r
interactive    up 120-00:00:     18   idle hppsl230s-rcf-412-04-l,hppsl230s-rcf-412-04-r,hppsl230s-rcf-412-05-l,hppsl230s-rcf-412-05-r,hppsl230s-rcf-412-06-l,hppsl230s-rcf-412-06-r,hppsl230s-rcf-412-07-l,hppsl230s-rcf-412-07-r,hppsl230s-rcf-412-08-l,hppsl230s-rcf-412-08-r,hppsl230s-rcf-412-09-l,hppsl230s-rcf-412-09-r,hppsl230s-rcf-412-10-l,hppsl230s-rcf-412-10-r,hppsl230s-rcf-412-11-l,hppsl230s-rcf-412-11-r,hppsl230s-rcf-412-12-l,hppsl230s-rcf-412-12-r
nih_s10        up 4-00:00:00      1    mix sgiuv300-srcf-d10-01
nih_s10_gpu    up 4-00:00:00      1    mix sgiuv300-srcf-d10-01
dtn            up 120-00:00:      2   idle cfxs2600gz-rcf-114-[06,08]
apps           up 120-00:00:      1   idle dper7425-srcf-d10-37

sacct

The sacct command can be used to retrieve infomration about jobs that have completed. For example:

[griznog@smsx10srw-srcf-d15-37 jobs]$ sacct -j  6593409
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
6593409            wrap interacti+    default          1 CANCELLED+      0:0 
6593409.bat+      batch               default          1  CANCELLED     0:15 
6593409.ext+     extern               default          1  COMPLETED      0:0 

When troubleshooting jobs, the ExitCode can be useful to determine how the job ended as it shows the exit code of the job script and the signal which caused the process to terminate. For the example above, the job was cancelled and it’s script was killed by signal 15 with the job script returning 0.

The sacct command can return many details about jobs, consult the sacct man page for more information about options and values that can be retrieved for jobs.

More information

For more information about additional Slurm commands, see the Slurm documentation.