Knowledge Base

RACC2 – Slurm commands

Essential SLURM commands

This is a compact reference sheet for the most essential SLURM commands and their usage. Not all possible commands are listed here. For more information about command options/flags and additional commands, please refer to SLURM’s own manual pages or their summary sheet, however please note that some commands in those pages are available for system administrators only.

Command

Description

Usage

sbatch Submits a batch job to the queue.
sbatch jobscript.sh
squeue Displays the state of a submitted job.

Use with -u for job information for a specific username.

squeue -u username
scancel Kills an existing job.

Infer the jobID with squeue.

scancel jobID
sinfo Displays all available cluster resources.
sinfo
sacct View job accounting data, use with -j for specific jobIDs.
sacct -j jobID
salloc Allocates compute node resources for interactive use.
salloc

 

Monitoring cluster resources with ‘sinfo’:

As ‘short’ is the default partition, it is convenient to display the resources for just this one, by adding ‘-p short’ to the ‘sinfo’ command. By default, the nodes which are in the same state are grouped together.

$ sinfo -p short

PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
short* up 1-00:00:00 1 mix# racc2-comp-2
short* up 1-00:00:00 29 idle~ racc2-comp-[3-31]
short* up 1-00:00:00 2 mix racc2-comp-[0-1]

The above output shows that nodes 3-31 are idle, and ‘~’ means they are switched off to save power. Nodes 1 and 2 are in a ‘mix’ state, meaning that some of the cores on the node are in use and some of them are free. Fully allocated nodes show the status ‘alloc’, meaning they will not be available for new jobs until the jobs currently running on them are finished.

Further details can be displayed using the ‘-o’ flag. See the manual page, ‘man sinfo’, for more details on format specifiers. In this example, the number of CPU cores are displayed with the command:

$ sinfo -p short -o "%P %.6t %C"

PARTITION STATE CPUS(A/I/O/T)
short* mix# 48/80/0/128
short* idle~ 0/3712/0/3712
short* mix 240/16/0/256

A/I/O/T stands for Allocated/Idle/Other/Total. The idle and switched off nodes have 3712 cores available. There is a node that is partially allocated, and another that is partially allocated and is being started (mix#). In total, there are 3808 cores (80 + 3712 + 16) available for new jobs.

Nodes can be listed individually by adding the ‘-N’ flag:

$ sinfo -p short -N -o "%N %.6t %C"
NODELIST  STATE CPUS(A/I/O/T)
racc2-comp-0 idle 0/128/0/128
racc2-comp-1 idle 0/128/0/128
racc2-comp-2 idle~ 0/128/0/128
racc2-comp-3 idle~ 0/128/0/128
racc2-comp-4 idle~ 0/128/0/128
racc2-comp-5 idle~ 0/128/0/128
racc2-comp-6 idle~ 0/128/0/128
racc2-comp-7 idle~ 0/128/0/128
racc2-comp-8 idle~ 0/128/0/128
racc2-comp-9 idle~ 0/128/0/128
racc2-comp-10 idle~ 0/128/0/128
racc2-comp-11 idle~ 0/128/0/128
racc2-comp-12 idle~ 0/128/0/128
racc2-comp-13 idle~ 0/128/0/128
racc2-comp-14 idle~ 0/128/0/128
racc2-comp-15 idle~ 0/128/0/128
racc2-comp-16 idle~ 0/128/0/128
racc2-comp-17 idle~ 0/128/0/128
racc2-comp-18 idle~ 0/128/0/128
racc2-comp-19 idle~ 0/128/0/128
racc2-comp-20 idle~ 0/128/0/128
racc2-comp-21 idle~ 0/128/0/128
racc2-comp-22 idle~ 0/128/0/128
racc2-comp-23 idle~ 0/128/0/128
racc2-comp-24 idle~ 0/128/0/128
racc2-comp-25 idle~ 0/128/0/128
racc2-comp-26 idle~ 0/128/0/128
racc2-comp-27 idle~ 0/128/0/128
racc2-comp-28 idle~ 0/128/0/128
racc2-comp-29 idle~ 0/128/0/128
racc2-comp-30 idle~ 0/128/0/128
racc2-comp-31 idle~ 0/128/0/128

 

News
Suggest Content…

Related articles

RACC2 – Introduction

RACC2 – Login and Interactive Computing

RACC2 – Batch Jobs