Slurm

SLURM is the workload manger and job scheduler for tron.ift.uni.wroc.pl

Basic usage

sinfo -alNshow nodes information
squeueShow job queue
squeue -u <username>List all current jobs for a user
squeue -u <username> -t RUNNINGList all running jobs for a user
squeue -u <username> -t PENDINGList all pending jobs for a user
scancel <jobid>To cancel one job
scancel -u <username>To cancel all the jobs for a user
scancel -t PENDING -u <username>To cancel all the pending jobs for a user
scancel –name myJobNameTo cancel one or more jobs by name

Slurm batch

The following parameters can be used as command line parameters with *sbatch* and *srun* or in jobscript, see also Job script example below

Basic settings

ParameterFunction
–job-name=<name>Job name to be displayed by for example squeue
–output=<path>Path to the file where the job (error) output is written to
–mail-type=<type>Turn on mail notification; type can be one of BEGIN, END, FAIL, REQUEUE or ALL
–mail-user=<email_address>Email address to send notifications to

Resources

ParameterFunction
–time=<d-hh:mm:ss>Time limit for job. Job will be killed by SLURM after time has run out. Format days-hours:minutes:seconds
–nodes=<num_nodes>Number of nodes. Multiple nodes are only useful for jobs with distributed-memory (e.g. MPI).
–mem=<MB>Memory (RAM) per node. Number followed by unit prefix, e.g. 16G
–mem-per-cpu=<MB>Memory (RAM) per requested CPU core
–ntasks-per-node=<num_procs>Number of (MPI) processes per node. More than one useful only for MPI jobs. Maximum number depends nodes (number of cores)
–cpus-per-task=<num_threads>CPU cores per task. For MPI use one. For parallelized applications benchmark this is the number of threads.
–exclusiveJob will not share nodes with other running jobs. You will be charged for the complete nodes even if you asked for less.

Additional

ParameterFunction
–array=<indexes>Submit a collection of similar jobs, e.g. –array=1-10. (sbatch command only). See official SLURM documentation
–dependency=<state:jobid>Wait with the start of the job until specified dependencies have been satified. E.g. –dependency=afterok:123456
–ntasks-per-core=2Enables hyperthreading. Only useful in special circumstances.

Job array

Job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily; job arrays with millions of tasks can be submitted in milliseconds (subject to configured size limits). All jobs must have the same initial options (e.g. size, time limit, etc.)

#SBATCH --array 1-200
#SBATCH --array 1-200%5 # %N suffix where N is the number of active tasks

Variables

SLURM_ARRAY_JOB_IDwill be set to the first job ID of the array
SLURM_ARRAY_TASK_IDwill be set to the job array index value
SLURM_ARRAY_TASK_COUNTwill be set to the number of tasks in the job array
SLURM_ARRAY_TASK_MAXwill be set to the highest job array index value
SLURM_ARRAY_TASK_MINwill be set to the lowest job array index value

Job script example

Before usage, please adjust parameters
#!/bin/bash -l
# Give your job a name, so you can recognize it in the queue overview
#SBATCH --job-name=example
 
#SBATCH -o slurm-%j.output # %j - will return SLURM_JOB_ID
#SBATCH -e slurm-%j.error
 
# Define, how many nodes you need. Here, we ask for 1 node.
# Each node has 8 cores.
#SBATCH --nodes=1
# You can further define the number of tasks with --ntasks-per-*
# See "man sbatch" for details. e.g. --ntasks=4 will ask for 4 cpus.
#SBATCH --ntasks=4
 
# How much memory you need.
# --mem will define memory per node and
# --mem-per-cpu will define memory per CPU/core. Choose one of those.
#SBATCH --mem=5GB   
##SBATCH --mem-per-cpu=1500MB  # this one is not in effect, due to the double hash
 
# Turn on mail notification. There are many possible self-explaining values:
# NONE, BEGIN, END, FAIL, ALL (including all aforementioned)
# For more values, check "man sbatch"
##SBATCH --mail-type=END,FAIL # this one is not in effect, due to the double hash
 
# You may not place any commands before the last SBATCH directive
 
# Define workdir for this job
WORK_DIRECTORY=/home/${USER}/test
cd ${WORK_DIRECTORY}
 
# This is where the actual work is done. In this case, the script only waits.
# The time command is optional, but it may give you a hint on how long the
# command worked
time sleep 10
#sleep 10
 
# Finish the script
exit 0

Put script to job queue with

sbatch ~/sampleScript.sh

Interactive mode

Get interactive access to shell on compute node

srun --nodes=1 --ntasks-per-node=1 --time=01:00:00 --pty bash -i
# Request specific node by name
srun --nodelist=node2 --ntasks-per-node=1 --time=01:00:00 --pty bash -i
ift/computing_cluster/slurm.txt · Last modified: 2023/04/02 20:22 by kcach
CC Attribution-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0