Skip to content

Training Tracks Guide

This guide organizes the 44 training modules into recommended reading paths by audience. Each track is self-contained -- start with the track that matches your role and add modules from other tracks as needed.

Audience Levels

Level Role Description
L1 Curious End-User "What is this scheduler thing and why should I care?"
L2 Working End-User "I need to submit jobs and get work done"
L3 Power User "I want to optimize my workflows and use advanced features"
L4 Administrator "I need to install, configure, and manage Slurm"
L5 IT Leadership "I need to understand capacity, cost, and strategy"

Track 1: Getting Started (L1-L2)

For users new to HPC or Slurm. Start here if you have never used a job scheduler before.

# Module What You'll Learn
1 What is HPC Scheduling? Why clusters need a scheduler, the "contract" between you and the system
2 Slurm Overview Key concepts: nodes, partitions, jobs, steps
3 Getting Started Log in, write your first job script, submit with sbatch
4 Submitting Jobs sbatch options, directives, output files, --wrap
5 Monitoring Jobs squeue, scontrol show job, sacct, sinfo
6 Managing Jobs scancel, hold/release, modify pending jobs

After this track: You can submit jobs, check their status, and manage them. Continue to Track 2 for resource management and advanced features.

Quick reference: Command Cheatsheet | Environment Variables


Track 2: End-User Essentials (L2)

For working users who submit jobs regularly. Builds on Track 1.

# Module What You'll Learn
1-6 Track 1 modules (prerequisite)
7 Resource Requests --mem, --cpus-per-task, --time, --exclusive, GRES
8 Interactive Jobs srun, salloc, X11 forwarding, Jupyter on compute nodes
9 Environment Modules module load/unload, Lmod, using modules in job scripts
10 Containers on Slurm Singularity/Apptainer, OCI containers, --container
11 Best Practices Resource estimation, efficiency, being a good cluster citizen

After this track: You can write efficient job scripts, request appropriate resources, and use the cluster effectively.

Quick reference: Job State Codes


Track 3: Power User (L3)

For users running complex workflows, MPI jobs, GPU workloads, and automated pipelines. Builds on Track 2.

# Module What You'll Learn
1-11 Track 2 modules (prerequisite)
12 Job Arrays --array, SLURM_ARRAY_TASK_ID, parameter sweeps, throttling
13 Job Dependencies --dependency, afterok/afterany, building pipelines
14 Parallel & MPI Jobs --ntasks vs --cpus-per-task, srun as MPI launcher, hybrid jobs
15 GPU Jobs --gres=gpu, GPU types, multi-GPU, CUDA_VISIBLE_DEVICES
16 Recurring Jobs (scrontab) Slurm's built-in cron for scheduled cluster jobs

After this track: You can build complex multi-step pipelines, run MPI and GPU workloads, and automate recurring analysis.


Track 4: Administrator (L4)

For cluster administrators. This is the most comprehensive track, covering Slurm installation through production operations.

Core Admin Modules

# Module What You'll Learn
1 Slurm Architecture Daemons, communication model, authentication options
2 Installation Prerequisites, Munge/auth/slurm, packages, MariaDB, slurmdbd
3 Configuration slurm.conf, slurmdbd.conf, cgroup.conf, node definitions
4 Partitions & QOS Partition design, QOS limits, preemption, access control
5 Accounts & Fairshare sacctmgr, account hierarchy, fairshare algorithm, shares
6 Resource Management GRES (GPUs, licenses), consumable resources, cgroup enforcement
7 Monitoring & Accounting sacct, sreport, sdiag, Prometheus/Grafana integration
8 Policies & Priority Multifactor priority, preemption, backfill, reservations
9 Troubleshooting Pending job diagnosis, node states, log files, common problems
10 Maintenance & Operations Draining, upgrades, backups, config versioning
11 High Availability slurmctld failover, slurmdbd HA, state directory, testing

Deployment Modules (choose your platform)

Module When to Read
On-Premise Deployment Deploying Slurm on bare-metal or VM infrastructure
AWS ParallelCluster Self-managed Slurm on AWS with dynamic scaling
AWS PCS AWS-managed Slurm (minimal admin overhead)

Read the deployment module(s) that match your environment. Most admins should also skim the other deployment modules for cross-platform awareness.

Quick Reference

Reference Use For
Command Cheatsheet Quick command lookup
Environment Variables SLURM_* variables in job scripts
Job State Codes Decoding job states (PD, R, CG, F, etc.)
Node State Codes Decoding node states (idle, alloc, drain, down)

Recommendation: Admins should also complete Track 1-3 (user modules) to understand the end-user experience on the cluster they manage.


Track 5: IT Leadership (L5)

For managers, directors, and decision-makers evaluating or overseeing HPC infrastructure. No command-line prerequisites.

# Module What You'll Learn
1 What is HPC Scheduling? Why shared computing needs a scheduler (non-technical overview)
2 Why Slurm Market position, GPU support, cloud integration, cost model, talent pool
3 Slurm Architecture Technical architecture at a glance (skim for context)
4 Capacity Planning Utilization metrics, sreport, planning strategies, cloud bursting economics
5 Cost Allocation Chargeback/showback, account hierarchy, QOS budgets, cloud cost attribution
6 Accounts & Fairshare How resource allocation and fairshare work (skim for policy context)
7 Policies & Priority Scheduling policies, preemption, reservations (skim for policy context)

After this track: You can make informed decisions about HPC infrastructure investments, evaluate Slurm vs. alternatives, and understand the reporting tools available for cost management.


Application Tracks

These tracks are for organizations running specific life science applications on Slurm. Each has an admin module (setup/configuration) and a user module (daily use).

Schrodinger Suite + Slurm

# Module Audience What You'll Learn
1 Schrodinger Admin Setup L4 SLM licensing, hosts file/hosts.yml, GPU config, Job Server, license-aware scheduling
2 Schrodinger User Guide L2-L3 -HOST flag, -NPROC, Maestro submission, monitoring, troubleshooting

Prerequisites: Track 1-2 (user basics) for the user module, Track 4 (admin) for the admin module.

CryoSPARC + Slurm

# Module Audience What You'll Learn
1 CryoSPARC Admin Setup L4 Cluster lanes, cluster_info.json, cluster_script.sh, GPU management, CryoSPARC Live
2 CryoSPARC User Guide L2-L3 Lane selection, resource settings, monitoring, failure diagnosis, SSD caching

Prerequisites: Track 1-2 (user basics) for the user module, Track 4 (admin, especially GPU and GRES) for the admin module.


Migration Tracks

For users transitioning from another scheduler to Slurm. Each guide provides command mapping tables, job script translations, and behavioral differences.

Source Scheduler Module Focus
Sun Grid Engine (SGE) SGE to Slurm qsub/sbatch, PE/--ntasks, queues/partitions
PBS/Torque PBS to Slurm qsub/sbatch, PBS directives to #SBATCH
IBM LSF LSF to Slurm bsub/sbatch, LSF queues to Slurm partitions

Best approach: Read your migration guide first, then work through Track 1-2 (or Track 3 if you were a power user on the old scheduler).


Deployment Overlays

Every module in the training set covers concepts that apply across all deployment platforms. Deployment-specific differences are highlighted in callout blocks throughout the modules:

ParallelCluster Note: AWS ParallelCluster-specific behavior or configuration

PCS Note: AWS PCS-specific behavior or configuration

On-Prem Note: On-premise-specific considerations

For deep dives into deployment-specific topics, see:

Deployment Module Key Topics
On-Premise Infrastructure, networking, storage, identity, configless mode, large cluster tuning
AWS ParallelCluster YAML config, static/dynamic nodes, FSx/EFS, EFA, Spot instances, cost management
AWS PCS Managed Slurm, cluster sizing, custom settings, multi-cluster sackd, accounting

Suggested Learning Paths by Role

Structural Biologist (cryo-EM)

  1. Track 1 (Getting Started)
  2. Track 2 modules 7-8 (Resources, Interactive Jobs)
  3. GPU Jobs
  4. CryoSPARC User Guide

Computational Chemist (drug discovery)

  1. Track 1 (Getting Started)
  2. Track 2 (End-User Essentials)
  3. GPU Jobs
  4. Schrodinger User Guide

Bioinformatician (genomics pipelines)

  1. Track 1 (Getting Started)
  2. Track 2 (End-User Essentials)
  3. Job Arrays + Job Dependencies
  4. Containers
  5. Best Practices

HPC System Administrator (new to Slurm)

  1. Track 1-3 (all user modules, to understand the user experience)
  2. Track 4 (all admin modules)
  3. Deployment module for your platform
  4. Application modules for your site's software

IT Director evaluating Slurm

  1. Track 5 (IT Leadership)
  2. Skim deployment modules for your target platform

Migrating from SGE/PBS/LSF

  1. Your migration rosetta stone
  2. Track 1-2 (build Slurm muscle memory)
  3. Track 3 if you were a power user
  4. Track 4 if you were an admin