Training Tracks Guide¶

This guide organizes the 44 training modules into recommended reading paths by audience. Each track is self-contained -- start with the track that matches your role and add modules from other tracks as needed.

Audience Levels¶

Level	Role	Description
L1	Curious End-User	"What is this scheduler thing and why should I care?"
L2	Working End-User	"I need to submit jobs and get work done"
L3	Power User	"I want to optimize my workflows and use advanced features"
L4	Administrator	"I need to install, configure, and manage Slurm"
L5	IT Leadership	"I need to understand capacity, cost, and strategy"

Track 1: Getting Started (L1-L2)¶

For users new to HPC or Slurm. Start here if you have never used a job scheduler before.

#	Module	What You'll Learn
1	What is HPC Scheduling?	Why clusters need a scheduler, the "contract" between you and the system
2	Slurm Overview	Key concepts: nodes, partitions, jobs, steps
3	Getting Started	Log in, write your first job script, submit with sbatch
4	Submitting Jobs	sbatch options, directives, output files, --wrap
5	Monitoring Jobs	squeue, scontrol show job, sacct, sinfo
6	Managing Jobs	scancel, hold/release, modify pending jobs

After this track: You can submit jobs, check their status, and manage them. Continue to Track 2 for resource management and advanced features.

Quick reference: Command Cheatsheet | Environment Variables

Track 2: End-User Essentials (L2)¶

For working users who submit jobs regularly. Builds on Track 1.

#	Module	What You'll Learn
1-6	Track 1 modules	(prerequisite)
7	Resource Requests	--mem, --cpus-per-task, --time, --exclusive, GRES
8	Interactive Jobs	srun, salloc, X11 forwarding, Jupyter on compute nodes
9	Environment Modules	module load/unload, Lmod, using modules in job scripts
10	Containers on Slurm	Singularity/Apptainer, OCI containers, --container
11	Best Practices	Resource estimation, efficiency, being a good cluster citizen

After this track: You can write efficient job scripts, request appropriate resources, and use the cluster effectively.

Quick reference: Job State Codes

Track 3: Power User (L3)¶

For users running complex workflows, MPI jobs, GPU workloads, and automated pipelines. Builds on Track 2.

#	Module	What You'll Learn
1-11	Track 2 modules	(prerequisite)
12	Job Arrays	--array, SLURM_ARRAY_TASK_ID, parameter sweeps, throttling
13	Job Dependencies	--dependency, afterok/afterany, building pipelines
14	Parallel & MPI Jobs	--ntasks vs --cpus-per-task, srun as MPI launcher, hybrid jobs
15	GPU Jobs	--gres=gpu, GPU types, multi-GPU, CUDA_VISIBLE_DEVICES
16	Recurring Jobs (scrontab)	Slurm's built-in cron for scheduled cluster jobs

After this track: You can build complex multi-step pipelines, run MPI and GPU workloads, and automate recurring analysis.

Track 4: Administrator (L4)¶

For cluster administrators. This is the most comprehensive track, covering Slurm installation through production operations.

Core Admin Modules¶

#	Module	What You'll Learn
1	Slurm Architecture	Daemons, communication model, authentication options
2	Installation	Prerequisites, Munge/auth/slurm, packages, MariaDB, slurmdbd
3	Configuration	slurm.conf, slurmdbd.conf, cgroup.conf, node definitions
4	Partitions & QOS	Partition design, QOS limits, preemption, access control
5	Accounts & Fairshare	sacctmgr, account hierarchy, fairshare algorithm, shares
6	Resource Management	GRES (GPUs, licenses), consumable resources, cgroup enforcement
7	Monitoring & Accounting	sacct, sreport, sdiag, Prometheus/Grafana integration
8	Policies & Priority	Multifactor priority, preemption, backfill, reservations
9	Troubleshooting	Pending job diagnosis, node states, log files, common problems
10	Maintenance & Operations	Draining, upgrades, backups, config versioning
11	High Availability	slurmctld failover, slurmdbd HA, state directory, testing

Deployment Modules (choose your platform)¶

Module	When to Read
On-Premise Deployment	Deploying Slurm on bare-metal or VM infrastructure
AWS ParallelCluster	Self-managed Slurm on AWS with dynamic scaling
AWS PCS	AWS-managed Slurm (minimal admin overhead)

Read the deployment module(s) that match your environment. Most admins should also skim the other deployment modules for cross-platform awareness.

Quick Reference¶

Reference	Use For
Command Cheatsheet	Quick command lookup
Environment Variables	SLURM_* variables in job scripts
Job State Codes	Decoding job states (PD, R, CG, F, etc.)
Node State Codes	Decoding node states (idle, alloc, drain, down)

Recommendation: Admins should also complete Track 1-3 (user modules) to understand the end-user experience on the cluster they manage.

Track 5: IT Leadership (L5)¶

For managers, directors, and decision-makers evaluating or overseeing HPC infrastructure. No command-line prerequisites.

#	Module	What You'll Learn
1	What is HPC Scheduling?	Why shared computing needs a scheduler (non-technical overview)
2	Why Slurm	Market position, GPU support, cloud integration, cost model, talent pool
3	Slurm Architecture	Technical architecture at a glance (skim for context)
4	Capacity Planning	Utilization metrics, sreport, planning strategies, cloud bursting economics
5	Cost Allocation	Chargeback/showback, account hierarchy, QOS budgets, cloud cost attribution
6	Accounts & Fairshare	How resource allocation and fairshare work (skim for policy context)
7	Policies & Priority	Scheduling policies, preemption, reservations (skim for policy context)

After this track: You can make informed decisions about HPC infrastructure investments, evaluate Slurm vs. alternatives, and understand the reporting tools available for cost management.

Application Tracks¶

These tracks are for organizations running specific life science applications on Slurm. Each has an admin module (setup/configuration) and a user module (daily use).

Schrodinger Suite + Slurm¶

#	Module	Audience	What You'll Learn
1	Schrodinger Admin Setup	L4	SLM licensing, hosts file/hosts.yml, GPU config, Job Server, license-aware scheduling
2	Schrodinger User Guide	L2-L3	-HOST flag, -NPROC, Maestro submission, monitoring, troubleshooting

Prerequisites: Track 1-2 (user basics) for the user module, Track 4 (admin) for the admin module.

CryoSPARC + Slurm¶

#	Module	Audience	What You'll Learn
1	CryoSPARC Admin Setup	L4	Cluster lanes, cluster_info.json, cluster_script.sh, GPU management, CryoSPARC Live
2	CryoSPARC User Guide	L2-L3	Lane selection, resource settings, monitoring, failure diagnosis, SSD caching

Prerequisites: Track 1-2 (user basics) for the user module, Track 4 (admin, especially GPU and GRES) for the admin module.

Migration Tracks¶

For users transitioning from another scheduler to Slurm. Each guide provides command mapping tables, job script translations, and behavioral differences.

Source Scheduler	Module	Focus
Sun Grid Engine (SGE)	SGE to Slurm	qsub/sbatch, PE/--ntasks, queues/partitions
PBS/Torque	PBS to Slurm	qsub/sbatch, PBS directives to #SBATCH
IBM LSF	LSF to Slurm	bsub/sbatch, LSF queues to Slurm partitions

Best approach: Read your migration guide first, then work through Track 1-2 (or Track 3 if you were a power user on the old scheduler).

Deployment Overlays¶

Every module in the training set covers concepts that apply across all deployment platforms. Deployment-specific differences are highlighted in callout blocks throughout the modules:

ParallelCluster Note: AWS ParallelCluster-specific behavior or configuration

PCS Note: AWS PCS-specific behavior or configuration

On-Prem Note: On-premise-specific considerations

For deep dives into deployment-specific topics, see:

Deployment	Module	Key Topics
On-Premise	Infrastructure, networking, storage, identity, configless mode, large cluster tuning
AWS ParallelCluster	YAML config, static/dynamic nodes, FSx/EFS, EFA, Spot instances, cost management
AWS PCS	Managed Slurm, cluster sizing, custom settings, multi-cluster sackd, accounting

Suggested Learning Paths by Role¶

Structural Biologist (cryo-EM)¶

Track 1 (Getting Started)
Track 2 modules 7-8 (Resources, Interactive Jobs)
GPU Jobs
CryoSPARC User Guide

Computational Chemist (drug discovery)¶

Track 1 (Getting Started)
Track 2 (End-User Essentials)
GPU Jobs
Schrodinger User Guide

Bioinformatician (genomics pipelines)¶

Track 1 (Getting Started)
Track 2 (End-User Essentials)
Job Arrays + Job Dependencies
Containers
Best Practices

HPC System Administrator (new to Slurm)¶

Track 1-3 (all user modules, to understand the user experience)
Track 4 (all admin modules)
Deployment module for your platform
Application modules for your site's software

IT Director evaluating Slurm¶

Track 5 (IT Leadership)
Skim deployment modules for your target platform

Migrating from SGE/PBS/LSF¶

Your migration rosetta stone
Track 1-2 (build Slurm muscle memory)
Track 3 if you were a power user
Track 4 if you were an admin