Command Cheatsheet # Slurm Command Cheatsheet ## Job Submission | Command | Description | |---------|-------------| | `sbatch script.sh` | Submit batch job | | `sbatch --wrap="command"` | Submit one-liner | | `sbatch --array=1-100%50 script.sh` | Submit job array (max 50 concurrent) | | `sbatch --dependency=afterok:123 script.sh` | Submit with dependency | | `sbatch --hold script.sh` | Submit in held state | ## Job Monitoring | Command | Description | |---------|-------------| | `squeue --me` | My active jobs | | `squeue` | All active jobs | | `squeue -p gpu` | Jobs in GPU partition | | `squeue --me --start` | Estimated start times | | `squeue --me -o "%.8i %.9P %.20j %.2t %.10M %R"` | Custom format | | `scontrol show job 123` | Full job details | | `sacct -j 123` | Completed job info | | `sacct -j 123 --format=JobID,Elapsed,MaxRSS,State,ExitCode` | Resource usage | | `sacct --starttime=now-7days --state=FAILED` | Recent failures | ## Job Management | Command | Description | |---------|-------------| | `scancel 123` | Cancel job | | `scancel --me` | Cancel all my jobs | | `scancel --me --state=PENDING` | Cancel only pending jobs | | `scancel 123_[1-10]` | Cancel array tasks 1-10 | | `scontrol hold 123` | Hold pending job | | `scontrol release 123` | Release held job | | `scontrol update JobId=123 TimeLimit=24:00:00` | Change time limit | | `scontrol requeue 123` | Requeue job | ## Cluster Information | Command | Description | |---------|-------------| | `sinfo` | Partition/node summary | | `sinfo -N -l` | Detailed node list | | `sinfo -p gpu -o "%N %G %c %m"` | GPU partition details | | `sinfo -t idle` | Only idle nodes | | `scontrol show partition` | Partition configuration | | `scontrol show node node001` | Node details | | `scontrol ping` | Controller health check | ## Accounting & Priority | Command | Description | |---------|-------------| | `sacct` | Today's jobs | | `sreport cluster utilization` | Cluster utilization | | `sreport user TopUsage --tres=cpu TopCount=10` | Top CPU users | | `sshare -l` | Fairshare values | | `sprio` | Priority breakdown for pending jobs | | `sdiag` | Scheduler diagnostics | ## Interactive Jobs | Command | Description | |---------|-------------| | `srun --pty bash` | Interactive shell | | `srun --pty -p gpu --gres=gpu:1 bash` | Interactive GPU shell | | `srun --pty --x11 bash` | Interactive with X11 | | `salloc --time=4:00:00` | Reserve resources | | `srun hostname` | Run command on compute node | ## Admin Commands | Command | Description | |---------|-------------| | `scontrol reconfigure` | Reload slurm.conf | | `scontrol update NodeName=X State=DRAIN Reason="text"` | Drain node | | `scontrol update NodeName=X State=RESUME` | Resume node | | `scontrol create reservation ...` | Create reservation | | `scontrol setdebug debug` | Increase log verbosity | | `scontrol setdebug info` | Reset log verbosity | | `sacctmgr show account` | List accounts | | `sacctmgr show association tree` | Show account tree | | `sacctmgr show qos` | List QOS | | `sacctmgr add user name account=acct` | Add user | ## Common sbatch Options ```bash #SBATCH --job-name=NAME # Job name #SBATCH --output=FILE # Stdout (%j=jobid, %x=name, %a=arrayid) #SBATCH --error=FILE # Stderr #SBATCH --partition=PART # Partition #SBATCH --time=HH:MM:SS # Walltime #SBATCH --ntasks=N # Number of tasks (MPI ranks) #SBATCH --cpus-per-task=N # CPUs per task (threads) #SBATCH --nodes=N # Number of nodes #SBATCH --mem=NG # Memory per node #SBATCH --mem-per-cpu=NG # Memory per CPU #SBATCH --gres=gpu:N # GPUs #SBATCH --array=RANGE%LIMIT # Job array #SBATCH --dependency=TYPE:ID # Dependency #SBATCH --account=ACCT # Account #SBATCH --exclusive # Exclusive node #SBATCH --mail-type=END,FAIL # Email notifications #SBATCH --mail-user=EMAIL # Email address #SBATCH --constraint=FEATURE # Node feature ``` ===== ## References - [SchedMD: Rosetta Stone](https://slurm.schedmd.com/rosetta.html) - [SchedMD: sbatch](https://slurm.schedmd.com/sbatch.html) - [SchedMD: squeue](https://slurm.schedmd.com/squeue.html) - [SchedMD: sacct](https://slurm.schedmd.com/sacct.html) - [SchedMD: sinfo](https://slurm.schedmd.com/sinfo.html)