Skip to content

Monitoring Jobs

Exercises

  1. Custom squeue format

Use squeue with a custom output format to show your jobs with these columns: job ID, partition, job name (up to 30 characters), state, elapsed time, time limit, and node list. Submit a test job first so you have something to see.

Hint / Solution
# Submit a test job
sbatch --time=00:10:00 --wrap="sleep 300" --job-name=format_test

# Custom format
squeue --me -o "%.8i %.9P %.30j %.2t %.10M %.10l %R"
  1. Inspect a job with scontrol

Submit a job that sleeps for 5 minutes. While it is running, use scontrol show job to find: (a) the working directory, (b) the stdout file path, (c) the exact submit time, and (d) the TRES (trackable resources) allocated.

Hint / Solution
sbatch --time=00:10:00 --mem=2G --cpus-per-task=2 --wrap="sleep 300"

scontrol show job <jobid>
# Look for: WorkDir, StdOut, SubmitTime, and TRES fields
  1. Find a completed job's memory usage with sacct

After a job completes, use sacct to compare the memory you requested (ReqMem) with the actual peak memory used (MaxRSS). Format the output to include JobID, JobName, ReqMem, MaxRSS, and State.

Hint / Solution
sacct -j <jobid> --format=JobID,JobName,ReqMem,MaxRSS,State

# Note: MaxRSS appears on the .batch step, not the parent job line
  1. Use sinfo to find idle nodes

Use sinfo to find all idle nodes in the default partition. Display hostnames, CPU count, and memory for each idle node.

Hint / Solution
sinfo -p batch -t idle -o "%n %c %m"
Replace `batch` with your cluster's default partition name if different.
  1. Track estimated start time of a pending job

Submit a job requesting resources that may cause it to pend (e.g., a large memory request). While it is pending, use squeue --start to check its estimated start time.

Hint / Solution
sbatch --time=01:00:00 --mem=200G --wrap="sleep 60"

squeue --me --start
# The START_TIME column shows when Slurm estimates the job will begin

References