Monitoring Jobs
Exercises¶
- Custom squeue format
Use squeue with a custom output format to show your jobs with these columns: job ID, partition, job name (up to 30 characters), state, elapsed time, time limit, and node list. Submit a test job first so you have something to see.
Hint / Solution
- Inspect a job with scontrol
Submit a job that sleeps for 5 minutes. While it is running, use scontrol show job to find: (a) the working directory, (b) the stdout file path, (c) the exact submit time, and (d) the TRES (trackable resources) allocated.
Hint / Solution
- Find a completed job's memory usage with sacct
After a job completes, use sacct to compare the memory you requested (ReqMem) with the actual peak memory used (MaxRSS). Format the output to include JobID, JobName, ReqMem, MaxRSS, and State.
Hint / Solution
- Use sinfo to find idle nodes
Use sinfo to find all idle nodes in the default partition. Display hostnames, CPU count, and memory for each idle node.
Hint / Solution
Replace `batch` with your cluster's default partition name if different.- Track estimated start time of a pending job
Submit a job requesting resources that may cause it to pend (e.g., a large memory request). While it is pending, use squeue --start to check its estimated start time.