Job Arrays
Exercises¶
- Submit a 10-task array job
Write a job script that prints "Task [index] running on [hostname]" for each array task. Submit it with --array=1-10. After all tasks complete, check that you have 10 output files.
Hint / Solution
- Use SLURM_ARRAY_TASK_ID to process different input files
Create 5 small input files (sample_1.txt through sample_5.txt), each containing a unique line of text. Write an array job that reads the file corresponding to its task ID and prints the contents.
Hint / Solution
# Create sample input files
for i in $(seq 1 5); do
echo "Data for sample $i" > sample_${i}.txt
done
cat > process_samples.sh << 'EOF'
#!/bin/bash
#SBATCH --job-name=process_samples
#SBATCH --output=logs/sample_%A_%a.out
#SBATCH --time=00:05:00
#SBATCH --mem=1G
INPUT=sample_${SLURM_ARRAY_TASK_ID}.txt
echo "Processing $INPUT:"
cat $INPUT
EOF
mkdir -p logs
sbatch --array=1-5 process_samples.sh
- Throttle an array to 3 concurrent tasks
Submit a 10-task array job where each task sleeps for 60 seconds, but limit concurrency to at most 3 tasks running at the same time. While it runs, use squeue --me to verify that no more than 3 tasks are in R state.
Hint / Solution
- Cancel a single array task
Submit a 10-task array job where each task sleeps for 5 minutes. Once tasks are running, cancel only task 5. Verify with squeue that the other tasks are still active.
Hint / Solution
- Use a file list with an array job
Create a file called file_list.txt with 4 file paths (one per line). Write an array job with --array=1-4 that reads the Nth line from file_list.txt using sed and prints it. This pattern is essential for processing samples with non-numeric names.
Hint / Solution
cat > file_list.txt << 'EOF'
/data/samples/patient_alpha.bam
/data/samples/patient_beta.bam
/data/samples/patient_gamma.bam
/data/samples/patient_delta.bam
EOF
cat > filelist_array.sh << 'EOF'
#!/bin/bash
#SBATCH --job-name=filelist
#SBATCH --output=filelist_%A_%a.out
#SBATCH --time=00:05:00
#SBATCH --mem=1G
INPUT=$(sed -n "${SLURM_ARRAY_TASK_ID}p" file_list.txt)
echo "Task $SLURM_ARRAY_TASK_ID would process: $INPUT"
EOF
sbatch --array=1-4 filelist_array.sh