Skip to content

Job Arrays

Exercises

  1. Submit a 10-task array job

Write a job script that prints "Task [index] running on [hostname]" for each array task. Submit it with --array=1-10. After all tasks complete, check that you have 10 output files.

Hint / Solution
cat > array_test.sh << 'EOF'
#!/bin/bash
#SBATCH --job-name=array_test
#SBATCH --output=array_%A_%a.out
#SBATCH --time=00:05:00
#SBATCH --mem=1G

echo "Task $SLURM_ARRAY_TASK_ID running on $(hostname)"
EOF

sbatch --array=1-10 array_test.sh

# After completion:
ls array_*.out | wc -l
# Should show 10
  1. Use SLURM_ARRAY_TASK_ID to process different input files

Create 5 small input files (sample_1.txt through sample_5.txt), each containing a unique line of text. Write an array job that reads the file corresponding to its task ID and prints the contents.

Hint / Solution
# Create sample input files
for i in $(seq 1 5); do
    echo "Data for sample $i" > sample_${i}.txt
done

cat > process_samples.sh << 'EOF'
#!/bin/bash
#SBATCH --job-name=process_samples
#SBATCH --output=logs/sample_%A_%a.out
#SBATCH --time=00:05:00
#SBATCH --mem=1G

INPUT=sample_${SLURM_ARRAY_TASK_ID}.txt
echo "Processing $INPUT:"
cat $INPUT
EOF

mkdir -p logs
sbatch --array=1-5 process_samples.sh
  1. Throttle an array to 3 concurrent tasks

Submit a 10-task array job where each task sleeps for 60 seconds, but limit concurrency to at most 3 tasks running at the same time. While it runs, use squeue --me to verify that no more than 3 tasks are in R state.

Hint / Solution
sbatch --array=1-10%3 --time=00:05:00 --wrap="sleep 60" --job-name=throttle_test

# Check that at most 3 are running
squeue --me
  1. Cancel a single array task

Submit a 10-task array job where each task sleeps for 5 minutes. Once tasks are running, cancel only task 5. Verify with squeue that the other tasks are still active.

Hint / Solution
sbatch --array=1-10 --time=00:10:00 --wrap="sleep 300" --job-name=cancel_one

# Note the array job ID (e.g., 12340)
squeue --me -r

# Cancel only task 5
scancel 12340_5

# Verify task 5 is gone but others remain
squeue --me -r

# Clean up
scancel 12340
  1. Use a file list with an array job

Create a file called file_list.txt with 4 file paths (one per line). Write an array job with --array=1-4 that reads the Nth line from file_list.txt using sed and prints it. This pattern is essential for processing samples with non-numeric names.

Hint / Solution
cat > file_list.txt << 'EOF'
/data/samples/patient_alpha.bam
/data/samples/patient_beta.bam
/data/samples/patient_gamma.bam
/data/samples/patient_delta.bam
EOF

cat > filelist_array.sh << 'EOF'
#!/bin/bash
#SBATCH --job-name=filelist
#SBATCH --output=filelist_%A_%a.out
#SBATCH --time=00:05:00
#SBATCH --mem=1G

INPUT=$(sed -n "${SLURM_ARRAY_TASK_ID}p" file_list.txt)
echo "Task $SLURM_ARRAY_TASK_ID would process: $INPUT"
EOF

sbatch --array=1-4 filelist_array.sh

References