Schrodinger User # Schrodinger Suite & Slurm -- End-User Guide ## Overview Schrodinger Suite is an integrated computational chemistry platform for drug discovery, materials science, and molecular modeling. When running on an HPC cluster, Schrodinger uses Slurm behind the scenes -- you submit to Schrodinger, and Schrodinger submits to Slurm. This module covers what you need to know as an end-user: how to submit jobs, choose resources, monitor progress, and troubleshoot failures. ### How It Works ``` You (Maestro or CLI) │ ▼ Schrodinger Job Layer │ Translates your -HOST and -NPROC into sbatch options ▼ Slurm (sbatch) │ Schedules and runs the job on compute nodes ▼ Compute Node (runs your Glide/Desmond/FEP+ job) ``` The key concept: your admin has configured **host entries** (like `cpu`, `gpu`, `cpu_highmem`) that map to Slurm partitions and resource settings. You select a host entry when submitting a job, and Schrodinger handles the rest. --- ## Submitting Jobs via Maestro GUI 1. Set up your calculation (e.g., Glide docking, Desmond MD) 2. In the **Start** dialog, select a **Host** from the dropdown - `cpu` -- standard CPU jobs (Glide, LigPrep, QikProp) - `gpu` -- GPU-accelerated jobs (Desmond, FEP+) - `cpu_highmem` -- high-memory jobs (large Prime, conformational search) - `driver` -- workflow orchestration (usually auto-selected) 3. Set the **Number of Processors** (maps to `-NPROC`) 4. Click **Start** Maestro shows the host entries configured by your admin. If you do not see the expected entries, contact your system administrator. --- ## Submitting Jobs via Command Line ### Common Command Structure ```bash $SCHRODINGER/ [options] -HOST -NPROC input_file ``` ### Examples ```bash # Glide docking (CPU) $SCHRODINGER/glide glide_dock.in -HOST cpu -NPROC 8 # Desmond molecular dynamics (GPU) $SCHRODINGER/desmond -HOST gpu -NPROC 4 desmond_md.msj # LigPrep (CPU, quick) $SCHRODINGER/ligprep -HOST cpu -NPROC 4 -i 0 input.sdf -o output.maegz # Prime homology modeling (high memory) $SCHRODINGER/prime prime_input.inp -HOST cpu_highmem -NPROC 16 ``` ### Key Flags | Flag | Purpose | Example | |------|---------|---------| | `-HOST` | Select the host entry (maps to Slurm partition) | `-HOST gpu` | | `-NPROC` | Number of processors/GPUs to request | `-NPROC 8` | | `-WAIT` | Block until the job completes | `-WAIT` | | `-LOCAL` | Run locally (not on the cluster) | `-LOCAL` | | `-OVERWRITE` | Overwrite existing output files | `-OVERWRITE` | ### The -HOST Flag The `-HOST` flag selects a pre-configured host entry. Each entry maps to a Slurm partition with specific resources: ```bash # See available host entries # Job Control (legacy): cat $SCHRODINGER/schrodinger.hosts # Job Server (modern): $SCHRODINGER/jsc list-hosts ``` Common patterns: - `-HOST cpu` -- general CPU compute - `-HOST gpu` -- GPU-accelerated - `-HOST cpu_highmem` -- high-memory nodes - `-HOST driver` -- workflow orchestration (lightweight) ### The -NPROC Flag `-NPROC` controls how many processors (or GPUs for GPU host entries) are requested: ```bash # Request 16 CPU cores for a Glide job $SCHRODINGER/glide dock.in -HOST cpu -NPROC 16 # Request 4 GPUs for a Desmond job $SCHRODINGER/desmond md.msj -HOST gpu -NPROC 4 ``` Behind the scenes, `-NPROC 16` on the `cpu` host entry becomes `--ntasks-per-node=16` in the sbatch command. On the `gpu` entry, `-NPROC 4` becomes `--gres=gpu:4`. --- ## Monitoring Jobs ### Schrodinger Job Monitor (Maestro) In Maestro: **Tasks > Job Monitor** shows all your active and completed Schrodinger jobs with status, host entry, and elapsed time. ### Command-Line Monitoring ```bash # Job Control (legacy) $SCHRODINGER/jobcontrol -list $SCHRODINGER/jobcontrol -list -all # Include completed jobs $SCHRODINGER/jobcontrol -list -j JOBNAME # Specific job # Job Server (modern) $SCHRODINGER/jsc list $SCHRODINGER/jsc list --all ``` ### Seeing Your Jobs in Slurm Since Schrodinger jobs run through Slurm, you can also use standard Slurm commands: ```bash # See your running/pending Slurm jobs squeue -u $USER # Detailed info on a specific Slurm job scontrol show job # Completed job accounting sacct -j --format=JobID,JobName,Elapsed,MaxRSS,ExitCode ``` ### Mapping Schrodinger Job IDs to Slurm Job IDs Schrodinger assigns its own job IDs. To find the corresponding Slurm job: ```bash # Job Control -- check the job's log file $SCHRODINGER/jobcontrol -list -j JOBNAME # Look for the Slurm job ID in the log output # Or search squeue by job name pattern squeue -u $USER -o "%.10i %.30j %.8T %.10M" ``` --- ## Understanding Job Failures When a job fails, you need to determine whether the failure is on the Schrodinger side or the Slurm side. ### Where to Look | Log Type | Location | What It Tells You | |----------|----------|-------------------| | **Schrodinger log** | `.log` in the working directory | Application errors, license issues | | **Schrodinger job record** | `$SCHRODINGER/jobcontrol -list -j ` | Exit status, host used | | **Slurm output** | `slurm-.out` or job-specific `.out/.err` | Slurm errors, resource limits | | **sacct** | `sacct -j ` | Exit code, memory usage, time elapsed | ### Common Failure Modes **License unavailable:** ``` Error: Unable to check out license for GLIDE ``` - The Schrodinger License Manager (SLM) server is unreachable or all licenses are in use - Wait and retry, or check with your admin about license availability - Note: Schrodinger uses SLM licensing (FlexNet/FLEXlm has been discontinued as of 2025-1). License issues may be caused by network/firewall changes blocking port 53000 to the SLM server **Scratch directory issues:** ``` Error: Cannot write to tmpdir /scr/... ``` - The scratch directory is full or not mounted - Contact your admin about the scratch filesystem **Out of memory (OOM):** ```bash # Check sacct for OOM indication sacct -j --format=JobID,MaxRSS,ReqMem,State # State = OUT_OF_MEMORY ``` - Your job needed more memory than Slurm allocated - Use a host entry with more memory (`cpu_highmem`) or ask your admin to adjust limits **Wall time exceeded:** ```bash sacct -j --format=JobID,Elapsed,Timelimit,State # State = TIMEOUT ``` - The job ran longer than the Slurm partition allows - Ask your admin about partition time limits, or break the job into smaller steps **PATH or environment issues:** ``` Error: $SCHRODINGER/mmshare-v... not found ``` - The Schrodinger installation is not accessible on the compute node - Contact your admin -- likely a shared filesystem issue ### When to Check Schrodinger Logs vs. Slurm Logs - **Job never started** → check `squeue` (is it pending?) and Slurm reason codes - **Job started but failed immediately** → check Schrodinger `.log` file - **Job ran for a while then failed** → check both Schrodinger `.log` and `sacct` for OOM/TIMEOUT - **Job completed but results are wrong** → this is a Schrodinger application issue, not Slurm --- ## Best Practices ### Choose the Right Host Entry | Workload | Host Entry | Typical -NPROC | |----------|-----------|----------------| | Glide docking (HTVS/SP) | `cpu` | 4-8 | | Glide docking (XP) | `cpu` | 8-16 | | LigPrep | `cpu` | 4 | | QikProp | `cpu` | 1-4 | | Prime (small) | `cpu` | 4-8 | | Prime (large, loop modeling) | `cpu_highmem` | 16-32 | | Desmond MD | `gpu` | 1-4 (GPUs) | | FEP+ | `gpu` | 1-8 (GPUs) | | Workflow drivers | `driver` | 1-4 | ### Resource Estimation Tips - **Glide:** Scales well to 8-16 cores per job. Beyond that, diminishing returns. - **Desmond:** GPU-bound. 1 GPU for small systems, 4 GPUs for large membrane systems. - **FEP+:** Each lambda window uses 1 GPU. Total GPUs = number of concurrent lambda windows. - **Prime:** Memory-intensive for loop modeling. Use `cpu_highmem` for large proteins. - **LigPrep:** Lightweight. 4 cores is usually sufficient. ### Scratch Space - Jobs write temporary files to the scratch directory (`tmpdir` in the host entry) - Large Desmond trajectories can generate tens of GB of scratch data - If scratch is on a shared filesystem, be mindful of quota limits - Clean up completed job directories if scratch is not auto-purged ### Don't Run on Login Nodes Never run compute-intensive Schrodinger jobs directly on the login node. Always use `-HOST` to submit through Slurm. The login node is for job submission, file management, and light interactive work only. Related Modules¶ Schrodinger Admin Setup -- how your admin configures Schrodinger for Slurm Submitting Jobs -- general Slurm job submission Monitoring Jobs -- squeue, sacct, and job monitoring GPU Jobs -- Slurm GPU job concepts References¶ Schrodinger: Running Jobs Schrodinger: Command-Line Reference Schrodinger: Maestro Job Monitor Slurm: squeue Slurm: sacct