Schrodinger User

# Schrodinger Suite & Slurm -- End-User Guide

## Overview

Schrodinger Suite is an integrated computational chemistry platform for drug discovery, materials science, and molecular modeling. When running on an HPC cluster, Schrodinger uses Slurm behind the scenes -- you submit to Schrodinger, and Schrodinger submits to Slurm.

This module covers what you need to know as an end-user: how to submit jobs, choose resources, monitor progress, and troubleshoot failures.

### How It Works

```
You (Maestro or CLI)
    │
    ▼
Schrodinger Job Layer
    │  Translates your -HOST and -NPROC into sbatch options
    ▼
Slurm (sbatch)
    │  Schedules and runs the job on compute nodes
    ▼
Compute Node (runs your Glide/Desmond/FEP+ job)
```

The key concept: your admin has configured **host entries** (like `cpu`, `gpu`, `cpu_highmem`) that map to Slurm partitions and resource settings. You select a host entry when submitting a job, and Schrodinger handles the rest.

---

## Submitting Jobs via Maestro GUI

1. Set up your calculation (e.g., Glide docking, Desmond MD)
2. In the **Start** dialog, select a **Host** from the dropdown
   - `cpu` -- standard CPU jobs (Glide, LigPrep, QikProp)
   - `gpu` -- GPU-accelerated jobs (Desmond, FEP+)
   - `cpu_highmem` -- high-memory jobs (large Prime, conformational search)
   - `driver` -- workflow orchestration (usually auto-selected)
3. Set the **Number of Processors** (maps to `-NPROC`)
4. Click **Start**

Maestro shows the host entries configured by your admin. If you do not see the expected entries, contact your system administrator.

---

## Submitting Jobs via Command Line

### Common Command Structure

```bash
$SCHRODINGER/<command> [options] -HOST <host_entry> -NPROC <N> input_file
```

### Examples

```bash
# Glide docking (CPU)
$SCHRODINGER/glide glide_dock.in -HOST cpu -NPROC 8

# Desmond molecular dynamics (GPU)
$SCHRODINGER/desmond -HOST gpu -NPROC 4 desmond_md.msj

# LigPrep (CPU, quick)
$SCHRODINGER/ligprep -HOST cpu -NPROC 4 -i 0 input.sdf -o output.maegz

# Prime homology modeling (high memory)
$SCHRODINGER/prime prime_input.inp -HOST cpu_highmem -NPROC 16
```

### Key Flags

| Flag | Purpose | Example |
|------|---------|---------|
| `-HOST` | Select the host entry (maps to Slurm partition) | `-HOST gpu` |
| `-NPROC` | Number of processors/GPUs to request | `-NPROC 8` |
| `-WAIT` | Block until the job completes | `-WAIT` |
| `-LOCAL` | Run locally (not on the cluster) | `-LOCAL` |
| `-OVERWRITE` | Overwrite existing output files | `-OVERWRITE` |

### The -HOST Flag

The `-HOST` flag selects a pre-configured host entry. Each entry maps to a Slurm partition with specific resources:

```bash
# See available host entries
# Job Control (legacy):
cat $SCHRODINGER/schrodinger.hosts

# Job Server (modern):
$SCHRODINGER/jsc list-hosts
```

Common patterns:
- `-HOST cpu` -- general CPU compute
- `-HOST gpu` -- GPU-accelerated
- `-HOST cpu_highmem` -- high-memory nodes
- `-HOST driver` -- workflow orchestration (lightweight)

### The -NPROC Flag

`-NPROC` controls how many processors (or GPUs for GPU host entries) are requested:

```bash
# Request 16 CPU cores for a Glide job
$SCHRODINGER/glide dock.in -HOST cpu -NPROC 16

# Request 4 GPUs for a Desmond job
$SCHRODINGER/desmond md.msj -HOST gpu -NPROC 4
```

Behind the scenes, `-NPROC 16` on the `cpu` host entry becomes `--ntasks-per-node=16` in the sbatch command. On the `gpu` entry, `-NPROC 4` becomes `--gres=gpu:4`.

---

## Monitoring Jobs

### Schrodinger Job Monitor (Maestro)

In Maestro: **Tasks > Job Monitor** shows all your active and completed Schrodinger jobs with status, host entry, and elapsed time.

### Command-Line Monitoring

```bash
# Job Control (legacy)
$SCHRODINGER/jobcontrol -list
$SCHRODINGER/jobcontrol -list -all       # Include completed jobs
$SCHRODINGER/jobcontrol -list -j JOBNAME  # Specific job

# Job Server (modern)
$SCHRODINGER/jsc list
$SCHRODINGER/jsc list --all
```

### Seeing Your Jobs in Slurm

Since Schrodinger jobs run through Slurm, you can also use standard Slurm commands:

```bash
# See your running/pending Slurm jobs
squeue -u $USER

# Detailed info on a specific Slurm job
scontrol show job <SLURM_JOBID>

# Completed job accounting
sacct -j <SLURM_JOBID> --format=JobID,JobName,Elapsed,MaxRSS,ExitCode
```

### Mapping Schrodinger Job IDs to Slurm Job IDs

Schrodinger assigns its own job IDs. To find the corresponding Slurm job:

```bash
# Job Control -- check the job's log file
$SCHRODINGER/jobcontrol -list -j JOBNAME
# Look for the Slurm job ID in the log output

# Or search squeue by job name pattern
squeue -u $USER -o "%.10i %.30j %.8T %.10M"
```

---

## Understanding Job Failures

When a job fails, you need to determine whether the failure is on the Schrodinger side or the Slurm side.

### Where to Look

| Log Type | Location | What It Tells You |
|----------|----------|-------------------|
| **Schrodinger log** | `<jobname>.log` in the working directory | Application errors, license issues |
| **Schrodinger job record** | `$SCHRODINGER/jobcontrol -list -j <name>` | Exit status, host used |
| **Slurm output** | `slurm-<jobid>.out` or job-specific `.out/.err` | Slurm errors, resource limits |
| **sacct** | `sacct -j <JOBID>` | Exit code, memory usage, time elapsed |

### Common Failure Modes

**License unavailable:**
```
Error: Unable to check out license for GLIDE
```
- The Schrodinger License Manager (SLM) server is unreachable or all licenses are in use
- Wait and retry, or check with your admin about license availability
- Note: Schrodinger uses SLM licensing (FlexNet/FLEXlm has been discontinued as of 2025-1). License issues may be caused by network/firewall changes blocking port 53000 to the SLM server

**Scratch directory issues:**
```
Error: Cannot write to tmpdir /scr/...
```
- The scratch directory is full or not mounted
- Contact your admin about the scratch filesystem

**Out of memory (OOM):**
```bash
# Check sacct for OOM indication
sacct -j <JOBID> --format=JobID,MaxRSS,ReqMem,State
# State = OUT_OF_MEMORY
```
- Your job needed more memory than Slurm allocated
- Use a host entry with more memory (`cpu_highmem`) or ask your admin to adjust limits

**Wall time exceeded:**
```bash
sacct -j <JOBID> --format=JobID,Elapsed,Timelimit,State
# State = TIMEOUT
```
- The job ran longer than the Slurm partition allows
- Ask your admin about partition time limits, or break the job into smaller steps

**PATH or environment issues:**
```
Error: $SCHRODINGER/mmshare-v... not found
```
- The Schrodinger installation is not accessible on the compute node
- Contact your admin -- likely a shared filesystem issue

### When to Check Schrodinger Logs vs. Slurm Logs

- **Job never started** → check `squeue` (is it pending?) and Slurm reason codes
- **Job started but failed immediately** → check Schrodinger `.log` file
- **Job ran for a while then failed** → check both Schrodinger `.log` and `sacct` for OOM/TIMEOUT
- **Job completed but results are wrong** → this is a Schrodinger application issue, not Slurm

---

## Best Practices

### Choose the Right Host Entry

| Workload | Host Entry | Typical -NPROC |
|----------|-----------|----------------|
| Glide docking (HTVS/SP) | `cpu` | 4-8 |
| Glide docking (XP) | `cpu` | 8-16 |
| LigPrep | `cpu` | 4 |
| QikProp | `cpu` | 1-4 |
| Prime (small) | `cpu` | 4-8 |
| Prime (large, loop modeling) | `cpu_highmem` | 16-32 |
| Desmond MD | `gpu` | 1-4 (GPUs) |
| FEP+ | `gpu` | 1-8 (GPUs) |
| Workflow drivers | `driver` | 1-4 |

### Resource Estimation Tips

- **Glide:** Scales well to 8-16 cores per job. Beyond that, diminishing returns.
- **Desmond:** GPU-bound. 1 GPU for small systems, 4 GPUs for large membrane systems.
- **FEP+:** Each lambda window uses 1 GPU. Total GPUs = number of concurrent lambda windows.
- **Prime:** Memory-intensive for loop modeling. Use `cpu_highmem` for large proteins.
- **LigPrep:** Lightweight. 4 cores is usually sufficient.

### Scratch Space

- Jobs write temporary files to the scratch directory (`tmpdir` in the host entry)
- Large Desmond trajectories can generate tens of GB of scratch data
- If scratch is on a shared filesystem, be mindful of quota limits
- Clean up completed job directories if scratch is not auto-purged

### Don't Run on Login Nodes

Never run compute-intensive Schrodinger jobs directly on the login node. Always use `-HOST` to submit through Slurm. The login node is for job submission, file management, and light interactive work only.

Schrodinger Admin Setup -- how your admin configures Schrodinger for Slurm
Submitting Jobs -- general Slurm job submission
Monitoring Jobs -- squeue, sacct, and job monitoring
GPU Jobs -- Slurm GPU job concepts

Schrodinger User

Related Modules¶

References¶