# Running on SLURM

**You will learn:** how to submit a PlayMolecule job to a SLURM cluster, poll its status, and use the resources requested by the app's manifest.

**Prerequisites:**
- [First app run](01-first-app-run.md) completed.
- SSH access to a node that can `sbatch`.
- `outdir` on a path visible to **all** SLURM nodes (a shared filesystem) — local `/tmp/` will not work for a remote worker.

## Setup

```python
from playmolecule import JobStatus
from playmolecule.apps import proteinprepare
```

## Step 1 — Set up the job in a shared directory

```python
ed = proteinprepare(outdir="/shared/scratch/me/proteinprepare-3ptb", pdbid="3ptb")
```

Where you point `outdir` matters: SLURM workers will read the run script and write outputs through that exact path. Anything under `/tmp/`, `~/`, or any node-local path won't be visible to the worker.

## Step 2 — Submit

```python
ed.run(queue="slurm", partition="normalCPU", ncpu=1, ngpu=0)
```

Passing `queue="slurm"` to {py:meth}`~playmolecule.ExecutableDirectory.run` submits through SLURM instead of running the container locally. Every other keyword (`partition`, `ncpu`, `ngpu`, `memory`, `walltime`, `nodelist`, `exclude`, `envvars`, `prerun`, …) is forwarded to the SLURM queue.

The call returns immediately. The SLURM queue object is stored on `ed` and the job ID is in SLURM's normal accounting.

## Step 3 — Poll status

```python
print(ed.status)
```

You'll see one of the four states from {py:class}`~playmolecule.JobStatus`:

- `JobStatus.WAITING_INFO` — submitted but not yet running.
- `JobStatus.RUNNING`
- `JobStatus.COMPLETED`
- `JobStatus.ERROR`

A simple polling loop:

```python
import time

while ed.status not in (JobStatus.COMPLETED, JobStatus.ERROR):
    time.sleep(30)

print("Done:", ed.status)
```

For background — what each state means and how it's detected — see [Job lifecycle](../explanation/job-lifecycle.md).

## Step 4 — Use app-default resources

The manifest declares per-app resources (CPUs, GPUs) that you don't have to repeat. If you don't pass `ncpu` / `ngpu`, the app's defaults are used. So for an app whose manifest sets `ncpu=4, ngpu=1`, this is enough:

```python
proteinprepare(outdir="/shared/scratch/me/run").run(queue="slurm", partition="normalCPU")
```

Override only when you need to deviate from the manifest defaults.

## Step 5 — Preset the queue once

When `PM_QUEUE_CONFIG` is set, `ed.run()` with no arguments picks up the queue, partition, and resources from the environment and submits to SLURM automatically:

```bash
export PM_QUEUE_CONFIG='{"queue": "slurm", "cpu_partition": "normalCPU", "gpu_partition": "normalGPU"}'
```

```python
ed.run()    # picks gpu_partition if the manifest requests GPUs, cpu_partition otherwise
```

Useful in shared CI scripts and admin-managed environments where users shouldn't have to know which partition to use.

## Recap

- Always set `outdir` to a shared-filesystem path before submitting to SLURM.
- `ed.run(queue="slurm", ...)` returns immediately; query `ed.status` to see progress.
- The app manifest provides default `ncpu` / `ngpu`; override only when needed.
- `PM_QUEUE_CONFIG` lets `ed.run()` pick the partition automatically.

## Next

- [Run many jobs on one GPU](../howto/run-many-jobs-on-one-gpu.md)
- [Check job status](../howto/check-job-status.md)
- [Job lifecycle](../explanation/job-lifecycle.md)

## Side note: `ed.slurm(...)`

`ed.slurm(partition=..., ncpu=..., ...)` is a thin alias for `ed.run(queue="slurm", ...)` that pre-dates the unified `run(queue=...)` interface. New code should use `run(queue="slurm")` — it composes with `PM_QUEUE_CONFIG`, parallels the local-run call, and avoids a second method to remember. {py:meth}`~playmolecule.ExecutableDirectory.slurm` will continue to work.