# Running on SLURM **You will learn:** how to submit a PlayMolecule job to a SLURM cluster, poll its status, and use the resources requested by the app's manifest. **Prerequisites:** - [First app run](01-first-app-run.md) completed. - SSH access to a node that can `sbatch`. - `outdir` on a path visible to **all** SLURM nodes (a shared filesystem) — local `/tmp/` will not work for a remote worker. ## Setup ```python from playmolecule import JobStatus from playmolecule.apps import proteinprepare ``` ## Step 1 — Set up the job in a shared directory ```python ed = proteinprepare(outdir="/shared/scratch/me/proteinprepare-3ptb", pdbid="3ptb") ``` Where you point `outdir` matters: SLURM workers will read the run script and write outputs through that exact path. Anything under `/tmp/`, `~/`, or any node-local path won't be visible to the worker. ## Step 2 — Submit ```python ed.run(queue="slurm", partition="normalCPU", ncpu=1, ngpu=0) ``` Passing `queue="slurm"` to {py:meth}`~playmolecule.ExecutableDirectory.run` submits through SLURM instead of running the container locally. Every other keyword (`partition`, `ncpu`, `ngpu`, `memory`, `walltime`, `nodelist`, `exclude`, `envvars`, `prerun`, …) is forwarded to the SLURM queue. The call returns immediately. The SLURM queue object is stored on `ed` and the job ID is in SLURM's normal accounting. ## Step 3 — Poll status ```python print(ed.status) ``` You'll see one of the four states from {py:class}`~playmolecule.JobStatus`: - `JobStatus.WAITING_INFO` — submitted but not yet running. - `JobStatus.RUNNING` - `JobStatus.COMPLETED` - `JobStatus.ERROR` A simple polling loop: ```python import time while ed.status not in (JobStatus.COMPLETED, JobStatus.ERROR): time.sleep(30) print("Done:", ed.status) ``` For background — what each state means and how it's detected — see [Job lifecycle](../explanation/job-lifecycle.md). ## Step 4 — Use app-default resources The manifest declares per-app resources (CPUs, GPUs) that you don't have to repeat. If you don't pass `ncpu` / `ngpu`, the app's defaults are used. So for an app whose manifest sets `ncpu=4, ngpu=1`, this is enough: ```python proteinprepare(outdir="/shared/scratch/me/run").run(queue="slurm", partition="normalCPU") ``` Override only when you need to deviate from the manifest defaults. ## Step 5 — Preset the queue once When `PM_QUEUE_CONFIG` is set, `ed.run()` with no arguments picks up the queue, partition, and resources from the environment and submits to SLURM automatically: ```bash export PM_QUEUE_CONFIG='{"queue": "slurm", "cpu_partition": "normalCPU", "gpu_partition": "normalGPU"}' ``` ```python ed.run() # picks gpu_partition if the manifest requests GPUs, cpu_partition otherwise ``` Useful in shared CI scripts and admin-managed environments where users shouldn't have to know which partition to use. ## Recap - Always set `outdir` to a shared-filesystem path before submitting to SLURM. - `ed.run(queue="slurm", ...)` returns immediately; query `ed.status` to see progress. - The app manifest provides default `ncpu` / `ngpu`; override only when needed. - `PM_QUEUE_CONFIG` lets `ed.run()` pick the partition automatically. ## Next - [Run many jobs on one GPU](../howto/run-many-jobs-on-one-gpu.md) - [Check job status](../howto/check-job-status.md) - [Job lifecycle](../explanation/job-lifecycle.md) ## Side note: `ed.slurm(...)` `ed.slurm(partition=..., ncpu=..., ...)` is a thin alias for `ed.run(queue="slurm", ...)` that pre-dates the unified `run(queue=...)` interface. New code should use `run(queue="slurm")` — it composes with `PM_QUEUE_CONFIG`, parallels the local-run call, and avoids a second method to remember. {py:meth}`~playmolecule.ExecutableDirectory.slurm` will continue to work.