# Executable directory

An {py:class}`~playmolecule.ExecutableDirectory` (ED) is the on-disk artefact you get back from every app call. It is the unit PlayMolecule moves around, runs, polls, and re-uses. This page explains what's inside one, why the abstraction exists, and how it composes with SLURM and HTTP backends.

## The two-phase model

A PlayMolecule app call has two distinct phases:

1. **Setup** — `proteinprepare(outdir="out", pdbid="3ptb")` validates arguments against the manifest signature, stages input files into `out/run_<timestamp>_<uuid>/`, writes the input JSON, generates a run script, and returns an {py:class}`~playmolecule.ExecutableDirectory`. **No container has started yet.**
2. **Run** — `ed.run()` (optionally `ed.run(queue="slurm", ...)`) hands the prepared directory to an execution backend. Outputs land back in `outdir`.

The split exists because the two phases benefit from different environments:

- Setup wants to be **cheap** and **local** — you might do it in a notebook on your laptop.
- Run wants to be **wherever the resources are** — your laptop, a SLURM worker, a GPU node, the HTTP backend.

Decoupling them means you can set up hundreds of EDs in a script and then submit them in a batch, replay a single ED on a different cluster, or inspect prepared inputs before paying for compute.

## Layout on disk

```text
outdir/
├── output.pdb                    # produced by the run (later)
├── details.csv                   # produced by the run (later)
├── run_03_07_2026_14_22_a1b2c3d4.sh   # the rendered run script
└── run_03_07_2026_14_22_a1b2c3d4/    # the inputs dir for this run
    ├── inputs.json                  # input JSON consumed by the container
    ├── input-files-staged-here/     # copies/symlinks of file params
    ├── .pm.alive                    # heartbeat — see Job lifecycle
    └── .pm.err                      # error sentinel (only if it failed)
```

Key properties:

- The **outdir** is the user-chosen location.
- The **run directory** has a fresh timestamp + UUID per call, so you can re-run the same ED and get parallel `run_*/` siblings.
- The **run script** lives next to the run directory; `runsh = inputs_dir.basename + ".sh"`.
- The directory is **self-contained**. If you `tar` it up, copy it to another machine, and reconstruct the ED there, `ed.run()` will work as long as the same registry/images are available.

## Reconstructing an ED from disk

```python
from playmolecule import ExecutableDirectory

ed = ExecutableDirectory(dirname="/shared/scratch/me/run")
print(ed.status)
ed.run()                # resume / re-run
```

The constructor finds the most recent `run_<id>/` inside `dirname` and uses it as the inputs directory. This is what makes "submit on Monday, check status on Tuesday" work — there's no in-memory state required.

## Execution backend dispatch

`ed.run()` dispatches to whichever execution backend was active **when the ED was built**. That means:

- Setting up under `PM_EXECUTOR=local` and later changing `PM_EXECUTOR=http://...` does not move the job. The backend was captured at setup time.
- To switch, set up a new ED in a new process.

`ed.status` follows the same dispatch — local EDs are queried by reading the heartbeat file and the SLURM queue; HTTP EDs are queried by HTTP.

## The `slurm` shortcut

`ed.run(queue="slurm", ...)` wraps the prepared run directory in a `jobqueues` SLURM submission. Resources default to the values captured from the app manifest at setup time. (`ed.slurm(...)` is a thin alias retained for backwards compatibility.)

The execution backend isn't switched to "SLURM" — SLURM is a mode of the local execution backend (it ultimately invokes the same `docker run` or `apptainer run`, just on a worker node).

## Batched MPS submission

{py:func}`~playmolecule.slurm_mps` takes a list of EDs and submits them as a single SLURM job that holds one GPU under NVIDIA MPS. The EDs are still independent on disk — each one writes to its own `outdir` — but the SLURM accounting collapses them. Resource defaults are taken from the **first** ED's `execution_resources`, not the union.

## Why not just return a dict?

You could imagine PlayMolecule returning `{"runsh": "...", "inputs_dir": "...", ...}` and dropping the class. The reasons it doesn't:

- `.status` needs to dispatch by execution backend. A dict can't do that without a wrapper.
- HTTP-backend jobs need to track their server-side job id between calls. A dict can't carry that.
- Polling code reads more naturally as `ed.status` than `ed["status"]`.

The ED is intentionally thin — almost everything it knows is in fields, and its methods (`run`, `slurm`, `status`) are dispatch shims to the active backend.

## See also

- [Architecture](architecture.md)
- [Job lifecycle](job-lifecycle.md)
- [Check job status](../howto/check-job-status.md)
- {py:class}`~playmolecule.ExecutableDirectory`