# Run from a PDB (OpenMM XML and OpenFF)

**You will learn:** how to set up an ACEMD simulation directly from a prepared PDB file using OpenMM XML force fields for the canonical residues and OpenFF / GAFF / Espaloma for non-canonical small molecules — no HTMD build step in front.

**Prerequisites:**

- ACEMD installed.
- A PDB file containing the simulation system (protein, solvent, ions, ligands — whatever you want to simulate).
- One or more OpenMM XML force-field files covering the canonical residues and water model.
- For non-canonical residues: the SMILES of each species and (optionally) ACEMD's OpenFF extras installed: `pip install "acemd[openff]"`.

```{note}
For systems that need solvation, ionisation, multi-chain assembly, pKa-based protonation, mutations, gap modelling, or covalent modifications, use the [HTMD builder](https://software.acellera.com/htmd/) upstream and feed ACEMD the build output. See the [Caveats](#caveats) below for the full list.
```

## Just an OpenMM XML force field

If every residue in the PDB is covered by the chosen XML force field, the input file is minimal:

```{code-block} yaml
:caption: input.yaml

structure: structure.pdb
coordinates: structure.pdb
parameters: ["amber14-all.xml", "amber14/tip3pfb.xml"]
boxsize: [60.0, 60.0, 60.0]
thermostat: true
run: 100ns
```

`structure` and `coordinates` point at the same PDB. The XML files supply the canonical-residue parameters and the water model. The PDB carries atom positions and the topology read by OpenMM.

```{note}
ACEMD also accepts an **mmCIF** (`.cif`/`.bcif`) structure in place of the PDB, and prefers it. mmCIF preserves the full bond graph — including inter-residue bonds such as disulfides — and has no 99,999-atom serial limit, whereas PDB `CONECT` records overflow on large systems. Just point `structure`/`coordinates` at the `.cif`; everything else is identical. This is the format the HTMD `openmm.build` handoff emits, alongside a `system.yaml` that {py:func}`~acemd.protocols.setup_equilibration` consumes to assemble the input file automatically.
```

## Add a small-molecule ligand via SMILES

For non-canonical residues (drug-like ligands, cofactors, modified residues), ACEMD can build parameters on the fly using the [openff-toolkit](https://docs.openforcefield.org) + [openmmforcefields](https://github.com/openmm/openmmforcefields) stack. Declare each species under the top-level `molecules` block:

```{code-block} yaml
:caption: input.yaml

structure: structure.pdb
coordinates: structure.pdb
parameters: ["amber14-all.xml", "amber14/tip3pfb.xml"]
boxsize: [60.0, 60.0, 60.0]
thermostat: true
run: 100ns
molecules:
  smiles:
    BEN: "[NH2+]=C(N)c1ccccc1"
  forcefield: gaff-2.2.20
```

`molecules.smiles` maps **residue name** (as it appears in the PDB) to SMILES. ACEMD assigns the small-molecule force field on the fly and combines it with the XML force field for the rest of the system.

## Pick the small-molecule force field

The `forcefield` key under `molecules` selects which generator runs. The type is auto-detected from the prefix; override with `forcefield_type` if your name doesn't follow the convention.

| Force field family | Example `forcefield` value | `forcefield_type` |
|--------------------|----------------------------|-------------------|
| GAFF               | `gaff-2.2.20`              | `gaff`            |
| OpenFF (Sage etc.) | `openff-2.0.0`               | `openff`        |
| Espaloma           | `espaloma-0.3.2`           | `espaloma`        |

For OpenFF specifically:

```{code-block} yaml
:caption: input.yaml

molecules:
  smiles:
    BEN: "[NH2+]=C(N)c1ccccc1"
  forcefield: openff-2.0.0
```

`charge_model` defaults to `am1bcc`. Set `gasteiger` for a faster (less accurate) alternative if your SMILES is large.

## Add hydrogens to a heavy-atom PDB

If the PDB is missing hydrogens, set `protonate`. The flag accepts either `true` (default pH 7.4) or an explicit pH value:

```{code-block} yaml
:caption: input.yaml

structure: structure.pdb
coordinates: structure.pdb
parameters: ["amber14-all.xml", "amber14/tip3pfb.xml"]
boxsize: [60.0, 60.0, 60.0]
thermostat: true
run: 100ns
protonate: true     # add hydrogens at pH 7.4
```

To use a different pH:

```yaml
protonate: 6.5      # add hydrogens at pH 6.5
```

ACEMD invokes OpenMM's `Modeller.addHydrogens(forcefield, pH=...)`, which selects template variants for each titratable residue based on the chosen pH and the XML force field's residue definitions.

## Caveats

The PDB-direct path is convenient for small systems and well-prepared inputs but cannot do many things the HTMD builder does. If any of these apply, build upstream and pass ACEMD the build output instead:

- **Solvation and ionisation** — the PDB must already include the water box and counter-ions. ACEMD does not add solvent.
- **Multi-chain assembly** — chains in the PDB are taken as-is; no merging, splitting, or gap modelling.
- **pKa prediction** — `protonate` selects template variants at a fixed user-given pH (default 7.4) but does not predict per-residue pKa values from structure.
- **Mutations** — apply upstream.
- **Custom RTF/PRM residue templates** — small-molecule parameterisation goes through SMILES only. For fully custom templates, use a CHARMM PSF/PRM build.
- **Covalent ligands or modified residues** — `molecules.smiles` covers only non-bonded small molecules.
- **Membrane assembly, lipid packing, layered systems** — build in HTMD; ACEMD reads only the final coordinates.

## See also

- [Input options reference](../reference/input-options.md) — `molecules` and `protonate` reference.
- {py:func}`~acemd.protocols.setup_equilibration`, {py:func}`~acemd.protocols.setup_production` — Python helpers that prepare ACEMD input directories from an HTMD builder output.
- [HTMD documentation](https://software.acellera.com/htmd/) — the upstream system-preparation pipeline.