System building#

System building is the work of turning a structure - a PDB, a homology model, a docked complex - into a simulation-ready system: protonated, segmented, solvated, ionised, parameterized against a force field, and written to a topology + coordinates pair that an MD engine accepts. HTMD treats this as a first-class workflow, not a script you string together by hand, and supports the full range of biomolecular systems through a single API.

The canonical pipeline#

Every HTMD build goes through the same five stages:

        flowchart LR
    A[Structure input<br/>PDB / CIF / model] --> B[Preparation<br/>protonation, gaps, mutations]
    B --> C[Segmentation<br/>protein / lipid / ligand /<br/>water / ion segments]
    C --> D[Solvation &<br/>ionisation]
    D --> E[Force-field build<br/>AMBER / CHARMM / OpenFF]
    E --> F[Topology + coords<br/>ready for MD]

Segmentation (autoSegment()) labels each contiguous protein / nucleic chain in its own segment (decided by backbone-atom geometry, not by resid numbering), bundles all waters into one segment and all ions into one, and puts every other connected non-polymer component (a ligand, a cofactor, a lipid) in its own segment. The builder then treats the components independently and doesn’t extend protein chains through HETATM ligands.
Non-standard residue detection (detectNonStandardResidues()) inspects the molecule and returns one spec per residue the force field doesn’t natively know (ligands, NCAAs, modified residues, crosslinkers).
SMILES templating (templateResidueFromSmiles()) sets correct bond orders, formal charges, and hydrogen counts on each non-standard residue from its reference SMILES - a prerequisite for clean parameterization.
Preparation (systemPrepare()) does pKa-aware protonation, missing-sidechain repair and partial backbone repair, and mutation; passed the detect_specs= list it preserves the templated non-standard residues across the PDB2PQR pass.
Parameterization (parameterizeFromSpecs()) runs antechamber on each non-standard residue, emits per-residue topology files (topo_paths), parameter files (frcmod_paths), and a custombonds list the builder feeds straight to tLeap.
Force-field build (htmd.builder.amber.build() - the most fully featured backend - or htmd.builder.charmm.build() / htmd.builder.openmm.build()) consumes the prepared molecule plus the parameterization outputs and writes a simulation-ready topology + coordinates pair.

Ionisation runs as the builder’s ionize=True default in all three backends. Solvation is only built-in for htmd.builder.openmm.build() (default solvate=True); for htmd.builder.amber.build() and htmd.builder.charmm.build() you call solvate() explicitly before the build.

The eight system-building tutorials walk this pipeline for protein-only, protein-ligand (AMBER and OpenFF), stapled, cyclic, and bicyclic peptides, protein-in-membrane, and protein-RNA systems. The same pipeline handles every case below.

Beyond canonical proteins#

HTMD’s building stack handles cases that “build a protein in CHARMM” tools usually don’t:

Non-canonical amino acids (NCAAs)#

NCAAs - phosphorylated residues, fluorinated analogues, modified backbones, peptide-drug warheads - flow through the standard three-step pipeline:

detectNonStandardResidues() inspects mol and returns one spec per non-standard residue, grouping connected non-canonicals (and their canonical anchors) into clusters automatically.
templateResidueFromSmiles() is called once per unique non-standard residue with its reference SMILES, fixing bond orders and adding hydrogens.
systemPrepare() is called with detect_specs=specs so PDB2PQR protonates the canonical residues while preserving the templated non-standard ones, then parameterizeFromSpecs() produces the GAFF2 topologies and AMBER ff14SB-anchored parameters that amber.build consumes.

parameterizeFromSpecs() defaults to AM1-BCC as its function-level charge model (GAFF-quality electrostatics, slow). For tutorials and rapid iteration we recommend charge_method="gasteiger" — much faster and good enough when the priority is the topology / connectivity rather than millihartree-accurate charges.

Stapled peptides, isopeptides, and cyclic peptides#

Stapled peptides (hydrocarbon, lactam, thioether staples), isopeptide bonds (N-to-side-chain crosslinks, e.g. ubiquitin-conjugation chemistry), and head-to-tail or side-chain cyclic peptides are all crosslinked NCAA constructs. They use the same three-step flow as standalone NCAAs - detectNonStandardResidues groups the bridging residue together with its anchor residues (or, for cyclic peptides, the N- and C-terminal anchors) into a single cluster, and the resulting custombonds entries carry the crosslink across the chain.

The same parameterizeFromSpecs() call handles staples, isopeptides, and cycles; you don’t have to hand-edit a tLeap script or a CHARMM PSF.

Disulfide bonds#

Disulfide handling is automatic by default: detectDisulfideBonds() scans the prepared structure for CY*-resname residues’ SG atoms within thresh Å (default 3 Å) — segids must be set first — and returns the inferred bridges. Both htmd.builder.amber.build() and htmd.builder.charmm.build() accept a disulfide= argument; pass None to auto-detect, or a list of (sel1, sel2) atom-selection pairs to override.

For pathological cases - ambiguous pairings, designed mutants where the disulfide must be at a specific Cys pair - call detectDisulfideBonds(mol)() directly on the prepared molecule to inspect what auto-detection would produce, then pass the curated list back as disulfide=.

Membrane systems#

The membrane stack lives in htmd.membranebuilder:

buildMembrane() assembles a bilayer at a given XY size from per-leaflet ratioupper and ratiolower dicts ({lipid_name: ratio} — see listLipids() for the supported set); asymmetric leaflet compositions are a first-class feature.
A short LJ-fluid relaxation step (ljfluid) packs the lipids without ring penetrations (ringpenetration).
For protein-in-membrane systems, the membrane is generated separately, then the protein is embedded via embed() after stripping clashing lipids (removeLipidsInProtein()).

See the asymmetric bilayer and pre-equilibrated-membrane embedding how-tos for the per-recipe shape.

Force-field choice#

The same prepared molecule can be built under multiple force fields without redoing preparation:

AMBER (htmd.builder.amber.build()) via tLeap - the most fully featured backend, the one the NCAA flow is parameterized against (ff14SB anchors, GAFF2 atom types), and the recommended default for protein, protein-ligand, and crosslinked-NCAA systems.
CHARMM (htmd.builder.charmm.build()) via VMD/psfgen - for projects standardised on CHARMM force fields and CHARMM-GUI-style membrane setups.
OpenMM (htmd.builder.openmm.build()) by default loads OpenMM’s bundled amber14 XMLs (protein.ff14SB + lipid17 + nucleic + tip3p) — i.e. an AMBER-equivalent protein stack assembled through OpenMM’s ForceField machinery. The differentiator versus the AMBER tLeap backend is opt-in OpenFF / SMIRNOFF parameters for small molecules: pass small_molecule_ff="openff-2.3.0" (or "gaff-2.2.20") plus an OFFMolecule list via molecules= to bring in SMIRNOFF/GAFF templating for ligands. All three builders produce systems any common engine (ACEMD, OpenMM, GROMACS) can run.

Force-field-specific tooling lives in the same module as the build entry point; common topology surgery (insertions, mutations, cis-peptide detection, disulfide detection) is in htmd.builder.builder.

What to read next#

The eight system-building tutorials for guided walk-throughs.
Asymmetric bilayer how-to for per-leaflet membrane composition.
htmd.builder and htmd.membranebuilder in the API reference for full signatures.