System building#

System building is the work of turning a structure - a PDB, a homology model, a docked complex - into a simulation-ready system: protonated, segmented, solvated, ionised, parameterized against a force field, and written to a topology + coordinates pair that an MD engine accepts. HTMD treats this as a first-class workflow, not a script you string together by hand, and supports the full range of biomolecular systems through a single API.

The canonical pipeline#

Every HTMD build goes through the same five stages:

        flowchart LR
    A[Structure input<br/>PDB / CIF / model] --> B[Preparation<br/>protonation, gaps, mutations]
    B --> C[Segmentation<br/>protein / lipid / ligand /<br/>water / ion segments]
    C --> D[Solvation &<br/>ionisation]
    D --> E[Force-field build<br/>AMBER / CHARMM / OpenFF]
    E --> F[Topology + coords<br/>ready for MD]
    
  • Segmentation (autoSegment()) labels each contiguous protein / nucleic chain in its own segment (decided by backbone-atom geometry, not by resid numbering), bundles all waters into one segment and all ions into one, and puts every other connected non-polymer component (a ligand, a cofactor, a lipid) in its own segment. The builder then treats the components independently and doesn’t extend protein chains through HETATM ligands.

  • Non-standard residue detection (detectNonStandardResidues()) inspects the molecule and returns one spec per residue the force field doesn’t natively know (ligands, NCAAs, modified residues, crosslinkers).

  • SMILES templating (templateResidueFromSmiles()) sets correct bond orders, formal charges, and hydrogen counts on each non-standard residue from its reference SMILES - a prerequisite for clean parameterization.

  • Preparation (systemPrepare()) does pKa-aware protonation, missing-sidechain repair and partial backbone repair, and mutation; passed the detect_specs= list it preserves the templated non-standard residues across the PDB2PQR pass.

  • Parameterization (parameterizeFromSpecs()) runs antechamber on each non-standard residue, emits per-residue topology files (topo_paths), parameter files (frcmod_paths), and a custombonds list the builder feeds straight to tLeap.

  • Force-field build (htmd.builder.amber.build() - the most fully featured backend - or htmd.builder.charmm.build() / htmd.builder.openmm.build()) consumes the prepared molecule plus the parameterization outputs and writes a simulation-ready topology + coordinates pair.

Ionisation runs as the builder’s ionize=True default in all three backends. Solvation is only built-in for htmd.builder.openmm.build() (default solvate=True); for htmd.builder.amber.build() and htmd.builder.charmm.build() you call solvate() explicitly before the build.

The eight system-building tutorials walk this pipeline for protein-only, protein-ligand (AMBER and OpenFF), stapled, cyclic, and bicyclic peptides, protein-in-membrane, and protein-RNA systems. The same pipeline handles every case below.

Beyond canonical proteins#

HTMD’s building stack handles cases that “build a protein in CHARMM” tools usually don’t:

Non-canonical amino acids (NCAAs)#

NCAAs - phosphorylated residues, fluorinated analogues, modified backbones, peptide-drug warheads - flow through the standard three-step pipeline:

  1. detectNonStandardResidues() inspects mol and returns one spec per non-standard residue, grouping connected non-canonicals (and their canonical anchors) into clusters automatically.

  2. templateResidueFromSmiles() is called once per unique non-standard residue with its reference SMILES, fixing bond orders and adding hydrogens.

  3. systemPrepare() is called with detect_specs=specs so PDB2PQR protonates the canonical residues while preserving the templated non-standard ones, then parameterizeFromSpecs() produces the GAFF2 topologies and AMBER ff14SB-anchored parameters that amber.build consumes.

parameterizeFromSpecs() defaults to AM1-BCC as its function-level charge model (GAFF-quality electrostatics, slow). For tutorials and rapid iteration we recommend charge_method="gasteiger" — much faster and good enough when the priority is the topology / connectivity rather than millihartree-accurate charges.

Stapled peptides, isopeptides, and cyclic peptides#

Stapled peptides (hydrocarbon, lactam, thioether staples), isopeptide bonds (N-to-side-chain crosslinks, e.g. ubiquitin-conjugation chemistry), and head-to-tail or side-chain cyclic peptides are all crosslinked NCAA constructs. They use the same three-step flow as standalone NCAAs - detectNonStandardResidues groups the bridging residue together with its anchor residues (or, for cyclic peptides, the N- and C-terminal anchors) into a single cluster, and the resulting custombonds entries carry the crosslink across the chain.

The same parameterizeFromSpecs() call handles staples, isopeptides, and cycles; you don’t have to hand-edit a tLeap script or a CHARMM PSF.

Disulfide bonds#

Disulfide handling is automatic by default: detectDisulfideBonds() scans the prepared structure for CY*-resname residues’ SG atoms within thresh Å (default 3 Å) — segids must be set first — and returns the inferred bridges. Both htmd.builder.amber.build() and htmd.builder.charmm.build() accept a disulfide= argument; pass None to auto-detect, or a list of (sel1, sel2) atom-selection pairs to override.

For pathological cases - ambiguous pairings, designed mutants where the disulfide must be at a specific Cys pair - call detectDisulfideBonds(mol)() directly on the prepared molecule to inspect what auto-detection would produce, then pass the curated list back as disulfide=.

Membrane systems#

The membrane stack lives in htmd.membranebuilder:

  • buildMembrane() assembles a bilayer at a given XY size from per-leaflet ratioupper and ratiolower dicts ({lipid_name: ratio} — see listLipids() for the supported set); asymmetric leaflet compositions are a first-class feature.

  • A short LJ-fluid relaxation step (ljfluid) packs the lipids without ring penetrations (ringpenetration).

  • For protein-in-membrane systems, the membrane is generated separately, then the protein is embedded via embed() after stripping clashing lipids (removeLipidsInProtein()).

See the asymmetric bilayer and pre-equilibrated-membrane embedding how-tos for the per-recipe shape.

Force-field choice#

The same prepared molecule can be built under multiple force fields without redoing preparation:

  • AMBER (htmd.builder.amber.build()) via tLeap - the most fully featured backend, the one the NCAA flow is parameterized against (ff14SB anchors, GAFF2 atom types), and the recommended default for protein, protein-ligand, and crosslinked-NCAA systems.

  • CHARMM (htmd.builder.charmm.build()) via VMD/psfgen - for projects standardised on CHARMM force fields and CHARMM-GUI-style membrane setups.

  • OpenMM (htmd.builder.openmm.build()) by default loads OpenMM’s bundled amber14 XMLs (protein.ff14SB + lipid17 + nucleic + tip3p) — i.e. an AMBER-equivalent protein stack assembled through OpenMM’s ForceField machinery. The differentiator versus the AMBER tLeap backend is opt-in OpenFF / SMIRNOFF parameters for small molecules: pass small_molecule_ff="openff-2.3.0" (or "gaff-2.2.20") plus an OFFMolecule list via molecules= to bring in SMIRNOFF/GAFF templating for ligands. All three builders produce systems any common engine (ACEMD, OpenMM, GROMACS) can run.

Force-field-specific tooling lives in the same module as the build entry point; common topology surgery (insertions, mutations, cis-peptide detection, disulfide detection) is in htmd.builder.builder.