Build a protein with a ligand using OpenFF#

You will learn: how to parameterize the benzamidine ligand with the OpenFF Sage small-molecule force field (instead of the default GAFF2) and build the resulting trypsin + benzamidine system through htmd.builder.openmm.build().

Prerequisites:

  • HTMD installed with the OpenFF extra: pip install acellera-htmd[openff,nagl].

  • You’ve worked through Build a protein with a ligand - this tutorial reuses the same five prep steps and only changes the small-molecule force field.

Why a different small-molecule force field#

Tutorial 02 parameterizes BEN with GAFF2 through antechamber. GAFF2 is a solid generic default and is what the AMBER ecosystem has long used for small molecules. The OpenFF / SMIRNOFF family (“Sage”) is a more recent alternative:

  • Direct SMARTS-based parameter assignment via the SMIRNOFF format: a single ~300-line force-field XML replaces GAFF2’s many-line atom-typing rules, with chemistry-aware specificity (sulfonamides, phosphates, and bridgehead nitrogens are typed correctly from Sage 2.1 onwards).

  • vdW parameters retrained against experimental mass density and enthalpy-of-mixing data from NIST.

  • Valence parameters (bonds, angles, torsions) fit against quantum-chemical optimised geometries and torsion profiles, with torsions split per central-bond multiplicity from Sage 2.3.

  • Charges via NAGL, an AM1-BCC graph-neural-network surrogate introduced with Sage 2.3 - much faster than running AM1-BCC per-residue, and what the Sage release was validated against.

  • Native OpenMM XML output, no tLeap round-trip. Training inputs, scripts, and results are public on the OpenFF GitHub org.

For any ligand where you want the latest, externally-validated force-field assignment, swap GAFF2 for an OpenFF release. The parameterizeFromSpecs() interface is unchanged - you flip two arguments:

  • forcefield="openff_unconstrained-2.3.0.offxml" (Sage 2.3, the current production line).

  • charge_method="nagl" (the GNN surrogate for AM1-BCC that Sage was fit against).

The OpenMM builder (htmd.builder.openmm.build()) consumes the emitted force-field XML directly - no separate prepi/frcmod step.

Warning

Parameterizing any residue under OpenFF binds the whole system to the OpenMM builder. The .frcmod format that tLeap consumes can’t represent the full SMIRNOFF feature set, so parameterizeFromSpecs() emits only the OpenMM force-field XML on the SMIRNOFF branch and htmd.builder.amber.build() has nothing to consume. If you need an AMBER prmtop produced by tLeap, stay on the GAFF2 path in tutorial 02. The OpenMM builder still writes a prmtop + pdb pair, so the downstream MD engine (ACEMD, OpenMM, GROMACS, …) doesn’t care.

Setup#

from moleculekit.molecule import Molecule
from moleculekit.tools.autosegment import autoSegment
from moleculekit.tools.nonstandard_residues import detectNonStandardResidues
from moleculekit.tools.preparation import systemPrepare
from htmd.builder import openmm
from htmd.builder.nonstandard import parameterizeFromSpecs
Copyright by Acellera Ltd. By executing you are accepting the License. In order to register, run htmd_register on your terminal.
The registration information must be valid so that it might be verified.
rdkit - INFO - Enabling RDKit 2026.03.3 jupyter extensions

Steps 1-4 - Same as the GAFF2 tutorial#

The prep pipeline is unchanged from tutorial 02 - see that tutorial for the per-step prose. Code-only recap:

mol = Molecule("3PTB")
mol = autoSegment(mol, fields=("segid", "chain"))
specs = detectNonStandardResidues(mol)

BEN_SMILES = "[NH2+]=C(N)c1ccccc1"
mol.templateResidueFromSmiles('resname "BEN"', BEN_SMILES, addHs=True)

prepared, specs = systemPrepare(mol, pH=7.4, detect_specs=specs)
---- Molecule chain report ----
Chain A:
    First residue:  ILE    16  
    Final residue:  ASN   245  
Chain B:
    First residue:  HOH   401  
    Final residue:  HOH   809  
Chain C:
          residue:  CA    480  
Chain D:
          residue:  BEN     1  
---- End of chain report ----
moleculekit.tools.preparation - WARNING - The following residues have not been optimized: CA, BEN
moleculekit.tools.preparation - INFO - Modified residue CYS    22 A to CYX
moleculekit.tools.preparation - INFO - Modified residue HIS    40 A to HIE
moleculekit.tools.preparation - INFO - Modified residue CYS    42 A to CYX
moleculekit.tools.preparation - INFO - Modified residue HIS    57 A to HIP
moleculekit.tools.preparation - INFO - Modified residue CYS    58 A to CYX
moleculekit.tools.preparation - INFO - Modified residue HIS    91 A to HID
moleculekit.tools.preparation - INFO - Modified residue CYS   128 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   136 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   157 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   168 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   182 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   191 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   201 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   220 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   232 A to CYX
moleculekit.tools.preparation - WARNING - Dubious protonation state: the pKa of 2 residues is within 1.0 units of pH 7.4.
moleculekit.tools.preparation - WARNING - Dubious protonation state:    TYR    39 A (pKa= 8.24)
moleculekit.tools.preparation - WARNING - Dubious protonation state:    HIS    57 A (pKa= 7.44)

Step 5 - Parameterize with OpenFF Sage#

out = parameterizeFromSpecs(
    specs,
    prepared,
    outdir="./params",
    forcefield="openff_unconstrained-2.3.0.offxml",
    charge_method="nagl",
)
print(out)
ClusterOutputs(topo_paths=[], frcmod_paths=[], custombonds=[], xml_paths=['./params/ligand_BEN/BEN.xml'])
htmd.builder.nonstandard - INFO - SMIRNOFF (openff) forcefield in use; the per-cluster XML outputs are consumable by htmd.builder.openmm.build only (via extra_xml=). Use htmd.builder.amber.build only when every NCAA is parameterised with GAFF/GAFF2.
pint.util - WARNING - Redefining '[electric_potential]' (<class 'pint.delegates.txt_defparser.plain.DerivedDimensionDefinition'>)
matplotlib.font_manager - INFO - Failed to extract font properties from /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf: Can not load face (unknown file format; error code 0x2)
matplotlib.font_manager - INFO - generated new fontManager
openff.nagl.nn._models - INFO - Could not find property in lookup table: 'Could not find property value for molecule with InChI InChI=1/C7H8N2/c8-7(9)6-4-2-1-3-5-6/h1-5H,(H3,8,9)/p+1/fC7H9N2/h8-9H2/q+1'
htmd.builder.openmm - INFO - Parameterised ligand 'BEN' with openff_unconstrained-2.3.0.offxml (charge_method='nagl'): 18 atoms, 18 bonds
htmd.builder.openmm - INFO - Wrote OpenMM XML fragment to ./params/ligand_BEN/BEN.xml - 1 residue(s) [BEN], 18/18 atoms kept (0 cap atoms dropped)

forcefield strings that don’t start with gaff are treated as SMIRNOFF offxml filenames and dispatched through OpenFF Interchange. The output ClusterOutputs carries .xml files in xml_paths - one <resname>.xml per free ligand (like the benzamidine here), and one cluster_<idx>.xml per cluster of covalently-bonded NCAAs. topo_paths and frcmod_paths may be empty for the SMIRNOFF branch, since the XML is self-contained.

charge_method="nagl" runs the OpenFF NAGL GNN surrogate for the charge model Sage’s vdW and torsion parameters were fit against (AM1-BCC for Sage 2.0–2.2, AshGC for Sage 2.3). Passing any non-"nagl" charge_method ("am1-bcc", "gasteiger", "resp", …) emits a mismatched-charges warning - it works, but you lose some of the consistency the Sage release was validated for.

Step 6 - Build under OpenMM#

# 3PTB ships with a handful of crystallographic waters that systemPrepare keeps.
# openmm.build's internal solvation step sees them and treats the mol as already-
# solvated, so it skips adding bulk water - which then leaves too few waters for
# the ioniser to swap into Na+/Cl- and still reach 0.15 M. Strip the crystal
# waters first so the builder solvates from scratch.
prepared.remove("water", _logger=False)

built, system = openmm.build(
    prepared,
    outdir="./build",
    extra_xml=list(out.xml_paths),
    custombonds=out.custombonds,
    solvate=True,
    padding=15.0,
    ionize=True,
    saltconc=0.15,
)
Disulfide Bond between: UniqueResidueID<resname: 'CYX', chain: 'A', resid: 42, insertion: '', segid: 'P0'>
                   and: UniqueResidueID<resname: 'CYX', chain: 'A', resid: 58, insertion: '', segid: 'P0'>

Disulfide Bond between: UniqueResidueID<resname: 'CYX', chain: 'A', resid: 136, insertion: '', segid: 'P0'>
                   and: UniqueResidueID<resname: 'CYX', chain: 'A', resid: 201, insertion: '', segid: 'P0'>

Disulfide Bond between: UniqueResidueID<resname: 'CYX', chain: 'A', resid: 191, insertion: '', segid: 'P0'>
                   and: UniqueResidueID<resname: 'CYX', chain: 'A', resid: 220, insertion: '', segid: 'P0'>

Disulfide Bond between: UniqueResidueID<resname: 'CYX', chain: 'A', resid: 168, insertion: '', segid: 'P0'>
                   and: UniqueResidueID<resname: 'CYX', chain: 'A', resid: 182, insertion: '', segid: 'P0'>

Disulfide Bond between: UniqueResidueID<resname: 'CYX', chain: 'A', resid: 22, insertion: '', segid: 'P0'>
                   and: UniqueResidueID<resname: 'CYX', chain: 'A', resid: 157, insertion: '', segid: 'P0'>

Disulfide Bond between: UniqueResidueID<resname: 'CYX', chain: 'A', resid: 128, insertion: '', segid: 'P0'>
                   and: UniqueResidueID<resname: 'CYX', chain: 'A', resid: 232, insertion: '', segid: 'P0'>
htmd.builder.openmm - INFO - Detecting disulfide bonds.
htmd.builder.builder - INFO - 6 disulfide bonds were added
htmd.builder.openmm - INFO - Solvating system with Modeller.addSolvent()...
htmd.builder.openmm - INFO - Creating OpenMM System...
htmd.builder.openmm - INFO - Wrote ./build/structure.prmtop and ./build/structure.inpcrd via ParmEd
moleculekit.writers - WARNING - Field "resid" of PDB overflows. Your data will be truncated to 4 characters.
htmd.builder.openmm - INFO - Wrote ForceField-XML handoff to ./build
htmd.builder.openmm - INFO - Finished building.

Argument breakdown:

Argument

Effect

extra_xml

Additional XML files describing the non-canonical residues - here, the per-cluster Sage XML emitted by parameterizeFromSpecs.

custombonds

Inter-residue bonds (same shape as in amber.build).

solvate=True

The OpenMM builder solvates internally; no separate solvate() call is needed. If you’d rather pre-solvate (with your own padding, water model, or centersel), call solvate() on prepared first, then pass solvate=False here and the builder will skip the internal step.

padding

Solvation padding in Å.

ionize + saltconc

Same meaning as in amber.build.

openmm.build returns a (built_mol, openmm_system) tuple. The built_mol is a Molecule of the final system; openmm_system is the OpenMM System object you can hand straight to an OpenMM Simulation. The output directory still contains structure.prmtop + structure.pdb so the result is consumable by any downstream MD driver, including ACEMD via acemd.protocols.setup_equilibration().

Gotchas#

  • frcmod (tLeap’s input format) can’t express the full SMIRNOFF feature set, so the OpenFF path emits OpenMM XML only - amber.build cannot consume it. Picking OpenFF for one residue commits the whole build to htmd.builder.openmm.build().

  • Pair forcefield="openff_*.offxml" with charge_method="nagl". Sage’s parameters were fit alongside AM1-BCC charges, and NAGL is the recommended fast surrogate; other charge models (Gasteiger, antechamber’s AM1-BCC) work but emit a mismatched-charges warning and lose some of the consistency the Sage release was validated for.

  • openmm.build does its own solvation when solvate=True. If you want to pre-solvate (e.g. to use a non-default padding or centersel), call solvate() first and pass solvate=False to the builder. Don’t pre-solvate and leave solvate=True - you’d end up with a doubly-solvated box.

  • For pure-canonical protein systems (no ligand, no NCAA) there’s nothing for OpenFF to do at the small-molecule level; you don’t need Sage and the GAFF2 path in tutorial 02 is enough. This tutorial pays off when the ligand is the focus.

See also#