htmd.builder.nonstandard module#

End-to-end parameterization pipeline for non-canonical residues under AMBER, driven by the spec list returned from moleculekit.tools.nonstandard_residues.detectNonStandardResidues().

parameterizeFromSpecs() is the user-facing entry point. It walks mol.bonds to recover cluster grouping (residues sharing non-peptide inter-residue bonds), builds a combined antechamber model compound per cluster (full residues + ACE/NME-style backbone caps), runs antechamber + parmchk2 once per cluster, and splits the output into per-residue CIF / frcmod pairs. Free residues (no cluster bonds) are parameterized standalone via htmd.builder._ambertools._fftype_antechamber().

The result ClusterOutputs carries the topology paths, frcmod paths, and custombonds list in the shape that htmd.builder.amber.build() expects.

For canonical residues that the detector renamed (CYS bonded to a scaffold, ASN glycosylated by a sugar, …), the per-residue CIF carries ff14SB atom types pulled from the right AMBER residue template (mid-chain CYX / N-terminal NCYX / C-terminal CCYX and the analogous forms for LYS/HIS/ASN/…) so that backbone bonds resolve against ff14SB. Per-atom charges come from the antechamber compute on the combined model, except the backbone atoms of chain-resident residues, which are pinned to ff14SB: the whole backbone from the ff14SB libraries for canonical residues, the charge-class amide charges for NCAAs (see _backbone_charge_map()). The frcmod carries cross-FF junction terms (bond / angle / dihedral entries spanning a canonical-residue atom and a non-canonical one) with the canonical-side atom types rewritten from antechamber’s GAFF2 to ff14SB.

class htmd.builder.nonstandard.ClusterBond(atom_a, atom_b)#

Bases: object

One non-peptide covalent bond between two atoms in a cluster. Symmetric (no canonical-side / scaffold-side asymmetry), so it works uniformly for NCAA-NCAA crosslinks, canonical-AA-anchored scaffolds, and everything in between.

atom_a: UniqueAtomID#
atom_b: UniqueAtomID#
class htmd.builder.nonstandard.ClusterModel(spec, cif_path, atom_map, atom_to_residue, atom_to_orig_name, canonical_renames)#

Bases: object

Result of buildClusterModel(). Carries everything prepareClusterResidues() needs to split antechamber’s output back into per-residue topology files.

atom_map: dict#
atom_to_orig_name: dict#
atom_to_residue: dict#
canonical_renames: dict#
cif_path: str#
spec: ClusterSpec#
class htmd.builder.nonstandard.ClusterOutputs(topo_paths=<factory>, frcmod_paths=<factory>, custombonds=<factory>, xml_paths=<factory>)#

Bases: object

Aggregated result of parameterizeFromSpecs() / prepareClusterResidues(). Carries the topology, parameter and custombond inputs that the user feeds back into htmd.builder.amber.build().

custombonds: list#
frcmod_paths: list#
topo_paths: list#
xml_paths: list#
class htmd.builder.nonstandard.ClusterSpec(subtype, residues, is_chain_resident, is_canonical, roles, bonds, canonical_resnames=<factory>, canonical_terminus=<factory>, is_n_term=<factory>, is_c_term=<factory>)#

Bases: object

A connected covalent cluster of residues that share non-peptide bonds and need combined parameterization. residues lists every cluster member; the four parallel lists carry per-residue metadata (chain residency, canonical/non-canonical, role tag, original canonical resname for renamed anchors).

bonds: list#
canonical_resnames: list#
canonical_terminus: list#
is_c_term: list#
is_canonical: list#
is_chain_resident: list#
is_n_term: list#
residues: list#
roles: list#
subtype: str#
class htmd.builder.nonstandard.ModelAtom(role, ff_type=None)#

Bases: object

Per-atom record in a cluster model compound. role is one of "residue" (atom is part of a cluster residue) or "cap" (an ACE/NME-style backbone cap atom that is dropped at split time). ff_type is unused in the current pipeline and kept for forward compatibility.

ff_type: str | None = None#
role: str#
htmd.builder.nonstandard.buildClusterModel(mol, spec, outdir)#

Build the combined model compound for a ClusterSpec: full residues + ACE/NME-style backbone caps derived from the live mol’s chain neighbours, written as a CIF ready to feed to antechamber.

htmd.builder.nonstandard.parameterizeFromSpecs(specs, mol, outdir, forcefield='gaff2', charge_method='am1-bcc', am1_path_length=15, pin_backbone_charges=True, normalize='cluster', use_pyodide=None)#

Parameterize every non-canonical residue in specs and return paths plus custombonds ready to feed htmd.builder.amber.build().

The function recovers cluster grouping by walking mol.bonds for non-peptide inter-residue bonds and unioning the touching residues. Per cluster it builds a combined model compound (full residues + ACE/NME-style backbone caps), runs antechamber + parmchk2 once, and splits the output into per-residue CIF / frcmod pairs. Free residues (no cluster bonds) are parameterized standalone.

Parameters:
  • specs (list) – Per-residue specs from moleculekit.tools.nonstandard_residues.detectNonStandardResidues().

  • mol (moleculekit.molecule.Molecule) – The molecule the specs describe. Must already carry covalent bonds (typically the post-systemPrepare molecule).

  • outdir (str) – Output directory for all generated CIF / frcmod / XML files.

  • forcefield (str or dict, optional) – Force field for the non-canonical atoms. Default "gaff2". A name starting with "gaff" dispatches through antechamber + parmchk2 and emits prepi + frcmod (consumable by amber.build()) plus a combined OpenMM XML; any other string is treated as a SMIRNOFF offxml filename (e.g. "openff_unconstrained-2.3.0.offxml") and dispatches through OpenFF Interchange, emitting only per-cluster OpenMM XML (consumable by openmm.build()). A dict {resname: ff_name, "default": ff_name} lets different residues use different force fields; mixing within a single cluster is not supported (the cluster compound is parameterised as one molecule).

  • charge_method (str, optional) – Charge model for the non-canonical atoms. Orthogonal to forcefield - every model works with both GAFF and SMIRNOFF typing (the externally-fit methods pre-compute charges, then the engine only types). "am1-bcc" (default) is the most accurate and honours the net charge. "gasteiger" is faster, computed via RDKit so it also honours the net charge, and is the automatic fallback under Pyodide where AM1-BCC’s SQM backend is unavailable. "nagl" uses the OpenFF NAGL graph neural network as an AM1-BCC surrogate - much faster on medium-to-large molecules. Requires PyTorch. "resp" / "resp-multiconf" fit RESP charges to a Psi4-computed QM ESP. Most accurate option but requires the private Acellera parameterize package + Psi4. resp-multiconf averages over up to 10 conformers (free ligands only; cluster path downgrades to single-conformer RESP since RDKit’s ETKDG isn’t appropriate for clusters with ACE/NME caps). "abcg2" is AM1-BCC v2, only meaningful with GAFF.

  • am1_path_length (int or None, optional) – Maximum path length for AM1-BCC charge equivalence determination, passed to antechamber’s -pl flag. Caps antechamber’s atom- equivalence search so it doesn’t hang on cyclic or large molecules. Only used for charge_method="am1-bcc" / "abcg2"; ignored for Gasteiger. None keeps antechamber’s own default.

  • pin_backbone_charges (bool, optional) – If True (default), the backbone partial charges of every chain-resident residue are pinned to ff14SB (residue-specific for canonical residues, charge-class fallback for NCAAs). Matches the Robin Betz / R.E.D. / Carlos Ramos tutorial convention. Set False to keep the cluster-computed backbone charges (the Forcefield_PTM / Khoury et al. 2014 convention, which argues backbone freezing can hurt fit quality).

  • normalize ({"cluster", "per_residue", None}, optional) – How to absorb the small per-residue drift left by slicing one residue out of a jointly-charged cluster (RDKit Gasteiger PEOE or antechamber AM1-BCC) and any shift the backbone pin introduces on the cluster total. Default "cluster": only the cluster total is normalised to integer; per-residue totals are left at their natural (fractional) values, preserving the per-atom charges the charge method computed. "per_residue": each emitted unit is integer-charged - AMBER’s tLeap convention, used by Betz / R.E.D. / Ramos, the safer choice if the same residue might recur in different bonding contexts. None: no rebalance at all (charges are exactly what the charge method produced, modulo the backbone pin).

  • use_pyodide (bool or None, optional) – Force the AmberTools dispatch path (True -> dispatch via antechamber_pyodide.run; False -> native subprocess). None (default) auto-detects Pyodide via sys.platform.

Returns:

Aggregated topology / parameter files and custombonds for the whole system. The forcefield choice determines what is populated:

  • GAFF forcefield: out.topo_paths (prepi) + out.frcmod_paths (frcmod, for amber.build()) plus one combined OpenMM XML (gaff_combined.xml) appended to out.xml_paths.

  • SMIRNOFF forcefield: per-cluster XML fragments appended to out.xml_paths only; topo_paths / frcmod_paths stay empty.

  • Mixed: both above contribute. Synthetic atom-type names are globally unique by construction, so the XMLs load together via openmm.app.ForceField(*defaultFf(), *out.xml_paths).

out.custombonds is populated in every case and matches the custombonds= argument of amber.build() / openmm.build(). To tell whether GAFF was involved (and therefore whether amber.build() is also viable), check frcmod_paths: it is non-empty iff at least one residue went through the GAFF path.

Return type:

ClusterOutputs

Examples

Build a scaffolded cyclic peptide (3 cysteines thioether-bonded to a triazinane scaffold LFI):

from moleculekit.molecule import Molecule
from moleculekit.tools.nonstandard_residues import detectNonStandardResidues
from moleculekit.tools.preparation import systemPrepare
from htmd.builder.nonstandard import parameterizeFromSpecs
from htmd.builder import amber

mol = Molecule("8QFZ.pdb")
mol.filter("chain B")
mol.segid[:] = "P"
mol.segid[mol.resname == "LFI"] = "L"

# 1. Inspect the molecule and decide what needs custom params.
specs = detectNonStandardResidues(mol)

# 2. Template each non-canonical residue from a SMILES string.
mol.templateResidueFromSmiles(
    "resname LFI",
    "C1N(CN(CN1C(=O)CCBr)C(=O)CCBr)C(=O)CCBr",
    addHs=True,
)

# 3. Protonate the canonical part and apply the spec renames /
#    displaced-H drops in one step.
pmol, _ = systemPrepare(mol, detect_specs=specs)

# 4. Run antechamber per cluster and split per-residue.
out = parameterizeFromSpecs(specs, pmol, outdir="./params")

# 5. Build.
built = amber.build(
    pmol,
    outdir="./build",
    custombonds=out.custombonds,
    topo=out.topo_paths,
    param=out.frcmod_paths,
)
htmd.builder.nonstandard.prepareClusterResidues(typed_path, frcmod_path, model, outdir=None, use_pyodide=None, residue_templates=None, parameter_sets=None, pin_backbone_charges=True, normalize='cluster')#

Split antechamber output for a cluster model compound into per- residue topology files and emit the matching custombonds list.

For each non-canonical cluster residue the function writes a CIF using antechamber’s GAFF2 types and the cluster compute’s per-atom charges. For each canonical anchor the CIF uses the appropriate AMBER residue template’s ff14SB atom types (mid-chain CYX / NLN / … or the matching N- or C-terminal variant NCYX / CCYX / … when the residue is at a chain terminus), with per-atom charges from the antechamber compute on the combined model. For chain-resident residues the backbone charges are pinned to ff14SB by default and the residue rebalanced to its integer formal charge (see _backbone_charge_map()); set pin_backbone_charges=False to skip the pin and keep the cluster-computed backbone charges. In both cases every per-residue file (chain-resident and scaffold) is rebalanced to its integer formal charge, so each emitted unit is integer-charged. Each canonical residue’s bucket resname (assigned by detect, e.g. CY1) keeps the residue out of tLeap’s built-in libraries so our prepi loads instead of the standard template.

Parameters:
  • typed_path (str) – Antechamber-typed mol2 of the cluster model compound.

  • frcmod_path (str) – parmchk2 output for the same model compound.

  • model (ClusterModel) – Cluster model returned by buildClusterModel.

  • outdir (str or None) – Output directory; created if missing. If None, a fresh tempdir is used.

  • residue_templates (list or None) – If provided, every per-residue typed-mol slice this function writes is appended as a _ResidueTemplateData for the downstream OpenMM XML emitter.

  • parameter_sets (list or None) – If provided, the cluster’s final AmberParameterSet (post junction-term injection and backbone-rename duplication, pre clean-up) is appended for the downstream XML emitter.

Return type:

ClusterOutputs