moleculekit.tools.nonstandard_residues module#

Discovery helper for non-standard residues that need user-driven AMBER parameterization before building.

A “non-standard residue” is any residue whose resname is not in moleculekit’s canonical amino-acid / nucleic / water / ion sets. detectNonStandardResidues() inspects the molecule (without mutating it) and returns one spec per non-standard residue, plus one per canonical residue covalently bonded to a non-canonical one:

  • ChainResidueSpec - one spec per chain-resident residue that needs special handling during AMBER parameterization: a non-canonical amino acid embedded in a polypeptide chain (selenomethionine, norleucine, a stapled NCAA, etc.) OR a canonical amino acid whose sidechain is covalently bonded to something other than its peptide neighbours (a Cys forming a thioether or disulfide, an Asn N-glycosylated by a sugar, a Glu CD - Lys NZ isopeptide).

  • ScaffoldSpec - free non-canonical residue with two or more non-peptide bonds to other residues (the central scaffold of a bicyclic peptide, a multi-anchor covalent inhibitor).

  • CovalentLigandSpec - free non-canonical residue with exactly one non-peptide bond to another residue (single-anchor covalent inhibitor, NAG-Asn glycan stem, single-Cys heme).

  • LigandSpec - free non-canonical residue with no covalent bonds (small-molecule binding-pocket ligand, fatty acid).

Pass the spec list to moleculekit.tools.preparation.systemPrepare() via detect_specs=specs to apply the proposed renames + H-drops on the prepared molecule.

class moleculekit.tools.nonstandard_residues.ChainResidueSpec(resname, residue, new_resname=None, anchor_atom=None, is_n_term=False, is_c_term=False)#

Bases: object

One spec per chain-resident residue that needs special handling during AMBER parameterization: a non-canonical amino acid embedded in a polypeptide chain (selenomethionine, norleucine, a stapled NCAA, etc.) OR a canonical amino acid whose sidechain is covalently bonded to something other than its peptide neighbours (a Cys forming a thioether or disulfide, an Asn N-glycosylated by a sugar, a Glu CD - Lys NZ isopeptide).

Fields:

  • resname: the residue’s resname in the input Molecule ("GLU", "NLE", "CYS", …).

  • residue: UniqueResidueID for the residue (segid / chain / resid / insertion).

  • new_resname: the resname to rename to before downstream parameterization. Set whenever a rename is needed:

    • Canonical AA at a junction: "CYX" for both ends of a CYS-SG <-> CYS-SG disulfide; an auto-generated 3-char XX# name otherwise, shared across residues sharing the bucket (resname, anchor_atom, partner_resname, n_term, c_term) so antechamber runs once per unique chemistry.

    • NCAA appearing with multiple terminus configurations: the existing _disambiguate_terminus_resnames() prefixes "N"/"C"/"B" so each terminus form gets its own prepi (otherwise tLeap’s second loadAmberPrep would clobber the first).

    • None when no rename is needed (plain mid-chain NCAA, single-terminus-config NCAA, etc.).

  • anchor_atom: the name of the residue’s sidechain atom that participates in a non-peptide inter-residue bond ("SG" for a Cys-thioether, "CD" for a Glu CD-LYS NZ isopeptide, "NZ" for the Lys end of the same isopeptide, "CE" for an NLE staple, …). None when the residue has no non-peptide bond (plain chain-resident NCAA); anchor_atom is not None is the single source of truth for “is this residue at a non-peptide junction?” - what the old CrosslinkedNCAASpec/NCAASpec distinction encoded. For residues with multiple non-peptide partners the detector picks the deterministically-first partner (sorted by partner residue index, then anchor atom name). For canonical-AA renamed entries this is also the partner used as the bucket key; for NCAA entries (where there’s no bucket key) the same deterministic order applies.

  • is_n_term / is_c_term: chain termini flags.

anchor_atom: str | None = None#
is_c_term: bool = False#
is_n_term: bool = False#
new_resname: str | None = None#
residue: UniqueResidueID#
resname: str#
class moleculekit.tools.nonstandard_residues.CovalentLigandSpec(resname, residue)#

Bases: object

A non-canonical residue that is not peptide-bonded into a chain and has exactly one non-peptide bond going out to another residue. Examples: a single-anchor covalent inhibitor, a NAG-Asn glycan stem, a single-Cys heme.

residue: UniqueResidueID#
resname: str#
class moleculekit.tools.nonstandard_residues.LigandSpec(resname, residue)#

Bases: object

A non-canonical residue with no covalent bonds to any other residue (a free, non-covalently bound ligand). Examples: small-molecule drug ligands in binding pockets, fatty acids, lipid head-groups. The parameterizer treats it standalone with no caps.

residue: UniqueResidueID#
resname: str#
class moleculekit.tools.nonstandard_residues.ScaffoldSpec(resname, residue)#

Bases: object

A non-canonical residue that is not peptide-bonded into a chain and has two or more non-peptide bonds going out to other residues. Examples: the central scaffold of a bicyclic / tricyclic peptide, a multi-anchor covalent inhibitor.

residue: UniqueResidueID#
resname: str#
moleculekit.tools.nonstandard_residues.detectNonStandardResidues(mol)#

Walk mol and emit one spec per residue that needs special handling by a downstream parameterizer / builder.

Inspects mol.bonds (without mutating the molecule) and classifies every non-canonical residue plus every canonical residue at a non-peptide junction into one of four spec types:

  • ChainResidueSpec — chain-resident residue that needs special parameterization: a non-canonical amino acid embedded in a polypeptide chain (selenomethionine, norleucine, stapled-peptide residues, …) OR a canonical AA whose sidechain is covalently bonded to anything other than its peptide neighbours (Cys-Cys disulfide, Cys thioether to a heme, Asn N-glycan, Glu-Lys isopeptide, Tyr coordinating a metal, …). Canonical AAs at a junction always receive new_resname: "CYX" for both ends of a CYS-SG <-> CYS-SG disulfide; an auto-generated 3-char XX# name otherwise. Residues that share the same (canonical_resname, anchor_atom, partner_resname, is_n_term, is_c_term) bucket key collapse to the same XX# so the parameterizer emits one prepi shared across them (e.g. all mid-chain ASN-ND2-bonded-to-NAG residues land on one bucket). Chain-terminal forms get their own buckets because they carry extra atoms (OXT on the C-terminus, H1/H2/H3 on the N-terminus) and different charges.

  • ScaffoldSpec — non-chain-resident residue with 2+ non-peptide bonds (bicyclic-peptide central scaffold, multi-anchor covalent inhibitor).

  • CovalentLigandSpec — non-chain-resident residue with exactly one non-peptide bond (single-anchor covalent inhibitor, NAG-Asn glycan stem, single-Cys heme).

  • LigandSpec — non-chain-resident residue with no covalent bonds (free small-molecule ligand, fatty acid).

Chain-resident NCAAs that appear with more than one terminus configuration in the same molecule are disambiguated post hoc by setting new_resname on the terminal specs ("N"+resname for N-term, "C"+resname for C-term, "B"+resname for a single- residue chain). When every instance of an NCAA shares the same terminus configuration, new_resname stays None.

Metal-coordination contacts where the metal is a standalone ion residue (e.g. PDB LINK records between a Zn²⁺ residue and a Zn-chelating inhibitor, or 3PTB’s Ca²⁺ coordinated by GLU/ASN/VAL oxygens) are skipped — the inhibitor stays a free LigandSpec, and the protein residues are left alone. Coordinations where the metal lives inside a cofactor (e.g. Fe inside HEM coordinated to a Tyr-OH or Cys-SG) are kept: the cofactor becomes a CovalentLigandSpec and the donating canonical AA becomes a ChainResidueSpec, because the donor’s protonation state changes (Tyr-O⁻, Cys-S⁻) and needs a custom prepi. Bonds touching water are always skipped.

Note

Plain Cys–Cys disulfides are not returned as separate specs for the caller to process. Both Cys residues are instead silently renamed to CYX inside moleculekit.tools.preparation.systemPrepare() (which calls this function internally). The ChainResidueSpec entries for disulfide-bonded cysteines exist only to carry the new_resname="CYX" rename; the parameterization of the S–S bond is handled by AMBER’s built-in CYX template, so no user intervention is required.

Parameters:

mol (moleculekit.molecule.Molecule) – Input molecule. Should already carry covalent bonds (read from a PDB CONECT block, a CIF _struct_conn block, or set up via Molecule.templateResidueFromSmiles()). If mol.bonds is empty, the detector falls back to distance-based bond guessing via mol._guessBonds() and logs a warning. The molecule is not mutated.

Returns:

Flat list mixing ChainResidueSpec, ScaffoldSpec, CovalentLigandSpec, and LigandSpec entries. Ordered by residue index in mol. Empty when the molecule has no non-standard residues and no sidechain crosslinks.

Return type:

list[PerResidueSpec]

Raises:

RuntimeError – If a canonical residue is bonded at an anchor atom that is not listed in moleculekit.tools._anchor_variants.ANCHOR_TABLE (the anchor needs to be registered there before the residue can be re-templated). Also raised if an NCAA resname is 4+ characters long and requires terminus-disambiguation prefixing, which would exceed AMBER’s 4-character prepi unit-name limit.

Examples

Detect non-standard residues, template them with SMILES, then prepare:

>>> from moleculekit.molecule import Molecule
>>> from moleculekit.tools.nonstandard_residues import detectNonStandardResidues
>>> from moleculekit.tools.preparation import systemPrepare
>>> mol = Molecule("3ptb")
>>> specs = detectNonStandardResidues(mol)

For a molecule that has a non-canonical residue (e.g. “LIG”) that needs SMILES-based templating before preparation:

>>> # Template the non-canonical residue with its SMILES
>>> lig_mask = mol.resname == "LIG"
>>> mol.remove("hydrogen")
>>> mol.templateResidueFromSmiles(lig_mask, smiles="...", addHs=True)
>>> # Now pass the specs so systemPrepare does not re-detect
>>> pmol, specs = systemPrepare(mol, detect_specs=specs)

When no non-standard-residue handling is needed, pass specs directly:

>>> pmol, specs, df = systemPrepare(mol, return_details=True)