moleculekit.tools.nonstandard_residues module#

Discovery helper for non-standard residues that need user-driven AMBER parameterization before building.

A “non-standard residue” is any residue whose resname is not in moleculekit’s canonical amino-acid / nucleic / water / ion sets. detectNonStandardResidues() inspects the molecule (without mutating it) and returns one spec per non-standard residue, plus one per canonical residue covalently bonded to a non-canonical one:

ChainResidueSpec - one spec per chain-resident residue that needs special handling during AMBER parameterization: a non-canonical amino acid embedded in a polypeptide chain (selenomethionine, norleucine, a stapled NCAA, etc.) OR a canonical amino acid whose sidechain is covalently bonded to something other than its peptide neighbours (a Cys forming a thioether or disulfide, an Asn N-glycosylated by a sugar, a Glu CD - Lys NZ isopeptide).

ScaffoldSpec - free non-canonical residue with two or more non-peptide bonds to other residues (the central scaffold of a bicyclic peptide, a multi-anchor covalent inhibitor).

CovalentLigandSpec - free non-canonical residue with exactly one non-peptide bond to another residue (single-anchor covalent inhibitor, NAG-Asn glycan stem, single-Cys heme).

LigandSpec - free non-canonical residue with no covalent bonds (small-molecule binding-pocket ligand, fatty acid).

Pass the spec list to moleculekit.tools.preparation.systemPrepare() via detect_specs=specs to apply the proposed renames + H-drops on the prepared molecule.

class moleculekit.tools.nonstandard_residues.ChainResidueSpec(resname: str, residue: UniqueResidueID, new_resname: str | None = None, anchor_atom: str | None = None, is_n_term: bool = False, is_c_term: bool = False)#

Bases: object

One spec per chain-resident residue that needs special handling during AMBER parameterization: a non-canonical amino acid embedded in a polypeptide chain (selenomethionine, norleucine, a stapled NCAA, etc.) OR a canonical amino acid whose sidechain is covalently bonded to something other than its peptide neighbours (a Cys forming a thioether or disulfide, an Asn N-glycosylated by a sugar, a Glu CD - Lys NZ isopeptide).

Fields:

resname: the residue’s resname in the input Molecule ("GLU", "NLE", "CYS", …).
residue: UniqueResidueID for the residue (segid / chain / resid / insertion).
new_resname: the resname to rename to before downstream parameterization. Set whenever a rename is needed:
- Canonical AA at a junction: "CYX" for both ends of a CYS-SG <-> CYS-SG disulfide; an auto-generated 3-char XX# name otherwise, shared across residues sharing the bucket (resname, anchor_atom, partner_resname, n_term, c_term) so antechamber runs once per unique chemistry.
- NCAA appearing with multiple terminus configurations: the existing _disambiguate_terminus_resnames() prefixes "N"/"C"/"B" so each terminus form gets its own prepi (otherwise tLeap’s second loadAmberPrep would clobber the first).
- None when no rename is needed (plain mid-chain NCAA, single-terminus-config NCAA, etc.).
anchor_atom: the name of the residue’s sidechain atom that participates in a non-peptide inter-residue bond ("SG" for a Cys-thioether, "CD" for a Glu CD-LYS NZ isopeptide, "NZ" for the Lys end of the same isopeptide, "CE" for an NLE staple, …). None when the residue has no non-peptide bond (plain chain-resident NCAA); anchor_atom is not None is the single source of truth for “is this residue at a non-peptide junction?” - what the old CrosslinkedNCAASpec/NCAASpec distinction encoded. For residues with multiple non-peptide partners the detector picks the deterministically-first partner (sorted by partner residue index, then anchor atom name). For canonical-AA renamed entries this is also the partner used as the bucket key; for NCAA entries (where there’s no bucket key) the same deterministic order applies.
is_n_term / is_c_term: chain termini flags.

anchor_atom: str | None = None#

is_c_term: bool = False#

is_n_term: bool = False#

new_resname: str | None = None#

residue: UniqueResidueID#

resname: str#

class moleculekit.tools.nonstandard_residues.CovalentLigandSpec(resname: str, residue: UniqueResidueID)#

Bases: object

A non-canonical residue that is not peptide-bonded into a chain and has exactly one non-peptide bond going out to another residue. Examples: a single-anchor covalent inhibitor, a NAG-Asn glycan stem, a single-Cys heme.

residue: UniqueResidueID#

resname: str#

class moleculekit.tools.nonstandard_residues.LigandSpec(resname: str, residue: UniqueResidueID)#

Bases: object

A non-canonical residue with no covalent bonds to any other residue (a free, non-covalently bound ligand). Examples: small-molecule drug ligands in binding pockets, fatty acids, lipid head-groups. The parameterizer treats it standalone with no caps.

residue: UniqueResidueID#

resname: str#

class moleculekit.tools.nonstandard_residues.ScaffoldSpec(resname: str, residue: UniqueResidueID)#

Bases: object

A non-canonical residue that is not peptide-bonded into a chain and has two or more non-peptide bonds going out to other residues. Examples: the central scaffold of a bicyclic / tricyclic peptide, a multi-anchor covalent inhibitor.

residue: UniqueResidueID#

resname: str#

moleculekit.tools.nonstandard_residues.detectNonStandardResidues(mol)#

Walk mol and emit one spec per residue that needs special handling by a downstream parameterizer / builder.

Inspects mol.bonds (without mutating the molecule) and classifies every non-canonical residue plus every canonical residue at a non-peptide junction into one of four spec types:

ChainResidueSpec — chain-resident residue that needs special parameterization: a non-canonical amino acid embedded in a polypeptide chain (selenomethionine, norleucine, stapled-peptide residues, …) OR a canonical AA whose sidechain is covalently bonded to anything other than its peptide neighbours (Cys-Cys disulfide, Cys thioether to a heme, Asn N-glycan, Glu-Lys isopeptide, Tyr coordinating a metal, …). Canonical AAs at a junction always receive new_resname: "CYX" for both ends of a CYS-SG <-> CYS-SG disulfide; an auto-generated 3-char XX# name otherwise. Residues that share the same (canonical_resname, anchor_atom, partner_resname, is_n_term, is_c_term) bucket key collapse to the same XX# so the parameterizer emits one prepi shared across them (e.g. all mid-chain ASN-ND2-bonded-to-NAG residues land on one bucket). Chain-terminal forms get their own buckets because they carry extra atoms (OXT on the C-terminus, H1/H2/H3 on the N-terminus) and different charges.
ScaffoldSpec — non-chain-resident residue with 2+ non-peptide bonds (bicyclic-peptide central scaffold, multi-anchor covalent inhibitor).
CovalentLigandSpec — non-chain-resident residue with exactly one non-peptide bond (single-anchor covalent inhibitor, NAG-Asn glycan stem, single-Cys heme).
LigandSpec — non-chain-resident residue with no covalent bonds (free small-molecule ligand, fatty acid).

Chain-resident NCAAs that appear with more than one terminus configuration in the same molecule are disambiguated post hoc by setting new_resname on the terminal specs ("N"+resname for N-term, "C"+resname for C-term, "B"+resname for a single- residue chain). When every instance of an NCAA shares the same terminus configuration, new_resname stays None.

Metal-coordination contacts (e.g. PDB LINK records between a Zn ion and a Zn-chelating inhibitor) are stored as bonds in mol.bonds but are not covalent for parameterization purposes, so they’re skipped — such inhibitors stay classified as free LigandSpec entries.

Parameters:: mol (moleculekit.molecule.Molecule) – Input molecule. Should already carry covalent bonds (read from a PDB CONECT block, a CIF _struct_conn block, or set up via Molecule.templateResidueFromSmiles()). If mol.bonds is empty, the detector falls back to distance-based bond guessing via mol._guessBonds() and logs a warning. The molecule is not mutated.
Returns:: Flat list mixing ChainResidueSpec, ScaffoldSpec, CovalentLigandSpec, and LigandSpec entries. Ordered by residue index in mol. Empty when the molecule has no non-standard residues and no sidechain crosslinks.
Return type:: list[PerResidueSpec]
Raises:: RuntimeError – If a canonical residue is bonded at an anchor atom that is not listed in moleculekit.tools._anchor_variants.ANCHOR_TABLE (the anchor needs to be registered there before the residue can be re-templated). Also raised if an NCAA resname is 4+ characters long and requires terminus-disambiguation prefixing, which would exceed AMBER’s 4-character prepi unit-name limit.

Examples

>>> from moleculekit.molecule import Molecule
>>> from moleculekit.tools.nonstandard_residues import detectNonStandardResidues
>>> mol = Molecule("3ptb")  
>>> specs = detectNonStandardResidues(mol)  

The returned specs can be forwarded to moleculekit.tools.preparation.systemPrepare() via detect_specs=specs to apply the planned renames and re-templating on the prepared molecule, or to a downstream builder for parameterization.