moleculekit.tools.nonstandard_residues module#
Discovery helper for non-standard residues that need user-driven AMBER parameterization before building.
A “non-standard residue” is any residue whose resname is not in
moleculekit’s canonical amino-acid / nucleic / water / ion sets.
detectNonStandardResidues() inspects the molecule (without
mutating it) and returns one spec per non-standard residue, plus one
per canonical residue covalently bonded to a non-canonical one:
ChainResidueSpec- one spec per chain-resident residue that needs special handling during AMBER parameterization: a non-canonical amino acid embedded in a polypeptide chain (selenomethionine, norleucine, a stapled NCAA, etc.) OR a canonical amino acid whose sidechain is covalently bonded to something other than its peptide neighbours (a Cys forming a thioether or disulfide, an Asn N-glycosylated by a sugar, a Glu CD - Lys NZ isopeptide).
ScaffoldSpec- free non-canonical residue with two or more non-peptide bonds to other residues (the central scaffold of a bicyclic peptide, a multi-anchor covalent inhibitor).
CovalentLigandSpec- free non-canonical residue with exactly one non-peptide bond to another residue (single-anchor covalent inhibitor, NAG-Asn glycan stem, single-Cys heme).
LigandSpec- free non-canonical residue with no covalent bonds (small-molecule binding-pocket ligand, fatty acid).
Pass the spec list to moleculekit.tools.preparation.systemPrepare()
via detect_specs=specs to apply the proposed renames + H-drops on
the prepared molecule.
- class moleculekit.tools.nonstandard_residues.ChainResidueSpec(resname: str, residue: UniqueResidueID, new_resname: str | None = None, anchor_atom: str | None = None, is_n_term: bool = False, is_c_term: bool = False)#
Bases:
objectOne spec per chain-resident residue that needs special handling during AMBER parameterization: a non-canonical amino acid embedded in a polypeptide chain (selenomethionine, norleucine, a stapled NCAA, etc.) OR a canonical amino acid whose sidechain is covalently bonded to something other than its peptide neighbours (a Cys forming a thioether or disulfide, an Asn N-glycosylated by a sugar, a Glu CD - Lys NZ isopeptide).
Fields:
resname: the residue’s resname in the inputMolecule("GLU","NLE","CYS", …).residue:UniqueResidueIDfor the residue (segid / chain / resid / insertion).new_resname: the resname to rename to before downstream parameterization. Set whenever a rename is needed:Canonical AA at a junction:
"CYX"for both ends of a CYS-SG <-> CYS-SG disulfide; an auto-generated 3-charXX#name otherwise, shared across residues sharing the bucket(resname, anchor_atom, partner_resname, n_term, c_term)so antechamber runs once per unique chemistry.NCAA appearing with multiple terminus configurations: the existing
_disambiguate_terminus_resnames()prefixes"N"/"C"/"B"so each terminus form gets its own prepi (otherwise tLeap’s secondloadAmberPrepwould clobber the first).Nonewhen no rename is needed (plain mid-chain NCAA, single-terminus-config NCAA, etc.).
anchor_atom: the name of the residue’s sidechain atom that participates in a non-peptide inter-residue bond ("SG"for a Cys-thioether,"CD"for a Glu CD-LYS NZ isopeptide,"NZ"for the Lys end of the same isopeptide,"CE"for an NLE staple, …).Nonewhen the residue has no non-peptide bond (plain chain-resident NCAA);anchor_atom is not Noneis the single source of truth for “is this residue at a non-peptide junction?” - what the oldCrosslinkedNCAASpec/NCAASpecdistinction encoded. For residues with multiple non-peptide partners the detector picks the deterministically-first partner (sorted by partner residue index, then anchor atom name). For canonical-AA renamed entries this is also the partner used as the bucket key; for NCAA entries (where there’s no bucket key) the same deterministic order applies.is_n_term/is_c_term: chain termini flags.
- residue: UniqueResidueID#
- class moleculekit.tools.nonstandard_residues.CovalentLigandSpec(resname: str, residue: UniqueResidueID)#
Bases:
objectA non-canonical residue that is not peptide-bonded into a chain and has exactly one non-peptide bond going out to another residue. Examples: a single-anchor covalent inhibitor, a NAG-Asn glycan stem, a single-Cys heme.
- residue: UniqueResidueID#
- class moleculekit.tools.nonstandard_residues.LigandSpec(resname: str, residue: UniqueResidueID)#
Bases:
objectA non-canonical residue with no covalent bonds to any other residue (a free, non-covalently bound ligand). Examples: small-molecule drug ligands in binding pockets, fatty acids, lipid head-groups. The parameterizer treats it standalone with no caps.
- residue: UniqueResidueID#
- class moleculekit.tools.nonstandard_residues.ScaffoldSpec(resname: str, residue: UniqueResidueID)#
Bases:
objectA non-canonical residue that is not peptide-bonded into a chain and has two or more non-peptide bonds going out to other residues. Examples: the central scaffold of a bicyclic / tricyclic peptide, a multi-anchor covalent inhibitor.
- residue: UniqueResidueID#
- moleculekit.tools.nonstandard_residues.detectNonStandardResidues(mol)#
Walk
moland emit one spec per residue that needs special handling by a downstream parameterizer / builder.Inspects
mol.bonds(without mutating the molecule) and classifies every non-canonical residue plus every canonical residue at a non-peptide junction into one of four spec types:ChainResidueSpec— chain-resident residue that needs special parameterization: a non-canonical amino acid embedded in a polypeptide chain (selenomethionine, norleucine, stapled-peptide residues, …) OR a canonical AA whose sidechain is covalently bonded to anything other than its peptide neighbours (Cys-Cys disulfide, Cys thioether to a heme, Asn N-glycan, Glu-Lys isopeptide, Tyr coordinating a metal, …). Canonical AAs at a junction always receivenew_resname:"CYX"for both ends of a CYS-SG <-> CYS-SG disulfide; an auto-generated 3-charXX#name otherwise. Residues that share the same(canonical_resname, anchor_atom, partner_resname, is_n_term, is_c_term)bucket key collapse to the sameXX#so the parameterizer emits one prepi shared across them (e.g. all mid-chain ASN-ND2-bonded-to-NAG residues land on one bucket). Chain-terminal forms get their own buckets because they carry extra atoms (OXTon the C-terminus,H1/H2/H3on the N-terminus) and different charges.ScaffoldSpec— non-chain-resident residue with 2+ non-peptide bonds (bicyclic-peptide central scaffold, multi-anchor covalent inhibitor).CovalentLigandSpec— non-chain-resident residue with exactly one non-peptide bond (single-anchor covalent inhibitor, NAG-Asn glycan stem, single-Cys heme).LigandSpec— non-chain-resident residue with no covalent bonds (free small-molecule ligand, fatty acid).
Chain-resident NCAAs that appear with more than one terminus configuration in the same molecule are disambiguated post hoc by setting
new_resnameon the terminal specs ("N"+resnamefor N-term,"C"+resnamefor C-term,"B"+resnamefor a single- residue chain). When every instance of an NCAA shares the same terminus configuration,new_resnamestaysNone.Metal-coordination contacts (e.g. PDB
LINKrecords between a Zn ion and a Zn-chelating inhibitor) are stored as bonds inmol.bondsbut are not covalent for parameterization purposes, so they’re skipped — such inhibitors stay classified as freeLigandSpecentries.- Parameters:
mol (
moleculekit.molecule.Molecule) – Input molecule. Should already carry covalent bonds (read from a PDBCONECTblock, a CIF_struct_connblock, or set up viaMolecule.templateResidueFromSmiles()). Ifmol.bondsis empty, the detector falls back to distance-based bond guessing viamol._guessBonds()and logs a warning. The molecule is not mutated.- Returns:
Flat list mixing
ChainResidueSpec,ScaffoldSpec,CovalentLigandSpec, andLigandSpecentries. Ordered by residue index inmol. Empty when the molecule has no non-standard residues and no sidechain crosslinks.- Return type:
list[PerResidueSpec]
- Raises:
RuntimeError – If a canonical residue is bonded at an anchor atom that is not listed in
moleculekit.tools._anchor_variants.ANCHOR_TABLE(the anchor needs to be registered there before the residue can be re-templated). Also raised if an NCAA resname is 4+ characters long and requires terminus-disambiguation prefixing, which would exceed AMBER’s 4-character prepi unit-name limit.
Examples
>>> from moleculekit.molecule import Molecule >>> from moleculekit.tools.nonstandard_residues import detectNonStandardResidues >>> mol = Molecule("3ptb") >>> specs = detectNonStandardResidues(mol)
The returned
specscan be forwarded tomoleculekit.tools.preparation.systemPrepare()viadetect_specs=specsto apply the planned renames and re-templating on the prepared molecule, or to a downstream builder for parameterization.