moleculekit.tools.nonstandard_residues module#
Discovery helper for non-standard residues that need user-driven AMBER parameterization before building.
A “non-standard residue” is any residue whose resname is not in
moleculekit’s canonical amino-acid / nucleic / water / ion sets.
detectNonStandardResidues() inspects the molecule (without
mutating it) and returns one spec per non-standard residue, plus one
per canonical residue covalently bonded to a non-canonical one:
ChainResidueSpec- one spec per chain-resident residue that needs special handling during AMBER parameterization: a non-canonical amino acid embedded in a polypeptide chain (selenomethionine, norleucine, a stapled NCAA, etc.) OR a canonical amino acid whose sidechain is covalently bonded to something other than its peptide neighbours (a Cys forming a thioether or disulfide, an Asn N-glycosylated by a sugar, a Glu CD - Lys NZ isopeptide).
ScaffoldSpec- free non-canonical residue with two or more non-peptide bonds to other residues (the central scaffold of a bicyclic peptide, a multi-anchor covalent inhibitor).
CovalentLigandSpec- free non-canonical residue with exactly one non-peptide bond to another residue (single-anchor covalent inhibitor, NAG-Asn glycan stem, single-Cys heme).
LigandSpec- free non-canonical residue with no covalent bonds (small-molecule binding-pocket ligand, fatty acid).
Pass the spec list to moleculekit.tools.preparation.systemPrepare()
via detect_specs=specs to apply the proposed renames + H-drops on
the prepared molecule.
- class moleculekit.tools.nonstandard_residues.ChainResidueSpec(resname, residue, new_resname=None, anchor_atom=None, is_n_term=False, is_c_term=False)#
Bases:
objectOne spec per chain-resident residue that needs special handling during AMBER parameterization: a non-canonical amino acid embedded in a polypeptide chain (selenomethionine, norleucine, a stapled NCAA, etc.) OR a canonical amino acid whose sidechain is covalently bonded to something other than its peptide neighbours (a Cys forming a thioether or disulfide, an Asn N-glycosylated by a sugar, a Glu CD - Lys NZ isopeptide).
Fields:
resname: the residue’s resname in the inputMolecule("GLU","NLE","CYS", …).residue:UniqueResidueIDfor the residue (segid / chain / resid / insertion).new_resname: the resname to rename to before downstream parameterization. Set whenever a rename is needed:Canonical AA at a junction:
"CYX"for both ends of a CYS-SG <-> CYS-SG disulfide; an auto-generated 3-charXX#name otherwise, shared across residues sharing the bucket(resname, anchor_atom, partner_resname, n_term, c_term)so antechamber runs once per unique chemistry.NCAA appearing with multiple terminus configurations: the existing
_disambiguate_terminus_resnames()prefixes"N"/"C"/"B"so each terminus form gets its own prepi (otherwise tLeap’s secondloadAmberPrepwould clobber the first).Nonewhen no rename is needed (plain mid-chain NCAA, single-terminus-config NCAA, etc.).
anchor_atom: the name of the residue’s sidechain atom that participates in a non-peptide inter-residue bond ("SG"for a Cys-thioether,"CD"for a Glu CD-LYS NZ isopeptide,"NZ"for the Lys end of the same isopeptide,"CE"for an NLE staple, …).Nonewhen the residue has no non-peptide bond (plain chain-resident NCAA);anchor_atom is not Noneis the single source of truth for “is this residue at a non-peptide junction?” - what the oldCrosslinkedNCAASpec/NCAASpecdistinction encoded. For residues with multiple non-peptide partners the detector picks the deterministically-first partner (sorted by partner residue index, then anchor atom name). For canonical-AA renamed entries this is also the partner used as the bucket key; for NCAA entries (where there’s no bucket key) the same deterministic order applies.is_n_term/is_c_term: chain termini flags.
- anchor_atom: str | None = None#
- is_c_term: bool = False#
- is_n_term: bool = False#
- new_resname: str | None = None#
- residue: UniqueResidueID#
- resname: str#
- class moleculekit.tools.nonstandard_residues.CovalentLigandSpec(resname, residue)#
Bases:
objectA non-canonical residue that is not peptide-bonded into a chain and has exactly one non-peptide bond going out to another residue. Examples: a single-anchor covalent inhibitor, a NAG-Asn glycan stem, a single-Cys heme.
- residue: UniqueResidueID#
- resname: str#
- class moleculekit.tools.nonstandard_residues.LigandSpec(resname, residue)#
Bases:
objectA non-canonical residue with no covalent bonds to any other residue (a free, non-covalently bound ligand). Examples: small-molecule drug ligands in binding pockets, fatty acids, lipid head-groups. The parameterizer treats it standalone with no caps.
- residue: UniqueResidueID#
- resname: str#
- class moleculekit.tools.nonstandard_residues.ScaffoldSpec(resname, residue)#
Bases:
objectA non-canonical residue that is not peptide-bonded into a chain and has two or more non-peptide bonds going out to other residues. Examples: the central scaffold of a bicyclic / tricyclic peptide, a multi-anchor covalent inhibitor.
- residue: UniqueResidueID#
- resname: str#
- moleculekit.tools.nonstandard_residues.detectNonStandardResidues(mol)#
Walk
moland emit one spec per residue that needs special handling by a downstream parameterizer / builder.Inspects
mol.bonds(without mutating the molecule) and classifies every non-canonical residue plus every canonical residue at a non-peptide junction into one of four spec types:ChainResidueSpec— chain-resident residue that needs special parameterization: a non-canonical amino acid embedded in a polypeptide chain (selenomethionine, norleucine, stapled-peptide residues, …) OR a canonical AA whose sidechain is covalently bonded to anything other than its peptide neighbours (Cys-Cys disulfide, Cys thioether to a heme, Asn N-glycan, Glu-Lys isopeptide, Tyr coordinating a metal, …). Canonical AAs at a junction always receivenew_resname:"CYX"for both ends of a CYS-SG <-> CYS-SG disulfide; an auto-generated 3-charXX#name otherwise. Residues that share the same(canonical_resname, anchor_atom, partner_resname, is_n_term, is_c_term)bucket key collapse to the sameXX#so the parameterizer emits one prepi shared across them (e.g. all mid-chain ASN-ND2-bonded-to-NAG residues land on one bucket). Chain-terminal forms get their own buckets because they carry extra atoms (OXTon the C-terminus,H1/H2/H3on the N-terminus) and different charges.ScaffoldSpec— non-chain-resident residue with 2+ non-peptide bonds (bicyclic-peptide central scaffold, multi-anchor covalent inhibitor).CovalentLigandSpec— non-chain-resident residue with exactly one non-peptide bond (single-anchor covalent inhibitor, NAG-Asn glycan stem, single-Cys heme).LigandSpec— non-chain-resident residue with no covalent bonds (free small-molecule ligand, fatty acid).
Chain-resident NCAAs that appear with more than one terminus configuration in the same molecule are disambiguated post hoc by setting
new_resnameon the terminal specs ("N"+resnamefor N-term,"C"+resnamefor C-term,"B"+resnamefor a single- residue chain). When every instance of an NCAA shares the same terminus configuration,new_resnamestaysNone.Metal-coordination contacts where the metal is a standalone ion residue (e.g. PDB
LINKrecords between a Zn²⁺ residue and a Zn-chelating inhibitor, or 3PTB’s Ca²⁺ coordinated by GLU/ASN/VAL oxygens) are skipped — the inhibitor stays a freeLigandSpec, and the protein residues are left alone. Coordinations where the metal lives inside a cofactor (e.g. Fe inside HEM coordinated to a Tyr-OH or Cys-SG) are kept: the cofactor becomes aCovalentLigandSpecand the donating canonical AA becomes aChainResidueSpec, because the donor’s protonation state changes (Tyr-O⁻, Cys-S⁻) and needs a custom prepi. Bonds touching water are always skipped.Note
Plain Cys–Cys disulfides are not returned as separate specs for the caller to process. Both Cys residues are instead silently renamed to
CYXinsidemoleculekit.tools.preparation.systemPrepare()(which calls this function internally). TheChainResidueSpecentries for disulfide-bonded cysteines exist only to carry thenew_resname="CYX"rename; the parameterization of the S–S bond is handled by AMBER’s built-in CYX template, so no user intervention is required.- Parameters:
mol (
moleculekit.molecule.Molecule) – Input molecule. Should already carry covalent bonds (read from a PDBCONECTblock, a CIF_struct_connblock, or set up viaMolecule.templateResidueFromSmiles()). Ifmol.bondsis empty, the detector falls back to distance-based bond guessing viamol._guessBonds()and logs a warning. The molecule is not mutated.- Returns:
Flat list mixing
ChainResidueSpec,ScaffoldSpec,CovalentLigandSpec, andLigandSpecentries. Ordered by residue index inmol. Empty when the molecule has no non-standard residues and no sidechain crosslinks.- Return type:
list[PerResidueSpec]
- Raises:
RuntimeError – If a canonical residue is bonded at an anchor atom that is not listed in
moleculekit.tools._anchor_variants.ANCHOR_TABLE(the anchor needs to be registered there before the residue can be re-templated). Also raised if an NCAA resname is 4+ characters long and requires terminus-disambiguation prefixing, which would exceed AMBER’s 4-character prepi unit-name limit.
Examples
Detect non-standard residues, template them with SMILES, then prepare:
>>> from moleculekit.molecule import Molecule >>> from moleculekit.tools.nonstandard_residues import detectNonStandardResidues >>> from moleculekit.tools.preparation import systemPrepare >>> mol = Molecule("3ptb") >>> specs = detectNonStandardResidues(mol)
For a molecule that has a non-canonical residue (e.g. “LIG”) that needs SMILES-based templating before preparation:
>>> # Template the non-canonical residue with its SMILES >>> lig_mask = mol.resname == "LIG" >>> mol.remove("hydrogen") >>> mol.templateResidueFromSmiles(lig_mask, smiles="...", addHs=True) >>> # Now pass the specs so systemPrepare does not re-detect >>> pmol, specs = systemPrepare(mol, detect_specs=specs)
When no non-standard-residue handling is needed, pass specs directly:
>>> pmol, specs, df = systemPrepare(mol, return_details=True)