moleculekit.tools.nonstandard_residues module#

Discovery helper for non-standard residues that need user-driven AMBER parameterization before building.

A “non-standard residue” is any residue whose resname is not in moleculekit’s canonical amino-acid / nucleic / water / ion sets. detectNonStandardResidues() inspects the molecule (without mutating it) and returns one spec per non-standard residue, plus one per canonical residue covalently bonded to a non-canonical one:

ChainResidueSpec - one spec per chain-resident residue that needs special handling during AMBER parameterization: a non-canonical amino acid embedded in a polypeptide chain (selenomethionine, norleucine, a stapled NCAA, etc.) OR a canonical amino acid whose sidechain is covalently bonded to something other than its peptide neighbours (a Cys forming a thioether or disulfide, an Asn N-glycosylated by a sugar, a Glu CD - Lys NZ isopeptide).

ScaffoldSpec - free non-canonical residue with two or more non-peptide bonds to other residues (the central scaffold of a bicyclic peptide, a multi-anchor covalent inhibitor).

CovalentLigandSpec - free non-canonical residue with exactly one non-peptide bond to another residue (single-anchor covalent inhibitor, NAG-Asn glycan stem, single-Cys heme).

LigandSpec - free non-canonical residue with no covalent bonds (small-molecule binding-pocket ligand, fatty acid).

Pass the spec list to moleculekit.tools.preparation.systemPrepare() via detect_specs=specs to apply the proposed renames + H-drops on the prepared molecule.

class moleculekit.tools.nonstandard_residues.ChainResidueSpec(resname, residue, new_resname=None, anchor_atom=None, is_n_term=False, is_c_term=False)#

Bases: object

One spec per chain-resident residue that needs special handling during AMBER parameterization: a non-canonical amino acid embedded in a polypeptide chain (selenomethionine, norleucine, a stapled NCAA, etc.) OR a canonical amino acid whose sidechain is covalently bonded to something other than its peptide neighbours (a Cys forming a thioether or disulfide, an Asn N-glycosylated by a sugar, a Glu CD - Lys NZ isopeptide).

Fields:

resname: the residue’s resname in the input Molecule ("GLU", "NLE", "CYS", …).
residue: UniqueResidueID for the residue (segid / chain / resid / insertion).
new_resname: the resname to rename to before downstream parameterization. Set whenever a rename is needed:
- Canonical AA at a junction: "CYX" for both ends of a CYS-SG <-> CYS-SG disulfide; an auto-generated 3-char XX# name otherwise, shared across residues sharing the bucket (resname, anchor_atom, partner_resname, n_term, c_term) so antechamber runs once per unique chemistry.
- NCAA appearing with multiple terminus configurations: the existing _disambiguate_terminus_resnames() prefixes "N"/"C"/"B" so each terminus form gets its own prepi (otherwise tLeap’s second loadAmberPrep would clobber the first).
- None when no rename is needed (plain mid-chain NCAA, single-terminus-config NCAA, etc.).
anchor_atom: the name of the residue’s sidechain atom that participates in a non-peptide inter-residue bond ("SG" for a Cys-thioether, "CD" for a Glu CD-LYS NZ isopeptide, "NZ" for the Lys end of the same isopeptide, "CE" for an NLE staple, …). None when the residue has no non-peptide bond (plain chain-resident NCAA); anchor_atom is not None is the single source of truth for “is this residue at a non-peptide junction?” - what the old CrosslinkedNCAASpec/NCAASpec distinction encoded. For residues with multiple non-peptide partners the detector picks the deterministically-first partner (sorted by partner residue index, then anchor atom name). For canonical-AA renamed entries this is also the partner used as the bucket key; for NCAA entries (where there’s no bucket key) the same deterministic order applies.
is_n_term / is_c_term: chain termini flags.

anchor_atom: str | None = None#

is_c_term: bool = False#

is_n_term: bool = False#

new_resname: str | None = None#

residue: UniqueResidueID#

resname: str#

class moleculekit.tools.nonstandard_residues.CovalentLigandSpec(resname, residue)#

Bases: object

A non-canonical residue that is not peptide-bonded into a chain and has exactly one non-peptide bond going out to another residue. Examples: a single-anchor covalent inhibitor, a NAG-Asn glycan stem, a single-Cys heme.

residue: UniqueResidueID#

resname: str#

class moleculekit.tools.nonstandard_residues.LigandSpec(resname, residue)#

Bases: object

A non-canonical residue with no covalent bonds to any other residue (a free, non-covalently bound ligand). Examples: small-molecule drug ligands in binding pockets, fatty acids, lipid head-groups. The parameterizer treats it standalone with no caps.

residue: UniqueResidueID#

resname: str#

class moleculekit.tools.nonstandard_residues.ScaffoldSpec(resname, residue)#

Bases: object

A non-canonical residue that is not peptide-bonded into a chain and has two or more non-peptide bonds going out to other residues. Examples: the central scaffold of a bicyclic / tricyclic peptide, a multi-anchor covalent inhibitor.

residue: UniqueResidueID#

resname: str#

moleculekit.tools.nonstandard_residues.detectNonStandardResidues(mol, guess_bonds=True)#

Walk mol and emit one spec per residue that needs special handling by a downstream parameterizer / builder.

Inspects mol.bonds (without mutating the molecule) and classifies every non-canonical residue plus every canonical residue at a non-peptide junction into one of four spec types:

ChainResidueSpec — chain-resident residue that needs special parameterization: a non-canonical amino acid embedded in a polypeptide chain (selenomethionine, norleucine, stapled-peptide residues, …) OR a canonical AA whose sidechain is covalently bonded to anything other than its peptide neighbours (Cys-Cys disulfide, Cys thioether to a heme, Asn N-glycan, Glu-Lys isopeptide, Tyr coordinating a metal, …). Canonical AAs at a junction always receive new_resname: "CYX" for both ends of a CYS-SG <-> CYS-SG disulfide; an auto-generated 3-char XX# name otherwise. Residues that share the same (canonical_resname, anchor_atom, partner_resname, is_n_term, is_c_term) bucket key collapse to the same XX# so the parameterizer emits one prepi shared across them (e.g. all mid-chain ASN-ND2-bonded-to-NAG residues land on one bucket). Chain-terminal forms get their own buckets because they carry extra atoms (OXT on the C-terminus, H1/H2/H3 on the N-terminus) and different charges.
ScaffoldSpec — non-chain-resident residue with 2+ non-peptide bonds (bicyclic-peptide central scaffold, multi-anchor covalent inhibitor).
CovalentLigandSpec — non-chain-resident residue with exactly one non-peptide bond (single-anchor covalent inhibitor, NAG-Asn glycan stem, single-Cys heme).
LigandSpec — non-chain-resident residue with no covalent bonds (free small-molecule ligand, fatty acid).

Chain-resident NCAAs that appear with more than one terminus configuration in the same molecule are disambiguated post hoc by setting new_resname on the terminal specs ("N"+resname for N-term, "C"+resname for C-term, "B"+resname for a single- residue chain). When every instance of an NCAA shares the same terminus configuration, new_resname stays None.

Metal-coordination contacts where the metal is a standalone ion residue (e.g. PDB LINK records between a Zn²⁺ residue and a Zn-chelating inhibitor, or 3PTB’s Ca²⁺ coordinated by GLU/ASN/VAL oxygens) are skipped — the inhibitor stays a free LigandSpec, and the protein residues are left alone. Coordinations where the metal lives inside a cofactor (e.g. Fe inside HEM coordinated to a Tyr-OH or Cys-SG) are kept: the cofactor becomes a CovalentLigandSpec and the donating canonical AA becomes a ChainResidueSpec, because the donor’s protonation state changes (Tyr-O⁻, Cys-S⁻) and needs a custom prepi. Bonds touching water are always skipped.

Note

Plain Cys–Cys disulfides are not returned as separate specs for the caller to process. Both Cys residues are instead silently renamed to CYX inside moleculekit.tools.preparation.systemPrepare() (which calls this function internally). The ChainResidueSpec entries for disulfide-bonded cysteines exist only to carry the new_resname="CYX" rename; the parameterization of the S–S bond is handled by AMBER’s built-in CYX template, so no user intervention is required.

Parameters:

mol (moleculekit.molecule.Molecule) – Input molecule. Should already carry covalent bonds (read from a PDB CONECT block, a CIF _struct_conn block, or set up via Molecule.templateResidueFromSmiles()). If mol.bonds is empty and guess_bonds is True, the detector falls back to distance-based bond guessing via mol._guessBonds() and logs a warning. The molecule is not mutated.
guess_bonds (bool) – When mol.bonds is empty, guess bonds from atom coordinates so crosslinks (disulfides, glycosidic bonds, …) can still be found. Set to False to skip guessing and rely only on explicit input bonds: useful for modelled structures whose slightly-off geometry produces spurious close contacts that would otherwise be flagged as bonds. When guessing is on, non-peptide bonds landing on a canonical amino acid’s backbone O / CA (atoms that never form a real crosslink) are treated as guessing artifacts and ignored; explicit input bonds are always trusted.

Returns:

Flat list mixing ChainResidueSpec, ScaffoldSpec, CovalentLigandSpec, and LigandSpec entries. Ordered by residue index in mol. Empty when the molecule has no non-standard residues and no sidechain crosslinks.

Return type:

list[PerResidueSpec]

Raises:

RuntimeError – If a canonical residue is bonded at an anchor atom that is not listed in moleculekit.tools._anchor_variants.ANCHOR_TABLE (the anchor needs to be registered there before the residue can be re-templated). Also raised if an NCAA resname is 4+ characters long and requires terminus-disambiguation prefixing, which would exceed AMBER’s 4-character prepi unit-name limit.

Examples

Detect non-standard residues, template them with SMILES, then prepare:

>>> from moleculekit.molecule import Molecule
>>> from moleculekit.tools.nonstandard_residues import detectNonStandardResidues
>>> from moleculekit.tools.preparation import systemPrepare
>>> mol = Molecule("3ptb")
>>> specs = detectNonStandardResidues(mol)

For a molecule that has a non-canonical residue (e.g. “LIG”) that needs SMILES-based templating before preparation:

>>> # Template the non-canonical residue with its SMILES
>>> lig_mask = mol.resname == "LIG"
>>> mol.remove("hydrogen")
>>> mol.templateResidueFromSmiles(lig_mask, smiles="...", addHs=True)
>>> # Now pass the specs so systemPrepare does not re-detect
>>> pmol, specs = systemPrepare(mol, detect_specs=specs)

When no non-standard-residue handling is needed, pass specs directly:

>>> pmol, specs, df = systemPrepare(mol, return_details=True)

moleculekit.tools.nonstandard_residues.geometric_interresidue_links(mol, atoms_a, atoms_b, frame=None, amide_dist=None, phosphodiester_dist=None)#

Return the geometric inter-residue covalent links between two residues as a list of (idx_a, idx_b, kind) tuples, where idx_a is an atom of atoms_a, idx_b an atom of atoms_b, and kind is one of:

"peptide": a backbone C of one residue within amide_dist of the other’s backbone N (the standard main-chain amide).
"isopeptide": an amide where one partner is a backbone N or C and the other is a SIDE-CHAIN carbon or nitrogen - a side-chain carbonyl acylating a backbone N (gamma-glutamyl / beta-aspartyl, e.g. microcystin’s ACB.CG->N) or a backbone carboxyl acylating a side-chain amino (epsilon-poly-lysine, C->NZ).
"phosphodiester": an O3'/C3' within phosphodiester_dist of a P.

This is the single shared definition of inter-residue geometry consulted by autoSegment (segment grouping), residue templating (boundary-atom H reduction), infer_nonstandard_junction_bonds() and systemPrepare (terminus assignment), so they agree on one geometry. It is a pure geometric read: mol.bonds is NOT consulted and mol is not modified. Deposited bonds are handled separately by each caller because they carry different trust: a deposited backbone bond is honored ungated (it is real and in the file), whereas a geometric isopeptide is gated to non-canonical junctions (proximity alone must not invent a crosslink between two standard residues).

Parameters:

mol (Molecule) – The molecule. Not modified.
atoms_a (numpy.ndarray) – Atom-index array of the first residue to test.
atoms_b (numpy.ndarray) – Atom-index array of the second residue to test.
frame (int) – Coordinate frame to use; defaults to mol.frame.
amide_dist (float) – Max C-N separation for a peptide / isopeptide link. Defaults to AMIDE_LINK_DIST.
phosphodiester_dist (float) – Max O3’/C3’-P separation for a phosphodiester link. Defaults to PHOSPHODIESTER_LINK_DIST.

Returns:

links – (idx_a, idx_b, kind) tuples, one per detected link.

Return type:

list

moleculekit.tools.nonstandard_residues.getResidueMask(mol, spec)#

Boolean mask over mol selecting the atoms of spec’s residue.

Matches on the residue identity (segid, chain, resid, insertion) and its resname, accepting either the original spec.resname or, when set, the renamed spec.new_resname. The same call therefore works whether mol is the structure the spec was detected in (its resname is spec.resname) or one that has already been through the detect-spec renames (its resname is spec.new_resname).

Parameters:

mol (Molecule) – The molecule the spec was detected in, before or after renaming.
spec (ChainResidueSpec or ScaffoldSpec or CovalentLigandSpec or LigandSpec) – A spec returned by detectNonStandardResidues().

Returns:

mask – Boolean mask, True on the atoms of spec’s residue.

Return type:

numpy.ndarray

moleculekit.tools.nonstandard_residues.infer_nonstandard_junction_bonds(mol, max_dist=1.8)#

Infer inter-residue backbone-continuation bonds that the input connectivity omits, at junctions involving a non-canonical residue.

Some deposited structures carry a non-standard residue whose backbone is continued by an undeposited amide bond - a side-chain isopeptide (microcystin’s beta-methyl-Asp CG acylating the next residue’s backbone N) or the reverse, a backbone carboxyl acylating the next residue’s side-chain amino (as in epsilon-poly-lysine, alpha-C -> Lys NZ). Without that bond autoSegment splits the chain and detectNonStandardResidues cannot find the anchor. This recovers it from geometry WITHOUT modifying mol; callers fold the result into their own connectivity analysis transiently.

For each pair of consecutive residues (file order, same chain) that carry NO inter-residue bond between them and where at least one residue is non-canonical, a single amide C-N bond is inferred when a backbone atom of one residue lies within max_dist of a complementary heavy atom of the other:

the later residue’s backbone N to the nearest carbon of the earlier residue (side-chain or backbone carboxyl -> backbone amino), or
the earlier residue’s backbone C to the nearest nitrogen of the later residue (backbone carboxyl -> side-chain or backbone amino).

Two canonical residues with a missing bond are left untouched (a real chain gap is never invented). Requiring a backbone N/C endpoint excludes pure side-chain crosslinks (disulfides, staples), which are not chain continuations.

Parameters:

mol (Molecule) – The molecule to analyse. It is not modified.
max_dist (float) – Maximum heavy-atom separation, in Angstrom, treated as a bond.

Returns:

bonds – A list of (atom_index_i, atom_index_j) tuples, one per inferred bond.

Return type:

list

moleculekit.tools.nonstandard_residues.requiresTemplate(spec)#

Whether a detected residue spec needs a user-supplied template.

True for genuinely non-standard residues (free ligands, non-canonical amino acids, scaffolds), i.e. those needing a SMILES or CIF template to add bonds, bond orders and hydrogens before parameterization. False for a canonical residue the detector reports only because it was renamed at a covalent junction (its resname stays canonical) and for force-field-shipped modified residues.

Parameters:: spec (ChainResidueSpec or ScaffoldSpec or CovalentLigandSpec or LigandSpec) – A spec returned by detectNonStandardResidues().
Returns:: True if spec requires a supplied template.
Return type:: bool

moleculekit.tools.nonstandard_residues.residuesRequiringTemplate(mol, guess_bonds=True)#

Return the resnames in mol that need a user-supplied template.

Runs detectNonStandardResidues() and keeps the residues that are genuinely non-standard (free ligands, non-canonical amino acids and scaffolds), i.e. the ones for which a SMILES or CIF template must be supplied to add bonds, bond orders and hydrogens before they can be parameterized. Canonical residues that the detector reports only because they were renamed at a covalent junction (a disulfide CYS -> CYX, a glycosylated ASN, …) keep their canonical resname and are excluded, since the force field already provides templates for them. Modified residues the force field ships (MSE, SEP, …) are canonical here too and are likewise excluded.

Parameters:

mol (Molecule) – The molecule to inspect.
guess_bonds (bool) – Passed through to detectNonStandardResidues().

Returns:

resnames – The sorted, unique resnames that require a supplied template.

Return type:

list of str