Non-standard residues and covalent modifications#
You will learn: how to detect non-standard residues, covalent modifications,
and free ligands in a structure, and how to pass that information into
systemPrepare() so it preserves the right bonds and renames residues correctly
for the force field.
Prerequisites:
The Basic protonation tutorial.
Setup#
from moleculekit.molecule import Molecule
from moleculekit.tools.preparation import systemPrepare
from moleculekit.tools.nonstandard_residues import (
detectNonStandardResidues,
ChainResidueSpec,
ScaffoldSpec,
CovalentLigandSpec,
LigandSpec,
)
Step 1 — Detect non-standard residues on a representative structure#
We use 1R1J, a thermolysin-like protease that carries three N-glycosylation sites (NAG sugars covalently attached to Asn residues) and a non-covalent zinc-chelating inhibitor (OIR). This gives us examples of all the important spec types in a single structure.
mol = Molecule("1R1J")
specs = detectNonStandardResidues(mol)
print(specs)
[ChainResidueSpec(resname='ASN', residue=<moleculekit.molecule.UniqueResidueID object at 0x7feeecbaa870>
UniqueResidueID<resname: 'ASN', chain: 'A', resid: 144, insertion: '', segid: '1'>, new_resname='XX1', anchor_atom='ND2', is_n_term=False, is_c_term=False), ChainResidueSpec(resname='ASN', residue=<moleculekit.molecule.UniqueResidueID object at 0x7feeecba90d0>
UniqueResidueID<resname: 'ASN', chain: 'A', resid: 324, insertion: '', segid: '1'>, new_resname='XX1', anchor_atom='ND2', is_n_term=False, is_c_term=False), ChainResidueSpec(resname='ASN', residue=<moleculekit.molecule.UniqueResidueID object at 0x7feeecb36fc0>
UniqueResidueID<resname: 'ASN', chain: 'A', resid: 627, insertion: '', segid: '1'>, new_resname='XX1', anchor_atom='ND2', is_n_term=False, is_c_term=False), CovalentLigandSpec(resname='NAG', residue=<moleculekit.molecule.UniqueResidueID object at 0x7feeec749100>
UniqueResidueID<resname: 'NAG', chain: 'A', resid: 752, insertion: '', segid: '2'>), CovalentLigandSpec(resname='NAG', residue=<moleculekit.molecule.UniqueResidueID object at 0x7feeec749130>
UniqueResidueID<resname: 'NAG', chain: 'A', resid: 753, insertion: '', segid: '2'>), CovalentLigandSpec(resname='NAG', residue=<moleculekit.molecule.UniqueResidueID object at 0x7feeec749160>
UniqueResidueID<resname: 'NAG', chain: 'A', resid: 754, insertion: '', segid: '2'>), LigandSpec(resname='OIR', residue=<moleculekit.molecule.UniqueResidueID object at 0x7feeec7491c0>
UniqueResidueID<resname: 'OIR', chain: 'A', resid: 2001, insertion: '', segid: '4'>)]
detectNonStandardResidues() does not mutate mol — it just walks the bond
graph and returns a list of spec objects (ChainResidueSpec, CovalentLigandSpec, LigandSpec, or ScaffoldSpec) describing every residue
that needs special handling.
Note: Plain Cys–Cys disulfide bonds are not in this list —
systemPrepare()handles those internally by renaming Cys to CYX.detectNonStandardResidues()targets non-canonical residues, sidechain crosslinks such as N-glycosylation or isopeptide bonds, and covalent or free ligands.
Step 2 — Walk through each spec subclass#
ChainResidueSpec — chain-resident residue needing special handling#
A ChainResidueSpec is emitted for every residue that sits inside a polypeptide
chain and needs special parameterization. This includes:
Non-canonical amino acids embedded in a peptide chain (no inter-residue non-peptide bond).
Canonical amino acids whose sidechain is covalently bonded to something outside the peptide backbone — an Asn N-glycosylated by a sugar, a Glu–Lys isopeptide bond, a Cys thioether to a scaffold.
The 1R1J structure has three Asn residues each bonded to a NAG sugar at their
ND2 atom. The detector emits a ChainResidueSpec for each, proposing a shared
renamed resname so the parameterizer generates one set of AMBER parameters for
all three:
chain_specs = [s for s in specs if isinstance(s, ChainResidueSpec)]
for s in chain_specs:
print(
f"resname={s.resname!r:4s} chain={s.residue.chain!r} "
f"resid={s.residue.resid:<6} anchor_atom={s.anchor_atom!r} "
f"new_resname={s.new_resname!r}"
)
resname='ASN' chain='A' resid=144 anchor_atom='ND2' new_resname='XX1'
resname='ASN' chain='A' resid=324 anchor_atom='ND2' new_resname='XX1'
resname='ASN' chain='A' resid=627 anchor_atom='ND2' new_resname='XX1'
Each ChainResidueSpec exposes:
Attribute |
Meaning |
|---|---|
|
Residue name in the input structure |
|
|
|
Name to rename to before parameterization ( |
|
Atom involved in the non-peptide bond ( |
|
Whether this is at the N- or C-terminus of a chain |
Canonical amino acids that participate in a non-peptide bond get renamed too — the parameterizer needs different atom names and missing-H counts than the standard residue. A cross-residue covalent bond between two canonical amino acids therefore produces two ChainResidueSpec entries, one per side of the bond.
5VBL’s bound peptide is cyclized through an isopeptide bond. Loading it and filtering for canonical amino-acid ChainResidueSpec entries surfaces exactly the two endpoints — each with its own new_resname and its own anchor_atom:
mol_5vbl = Molecule("5VBL")
specs_5vbl = detectNonStandardResidues(mol_5vbl)
CANONICAL_AAS = {
"ALA", "ARG", "ASN", "ASP", "CYS", "GLN", "GLU", "GLY", "HIS", "ILE",
"LEU", "LYS", "MET", "PHE", "PRO", "SER", "THR", "TRP", "TYR", "VAL",
}
isopeptide_endpoints = [
s for s in specs_5vbl
if isinstance(s, ChainResidueSpec) and s.resname in CANONICAL_AAS
]
for s in isopeptide_endpoints:
print(
f"resname={s.resname!r:4s} chain={s.residue.chain!r} "
f"resid={s.residue.resid:<6} anchor_atom={s.anchor_atom!r} "
f"new_resname={s.new_resname!r}"
)
resname='GLU' chain='A' resid=10 anchor_atom='CD' new_resname='XX1'
resname='LYS' chain='A' resid=13 anchor_atom='NZ' new_resname='XX2'
Both partners have new_resname set; the unique names tell antechamber to build a separate prepi for each side.
CovalentLigandSpec — single-anchor covalent ligand#
A CovalentLigandSpec is emitted for a free (non-chain-resident) residue with
exactly one covalent bond to the rest of the structure. In 1R1J, the NAG
N-acetylglucosamine sugars each attach to one Asn via a single C1-ND2 glycosidic
bond:
cov_specs = [s for s in specs if isinstance(s, CovalentLigandSpec)]
for s in cov_specs:
print(
f"resname={s.resname!r} chain={s.residue.chain!r} "
f"resid={s.residue.resid}"
)
resname='NAG' chain='A' resid=752
resname='NAG' chain='A' resid=753
resname='NAG' chain='A' resid=754
CovalentLigandSpec has two public attributes: resname and residue.
LigandSpec — free non-covalent ligand#
A LigandSpec covers non-chain-resident residues with no covalent bonds to
any other residue. In 1R1J, the thiorphan-class inhibitor OIR coordinates the
active-site zinc ion via O19 and S26, but those are metal-coordination contacts
(not covalent bonds), so the detector correctly classifies it as a free ligand:
lig_specs = [s for s in specs if isinstance(s, LigandSpec)]
for s in lig_specs:
print(
f"resname={s.resname!r} chain={s.residue.chain!r} "
f"resid={s.residue.resid}"
)
resname='OIR' chain='A' resid=2001
LigandSpec also has two public attributes: resname and residue.
ScaffoldSpec — multi-anchor covalent scaffold#
A ScaffoldSpec is emitted for a non-chain-resident residue with two or more
covalent bonds going out to the polypeptide chain — typical of bicyclic peptide
scaffolds or multi-anchor covalent inhibitors.
For a live example we load 8QFZ chain B, a lasso-peptide scaffold (LFI) thioether-bonded to three CYS residues:
mol_8qfz = Molecule("8QFZ")
mol_8qfz.filter("chain B", _logger=False)
specs_8qfz = detectNonStandardResidues(mol_8qfz)
scaffold_specs = [s for s in specs_8qfz if isinstance(s, ScaffoldSpec)]
for s in scaffold_specs:
print(f"resname={s.resname!r} chain={s.residue.chain!r} resid={s.residue.resid}")
resname='LFI' chain='B' resid=101
The LFI scaffold appears as a ScaffoldSpec because it bonds covalently to three
chain-resident CYS residues. Each of those CYS residues appears as a
ChainResidueSpec with a unique auto-generated rename target, because they sit
at different chain positions (N-terminal, mid-chain, C-terminal) and therefore
carry different capping atoms in solution.
ScaffoldSpec has two public attributes: resname and residue.
Step 3 — Apply specs through systemPrepare#
Pass the spec list to systemPrepare() via detect_specs= to apply the proposed
renames and preserve the cross-residue bonds that protonation would otherwise
drop:
pmol, applied_specs = systemPrepare(mol, detect_specs=specs, verbose=False)
rdkit - INFO - Enabling RDKit 2026.03.2 jupyter extensions
moleculekit.rdkittools - INFO - Converted Molecule to RDKit mol with SMILES: NC(O)CC(N)CO
moleculekit.rdkittools - INFO - Stripped unmatched terminal heavy atoms from SMILES template (e.g. leaving group displaced by a covalent link, or carboxyl -OH on a non-terminal amino acid). Modified SMILES: 'NC(=O)C[C@H](N)C=O'
moleculekit.rdkittools - INFO - Converted Molecule to RDKit mol with SMILES: NC(O)CC(N)CO
moleculekit.rdkittools - INFO - Stripped unmatched terminal heavy atoms from SMILES template (e.g. leaving group displaced by a covalent link, or carboxyl -OH on a non-terminal amino acid). Modified SMILES: 'NC(=O)C[C@H](N)C=O'
moleculekit.rdkittools - INFO - Converted Molecule to RDKit mol with SMILES: NC(O)CC(N)CO
moleculekit.rdkittools - INFO - Stripped unmatched terminal heavy atoms from SMILES template (e.g. leaving group displaced by a covalent link, or carboxyl -OH on a non-terminal amino acid). Modified SMILES: 'NC(=O)C[C@H](N)C=O'
moleculekit.tools.preparation - WARNING - Both chains and segments are defined in Molecule.chain / Molecule.segid, however they are inconsistent. Protein preparation will use the chain information.
moleculekit.tools.preparation - WARNING - The following residues have not been optimized: NAG, OIR, ZN
moleculekit.tools.preparation - WARNING - Dubious protonation state: the pKa of 5 residues is within 1.0 units of pH 7.4.
moleculekit.tools.preparation - WARNING - Dubious protonation state: HIS 437 A (pKa= 6.77)
moleculekit.tools.preparation - WARNING - Dubious protonation state: LYS 471 A (pKa= 6.77)
moleculekit.tools.preparation - WARNING - Dubious protonation state: ASP 591 A (pKa= 7.75)
moleculekit.tools.preparation - WARNING - Dubious protonation state: HIS 637 A (pKa= 6.61)
moleculekit.tools.preparation - WARNING - Dubious protonation state: HIS 733 A (pKa= 6.40)
detect_specs=specs tells systemPrepare() to rename force-field-relevant
residues (Asn → shared auto-name so antechamber builds one prepi) and preserve
the glycosidic C1-ND2 bonds that PDB2PQR’s hydrogenation step would otherwise
sever. pmol is a new Molecule; mol is unchanged.
Step 4 — Suppress a specific spec#
You can filter the spec list before passing it in. For example, to skip
preparation of the covalent NAG sugars (perhaps you will handle them in a
separate glycan-parameterization step) you can drop all CovalentLigandSpec
entries:
specs_no_nag = [s for s in specs if not isinstance(s, CovalentLigandSpec)]
pmol_no_nag, _ = systemPrepare(mol, detect_specs=specs_no_nag, verbose=False)
moleculekit.rdkittools - INFO - Converted Molecule to RDKit mol with SMILES: NC(O)CC(N)CO
moleculekit.rdkittools - INFO - Stripped unmatched terminal heavy atoms from SMILES template (e.g. leaving group displaced by a covalent link, or carboxyl -OH on a non-terminal amino acid). Modified SMILES: 'NC(=O)C[C@H](N)C=O'
moleculekit.rdkittools - INFO - Converted Molecule to RDKit mol with SMILES: NC(O)CC(N)CO
moleculekit.rdkittools - INFO - Stripped unmatched terminal heavy atoms from SMILES template (e.g. leaving group displaced by a covalent link, or carboxyl -OH on a non-terminal amino acid). Modified SMILES: 'NC(=O)C[C@H](N)C=O'
moleculekit.rdkittools - INFO - Converted Molecule to RDKit mol with SMILES: NC(O)CC(N)CO
moleculekit.rdkittools - INFO - Stripped unmatched terminal heavy atoms from SMILES template (e.g. leaving group displaced by a covalent link, or carboxyl -OH on a non-terminal amino acid). Modified SMILES: 'NC(=O)C[C@H](N)C=O'
moleculekit.tools.preparation - WARNING - Both chains and segments are defined in Molecule.chain / Molecule.segid, however they are inconsistent. Protein preparation will use the chain information.
moleculekit.tools.preparation - WARNING - The following residues have not been optimized: NAG, OIR, ZN
moleculekit.tools.preparation - WARNING - Dubious protonation state: the pKa of 5 residues is within 1.0 units of pH 7.4.
moleculekit.tools.preparation - WARNING - Dubious protonation state: HIS 437 A (pKa= 6.77)
moleculekit.tools.preparation - WARNING - Dubious protonation state: LYS 471 A (pKa= 6.77)
moleculekit.tools.preparation - WARNING - Dubious protonation state: ASP 591 A (pKa= 7.75)
moleculekit.tools.preparation - WARNING - Dubious protonation state: HIS 637 A (pKa= 6.61)
moleculekit.tools.preparation - WARNING - Dubious protonation state: HIS 733 A (pKa= 6.40)
You can also filter on a spec’s public attributes. For instance, to keep only the ASN entries (dropping OIR and leaving NAG out too):
specs_asn_only = [s for s in specs if s.resname == "ASN"]
pmol_asn, _ = systemPrepare(mol, detect_specs=specs_asn_only, verbose=False)
moleculekit.rdkittools - INFO - Converted Molecule to RDKit mol with SMILES: NC(O)CC(N)CO
moleculekit.rdkittools - INFO - Stripped unmatched terminal heavy atoms from SMILES template (e.g. leaving group displaced by a covalent link, or carboxyl -OH on a non-terminal amino acid). Modified SMILES: 'NC(=O)C[C@H](N)C=O'
moleculekit.rdkittools - INFO - Converted Molecule to RDKit mol with SMILES: NC(O)CC(N)CO
moleculekit.rdkittools - INFO - Stripped unmatched terminal heavy atoms from SMILES template (e.g. leaving group displaced by a covalent link, or carboxyl -OH on a non-terminal amino acid). Modified SMILES: 'NC(=O)C[C@H](N)C=O'
moleculekit.rdkittools - INFO - Converted Molecule to RDKit mol with SMILES: NC(O)CC(N)CO
moleculekit.rdkittools - INFO - Stripped unmatched terminal heavy atoms from SMILES template (e.g. leaving group displaced by a covalent link, or carboxyl -OH on a non-terminal amino acid). Modified SMILES: 'NC(=O)C[C@H](N)C=O'
moleculekit.tools.preparation - WARNING - Both chains and segments are defined in Molecule.chain / Molecule.segid, however they are inconsistent. Protein preparation will use the chain information.
moleculekit.tools.preparation - WARNING - The following residues have not been optimized: NAG, OIR, ZN
moleculekit.tools.preparation - WARNING - Dubious protonation state: the pKa of 5 residues is within 1.0 units of pH 7.4.
moleculekit.tools.preparation - WARNING - Dubious protonation state: HIS 437 A (pKa= 6.77)
moleculekit.tools.preparation - WARNING - Dubious protonation state: LYS 471 A (pKa= 6.77)
moleculekit.tools.preparation - WARNING - Dubious protonation state: ASP 591 A (pKa= 7.75)
moleculekit.tools.preparation - WARNING - Dubious protonation state: HIS 637 A (pKa= 6.61)
moleculekit.tools.preparation - WARNING - Dubious protonation state: HIS 733 A (pKa= 6.40)
Any spec you remove is simply ignored by systemPrepare(); it uses only the
entries you provide.
Recap#
detectNonStandardResidues()enumerates non-standard residues and covalent modifications without mutatingmol.Cys–Cys disulfides are not returned by it —
systemPrepare()handles those internally.Four spec subclasses cover chain crosslinks (
ChainResidueSpec), bicyclic scaffolds (ScaffoldSpec), covalent ligands (CovalentLigandSpec), and free ligands (LigandSpec).Pass the spec list (or a filtered subset) into
systemPrepare()withdetect_specs=...to control renaming and bond preservation.