The system-preparation pipeline#
systemPrepare() is moleculekit’s all-in-one function for taking a raw PDB
structure and producing a properly protonated, hydrogen-optimized molecule
ready for MD parameterization. This page builds a mental model of what happens
inside the function, why each step exists, and how the parameters map onto that
process. For step-by-step worked examples, see the system-preparation tutorials
linked at the bottom.
The pipeline at a glance#
flowchart TD
A[Input Molecule] --> B[autoSegment]
B --> C[Molecule.mutateResidue]
C --> D[model_gaps]
D --> E[Molecule.templateResidueFromSmiles]
E --> F[detectNonStandardResidues]
F --> G[systemPrepare]
G --> H[Prepared Molecule + details DataFrame]
Every node above is an entry point you can call directly. Each step is optional — skip any that your input does not need. The internals of systemPrepare itself (rename, PDB2PQR, PROPKA, debump, bond restore, …) are covered step-by-step below; you do not invoke them directly.
Step 1 — Detect non-standard residues#
detectNonStandardResidues() inspects mol.bonds (or guesses them by
distance if mol.bonds is empty) and walks every residue. It emits one
spec per residue that needs special handling:
ChainResidueSpec — a non-canonical amino acid embedded in a polypeptide, or a canonical AA whose sidechain makes a non-peptide covalent bond. Examples: selenomethionine; an ASN-ND2 glycosylated with NAG; a TYR coordinating a heme iron.
CovalentLigandSpec — a non-chain residue with exactly one non-peptide bond (single-anchor covalent inhibitor, NAG stem of a glycan).
ScaffoldSpec — a non-chain residue with two or more non-peptide bonds (bicyclic-peptide scaffold, multi-anchor inhibitor).
LigandSpec — a non-chain residue with no covalent bonds to the protein (free small-molecule, solvent molecule other than water, ion).
Note: plain Cys–Cys disulfides are NOT returned by detectNonStandardResidues().
They are handled internally by systemPrepare() in step 2: the participating
cysteines are renamed CYX and the disulfide bonds are preserved across the
PDB2PQR roundtrip. You don’t need a spec to make this work.
Metal-coordination bonds involving standalone metal ions (e.g. a Ca²⁺ ion
coordinated by protein oxygens) are skipped — the protein residues are left
unmodified. Coordinations where the metal lives inside a cofactor (e.g. Fe
in a heme coordinated by a Cys-SG) are kept: the cofactor gets a
CovalentLigandSpec and the donating residue becomes a ChainResidueSpec so
its protonation state (Tyr-O⁻, Cys-S⁻) is handled correctly.
Step 2 — Rename residues for force-field compatibility#
Based on the specs from step 1, canonical AAs at non-peptide junctions are renamed:
Original |
Renamed to |
Reason |
|---|---|---|
CYS (disulfide) |
CYX |
Thiol H absent; PDB2PQR needs this name |
CYS (metal coord.) |
XX# bucket |
Custom prepi shared across identical junctions |
HIS |
HID / HIE / HIP |
Tautomer / charge state from PROPKA |
ASP (neutral) |
ASH |
After titration |
GLU (neutral) |
GLH |
After titration |
LYS (neutral) |
LYN |
After titration |
TYR (negative) |
TYM |
After titration |
ARG (neutral) |
AR0 |
After titration |
Non-canonical AAs (NCAAs) that have been pre-templated with
templateResidueFromSmiles() are also handled here so that PDB2PQR’s
templates apply cleanly to their atoms.
Step 3 — Add hydrogens via PDB2PQR#
PDB2PQR adds missing heavy atoms and all polar hydrogens using its internal force-field templates. The result is a fully hydrogenated structure at the default protonation for each residue’s name.
Residues in no_prot are excluded from hydrogen addition — useful when a
known-good H-placement already exists and you want PDB2PQR to leave it alone.
Step 4 — Predict pKa and titrate (optional)#
When titration=True (the default), PROPKA estimates the pKa of every
titratable residue in the context of the folded structure. At the target pH,
each titratable group is set to its dominant protonation state:
ASP/GLU: neutral if
pKa > pHHIS: HID (Nδ), HIE (Nε), or HIP (both) depending on H-bond geometry and pKa
LYS, ARG, TYR: neutral if
pKa < pH
This step is skipped when titration=False. In that case, all residues keep
their default protonation state (standard charge at pH 7 for canonical AAs).
Step 5 — Flip and debump#
PDB2PQR flips the amide groups of Asn and Gln, and the imidazole of His, to find the orientation that forms the best hydrogen-bond network. It then debumps any steric clashes introduced by the added hydrogens.
Residues in no_opt are held fixed and not flipped. Use this for residues in
a metal site or a known crystal-water network where the flip would break the
geometry.
Step 6 — Restore non-peptidic bonds#
The PDB2PQR roundtrip discards bonds that are not part of the canonical
peptide topology — disulfides, glycosidic bonds, metal coordinations, stapled
sidechain bonds, intra-ligand bonds, etc. When hold_nonpeptidic_bonds=True
(the default), systemPrepare captures those bonds before the PDB2PQR call
and restores them in the output molecule.
Setting hold_nonpeptidic_bonds=False disables the capture-and-restore
mechanism. The canonical atoms still get protonated and optimized, but any
bond that PDB2PQR drops is permanently lost from the output. This option is
rarely needed and should be used with care.
Step 7 — Restore formal charges and termini#
After the PDB2PQR roundtrip, the formal charges on non-standard residues and
termini can be incorrect because PDB2PQR assigns charges via its own tables
rather than propagating the values from the input. systemPrepare recaptures
the formal charges from the input before the call and restores them on the
corresponding atoms in the output.
This ensures that, for example, an N-terminal ammonium (formal charge +1) and a deprotonated Tyr-O⁻ in a metal site (formal charge -1) survive the pipeline with the correct charges for downstream force-field parameterization.
Step 8 — Restore missing sidechains (optional)#
When restore_missing_sidechains=True, systemPrepare() uses moleculekit’s
Dunbrack-rotamer mutator to template back any canonical residues whose entire
sidechain is absent from the input structure (e.g. truncated crystal structures
or homology models). This runs before the PDB2PQR call so the reconstructed
atoms participate in all downstream protonation and optimization steps.
This option is off by default because mutating residues is a significant structural change that should be made consciously.
How parameters map onto the pipeline#
Parameter |
Step |
Effect |
|---|---|---|
|
4 |
PROPKA target pH for titration |
|
4 |
Skip PROPKA; use default protonation |
|
4 |
Patch specific residues after PROPKA |
|
5 |
Exclude residues from H-bond flip and debump |
|
3 |
Exclude residues from hydrogen addition |
|
4 |
Exclude residues from titration |
|
6 |
Capture and restore non-peptidic bonds |
|
8 |
Template back absent canonical sidechains |
|
4 |
Warn on buried titratable residues (membrane context) |
|
1 |
Supply a pre-computed spec list; bypass auto-detection |
|
— |
Return per-residue pKa / protonation DataFrame |
The larger pipeline#
systemPrepare is one piece of a broader system-preparation workflow. Several
preparation steps must happen before calling it:
autoSegment before systemPrepare#
PDB2PQR requires non-empty, consistent segment identifiers. Plain PDB files
typically have empty segid fields. Run autoSegment() first:
from moleculekit.molecule import Molecule
from moleculekit.tools.autosegment import autoSegment
from moleculekit.tools.preparation import systemPrepare
mol = Molecule("structure.pdb")
mol_seg = autoSegment(mol)
mol_out, specs = systemPrepare(mol_seg)
templateResidueFromSmiles before systemPrepare#
Non-canonical residues (covalent ligands, modified AAs, NCAAs) need their bond
topology, element assignments, and formal charges established before
systemPrepare() runs. Call templateResidueFromSmiles() with sel, smiles, and addHs=True for each such residue:
lig_mask = mol.resname == "LIG"
mol.templateResidueFromSmiles(lig_mask, "CC(=O)Nc1ccc(O)cc1", addHs=True)
mol_out, specs = systemPrepare(mol)
systemPrepare() calls detectNonStandardResidues() internally, so it will pick
up the ligand spec automatically.
mutateResidue before systemPrepare#
When you want to swap canonical residues (e.g. a point mutation), call
mutateResidue() with sel and new_resname first. The mutator uses Dunbrack rotamers
to place the new sidechain, and systemPrepare() then protonates and optimizes
the mutated residue along with the rest of the protein.
model_gaps before systemPrepare#
Missing loops or residues in a crystal structure can be modelled with
model_gaps() (from moleculekit.tools.modelling), which uses the ProMod3
Singularity image. Run gap-filling before systemPrepare() so the modelled
residues are also protonated correctly.
Passing a pre-computed spec list#
detectNonStandardResidues() is called internally every time systemPrepare()
runs unless you supply a spec list via detect_specs. This is useful when:
You want to inspect or filter the spec list before preparation (e.g. to remove a spec for a ligand you want to leave unparameterized).
You are running
systemPreparein a batch workflow and want to cache the spec list.
from moleculekit.tools.nonstandard_residues import detectNonStandardResidues
from moleculekit.tools.preparation import systemPrepare
mol = Molecule("structure.pdb")
specs = detectNonStandardResidues(mol)
# Inspect specs, filter if needed
ligand_specs = [s for s in specs if s.resname == "LIG"]
mol_out, applied_specs = systemPrepare(mol, detect_specs=ligand_specs)
Pass detect_specs=[] to skip non-standard residue handling entirely (all
residues treated as canonical).
Further reading#
Tutorial: Basic protonation
Tutorial: Non-standard residues
Tutorial: Custom residues from SMILES
Tutorial: Mutations, gap closing, and segmentation