# The system-preparation pipeline {py:func}`~moleculekit.tools.preparation.systemPrepare` is moleculekit's all-in-one function for taking a raw PDB structure and producing a properly protonated, hydrogen-optimized molecule ready for MD parameterization. This page builds a mental model of what happens inside the function, why each step exists, and how the parameters map onto that process. For step-by-step worked examples, see the system-preparation tutorials linked at the bottom. ## The pipeline at a glance ```{mermaid} flowchart TD A[Input Molecule] --> B[autoSegment] B --> C[Molecule.mutateResidue] C --> D[model_gaps] D --> E[Molecule.templateResidueFromSmiles] E --> F[detectNonStandardResidues] F --> G[systemPrepare] G --> H[Prepared Molecule + details DataFrame] ``` Every node above is an entry point you can call directly. Each step is optional — skip any that your input does not need. The internals of `systemPrepare` itself (rename, PDB2PQR, PROPKA, debump, bond restore, …) are covered step-by-step below; you do not invoke them directly. ## Step 1 — Detect non-standard residues {py:func}`~moleculekit.tools.nonstandard_residues.detectNonStandardResidues` inspects `mol.bonds` (or guesses them by distance if `mol.bonds` is empty) and walks every residue. It emits one **spec** per residue that needs special handling: - **ChainResidueSpec** — a non-canonical amino acid embedded in a polypeptide, or a canonical AA whose sidechain makes a non-peptide covalent bond. Examples: selenomethionine; an ASN-ND2 glycosylated with NAG; a TYR coordinating a heme iron. - **CovalentLigandSpec** — a non-chain residue with exactly one non-peptide bond (single-anchor covalent inhibitor, NAG stem of a glycan). - **ScaffoldSpec** — a non-chain residue with two or more non-peptide bonds (bicyclic-peptide scaffold, multi-anchor inhibitor). - **LigandSpec** — a non-chain residue with no covalent bonds to the protein (free small-molecule, solvent molecule other than water, ion). Note: plain Cys–Cys disulfides are NOT returned by {py:func}`~moleculekit.tools.nonstandard_residues.detectNonStandardResidues`. They are handled internally by {py:func}`~moleculekit.tools.preparation.systemPrepare` in step 2: the participating cysteines are renamed `CYX` and the disulfide bonds are preserved across the PDB2PQR roundtrip. You don't need a spec to make this work. Metal-coordination bonds involving standalone metal ions (e.g. a Ca²⁺ ion coordinated by protein oxygens) are skipped — the protein residues are left unmodified. Coordinations where the metal lives *inside* a cofactor (e.g. Fe in a heme coordinated by a Cys-SG) are kept: the cofactor gets a `CovalentLigandSpec` and the donating residue becomes a `ChainResidueSpec` so its protonation state (Tyr-O⁻, Cys-S⁻) is handled correctly. ## Step 2 — Rename residues for force-field compatibility Based on the specs from step 1, canonical AAs at non-peptide junctions are renamed: | Original | Renamed to | Reason | |---|---|---| | CYS (disulfide) | CYX | Thiol H absent; PDB2PQR needs this name | | CYS (metal coord.) | XX# bucket | Custom prepi shared across identical junctions | | HIS | HID / HIE / HIP | Tautomer / charge state from PROPKA | | ASP (neutral) | ASH | After titration | | GLU (neutral) | GLH | After titration | | LYS (neutral) | LYN | After titration | | TYR (negative) | TYM | After titration | | ARG (neutral) | AR0 | After titration | Non-canonical AAs (NCAAs) that have been pre-templated with {py:meth}`~moleculekit.molecule.Molecule.templateResidueFromSmiles` are also handled here so that PDB2PQR's templates apply cleanly to their atoms. ## Step 3 — Add hydrogens via PDB2PQR PDB2PQR adds missing heavy atoms and all polar hydrogens using its internal force-field templates. The result is a fully hydrogenated structure at the default protonation for each residue's name. Residues in `no_prot` are excluded from hydrogen addition — useful when a known-good H-placement already exists and you want PDB2PQR to leave it alone. ## Step 4 — Predict pKa and titrate (optional) When `titration=True` (the default), PROPKA estimates the pKa of every titratable residue in the context of the folded structure. At the target `pH`, each titratable group is set to its dominant protonation state: - ASP/GLU: neutral if `pKa > pH` - HIS: HID (Nδ), HIE (Nε), or HIP (both) depending on H-bond geometry and pKa - LYS, ARG, TYR: neutral if `pKa < pH` This step is skipped when `titration=False`. In that case, all residues keep their default protonation state (standard charge at pH 7 for canonical AAs). ## Step 5 — Flip and debump PDB2PQR flips the amide groups of Asn and Gln, and the imidazole of His, to find the orientation that forms the best hydrogen-bond network. It then debumps any steric clashes introduced by the added hydrogens. Residues in `no_opt` are held fixed and not flipped. Use this for residues in a metal site or a known crystal-water network where the flip would break the geometry. ## Step 6 — Restore non-peptidic bonds The PDB2PQR roundtrip discards bonds that are not part of the canonical peptide topology — disulfides, glycosidic bonds, metal coordinations, stapled sidechain bonds, intra-ligand bonds, etc. When `hold_nonpeptidic_bonds=True` (the default), `systemPrepare` captures those bonds before the PDB2PQR call and restores them in the output molecule. Setting `hold_nonpeptidic_bonds=False` disables the capture-and-restore mechanism. The canonical atoms still get protonated and optimized, but any bond that PDB2PQR drops is permanently lost from the output. This option is rarely needed and should be used with care. ## Step 7 — Restore formal charges and termini After the PDB2PQR roundtrip, the formal charges on non-standard residues and termini can be incorrect because PDB2PQR assigns charges via its own tables rather than propagating the values from the input. `systemPrepare` recaptures the formal charges from the input before the call and restores them on the corresponding atoms in the output. This ensures that, for example, an N-terminal ammonium (formal charge +1) and a deprotonated Tyr-O⁻ in a metal site (formal charge -1) survive the pipeline with the correct charges for downstream force-field parameterization. ## Step 8 — Restore missing sidechains (optional) When `restore_missing_sidechains=True`, {py:func}`~moleculekit.tools.preparation.systemPrepare` uses moleculekit's Dunbrack-rotamer mutator to template back any canonical residues whose entire sidechain is absent from the input structure (e.g. truncated crystal structures or homology models). This runs **before** the PDB2PQR call so the reconstructed atoms participate in all downstream protonation and optimization steps. This option is off by default because mutating residues is a significant structural change that should be made consciously. ## How parameters map onto the pipeline | Parameter | Step | Effect | |---|---|---| | `pH` | 4 | PROPKA target pH for titration | | `titration=False` | 4 | Skip PROPKA; use default protonation | | `force_protonation=[(sel, "HID"), ...]` | 4 | Patch specific residues after PROPKA | | `no_opt` | 5 | Exclude residues from H-bond flip and debump | | `no_prot` | 3 | Exclude residues from hydrogen addition | | `no_titr` | 4 | Exclude residues from titration | | `hold_nonpeptidic_bonds=True` | 6 | Capture and restore non-peptidic bonds | | `restore_missing_sidechains=True` | 8 | Template back absent canonical sidechains | | `hydrophobic_thickness` | 4 | Warn on buried titratable residues (membrane context) | | `detect_specs` | 1 | Supply a pre-computed spec list; bypass auto-detection | | `return_details=True` | — | Return per-residue pKa / protonation DataFrame | ## The larger pipeline `systemPrepare` is one piece of a broader system-preparation workflow. Several preparation steps must happen **before** calling it: ### autoSegment before systemPrepare PDB2PQR requires non-empty, consistent segment identifiers. Plain PDB files typically have empty `segid` fields. Run {py:func}`~moleculekit.tools.autosegment.autoSegment` first: ```python from moleculekit.molecule import Molecule from moleculekit.tools.autosegment import autoSegment from moleculekit.tools.preparation import systemPrepare mol = Molecule("structure.pdb") mol_seg = autoSegment(mol) mol_out, specs = systemPrepare(mol_seg) ``` ### templateResidueFromSmiles before systemPrepare Non-canonical residues (covalent ligands, modified AAs, NCAAs) need their bond topology, element assignments, and formal charges established before {py:func}`~moleculekit.tools.preparation.systemPrepare` runs. Call {py:meth}`~moleculekit.molecule.Molecule.templateResidueFromSmiles` with `sel`, `smiles`, and `addHs=True` for each such residue: ```python lig_mask = mol.resname == "LIG" mol.templateResidueFromSmiles(lig_mask, "CC(=O)Nc1ccc(O)cc1", addHs=True) mol_out, specs = systemPrepare(mol) ``` {py:func}`~moleculekit.tools.preparation.systemPrepare` calls {py:func}`~moleculekit.tools.nonstandard_residues.detectNonStandardResidues` internally, so it will pick up the ligand spec automatically. ### mutateResidue before systemPrepare When you want to swap canonical residues (e.g. a point mutation), call {py:meth}`~moleculekit.molecule.Molecule.mutateResidue` with `sel` and `new_resname` first. The mutator uses Dunbrack rotamers to place the new sidechain, and {py:func}`~moleculekit.tools.preparation.systemPrepare` then protonates and optimizes the mutated residue along with the rest of the protein. ### model_gaps before systemPrepare Missing loops or residues in a crystal structure can be modelled with {py:func}`~moleculekit.tools.modelling.model_gaps` (from `moleculekit.tools.modelling`), which uses the ProMod3 Singularity image. Run gap-filling before {py:func}`~moleculekit.tools.preparation.systemPrepare` so the modelled residues are also protonated correctly. ## Passing a pre-computed spec list {py:func}`~moleculekit.tools.nonstandard_residues.detectNonStandardResidues` is called internally every time {py:func}`~moleculekit.tools.preparation.systemPrepare` runs unless you supply a spec list via `detect_specs`. This is useful when: - You want to inspect or filter the spec list before preparation (e.g. to remove a spec for a ligand you want to leave unparameterized). - You are running `systemPrepare` in a batch workflow and want to cache the spec list. ```python from moleculekit.tools.nonstandard_residues import detectNonStandardResidues from moleculekit.tools.preparation import systemPrepare mol = Molecule("structure.pdb") specs = detectNonStandardResidues(mol) # Inspect specs, filter if needed ligand_specs = [s for s in specs if s.resname == "LIG"] mol_out, applied_specs = systemPrepare(mol, detect_specs=ligand_specs) ``` Pass `detect_specs=[]` to skip non-standard residue handling entirely (all residues treated as canonical). ## Further reading - Tutorial: [Basic protonation](../tutorials/system-prep/01-basic-protonation.md) - Tutorial: [Non-standard residues](../tutorials/system-prep/02-non-standard-residues.md) - Tutorial: [Custom residues from SMILES](../tutorials/system-prep/03-custom-residues-from-smiles.md) - Tutorial: [Mutations, gap closing, and segmentation](../tutorials/system-prep/04-mutation-gap-closing-segmentation.md)