The system-preparation pipeline#

systemPrepare() is moleculekit’s all-in-one function for taking a raw PDB structure and producing a properly protonated, hydrogen-optimized molecule ready for MD parameterization. This page builds a mental model of what happens inside the function, why each step exists, and how the parameters map onto that process. For step-by-step worked examples, see the system-preparation tutorials linked at the bottom.

The pipeline at a glance#

        flowchart TD
    A[Input Molecule] --> B[autoSegment]
    B --> C[Molecule.mutateResidue]
    C --> D[model_gaps]
    D --> E[Molecule.templateResidueFromSmiles]
    E --> F[detectNonStandardResidues]
    F --> G[systemPrepare]
    G --> H[Prepared Molecule + details DataFrame]
    

Every node above is an entry point you can call directly. Each step is optional — skip any that your input does not need. The internals of systemPrepare itself (rename, PDB2PQR, PROPKA, debump, bond restore, …) are covered step-by-step below; you do not invoke them directly.

Step 1 — Detect non-standard residues#

detectNonStandardResidues() inspects mol.bonds (or guesses them by distance if mol.bonds is empty) and walks every residue. It emits one spec per residue that needs special handling:

  • ChainResidueSpec — a non-canonical amino acid embedded in a polypeptide, or a canonical AA whose sidechain makes a non-peptide covalent bond. Examples: selenomethionine; an ASN-ND2 glycosylated with NAG; a TYR coordinating a heme iron.

  • CovalentLigandSpec — a non-chain residue with exactly one non-peptide bond (single-anchor covalent inhibitor, NAG stem of a glycan).

  • ScaffoldSpec — a non-chain residue with two or more non-peptide bonds (bicyclic-peptide scaffold, multi-anchor inhibitor).

  • LigandSpec — a non-chain residue with no covalent bonds to the protein (free small-molecule, solvent molecule other than water, ion).

Note: plain Cys–Cys disulfides are NOT returned by detectNonStandardResidues(). They are handled internally by systemPrepare() in step 2: the participating cysteines are renamed CYX and the disulfide bonds are preserved across the PDB2PQR roundtrip. You don’t need a spec to make this work.

Metal-coordination bonds involving standalone metal ions (e.g. a Ca²⁺ ion coordinated by protein oxygens) are skipped — the protein residues are left unmodified. Coordinations where the metal lives inside a cofactor (e.g. Fe in a heme coordinated by a Cys-SG) are kept: the cofactor gets a CovalentLigandSpec and the donating residue becomes a ChainResidueSpec so its protonation state (Tyr-O⁻, Cys-S⁻) is handled correctly.

Step 2 — Rename residues for force-field compatibility#

Based on the specs from step 1, canonical AAs at non-peptide junctions are renamed:

Original

Renamed to

Reason

CYS (disulfide)

CYX

Thiol H absent; PDB2PQR needs this name

CYS (metal coord.)

XX# bucket

Custom prepi shared across identical junctions

HIS

HID / HIE / HIP

Tautomer / charge state from PROPKA

ASP (neutral)

ASH

After titration

GLU (neutral)

GLH

After titration

LYS (neutral)

LYN

After titration

TYR (negative)

TYM

After titration

ARG (neutral)

AR0

After titration

Non-canonical AAs (NCAAs) that have been pre-templated with templateResidueFromSmiles() are also handled here so that PDB2PQR’s templates apply cleanly to their atoms.

Step 3 — Add hydrogens via PDB2PQR#

PDB2PQR adds missing heavy atoms and all polar hydrogens using its internal force-field templates. The result is a fully hydrogenated structure at the default protonation for each residue’s name.

Residues in no_prot are excluded from hydrogen addition — useful when a known-good H-placement already exists and you want PDB2PQR to leave it alone.

Step 4 — Predict pKa and titrate (optional)#

When titration=True (the default), PROPKA estimates the pKa of every titratable residue in the context of the folded structure. At the target pH, each titratable group is set to its dominant protonation state:

  • ASP/GLU: neutral if pKa > pH

  • HIS: HID (Nδ), HIE (Nε), or HIP (both) depending on H-bond geometry and pKa

  • LYS, ARG, TYR: neutral if pKa < pH

This step is skipped when titration=False. In that case, all residues keep their default protonation state (standard charge at pH 7 for canonical AAs).

Step 5 — Flip and debump#

PDB2PQR flips the amide groups of Asn and Gln, and the imidazole of His, to find the orientation that forms the best hydrogen-bond network. It then debumps any steric clashes introduced by the added hydrogens.

Residues in no_opt are held fixed and not flipped. Use this for residues in a metal site or a known crystal-water network where the flip would break the geometry.

Step 6 — Restore non-peptidic bonds#

The PDB2PQR roundtrip discards bonds that are not part of the canonical peptide topology — disulfides, glycosidic bonds, metal coordinations, stapled sidechain bonds, intra-ligand bonds, etc. When hold_nonpeptidic_bonds=True (the default), systemPrepare captures those bonds before the PDB2PQR call and restores them in the output molecule.

Setting hold_nonpeptidic_bonds=False disables the capture-and-restore mechanism. The canonical atoms still get protonated and optimized, but any bond that PDB2PQR drops is permanently lost from the output. This option is rarely needed and should be used with care.

Step 7 — Restore formal charges and termini#

After the PDB2PQR roundtrip, the formal charges on non-standard residues and termini can be incorrect because PDB2PQR assigns charges via its own tables rather than propagating the values from the input. systemPrepare recaptures the formal charges from the input before the call and restores them on the corresponding atoms in the output.

This ensures that, for example, an N-terminal ammonium (formal charge +1) and a deprotonated Tyr-O⁻ in a metal site (formal charge -1) survive the pipeline with the correct charges for downstream force-field parameterization.

Step 8 — Restore missing sidechains (optional)#

When restore_missing_sidechains=True, systemPrepare() uses moleculekit’s Dunbrack-rotamer mutator to template back any canonical residues whose entire sidechain is absent from the input structure (e.g. truncated crystal structures or homology models). This runs before the PDB2PQR call so the reconstructed atoms participate in all downstream protonation and optimization steps.

This option is off by default because mutating residues is a significant structural change that should be made consciously.

How parameters map onto the pipeline#

Parameter

Step

Effect

pH

4

PROPKA target pH for titration

titration=False

4

Skip PROPKA; use default protonation

force_protonation=[(sel, "HID"), ...]

4

Patch specific residues after PROPKA

no_opt

5

Exclude residues from H-bond flip and debump

no_prot

3

Exclude residues from hydrogen addition

no_titr

4

Exclude residues from titration

hold_nonpeptidic_bonds=True

6

Capture and restore non-peptidic bonds

restore_missing_sidechains=True

8

Template back absent canonical sidechains

hydrophobic_thickness

4

Warn on buried titratable residues (membrane context)

detect_specs

1

Supply a pre-computed spec list; bypass auto-detection

return_details=True

Return per-residue pKa / protonation DataFrame

The larger pipeline#

systemPrepare is one piece of a broader system-preparation workflow. Several preparation steps must happen before calling it:

autoSegment before systemPrepare#

PDB2PQR requires non-empty, consistent segment identifiers. Plain PDB files typically have empty segid fields. Run autoSegment() first:

from moleculekit.molecule import Molecule
from moleculekit.tools.autosegment import autoSegment
from moleculekit.tools.preparation import systemPrepare

mol = Molecule("structure.pdb")
mol_seg = autoSegment(mol)
mol_out, specs = systemPrepare(mol_seg)

templateResidueFromSmiles before systemPrepare#

Non-canonical residues (covalent ligands, modified AAs, NCAAs) need their bond topology, element assignments, and formal charges established before systemPrepare() runs. Call templateResidueFromSmiles() with sel, smiles, and addHs=True for each such residue:

lig_mask = mol.resname == "LIG"
mol.templateResidueFromSmiles(lig_mask, "CC(=O)Nc1ccc(O)cc1", addHs=True)
mol_out, specs = systemPrepare(mol)

systemPrepare() calls detectNonStandardResidues() internally, so it will pick up the ligand spec automatically.

mutateResidue before systemPrepare#

When you want to swap canonical residues (e.g. a point mutation), call mutateResidue() with sel and new_resname first. The mutator uses Dunbrack rotamers to place the new sidechain, and systemPrepare() then protonates and optimizes the mutated residue along with the rest of the protein.

model_gaps before systemPrepare#

Missing loops or residues in a crystal structure can be modelled with model_gaps() (from moleculekit.tools.modelling), which uses the ProMod3 Singularity image. Run gap-filling before systemPrepare() so the modelled residues are also protonated correctly.

Passing a pre-computed spec list#

detectNonStandardResidues() is called internally every time systemPrepare() runs unless you supply a spec list via detect_specs. This is useful when:

  • You want to inspect or filter the spec list before preparation (e.g. to remove a spec for a ligand you want to leave unparameterized).

  • You are running systemPrepare in a batch workflow and want to cache the spec list.

from moleculekit.tools.nonstandard_residues import detectNonStandardResidues
from moleculekit.tools.preparation import systemPrepare

mol = Molecule("structure.pdb")
specs = detectNonStandardResidues(mol)

# Inspect specs, filter if needed
ligand_specs = [s for s in specs if s.resname == "LIG"]

mol_out, applied_specs = systemPrepare(mol, detect_specs=ligand_specs)

Pass detect_specs=[] to skip non-standard residue handling entirely (all residues treated as canonical).

Further reading#