moleculekit.tools.preparation module#

exception moleculekit.tools.preparation.MissingTopologyError#: Bases: Exception

moleculekit.tools.preparation.systemPrepare(mol_in, titration=True, pH=7.4, force_protonation=None, no_opt=None, no_prot=None, no_titr=None, hold_nonpeptidic_bonds=True, verbose=True, return_details=False, hydrophobic_thickness=None, plot_pka=None, _logger_level='ERROR', titrate=None, detect_specs=None, restore_missing_sidechains=False, guess_bonds=True)#

Prepare a molecular system by adding hydrogens, assigning protonation states, and optimizing the H-bond network.

Wraps PDB2PQR + PROPKA. Protein and nucleic residues are protonated and optimized; non-protein, non-nucleic residues contribute to the pKa calculation but are not themselves protonated or optimized here. The input molecule is not mutated — a new Molecule is returned.

The returned molecule uses the following protonation-aware resnames:

ASH	Neutral ASP
CYX	SS-bonded CYS
CYM	Negative CYS
GLH	Neutral GLU
HIP	Positive HIS
HID	Neutral HIS, proton HD1 present
HIE	Neutral HIS, proton HE2 present
LYN	Neutral LYS
TYM	Negative TYR
AR0	Neutral ARG

Charge +1	Neutral	Charge -1
	ASH	ASP
	CYS	CYM
	GLH	GLU
HIP	HID/HIE
LYS	LYN
	TYR	TYM
ARG	AR0

Notes

What this function does:

Assigns protonation states via PROPKA.
Flips Asn/Gln/His sidechains to optimize the H-bond network.
Debumps clashes introduced by added hydrogens.
Adds missing heavy atoms and hydrogens via PDB2PQR’s force-field templates.
Detects non-standard residues and sidechain crosslinks (disulfides, metal-coordinating Cys/His/Tyr, isopeptides, glycosylations, …) and renames the affected residues to canonical variants (CYX, XX# buckets) so PDB2PQR’s templates apply and the bonds are preserved. See detect_specs below.

If hydrophobic_thickness is set to a positive value 2*h, a warning is produced for titratable residues with -h < z < h that are buried by less than 75%. The heuristic is crude (it assumes the protein is aligned with the membrane centered at z=0) and the “buried fraction” estimate (from PROPKA) is approximate, so cavity- facing residues may appear solvent-exposed regardless of z.

Parameters:

mol_in (Molecule) – Input molecule. Not mutated — an internal copy is taken on the first frame only.
titration (bool) – Master switch for titration. If True, run PROPKA to assign titration states. If False, just add and optimize hydrogens at the input resnames’ default protonation, and the titrate argument has no effect.
pH (float) – Solution pH used by PROPKA to pick titration states. Default 7.4.
force_protonation (list | None) – Force specific protonation states on individual residues. Each entry is (atomselection, resname), e.g. [("protein and resid 40", "HID")] forces residue 40 to HID. Atom selections use VMD syntax.
no_opt (list | None) – Atom selections of residues to exclude from H-bond / sidechain flip optimization. Use this when an optimizer flip would degrade a known-good structure (e.g. a residue in a metal site).
no_prot (list | None) – Atom selections of residues to exclude from hydrogen addition.
no_titr (list | None) – Atom selections of residues to exclude from titration.
hold_nonpeptidic_bonds (bool) – When True (default), protein residues that are covalently bonded to non-protein partners are automatically added to no_opt, no_prot and no_titr so their bonds aren’t broken and their boundary atoms aren’t over-protonated. Set to False to let PDB2PQR/PROPKA process them ignoring the non-peptide bond.
verbose (bool) – If False, demote this module’s logger to WARNING for the call.
return_details (bool) – If True, additionally return a pandas DataFrame with per-residue protonation / pKa info.
hydrophobic_thickness (float | None) – Membrane thickness 2*h in Angstrom (None for globular proteins). Triggers a warning for titratable residues that fall inside the bilayer and are not well-buried (see Notes).
plot_pka (str | None) – Path to a .png file. When set, writes a titration diagram for the system’s titratable residues.
titrate (list | None) – When titration is True, restrict titration to a subset of the canonical AAs in the table above (e.g. ["HIS", "ARG"]). All other AAs in the table are added to no_titr. When None (default), every AA in the table is titratable. Ignored when titration is False.
detect_specs (list | None) – Per-residue specs from moleculekit.tools.nonstandard_residues.detectNonStandardResidues(). When left as None (the default), systemPrepare runs detectNonStandardResidues() itself so canonical residues at non-peptide junctions are renamed (e.g. CYS-Cys disulfide -> CYX, a TYR coordinating a heme Fe -> a TY# bucket) and their sidechains re-templated before PDB2PQR runs. Pass an explicit list to bypass the auto-detection; pass detect_specs=[] to skip non-standard residue handling entirely. The applied specs are always returned so they can be forwarded to the downstream parameterizer / builder.
restore_missing_sidechains (bool) – If True, sidechain atoms removed by PDB2PQR are templated back in after preparation. Specifically, canonical protein residues whose entire heavy-atom sidechain is absent (only backbone + CB remain) are reconstructed using the Dunbrack rotamer library before PDB2PQR runs, so PDB2PQR’s 10 %-missing-atom threshold does not reject the structure on a partial input. Default False.
guess_bonds (bool) – Forwarded to detectNonStandardResidues() when detect_specs is None. When the input molecule has no bonds, controls whether they are guessed from atom coordinates to find crosslinks. Set to False to rely only on explicit input bonds, avoiding spurious crosslinks from slightly-off modelled geometry. Ignored when detect_specs is passed explicitly.

Returns:

mol_out (moleculekit.molecule.Molecule) – The protonated and optimized molecule.
detect_specs (list[PerResidueSpec]) – The non-standard-residue specs that were applied (the caller’s detect_specs argument if supplied, otherwise the auto-detected list). Empty when detect_specs=[] was passed in.
details (pandas.DataFrame) – Per-residue protonation states, pKas and buried fractions. Returned only when return_details=True.

Raises:

ImportError – If pdb2pqr is not installed.

Examples

>>> tryp = Molecule('3PTB')
>>> tryp_op, specs, df = systemPrepare(tryp, return_details=True)
>>> tryp_op.write('/tmp/3PTB_prepared.pdb')
>>> df.to_excel("/tmp/tryp-report.csv")
>>> df
resname protonation  resid insertion chain segid       pKa    buried
0       ILE         ILE     16               A     0  7.413075  0.839286
1       VAL         VAL     17               A     0       NaN       NaN
2       GLY         GLY     18               A     0       NaN       NaN
3       GLY         GLY     19               A     0       NaN       NaN
4       TYR         TYR     20               A     0  9.590845  0.146429
..      ...         ...    ...       ...   ...   ...       ...       ...
282     HOH         WAT    804               A     1       NaN       NaN
283     HOH         WAT    805               A     1       NaN       NaN
284     HOH         WAT    807               A     1       NaN       NaN
285     HOH         WAT    808               A     1       NaN       NaN
286     HOH         WAT    809               A     1       NaN       NaN

[287 rows x 8 columns]

Acidic pH:

>>> tryp_op, specs = systemPrepare(tryp, pH=1.0)
>>> tryp_op.write('/tmp/3PTB_pH1.pdb')

Freeze residues 36 and 49 in place:

>>> tryp_op, specs = systemPrepare(tryp, no_opt=["protein and resid 36", "chain A and resid 49"])

Disable protonation on residue 32:

>>> tryp_op, specs = systemPrepare(tryp, no_prot=["protein and resid 32",])

Disable titration and protonation on residue 32:

>>> tryp_op, specs = systemPrepare(tryp, no_titr=["protein and resid 32",], no_prot=["protein and resid 32",])

Force residue 40 to HIE and 57 to HIP:

>>> tryp_op, specs = systemPrepare(tryp, force_protonation=[("protein and resid 40", "HIE"), ("protein and resid 57", "HIP")])

Skip non-standard residue detection entirely (legacy behavior):

>>> tryp_op, specs = systemPrepare(tryp, detect_specs=[])

Forward specs to a downstream builder for non-canonical residues (NCAAs, sidechain crosslinks, metal-coordinating residues):

>>> # from htmd.builder.nonstandard import parameterizeFromSpecs
>>> # parameterizeFromSpecs(specs, tryp_op, outdir="./params")