moleculekit.tools.preparation module#

exception moleculekit.tools.preparation.MissingTopologyError#

Bases: Exception

moleculekit.tools.preparation.systemPrepare(mol_in, titration=True, pH=7.4, force_protonation=None, no_opt=None, no_prot=None, no_titr=None, hold_nonpeptidic_bonds=True, verbose=True, return_details=False, hydrophobic_thickness=None, plot_pka=None, _logger_level='ERROR', titrate=None, detect_specs=None, restore_missing_sidechains=False)#

Prepare a molecular system by adding hydrogens, assigning protonation states, and optimizing the H-bond network.

Wraps PDB2PQR + PROPKA. Protein and nucleic residues are protonated and optimized; non-protein, non-nucleic residues contribute to the pKa calculation but are not themselves protonated or optimized here. The input molecule is not mutated — a new Molecule is returned.

The returned molecule uses the following protonation-aware resnames:

ASH

Neutral ASP

CYX

SS-bonded CYS

CYM

Negative CYS

GLH

Neutral GLU

HIP

Positive HIS

HID

Neutral HIS, proton HD1 present

HIE

Neutral HIS, proton HE2 present

LYN

Neutral LYS

TYM

Negative TYR

AR0

Neutral ARG

Charge +1

Neutral

Charge -1

ASH

ASP

CYS

CYM

GLH

GLU

HIP

HID/HIE

LYS

LYN

TYR

TYM

ARG

AR0

Notes

What this function does:

  • Assigns protonation states via PROPKA.

  • Flips Asn/Gln/His sidechains to optimize the H-bond network.

  • Debumps clashes introduced by added hydrogens.

  • Adds missing heavy atoms and hydrogens via PDB2PQR’s force-field templates.

  • Detects non-standard residues and sidechain crosslinks (disulfides, metal-coordinating Cys/His/Tyr, isopeptides, glycosylations, …) and renames the affected residues to canonical variants (CYX, XX# buckets) so PDB2PQR’s templates apply and the bonds are preserved. See detect_specs below.

If hydrophobic_thickness is set to a positive value 2*h, a warning is produced for titratable residues with -h < z < h that are buried by less than 75%. The heuristic is crude (it assumes the protein is aligned with the membrane centered at z=0) and the “buried fraction” estimate (from PROPKA) is approximate, so cavity- facing residues may appear solvent-exposed regardless of z.

Parameters:
  • mol_in (Molecule) – Input molecule. Not mutated — an internal copy is taken on the first frame only.

  • titration (bool) – If True, run PROPKA to assign titration states. If False, just add and optimize hydrogens at the input resnames’ default protonation.

  • pH (float) – Solution pH used by PROPKA to pick titration states. Default 7.4.

  • force_protonation (list of tuple[str, str], optional) – Force specific protonation states on individual residues. Each entry is (atomselection, resname), e.g. [("protein and resid 40", "HID")] forces residue 40 to HID. Atom selections use VMD syntax.

  • no_opt (list of str, optional) – Atom selections of residues to exclude from H-bond / sidechain flip optimization. Use this when an optimizer flip would degrade a known-good structure (e.g. a residue in a metal site).

  • no_prot (list of str, optional) – Atom selections of residues to exclude from hydrogen addition.

  • no_titr (list of str, optional) – Atom selections of residues to exclude from titration.

  • hold_nonpeptidic_bonds (bool) – When True (default), protein residues that are covalently bonded to non-protein partners are automatically added to no_opt, no_prot and no_titr so their bonds aren’t broken and their boundary atoms aren’t over-protonated. Set to False to let PDB2PQR/PROPKA process them ignoring the non-peptide bond.

  • verbose (bool) – If False, demote this module’s logger to WARNING for the call.

  • return_details (bool) – If True, additionally return a pandas DataFrame with per-residue protonation / pKa info.

  • hydrophobic_thickness (float, optional) – Membrane thickness 2*h in Angstrom (None for globular proteins). Triggers a warning for titratable residues that fall inside the bilayer and are not well-buried (see Notes).

  • plot_pka (str, optional) – Path to a .png file. When set, writes a titration diagram for the system’s titratable residues.

  • titrate (list[str], optional) – Restrict titration to a subset of the canonical AAs in the table above (e.g. ["HIS", "ARG"]). All other AAs in the table are added to no_titr. When None (default), every AA in the table is titratable.

  • detect_specs (list[PerResidueSpec], optional) – Per-residue specs from moleculekit.tools.nonstandard_residues.detectNonStandardResidues(). When left as None (the default), systemPrepare runs detectNonStandardResidues() itself so canonical residues at non-peptide junctions are renamed (e.g. CYS-Cys disulfide -> CYX, a TYR coordinating a heme Fe -> a TY# bucket) and their sidechains re-templated before PDB2PQR runs. Pass an explicit list to bypass the auto-detection; pass detect_specs=[] to skip non-standard residue handling entirely. The applied specs are always returned so they can be forwarded to the downstream parameterizer / builder.

  • restore_missing_sidechains (bool, optional) – If True, sidechain atoms removed by PDB2PQR are templated back in after preparation. Specifically, canonical protein residues whose entire heavy-atom sidechain is absent (only backbone + CB remain) are reconstructed using the Dunbrack rotamer library before PDB2PQR runs, so PDB2PQR’s 10 %-missing-atom threshold does not reject the structure on a partial input. Default False.

Returns:

  • mol_out (moleculekit.molecule.Molecule) – The protonated and optimized molecule.

  • detect_specs (list[PerResidueSpec]) – The non-standard-residue specs that were applied (the caller’s detect_specs argument if supplied, otherwise the auto-detected list). Empty when detect_specs=[] was passed in.

  • details (pandas.DataFrame) – Per-residue protonation states, pKas and buried fractions. Returned only when return_details=True.

Raises:

ImportError – If pdb2pqr is not installed.

Examples

>>> tryp = Molecule('3PTB')
>>> tryp_op, specs, df = systemPrepare(tryp, return_details=True)
>>> tryp_op.write('/tmp/3PTB_prepared.pdb')
>>> df.to_excel("/tmp/tryp-report.csv")
>>> df
resname protonation  resid insertion chain segid       pKa    buried
0       ILE         ILE     16               A     0  7.413075  0.839286
1       VAL         VAL     17               A     0       NaN       NaN
2       GLY         GLY     18               A     0       NaN       NaN
3       GLY         GLY     19               A     0       NaN       NaN
4       TYR         TYR     20               A     0  9.590845  0.146429
..      ...         ...    ...       ...   ...   ...       ...       ...
282     HOH         WAT    804               A     1       NaN       NaN
283     HOH         WAT    805               A     1       NaN       NaN
284     HOH         WAT    807               A     1       NaN       NaN
285     HOH         WAT    808               A     1       NaN       NaN
286     HOH         WAT    809               A     1       NaN       NaN

[287 rows x 8 columns]

Acidic pH:

>>> tryp_op, specs = systemPrepare(tryp, pH=1.0)
>>> tryp_op.write('/tmp/3PTB_pH1.pdb')

Freeze residues 36 and 49 in place:

>>> tryp_op, specs = systemPrepare(tryp, no_opt=["protein and resid 36", "chain A and resid 49"])

Disable protonation on residue 32:

>>> tryp_op, specs = systemPrepare(tryp, no_prot=["protein and resid 32",])

Disable titration and protonation on residue 32:

>>> tryp_op, specs = systemPrepare(tryp, no_titr=["protein and resid 32",], no_prot=["protein and resid 32",])

Force residue 40 to HIE and 57 to HIP:

>>> tryp_op, specs = systemPrepare(tryp, force_protonation=[("protein and resid 40", "HIE"), ("protein and resid 57", "HIP")])

Skip non-standard residue detection entirely (legacy behavior):

>>> tryp_op, specs = systemPrepare(tryp, detect_specs=[])

Forward specs to a downstream builder for non-canonical residues (NCAAs, sidechain crosslinks, metal-coordinating residues):

>>> # from htmd.builder.nonstandard import parameterizeFromSpecs
>>> # parameterizeFromSpecs(specs, tryp_op, outdir="./params")