Basic protonation#

You will learn: how to add hydrogens to a protein at a chosen pH, inspect the resulting protonation-state table, and write the prepared system to disk.

Prerequisites:

  • The First molecule tutorial.

  • PDB2PQR and PROPKA installed (they ship as moleculekit dependencies).

Setup#

import pandas as pd
from moleculekit.molecule import Molecule
from moleculekit.tools.preparation import systemPrepare

mol = Molecule("3PTB")

3PTB is bovine trypsin with a benzamidine ligand in the active site.

Step 1 — Run systemPrepare at pH 7.4#

pmol, specs, details = systemPrepare(mol, pH=7.4, return_details=True)
---- Molecule chain report ----
Chain A:
    First residue: ILE    16  
    Final residue: HOH   809  
---- End of chain report ----
moleculekit.tools.preparation - WARNING - Both chains and segments are defined in Molecule.chain / Molecule.segid, however they are inconsistent. Protein preparation will use the chain information.
moleculekit.tools.preparation - WARNING - The following residues have not been optimized: CA, BEN
moleculekit.tools.preparation - INFO - Modified residue CYS    22 A to CYX
moleculekit.tools.preparation - INFO - Modified residue HIS    40 A to HIE
moleculekit.tools.preparation - INFO - Modified residue CYS    42 A to CYX
moleculekit.tools.preparation - INFO - Modified residue HIS    57 A to HIP
moleculekit.tools.preparation - INFO - Modified residue CYS    58 A to CYX
moleculekit.tools.preparation - INFO - Modified residue HIS    91 A to HID
moleculekit.tools.preparation - INFO - Modified residue CYS   128 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   136 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   157 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   168 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   182 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   191 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   201 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   220 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   232 A to CYX
moleculekit.tools.preparation - WARNING - Dubious protonation state: the pKa of 2 residues is within 1.0 units of pH 7.4.
moleculekit.tools.preparation - WARNING - Dubious protonation state:    TYR    39 A (pKa= 8.24)
moleculekit.tools.preparation - WARNING - Dubious protonation state:    HIS    57 A (pKa= 7.44)

The call returns a 3-tuple: pmol, specs, details. pmol is a new Molecule — the input mol is not mutated. specs is the list of detected non-standard-residue specs that the call applied (same type as returned by detectNonStandardResidues()); pass it back to a later systemPrepare call if you need to repeat the run, or inspect it to audit which residues were renamed. details is a pandas.DataFrame with one row per titratable residue; columns include resname, resid, chain, segid, pKa, protonation, and buried. The function adds hydrogens, runs PROPKA to predict pKa values, and titrates each titratable residue accordingly.

Step 2 — Inspect protonation states#

details[["resname", "resid", "protonation", "pKa"]].head(10)
resname resid protonation pKa
0 ILE 16 ILE NaN
1 VAL 17 VAL NaN
2 GLY 18 GLY NaN
3 GLY 19 GLY NaN
4 TYR 20 TYR 9.590845
5 THR 21 THR NaN
6 CYS 22 CYX 99.990000
7 GLY 23 GLY NaN
8 ALA 24 ALA NaN
9 ASN 25 ASN NaN

Each row shows the assigned protonation form for one residue. Histidines appear as HID, HIE, or HIP depending on tautomer or charge; aspartates as ASP (deprotonated) or ASH (protonated); glutamates as GLU or GLH; cysteines as CYS or CYM; lysines as LYS or LYN.

Residues whose predicted pKa falls within 2 units of the target pH are the most sensitive to the pH choice — flipping them would change their protonation state if pH moved a unit or two:

import numpy as np

pkas = pd.to_numeric(details["pKa"], errors="coerce")
near_pka = details[np.abs(pkas - 7.4) < 2.0]
near_pka[["resname", "resid", "chain", "protonation", "pKa"]]
resname resid chain protonation pKa
21 TYR 39 A TYR 8.242255
39 HIS 57 A HIP 7.441714
51 GLU 70 A GLU 6.098253

These residues would flip protonation state if pH moved a unit or two from 7.4.

Step 3 — Skip titration entirely#

pmol_no_titr, _ = systemPrepare(mol, titration=False)
---- Molecule chain report ----
Chain A:
    First residue: ILE    16  
    Final residue: HOH   809  
---- End of chain report ----
moleculekit.tools.preparation - WARNING - Both chains and segments are defined in Molecule.chain / Molecule.segid, however they are inconsistent. Protein preparation will use the chain information.
moleculekit.tools.preparation - WARNING - The following residues have not been optimized: CA, BEN
moleculekit.tools.preparation - INFO - Modified residue CYS    22 A to CYX
moleculekit.tools.preparation - INFO - Modified residue HIS    40 A to HIE
moleculekit.tools.preparation - INFO - Modified residue CYS    42 A to CYX
moleculekit.tools.preparation - INFO - Modified residue HIS    57 A to HID
moleculekit.tools.preparation - INFO - Modified residue CYS    58 A to CYX
moleculekit.tools.preparation - INFO - Modified residue HIS    91 A to HID
moleculekit.tools.preparation - INFO - Modified residue CYS   128 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   136 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   157 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   168 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   182 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   191 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   201 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   220 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS   232 A to CYX

titration=False skips PROPKA. Hydrogens are still added by PDB2PQR, but every titratable residue gets the standard protonation form at default pH with no per-residue prediction. Use this when you already know the protonation states you want, or when you will set them yourself via force_protonation.

Step 4 — Write the prepared structure#

pmol.write("trypsin_prepared.cif")

mmCIF is the recommended output format here: it preserves the bonds, bond orders, and formal charges that systemPrepare just established. Reload with Molecule(“trypsin_prepared.cif”) to round-trip.

Recap#

  • systemPrepare() adds hydrogens and assigns protonation states to mol at the chosen pH in one call.

  • return_details=True gives you a per-titratable-residue table (a pandas.DataFrame) for inspection.

  • titration=False skips PROPKA when you do not need pKa prediction.

Next#