Basic protonation#
You will learn: how to add hydrogens to a protein at a chosen pH, inspect the resulting protonation-state table, and write the prepared system to disk.
Prerequisites:
The First molecule tutorial.
PDB2PQR and PROPKA installed (they ship as moleculekit dependencies).
Setup#
import pandas as pd
from moleculekit.molecule import Molecule
from moleculekit.tools.preparation import systemPrepare
mol = Molecule("3PTB")
3PTB is bovine trypsin with a benzamidine ligand in the active site.
Step 1 — Run systemPrepare at pH 7.4#
pmol, specs, details = systemPrepare(mol, pH=7.4, return_details=True)
---- Molecule chain report ----
Chain A:
First residue: ILE 16
Final residue: HOH 809
---- End of chain report ----
moleculekit.tools.preparation - WARNING - Both chains and segments are defined in Molecule.chain / Molecule.segid, however they are inconsistent. Protein preparation will use the chain information.
moleculekit.tools.preparation - WARNING - The following residues have not been optimized: CA, BEN
moleculekit.tools.preparation - INFO - Modified residue CYS 22 A to CYX
moleculekit.tools.preparation - INFO - Modified residue HIS 40 A to HIE
moleculekit.tools.preparation - INFO - Modified residue CYS 42 A to CYX
moleculekit.tools.preparation - INFO - Modified residue HIS 57 A to HIP
moleculekit.tools.preparation - INFO - Modified residue CYS 58 A to CYX
moleculekit.tools.preparation - INFO - Modified residue HIS 91 A to HID
moleculekit.tools.preparation - INFO - Modified residue CYS 128 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS 136 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS 157 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS 168 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS 182 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS 191 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS 201 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS 220 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS 232 A to CYX
moleculekit.tools.preparation - WARNING - Dubious protonation state: the pKa of 2 residues is within 1.0 units of pH 7.4.
moleculekit.tools.preparation - WARNING - Dubious protonation state: TYR 39 A (pKa= 8.24)
moleculekit.tools.preparation - WARNING - Dubious protonation state: HIS 57 A (pKa= 7.44)
The call returns a 3-tuple: pmol, specs, details. pmol is a new Molecule — the input mol is not mutated. specs is the list of detected non-standard-residue specs that the call applied (same type as returned by detectNonStandardResidues()); pass it back to a later systemPrepare call if you need to repeat the run, or inspect it to audit which residues were renamed. details is a pandas.DataFrame with one row per titratable residue; columns include resname, resid, chain, segid, pKa, protonation, and buried. The function adds hydrogens, runs PROPKA to predict pKa values, and titrates each titratable residue accordingly.
Step 2 — Inspect protonation states#
details[["resname", "resid", "protonation", "pKa"]].head(10)
| resname | resid | protonation | pKa | |
|---|---|---|---|---|
| 0 | ILE | 16 | ILE | NaN |
| 1 | VAL | 17 | VAL | NaN |
| 2 | GLY | 18 | GLY | NaN |
| 3 | GLY | 19 | GLY | NaN |
| 4 | TYR | 20 | TYR | 9.590845 |
| 5 | THR | 21 | THR | NaN |
| 6 | CYS | 22 | CYX | 99.990000 |
| 7 | GLY | 23 | GLY | NaN |
| 8 | ALA | 24 | ALA | NaN |
| 9 | ASN | 25 | ASN | NaN |
Each row shows the assigned protonation form for one residue. Histidines appear as HID, HIE, or HIP depending on tautomer or charge; aspartates as ASP (deprotonated) or ASH (protonated); glutamates as GLU or GLH; cysteines as CYS or CYM; lysines as LYS or LYN.
Residues whose predicted pKa falls within 2 units of the target pH are the most sensitive to the pH choice — flipping them would change their protonation state if pH moved a unit or two:
import numpy as np
pkas = pd.to_numeric(details["pKa"], errors="coerce")
near_pka = details[np.abs(pkas - 7.4) < 2.0]
near_pka[["resname", "resid", "chain", "protonation", "pKa"]]
| resname | resid | chain | protonation | pKa | |
|---|---|---|---|---|---|
| 21 | TYR | 39 | A | TYR | 8.242255 |
| 39 | HIS | 57 | A | HIP | 7.441714 |
| 51 | GLU | 70 | A | GLU | 6.098253 |
These residues would flip protonation state if pH moved a unit or two from 7.4.
Step 3 — Skip titration entirely#
pmol_no_titr, _ = systemPrepare(mol, titration=False)
---- Molecule chain report ----
Chain A:
First residue: ILE 16
Final residue: HOH 809
---- End of chain report ----
moleculekit.tools.preparation - WARNING - Both chains and segments are defined in Molecule.chain / Molecule.segid, however they are inconsistent. Protein preparation will use the chain information.
moleculekit.tools.preparation - WARNING - The following residues have not been optimized: CA, BEN
moleculekit.tools.preparation - INFO - Modified residue CYS 22 A to CYX
moleculekit.tools.preparation - INFO - Modified residue HIS 40 A to HIE
moleculekit.tools.preparation - INFO - Modified residue CYS 42 A to CYX
moleculekit.tools.preparation - INFO - Modified residue HIS 57 A to HID
moleculekit.tools.preparation - INFO - Modified residue CYS 58 A to CYX
moleculekit.tools.preparation - INFO - Modified residue HIS 91 A to HID
moleculekit.tools.preparation - INFO - Modified residue CYS 128 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS 136 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS 157 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS 168 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS 182 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS 191 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS 201 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS 220 A to CYX
moleculekit.tools.preparation - INFO - Modified residue CYS 232 A to CYX
titration=False skips PROPKA. Hydrogens are still added by PDB2PQR, but every titratable residue gets the standard protonation form at default pH with no per-residue prediction. Use this when you already know the protonation states you want, or when you will set them yourself via force_protonation.
Step 4 — Write the prepared structure#
pmol.write("trypsin_prepared.cif")
mmCIF is the recommended output format here: it preserves the bonds, bond orders, and formal charges that systemPrepare just established. Reload with Molecule(“trypsin_prepared.cif”) to round-trip.
Recap#
systemPrepare()adds hydrogens and assigns protonation states tomolat the chosenpHin one call.return_details=Truegives you a per-titratable-residue table (apandas.DataFrame) for inspection.titration=Falseskips PROPKA when you do not need pKa prediction.