Protein preparation

What is protein preparation?

The protein preparation phase, based on the PDB2PQR and propKa softwares, addresses e.g. the problems of assigning titration states at the user-chosen pH; flipping the side chains of HIS, ASN, and GLN residues; and optimizing the overall hydrogen bonding network.

After preparing, the build phase takes a prepared system and applies the chosen forcefield in order to obtain simulation-ready input files.

Let’s start

from htmd.ui import *
config(viewer='ngl')
Please cite HTMD: Doerr et al.(2016)JCTC,12,1845.
https://dx.doi.org/10.1021/acs.jctc.6b00049
Documentation: http://software.acellera.com/
To update: conda update htmd -c acellera -c psi4

You are on the latest HTMD version (unpackaged : /home/joao/maindisk/software/repos/Acellera/htmd/htmd).

Protein Preparation in HTMD

The system preparation phase is based on the PDB2PQR software. It includes the following steps (from the PDB2PQR algorithm description):

  • Compute empirical pKa values for the residues’ local environment (propKa)
  • Assign titration states at the user-chosen pH;
  • Flipping the side chains of HIS (including user defined HIS states), ASN, and GLN residues;
  • Rotating the sidechain hydrogen on SER, THR, TYR, and CYS (if available);
  • Determining the best placement for the sidechain hydrogen on neutral HIS, protonated GLU, and protonated ASP;
  • Optimizing all water hydrogens.

The hydrogen bonding network calculations are performed by the PDB2PQR software package. The pKa calculations are performed by the PROPKA 3.1 software packages. Please see the copyright, license and citation terms distributed with each.

Note that this version was modified in order to use an externally-supplied propKa 3.1 (installed automatically via dependencies), whereas the original had propKa 3.0 embedded!

The results of the function should be roughly equivalent of the system preparation wizard’s preprocessing and optimization steps of Schrodinger’s Maestro software.

Protein residue pKas in water

image0

Modified residue names

The molecule produced by the preparation modifies residue names according to their protonation. Later system-building functions assume these residue naming conventions.

Charge +1 Neutral Charge -1
ASH ASP
CYS CYM
GLH GLU
HIP HID/HIE
LYS LYN
TYR TYM
ARG AR0

Note: support for alternative charge states varies between the forcefields.

Limitations

  • PDB2PQR: returns one solution consistent with its restraints, not the optimal one;
  • Membrane proteins: propKa ignores lipid exposure (more on this later);
  • Large conformational changes: local environment changes may be large enough that pKa decisions are not transferable.

proteinPrepare function

The proteinPrepare function requires a Molecule object, the protein to be prepared, as an argument, and returns the prepared system, also as a Molecule. Logging messages will provide information and warnings about the process.

def proteinPrepare(mol_in,
                   pH=7.0,
                   verbose=0,
                   returnDetails=False,
                   hydrophobicThickness=None,
                   holdSelection=None):

Returns a Molecule object, where residues have been renamed to follow internal conventions on protonation (below). Coordinates are changed to optimize the H-bonding network. This should be roughly comparable to Schroedinger Maestro’s preparation wizard.

Parameters

mol_in : htmd.Molecule
    the object to be optimized
pH : float
    pH to decide titration
verbose : int
    verbosity
returnDetails : bool
    whether to return just the prepared Molecule (False, default) or a molecule *and* a ResidueInfo
    object including computed properties
hydrophobicThickness : float
    the thickness of the membrane in which the protein is embedded, or None if globular protein.
    Used to provide a warning about membrane-exposed residues.
holdSelection : str
    Atom selection to be excluded from optimization.
    Only the carbon-alpha atom will be considered for the corresponding residue.

proteinPrepare() is a convenience function. Using it is not mandatory. You can manipulate the input molecule with your custom functions. In particular,

  • Addition of hydrogen atoms is not required
  • Protonation states are set by renaming residues
  • HIS and other residues can be edited as coordinates

Example

Prepare trypsin (PDB: 3PTB) at pH 7.

tryp = Molecule("3PTB")
tryp_op = proteinPrepare(tryp)
2018-03-16 16:45:27,014 - htmd.molecule.readers - INFO - Using local copy for 3PTB: /home/joao/maindisk/software/repos/Acellera/htmd/htmd/data/pdb/3ptb.pdb
2018-03-16 16:45:28,394 - htmd.molecule.molecule - WARNING - Residue insertions were detected in the Molecule. It is recommended to renumber the residues using the Molecule.renumberResidues() method.
2018-03-16 16:45:28,556 - propka - INFO - No pdbfile provided
2018-03-16 16:45:31,188 - htmd.builder.preparation - WARNING - The following residue has not been optimized: CA
2018-03-16 16:45:31,189 - htmd.builder.preparation - WARNING - The following residue has not been optimized: BEN
2018-03-16 16:45:39,499 - htmd.builder.preparationdata - INFO - The following residues are in a non-standard state: CYS    22  A (CYX), HIS    40  A (HIE), CYS    42  A (CYX), HIS    57  A (HIP), CYS    58  A (CYX), HIS    91  A (HID), CYS   128  A (CYX), CYS   136  A (CYX), CYS   157  A (CYX), CYS   168  A (CYX), CYS   182  A (CYX), CYS   191  A (CYX), CYS   201  A (CYX), CYS   220  A (CYX), CYS   232  A (CYX)
2018-03-16 16:45:39,935 - htmd.builder.preparationdata - WARNING - Dubious protonation state: the pKa of 3 residues is within 1.0 units of pH 7.0.
2018-03-16 16:45:39,939 - htmd.builder.preparationdata - WARNING - Dubious protonation state:    HIS    57  A (pKa= 7.44)
2018-03-16 16:45:39,940 - htmd.builder.preparationdata - WARNING - Dubious protonation state:    GLU    70  A (pKa= 6.10)
2018-03-16 16:45:39,942 - htmd.builder.preparationdata - WARNING - Dubious protonation state:    N+     16T A (pKa= 7.41)
2018-03-16 16:45:40,042 - htmd.builder.preparationdata - WARNING - Found N-terminus 83.9% buried (> 50.0% threshold)
2018-03-16 16:45:40,043 - htmd.builder.preparationdata - WARNING - Found C-terminus involved in H bonds

Visualize protonation of residue 40

tryp_op.view(style="Licorice",sel="resid 40",hold=True)
tryp_op.view(style="Lines",sel="same residue as exwithin 4 of resid 40")
A Jupyter Widget

Preparation report

If the returnDetails argument is set, an object of type ResidueData is returned as a second return value. It carries a wealth of information on the preparation results.

tryp_op, prepData = proteinPrepare(tryp, returnDetails=True)
prepData
2018-03-16 16:46:21,332 - htmd.builder.preparation - WARNING - The following residue has not been optimized: CA
2018-03-16 16:46:21,334 - htmd.builder.preparation - WARNING - The following residue has not been optimized: BEN
2018-03-16 16:46:29,334 - htmd.builder.preparationdata - INFO - The following residues are in a non-standard state: CYS    22  A (CYX), HIS    40  A (HIE), CYS    42  A (CYX), HIS    57  A (HIP), CYS    58  A (CYX), HIS    91  A (HID), CYS   128  A (CYX), CYS   136  A (CYX), CYS   157  A (CYX), CYS   168  A (CYX), CYS   182  A (CYX), CYS   191  A (CYX), CYS   201  A (CYX), CYS   220  A (CYX), CYS   232  A (CYX)
2018-03-16 16:46:29,337 - htmd.builder.preparationdata - WARNING - Dubious protonation state: the pKa of 3 residues is within 1.0 units of pH 7.0.
2018-03-16 16:46:29,339 - htmd.builder.preparationdata - WARNING - Dubious protonation state:    HIS    57  A (pKa= 7.44)
2018-03-16 16:46:29,340 - htmd.builder.preparationdata - WARNING - Dubious protonation state:    GLU    70  A (pKa= 6.10)
2018-03-16 16:46:29,341 - htmd.builder.preparationdata - WARNING - Dubious protonation state:    N+     16T A (pKa= 7.41)
2018-03-16 16:46:29,433 - htmd.builder.preparationdata - WARNING - Found N-terminus 83.9% buried (> 50.0% threshold)
2018-03-16 16:46:29,434 - htmd.builder.preparationdata - WARNING - Found C-terminus involved in H bonds
PreparationData object about 290 residues.
Unparametrized residue names: CA, BEN
Please find the full info in the .data property, e.g.:
  resname  resid insertion chain       pKa protonation flipped     buried
0     ILE     16               A       NaN         ILE     NaN        NaN
1     VAL     17               A       NaN         VAL     NaN        NaN
2     GLY     18               A       NaN         GLY     NaN        NaN
3     GLY     19               A       NaN         GLY     NaN        NaN
4     TYR     20               A  9.590845         TYR     NaN  14.642857
 . . .

Most of it is accessible in the data property, as a pandas DataFrame.

prepData.data.columns
Index(['resname', 'resid', 'insertion', 'chain', 'pKa', 'protonation',
       'flipped', 'patches', 'buried', 'z', 'membraneExposed',
       'forced_protonation', 'default_protonation', 'pka_group_id',
       'pka_residue_type', 'pka_type', 'pka_charge', 'pka_atom_name',
       'pka_atom_sybyl_type', 'pdb2pqr_idx', 'guessedAtoms'],
      dtype='object')
prepData.data.loc[:,['resname','resid','pKa','protonation']].head(10)
resname resid pKa protonation
0 ILE 16 NaN ILE
1 VAL 17 NaN VAL
2 GLY 18 NaN GLY
3 GLY 19 NaN GLY
4 TYR 20 9.590845 TYR
5 THR 21 NaN THR
6 CYS 22 99.990000 CYX
7 GLY 23 NaN GLY
8 ALA 24 NaN ALA
9 ASN 25 NaN ASN

As such, it can be easily queried and written as a spreadsheet in Excel or CSV format.

prepData.data.to_excel("./tryp_data.xlsx")

Special case: Membrane proteins

Membrane-embedded proteins are in contact with an hydrophobic region which may alter pKa values for membrane-exposed residues (image taken from Teixeira et al., J. Chem. Theory Comput., 2016, 12 (3), pp 930–934).

image0

Although the effect is not currently taken into account quantitatively, if a hydrophobicThickness argument is provided, warnings will be generated for residues exposed to the lipid region.

The following example shows the preparation of the mu opioid receptor, 4DKL. The pre-oriented structure is retrieved from the OPM database.

from htmd.util import opm
mor, thickness = opm("4dkl")
print(thickness)
mor.filter("protein and noh")
mor_opt, mor_data = proteinPrepare(mor, returnDetails=True,
                                   hydrophobicThickness=thickness)

exposedRes = mor_data.data.membraneExposed
mor_data.data[exposedRes]
mor_data.data[exposedRes].to_excel("./mor_exposed_residues.xlsx")
2018-03-16 16:47:01,783 - htmd.molecule.molecule - INFO - Removed 2546 atoms. 4836 atoms remaining in the molecule.
2018-03-16 16:47:01,876 - htmd.molecule.molecule - INFO - Removed 364 atoms. 4472 atoms remaining in the molecule.
32.0
2018-03-16 16:47:29,115 - htmd.builder.preparationdata - INFO - The following residues are in a non-standard state: ASP   114  A (ASH), CYS   140  A (CYX), HIS   171  A (HID), CYS   217  A (CYX), HIS   223  A (HID), HIS   297  A (HID), HIS   319  A (HIE), ASP   114  B (ASH), CYS   140  B (CYX), HIS   171  B (HID), CYS   217  B (CYX), HIS   223  B (HID), HIS   297  B (HID), HIS   319  B (HIE)
2018-03-16 16:47:29,118 - htmd.builder.preparationdata - WARNING - Dubious protonation state: the pKa of 6 residues is within 1.0 units of pH 7.0.
2018-03-16 16:47:29,119 - htmd.builder.preparationdata - WARNING - Dubious protonation state:    ASP   114  A (pKa= 7.85)
2018-03-16 16:47:29,121 - htmd.builder.preparationdata - WARNING - Dubious protonation state:    HIS   223  A (pKa= 6.36)
2018-03-16 16:47:29,122 - htmd.builder.preparationdata - WARNING - Dubious protonation state:    ASP   114  B (pKa= 7.85)
2018-03-16 16:47:29,123 - htmd.builder.preparationdata - WARNING - Dubious protonation state:    HIS   223  B (pKa= 6.36)
2018-03-16 16:47:29,124 - htmd.builder.preparationdata - WARNING - Dubious protonation state:    N+     65T A (pKa= 7.76)
2018-03-16 16:47:29,125 - htmd.builder.preparationdata - WARNING - Dubious protonation state:    N+     65T B (pKa= 7.76)
2018-03-16 16:47:29,342 - htmd.builder.preparationdata - WARNING - Predictions for 34 residues may be incorrect because they are exposed to the membrane (-16.0<z<16.00 and buried<75.0%).

Case 2. Building with a ligand

Coexistence and automatic placement of a ligand requires further manipulation, because:

  1. The ligand may have to be arranged in a geometrically sensible way
  2. We likely need additional parameters and topologies

See the tutorial System Building Trypsin-Benzamidine.

Case 3. Membrane proteins

Pre-equilibrated membranes can be merged with pre-oriented systems, e.g. downloaded from the OPM. See the tutorial System Building μ-opioid Receptor in Membrane.

Citations

Please acknowledge your use of PDB2PQR and PropKa by citing:

  • Dolinsky TJ, Czodrowski P, Li H, Nielsen JE, Jensen JH, Klebe G, Baker NA. PDB2PQR: Expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic Acids Res, 35, W522-5, 2007.
  • Sondergaard, Chresten R., Mats HM Olsson, Michal Rostkowski, and Jan H. Jensen. “Improved Treatment of Ligands and Coupling Effects in Empirical Calculation and Rationalization of pKa Values.” Journal of Chemical Theory and Computation 7, no. 7 (2011): 2284-2295.