{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# System preparation" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## What is system preparation?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "The protonation state of a biological system is critical. Since MD simulations typically don't allow for bond breaking, the\n", "initial protonation of the system must be accurate. Knowing what pH you are trying to reproduce and the protonation state of all molecules is therefore important to obtain the correct results. If you suspect changing protonation is important to your system and you still want to use classical mechanics, consider running multiple simulations with different protonation states.\n", "\n", "Histidine residues can have three different protonations states even at pH 7, therefore, a correct protonation of this\n", "residue is particularly critical. This residue can be protonated at either delta (most common; HSD/HID), epsilon (very\n", "common also; HSE/HIE) or at both nitrogens (special situations and low pH; HSP/HIP).\n", "\n", "![histidines](img/histidines.png)\n", "\n", "The best way to determine how histidine should be protonated is to look at the the structure. Typically, a histidine\n", "residue is protonated if it is close enough to an electron donor (e.g. a glutamic acid), thus creating a hydrogen bond.\n", "Since histidines are frequently present at protein active sites, a correct protonation state is particularly important\n", "in ligand binding simulations.\n", "\n", "In HTMD, one can use [systemPrepare](https://software.acellera.com/moleculekit/moleculekit.tools.preparation.html#moleculekit.tools.preparation.systemPrepare) to help with protonation.\n", "\n", "The **system preparation** phase, based on the PDB2PQR and propKa softwares, addresses e.g. the problems of assigning titration states at the user-chosen pH; flipping the side chains of HIS, ASN, and GLN residues; and optimizing the overall hydrogen bonding network. \n", "\n", "After preparing, the **build** phase takes a prepared system and applies the chosen forcefield in order to obtain simulation-ready input files." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Let's start" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2021-11-16 09:53:00,399 - numexpr.utils - INFO - NumExpr defaulting to 8 threads.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Please cite HTMD: Doerr et al.(2016)JCTC,12,1845. https://dx.doi.org/10.1021/acs.jctc.6b00049\n", "\n", "HTMD Documentation at: https://www.htmd.org/docs/latest/\n", "\n", "You are on the latest HTMD version (unpackaged : /home/sdoerr/Work/htmd/htmd).\n", "\n" ] } ], "source": [ "from htmd.ui import *\n", "config(viewer='ngl')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## System Preparation in HTMD" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The system preparation phase is based on the PDB2PQR software. It \n", "includes the following steps (from the\n", "[PDB2PQR algorithm\n", "description](https://pdb2pqr.readthedocs.io/en/latest/using/algorithms.html)):\n", "\n", " * Compute empirical pKa values for the residues' local environment (propKa)\n", " * Assign titration states at the user-chosen pH;\n", " * Flipping the side chains of HIS (including user defined HIS states), ASN, and GLN residues;\n", "\n", " * Rotating the sidechain hydrogen on SER, THR, TYR, and CYS (if available);\n", " * Determining the best placement for the sidechain hydrogen on neutral HIS, protonated GLU, and protonated ASP;\n", " * Optimizing all water hydrogens." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The hydrogen bonding network calculations are performed by the\n", "[PDB2PQR](http://www.poissonboltzmann.org/) software package. The pKa\n", "calculations are performed by the [PROPKA\n", "3.1](https://github.com/jensengroup/propka-3.1) software packages.\n", "Please see the copyright, license and citation terms distributed with each.\n", "\n", "Note that this version was modified in order to use an \n", "externally-supplied propKa **3.1** (installed automatically via dependencies), whereas\n", "the original had propKa 3.0 *embedded*!\n", "\n", "The results of the function should be roughly equivalent of the system\n", "preparation wizard's preprocessing and optimization steps\n", "of Schrodinger's Maestro software." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Protein residue pKas in water" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "![residue_naming](img/naming.svg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Modified residue names" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "The molecule produced by the preparation modifies residue names\n", "according to their protonation.\n", "Later system-building functions assume these residue naming conventions." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Charge +1 | Neutral | Charge -1\n", "-------------|------------|----------\n", " - | ASH | ASP\n", " - | CYS | CYM\n", " - | GLH | GLU\n", "HIP | HID/HIE | -\n", "LYS | LYN | -\n", " - | TYR | TYM\n", "ARG | AR0 | -" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: support for alternative charge states varies between the forcefields." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Limitations" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ " * *PDB2PQR*: returns **one** solution consistent with its restraints, not the optimal one;\n", " * *Membrane proteins*: propKa ignores **lipid exposure** (more on this later);\n", " * *Large conformational changes*: local environment changes may be large enough that pKa decisions are **not transferable**." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## `systemPrepare` function" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "The `systemPrepare` function requires a `Molecule` object, the protein/DNA/RNA to be prepared, as an argument, and returns the prepared system, also as a `Molecule`. Logging messages will provide information and warnings about the process.\n", "\n", "```python\n", "def systemPrepare(mol_in,\n", " pH=7.0,\n", " verbose=0,\n", " return_details=False,\n", " hydrophobic_thickness=None):\n", "```\n", "\n", "Returns a `Molecule` object, where residues have been renamed to follow internal conventions on protonation (below). Coordinates are changed to optimize the H-bonding network." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Parameters" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on function systemPrepare in module moleculekit.tools.preparation:\n", "\n", "systemPrepare(mol_in, titration=True, pH=7.4, force_protonation=None, no_opt=None, no_prot=None, no_titr=None, hold_nonpeptidic_bonds=True, verbose=True, return_details=False, hydrophobic_thickness=None, plot_pka=None, _logger_level='ERROR', _molkit_ff=True)\n", " Prepare molecular systems through protonation and h-bond optimization.\n", " \n", " The preparation routine protonates and optimizes protein and nucleic residues.\n", " It will also take into account any non-protein, non-nucleic molecules for the pKa calculation\n", " but will not attempt to protonate or optimize those.\n", " \n", " Returns a Molecule object, where residues have been renamed to follow\n", " internal conventions on protonation (below). Coordinates are changed to\n", " optimize the H-bonding network.\n", " \n", " The following residue names are used in the returned molecule:\n", " \n", " === ===============================\n", " ASH Neutral ASP\n", " CYX SS-bonded CYS\n", " CYM Negative CYS\n", " GLH Neutral GLU\n", " HIP Positive HIS\n", " HID Neutral HIS, proton HD1 present\n", " HIE Neutral HIS, proton HE2 present\n", " LYN Neutral LYS\n", " TYM Negative TYR\n", " AR0 Neutral ARG\n", " === ===============================\n", " \n", " ========= ======= =========\n", " Charge +1 Neutral Charge -1\n", " ========= ======= =========\n", " - ASH ASP\n", " - CYS CYM\n", " - GLH GLU\n", " HIP HID/HIE -\n", " LYS LYN -\n", " - TYR TYM\n", " ARG AR0 -\n", " ========= ======= =========\n", " \n", " A detailed table about the residues modified is returned (as a second return value) when\n", " return_details is True .\n", " \n", " If hydrophobic_thickness is set to a positive value 2*h, a warning is produced for titratable residues\n", " having -h>> tryp = Molecule('3PTB')\n", " >>> tryp_op, df = systemPrepare(tryp, return_details=True)\n", " >>> tryp_op.write('/tmp/3PTB_prepared.pdb')\n", " >>> df.to_excel(\"/tmp/tryp-report.csv\")\n", " >>> df # doctest: +NORMALIZE_WHITESPACE\n", " resname protonation resid insertion chain segid pKa buried\n", " 0 ILE ILE 16 A 0 7.413075 0.839286\n", " 1 VAL VAL 17 A 0 NaN NaN\n", " 2 GLY GLY 18 A 0 NaN NaN\n", " 3 GLY GLY 19 A 0 NaN NaN\n", " 4 TYR TYR 20 A 0 9.590845 0.146429\n", " .. ... ... ... ... ... ... ... ...\n", " 282 HOH WAT 804 A 1 NaN NaN\n", " 283 HOH WAT 805 A 1 NaN NaN\n", " 284 HOH WAT 807 A 1 NaN NaN\n", " 285 HOH WAT 808 A 1 NaN NaN\n", " 286 HOH WAT 809 A 1 NaN NaN\n", " \n", " [287 rows x 8 columns]\n", " \n", " >>> tryp_op = systemPrepare(tryp, pH=1.0)\n", " >>> tryp_op.write('/tmp/3PTB_pH1.pdb')\n", " \n", " The following will force the preparation to freeze residues 36 and 49 in place\n", " >>> tryp_op = systemPrepare(tryp, no_opt=[\"protein and resid 36\", \"chain A and resid 49\"])\n", " \n", " The following will disable protonation on residue 32 of the protein\n", " >>> tryp_op = systemPrepare(tryp, no_prot=[\"protein and resid 32\",])\n", " \n", " The following will disable titration and protonation on residue 32\n", " >>> tryp_op = systemPrepare(tryp, no_titr=[\"protein and resid 32\",], no_prot=[\"protein and resid 32\",])\n", " \n", " The following will force residue 40 protonation to HIE and 57 to HIP\n", " >>> tryp_op = systemPrepare(tryp, force_protonation=[(\"protein and resid 40\", \"HIE\"), (\"protein and resid 57\", \"HIP\")])\n", "\n" ] } ], "source": [ "help(systemPrepare)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "`systemPrepare()` is a convenience function. Using it\n", "is **not** mandatory. You can \n", "manipulate the input molecule with your custom functions. \n", "In particular,\n", "\n", "* Addition of hydrogen atoms is not required\n", "* Protonation states are set by renaming residues\n", "* HIS and other residues can be edited as coordinates\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Example" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Prepare trypsin (PDB: 3PTB) at pH 7." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2021-11-16 09:54:22,642 - moleculekit.readers - INFO - Using local copy for 3PTB: /home/sdoerr/Work/moleculekit/moleculekit/test-data/pdb/3ptb.pdb\n", "2021-11-16 09:54:22,818 - moleculekit.tools.preparation - WARNING - Both chains and segments are defined in Molecule.chain / Molecule.segid, however they are inconsistent. Protein preparation will use the chain information.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "---- Molecule chain report ----\n", "Chain A:\n", " First residue: ILE:16:\n", " Final residue: HOH:809:\n", "---- End of chain report ----\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2021-11-16 09:54:24,870 - moleculekit.tools.preparation - WARNING - The following residues have not been optimized: BEN, CA\n", "2021-11-16 09:54:24,964 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:22 to CYX\n", "2021-11-16 09:54:24,965 - moleculekit.tools.preparation - INFO - Modified residue HIS:A:40 to HIE\n", "2021-11-16 09:54:24,965 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:42 to CYX\n", "2021-11-16 09:54:24,965 - moleculekit.tools.preparation - INFO - Modified residue HIS:A:57 to HIP\n", "2021-11-16 09:54:24,966 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:58 to CYX\n", "2021-11-16 09:54:24,966 - moleculekit.tools.preparation - INFO - Modified residue HIS:A:91 to HID\n", "2021-11-16 09:54:24,966 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:128 to CYX\n", "2021-11-16 09:54:24,967 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:136 to CYX\n", "2021-11-16 09:54:24,967 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:157 to CYX\n", "2021-11-16 09:54:24,968 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:168 to CYX\n", "2021-11-16 09:54:24,968 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:182 to CYX\n", "2021-11-16 09:54:24,968 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:191 to CYX\n", "2021-11-16 09:54:24,969 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:201 to CYX\n", "2021-11-16 09:54:24,969 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:220 to CYX\n", "2021-11-16 09:54:24,969 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:232 to CYX\n", "2021-11-16 09:54:24,972 - moleculekit.tools.preparation - WARNING - Dubious protonation state: the pKa of 3 residues is within 1.0 units of pH 7.4.\n", "2021-11-16 09:54:24,973 - moleculekit.tools.preparation - WARNING - Dubious protonation state: ILE 16 A (pKa= 7.41)\n", "2021-11-16 09:54:24,973 - moleculekit.tools.preparation - WARNING - Dubious protonation state: TYR 39 A (pKa= 8.24)\n", "2021-11-16 09:54:24,974 - moleculekit.tools.preparation - WARNING - Dubious protonation state: HIS 57 A (pKa= 7.44)\n" ] } ], "source": [ "tryp = Molecule(\"3PTB\")\n", "tryp_op = systemPrepare(tryp)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Visualize protonation of residue 40" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "cfa19ea4a2074c09b0dcfd45f37b8475", "version_major": 2, "version_minor": 0 }, "text/plain": [ "A Jupyter Widget" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tryp_op.view(style=\"Licorice\",sel=\"resid 40\",hold=True)\n", "tryp_op.view(style=\"Lines\",sel=\"same residue as exwithin 4 of resid 40\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Preparation report" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "If the `return_details` argument is set, an object of type `pandas.DataFrame` is returned as a **second** return value. It carries a wealth of information on the preparation results. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2021-11-16 09:54:27,991 - moleculekit.tools.preparation - WARNING - Both chains and segments are defined in Molecule.chain / Molecule.segid, however they are inconsistent. Protein preparation will use the chain information.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "---- Molecule chain report ----\n", "Chain A:\n", " First residue: ILE:16:\n", " Final residue: HOH:809:\n", "---- End of chain report ----\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2021-11-16 09:54:30,079 - moleculekit.tools.preparation - WARNING - The following residues have not been optimized: BEN, CA\n", "2021-11-16 09:54:30,177 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:22 to CYX\n", "2021-11-16 09:54:30,177 - moleculekit.tools.preparation - INFO - Modified residue HIS:A:40 to HIE\n", "2021-11-16 09:54:30,178 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:42 to CYX\n", "2021-11-16 09:54:30,178 - moleculekit.tools.preparation - INFO - Modified residue HIS:A:57 to HIP\n", "2021-11-16 09:54:30,178 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:58 to CYX\n", "2021-11-16 09:54:30,179 - moleculekit.tools.preparation - INFO - Modified residue HIS:A:91 to HID\n", "2021-11-16 09:54:30,179 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:128 to CYX\n", "2021-11-16 09:54:30,179 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:136 to CYX\n", "2021-11-16 09:54:30,180 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:157 to CYX\n", "2021-11-16 09:54:30,180 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:168 to CYX\n", "2021-11-16 09:54:30,181 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:182 to CYX\n", "2021-11-16 09:54:30,182 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:191 to CYX\n", "2021-11-16 09:54:30,182 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:201 to CYX\n", "2021-11-16 09:54:30,183 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:220 to CYX\n", "2021-11-16 09:54:30,183 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:232 to CYX\n", "2021-11-16 09:54:30,188 - moleculekit.tools.preparation - WARNING - Dubious protonation state: the pKa of 3 residues is within 1.0 units of pH 7.4.\n", "2021-11-16 09:54:30,188 - moleculekit.tools.preparation - WARNING - Dubious protonation state: ILE 16 A (pKa= 7.41)\n", "2021-11-16 09:54:30,189 - moleculekit.tools.preparation - WARNING - Dubious protonation state: TYR 39 A (pKa= 8.24)\n", "2021-11-16 09:54:30,189 - moleculekit.tools.preparation - WARNING - Dubious protonation state: HIS 57 A (pKa= 7.44)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
resnameprotonationresidinsertionchainsegidpKaburied
0ILEILE16A07.4130750.839286
1VALVAL17A0NaNNaN
2GLYGLY18A0NaNNaN
3GLYGLY19A0NaNNaN
4TYRTYR20A09.5908450.146429
...........................
282HOHWAT804A1NaNNaN
283HOHWAT805A1NaNNaN
284HOHWAT807A1NaNNaN
285HOHWAT808A1NaNNaN
286HOHWAT809A1NaNNaN
\n", "

287 rows × 8 columns

\n", "
" ], "text/plain": [ " resname protonation resid insertion chain segid pKa buried\n", "0 ILE ILE 16 A 0 7.413075 0.839286\n", "1 VAL VAL 17 A 0 NaN NaN\n", "2 GLY GLY 18 A 0 NaN NaN\n", "3 GLY GLY 19 A 0 NaN NaN\n", "4 TYR TYR 20 A 0 9.590845 0.146429\n", ".. ... ... ... ... ... ... ... ...\n", "282 HOH WAT 804 A 1 NaN NaN\n", "283 HOH WAT 805 A 1 NaN NaN\n", "284 HOH WAT 807 A 1 NaN NaN\n", "285 HOH WAT 808 A 1 NaN NaN\n", "286 HOH WAT 809 A 1 NaN NaN\n", "\n", "[287 rows x 8 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tryp_op, df = systemPrepare(tryp, return_details=True)\n", "df" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Most of it is accessible in the `data` property, as a [pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['resname', 'protonation', 'resid', 'insertion', 'chain', 'segid', 'pKa',\n", " 'buried'],\n", " dtype='object')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
resnameresidpKaprotonation
0ILE167.413075ILE
1VAL17NaNVAL
2GLY18NaNGLY
3GLY19NaNGLY
4TYR209.590845TYR
5THR21NaNTHR
6CYS2299.990000CYX
7GLY23NaNGLY
8ALA24NaNALA
9ASN25NaNASN
\n", "
" ], "text/plain": [ " resname resid pKa protonation\n", "0 ILE 16 7.413075 ILE\n", "1 VAL 17 NaN VAL\n", "2 GLY 18 NaN GLY\n", "3 GLY 19 NaN GLY\n", "4 TYR 20 9.590845 TYR\n", "5 THR 21 NaN THR\n", "6 CYS 22 99.990000 CYX\n", "7 GLY 23 NaN GLY\n", "8 ALA 24 NaN ALA\n", "9 ASN 25 NaN ASN" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[:,['resname','resid','pKa','protonation']].head(10)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "As such, it can be easily queried and written as a spreadsheet in Excel or CSV format." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "df.to_csv(\"./tryp_data.csv\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Special case: Membrane proteins" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Membrane-embedded proteins are in contact with an hydrophobic region which may alter pKa values for membrane-exposed residues (image taken from [Teixeira et al., J. Chem. Theory Comput., 2016, 12 (3), pp 930–934](http://dx.doi.org/10.1021/acs.jctc.5b01114))." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "![pka_membranes](img/pka_membranes.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Although the effect is not currently taken into account quantitatively, if a `hydrophobic_thickness` argument is provided, warnings will be generated for residues exposed to the lipid region." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The following example shows the preparation of the mu opioid receptor, 4DKL. \n", "The **pre-oriented** structure is retrieved from the OPM database." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2021-11-16 09:55:02,479 - moleculekit.molecule - INFO - Removed 2546 atoms. 4836 atoms remaining in the molecule.\n", "2021-11-16 09:55:02,510 - moleculekit.molecule - INFO - Removed 364 atoms. 4472 atoms remaining in the molecule.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "32.0\n", "\n", "---- Molecule chain report ----\n", "Chain A:\n", " First residue: MET:65:\n", " Final residue: ILE:352:\n", "Chain B:\n", " First residue: MET:65:\n", " Final residue: ILE:352:\n", "---- End of chain report ----\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2021-11-16 09:55:07,022 - moleculekit.tools.preparation - INFO - Modified residue ASP:A:114 to ASH\n", "2021-11-16 09:55:07,023 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:140 to CYX\n", "2021-11-16 09:55:07,024 - moleculekit.tools.preparation - INFO - Modified residue HIS:A:171 to HID\n", "2021-11-16 09:55:07,024 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:217 to CYX\n", "2021-11-16 09:55:07,025 - moleculekit.tools.preparation - INFO - Modified residue HIS:A:223 to HID\n", "2021-11-16 09:55:07,025 - moleculekit.tools.preparation - INFO - Modified residue HIS:A:297 to HID\n", "2021-11-16 09:55:07,025 - moleculekit.tools.preparation - INFO - Modified residue HIS:A:319 to HIE\n", "2021-11-16 09:55:07,026 - moleculekit.tools.preparation - INFO - Modified residue ASP:B:114 to ASH\n", "2021-11-16 09:55:07,026 - moleculekit.tools.preparation - INFO - Modified residue CYS:B:140 to CYX\n", "2021-11-16 09:55:07,027 - moleculekit.tools.preparation - INFO - Modified residue HIS:B:171 to HID\n", "2021-11-16 09:55:07,027 - moleculekit.tools.preparation - INFO - Modified residue CYS:B:217 to CYX\n", "2021-11-16 09:55:07,027 - moleculekit.tools.preparation - INFO - Modified residue HIS:B:223 to HID\n", "2021-11-16 09:55:07,027 - moleculekit.tools.preparation - INFO - Modified residue HIS:B:297 to HID\n", "2021-11-16 09:55:07,028 - moleculekit.tools.preparation - INFO - Modified residue HIS:B:319 to HIE\n", "2021-11-16 09:55:07,028 - moleculekit.tools.preparation - WARNING - Dubious protonation state: the pKa of 4 residues is within 1.0 units of pH 7.4.\n", "2021-11-16 09:55:07,030 - moleculekit.tools.preparation - WARNING - Dubious protonation state: MET 65 A (pKa= 7.76)\n", "2021-11-16 09:55:07,030 - moleculekit.tools.preparation - WARNING - Dubious protonation state: ASP 114 A (pKa= 7.85)\n", "2021-11-16 09:55:07,030 - moleculekit.tools.preparation - WARNING - Dubious protonation state: MET 65 B (pKa= 7.76)\n", "2021-11-16 09:55:07,031 - moleculekit.tools.preparation - WARNING - Dubious protonation state: ASP 114 B (pKa= 7.85)\n", "2021-11-16 09:55:07,037 - moleculekit.tools.preparation - WARNING - Predictions for 18 residues may be incorrect because they are exposed to the membrane (-16.0