This tutorial will introduce you to how to obtain voxel descriptors of pharmacophoric-like properties of your protein and ligands, ready for machine learning applications such as the ones used in KDeep, DeepSite and more. For more information and applications read the following references:
This tutorial will only work with the VMD viewer.
from moleculekit.molecule import Molecule from moleculekit.tools.voxeldescriptors import getVoxelDescriptors, viewVoxelFeatures from moleculekit.tools.atomtyper import prepareProteinForAtomtyping from moleculekit.smallmol.smallmol import SmallMol from moleculekit.home import home import os
# First let's read our two favourite protein and ligand structures tut_data = home(dataDir='test-voxeldescriptors') prot = Molecule(os.path.join(tut_data, '3PTB.pdb')) slig = SmallMol(os.path.join(tut_data, 'benzamidine.mol2'))
Atom typing for the protein requires the protein to be protonated and to
include the atom bond information. Currently the Molecule contains does
not contain all bond information as you can see by checking prot.bonds.
prepareProteinForAtomtyping will perform most necessary operations
such as removing non-protein atoms, adding hydrogens, guessing bonds and
guessing protein chains but you can perform those manually as well. Take
care however as the protonation will a) move atoms to optimize hydrogen
networks b) add missing sidechains and c) the bond guessing can go wrong
if atoms are very close to each other which can happen when adding
# If your structure is fully protonated and contains all bond information in prot.bonds skip this step! prot = prepareProteinForAtomtyping(prot) # We would suggest visualizing the results with prot.view(guessBonds=False) to show only the bonds # already stored in the Molecule. prot.view(guessBonds=False)
Now we can calculate the voxel information for the protein. By default
getVoxelDescriptors will calculate the bounding box of the molecule
and grid it into voxels. As we don’t use point properties but smooth
them out over space, we will add 1 A buffer space around the protein for
the voxelization grid so that the properties don’t cut off at the edge
of the grid.
prot_vox, prot_centers, prot_N = getVoxelDescriptors(prot, buffer=1)
Now let’s visualize the voxels. This will visualize each voxel channel as a separate VMD molecule so that you can inspect each individually. You can play around with the IsoValue in the IsoSurface representation of each channel to inspect it in more detail.
prot.view(guessBonds=False) viewVoxelFeatures(prot_vox, prot_centers, prot_N)
# For the ligand since it's small we could increase the voxel resolution if we so desire to 0.5 A instead of the default 1 A. lig_vox, lig_centers, lig_N = getVoxelDescriptors(slig, voxelsize=0.5, buffer=1) slig.view(guessBonds=False) viewVoxelFeatures(lig_vox, lig_centers, lig_N)
import torch import torch.nn as nn import torch.nn.functional as F import numpy as np
To perform 3D convolutions we need to reshape the data to
(nsamples, nchannels, d, h, w) with the last three dimensions
corresponding to the 3D elements. Our data is in
format so we first transpose and then reshape it. In this case it’s a
single sample so we just add a first dimension to the array.
nchannels = prot_vox.shape prot_vox_t = prot_vox.transpose().reshape([1, nchannels, prot_N, prot_N, prot_N]) prot_vox_t = torch.tensor(prot_vox_t.astype(np.float32)) lig_vox_t = lig_vox.transpose().reshape([1, nchannels, lig_N, lig_N, lig_N]) lig_vox_t = torch.tensor(lig_vox_t.astype(np.float32))
Let’s just test is on a simple pytorch model with 3D convolutions
class Model(nn.Module): def __init__(self): super(Model, self).__init__() self.conv1 = nn.Conv3d(nchannels, 20, 5) # (in_channels, out_channels, kernel_size) self.conv2 = nn.Conv3d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
model = Model() results = model.forward(lig_vox_t) print(results.shape)
That’s all for this tutorial. Time to go ahead and create the next coolest voxel-based predictor!