HTMD Molecules#
Molecule getters and setters#
The
Molecule
objects can be accessed and/or manipulated.This is principally done through a pair of methods which are known as “getter/setter” methods in the object-oriented jargon.
Molecule.get()
to access propertiesMolecule.set()
to change properties
The following sections will show the usage of property getters and setters in a number of real-world tasks.
First, let’s import HTMD and load 3PTB into a Molecule
object:
from htmd.ui import *
mol = Molecule('3PTB')
2018-02-24 04:26:31,141 - htmd.molecule.readers - INFO - Using local copy for 3PTB: /home/joao/maindisk/software/repos/Acellera/htmd/htmd/data/pdb/3ptb.pdb
2018-02-24 04:26:31,269 - htmd.molecule.molecule - WARNING - Residue insertions were detected in the Molecule. It is recommended to renumber the residues using the Molecule.renumberResidues() method.
Please cite HTMD: Doerr et al.(2016)JCTC,12,1845.
https://dx.doi.org/10.1021/acs.jctc.6b00049
Documentation: http://software.acellera.com/
To update: conda update htmd -c acellera -c psi4
You are on the latest HTMD version (unpackaged : /home/joao/maindisk/software/repos/Acellera/htmd/htmd).
Now let’s show-case some examples.
Check the residue IDs of Cysteines in the Molecule
#
In order to get the residue IDs of cysteines in the Molecule
, one
can do:
mol.get('resid', sel='resname CYS')
array([ 22, 22, 22, 22, 22, 22, 42, 42, 42, 42, 42, 42, 58,
58, 58, 58, 58, 58, 128, 128, 128, 128, 128, 128, 136, 136,
136, 136, 136, 136, 157, 157, 157, 157, 157, 157, 168, 168, 168,
168, 168, 168, 182, 182, 182, 182, 182, 182, 191, 191, 191, 191,
191, 191, 201, 201, 201, 201, 201, 201, 220, 220, 220, 220, 220,
220, 232, 232, 232, 232, 232, 232])
Note: residue IDs are outputted multiple times. This is due to the fact that one value is returned per matched atom, and there are 6 atoms resolved per Cys residue.
The names of the 6 atoms of cysteine residue 58 can be checked with:
mol.get('name','resname CYS and resid 58')
array(['N', 'CA', 'C', 'O', 'CB', 'SG'], dtype=object)
To obtain one residue ID per residue, one can either,
further restrict the selection to carbon α atoms (for example):
mol.get('resid',sel='name CA and resname CYS')
array([ 22, 42, 58, 128, 136, 157, 168, 182, 191, 201, 220, 232])
or,
take advantage of
python
and use thenumpy.unique
function to remove repeated entries:
np.unique(mol.get('resid',sel='resname CYS'))
array([ 22, 42, 58, 128, 136, 157, 168, 182, 191, 201, 220, 232])
Note: numpy
is imported as np
when from htmd.ui import *
is ran.
Retrieve the coordinates of atoms#
This is done by accessing the Molecule.coords
property. It is a
special property, since it returns a 3-column vector (for the x, y, z
coordinates).
mol.get('coords','resname CYS and resid 58 and name CA')
array([[ 4.23999977, 16.49500084, 27.98600006]], dtype=float32)
Note: the precision is restricted to the one in the PDB file.
If more than one atom (say, N>1) are selected, an N x 3
matrix is
returned:
mol.get('coords','resname CYS and resid 58')
array([[ 5.12200022, 16.71899986, 26.86300087],
[ 4.23999977, 16.49500084, 27.98600006],
[ 4.87400007, 16.95800018, 29.29999924],
[ 4.23799992, 16.76399994, 30.36199951],
[ 3.94099998, 14.9989996 , 28.07099915],
[ 2.79200006, 14.45199966, 26.72200012]], dtype=float32)
Know the chains or segments present in the Molecule
#
The chains present in the Molecule
can be known using:
np.unique(mol.get('chain'))
array(['A'], dtype=object)
which means that every atom is assigned to the same chain (i.e., chain A).
The same can be done for segments:
np.unique(mol.get('segid'))
array(['0', '1'], dtype=object)
These segment IDs may need to be changed for system building.
Set properties of the Molecule
#
Molecule.set
can be used to set or change specific fields of the
molecule or a selection.
For example, it can create a segment ID called ‘P’ for all protein atoms:
mol.set('segid','P','protein')
np.unique(mol.get('segid', 'protein'))
array(['P'], dtype=object)
Another useful task for Molecule.set
is to rename residues.
For example, to change the residue name of all histidine residues to ‘HSN’:
mol.set('resname','HSN','resname HIS')
mol.get('resid',sel='resname HSN')
array([40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 57, 57, 57, 57, 57, 57, 57,
57, 57, 57, 91, 91, 91, 91, 91, 91, 91, 91, 91, 91])
Count the number of waters in the Molecule
#
Get the indices of atoms that were recognized as water:
mol.get('serial',sel='water')
array([1641, 1642, 1643, 1644, 1645, 1646, 1647, 1648, 1649, 1650, 1651,
1652, 1653, 1654, 1655, 1656, 1657, 1658, 1659, 1660, 1661, 1662,
1663, 1664, 1665, 1666, 1667, 1668, 1669, 1670, 1671, 1672, 1673,
1674, 1675, 1676, 1677, 1678, 1679, 1680, 1681, 1682, 1683, 1684,
1685, 1686, 1687, 1688, 1689, 1690, 1691, 1692, 1693, 1694, 1695,
1696, 1697, 1698, 1699, 1700, 1701, 1702])
Note: in this case, the hydrogens of waters are not present, so we
only get one index per water without using the np.unique
function.
Then, the number of waters present in the Molecule
can be obtained
with:
len(mol.get('serial',sel='water and name O'))
62
Alternatively, the Molecule.atomselect
method can be used. It
returns a vector of boolean values:
mol.atomselect('water and name O')
array([False, False, False, ..., True, True, True], dtype=bool)
The fact that True
counts as 1 in the sum
function can be used
to obtain, through a different way, the number of waters:
selection = mol.atomselect('water and name O')
print(selection)
sum(selection)
[False False False ..., True True True]
62
Duplicate, modify and write Molecule
objects#
Use Molecule.copy
to duplicate the molecule into a new object:
newmol = mol.copy()
Molecule.write
can be used to output a PDB file of your whole
molecule (or just a selection). The following command uses the above
copied molecule newmol
to write out a PDB file of the benzamidine
ligand atoms present in the fetched PDB file, except for hydrogen atoms:
newmol.write('/tmp/ligand.pdb','resname BEN and noh')
Molecule.filter
can be used to select specific parts of the molecule
(e.g. chains, segments) and clean/remove all the rest.
For example, clean all except for protein atoms in chain A:
mol.filter('chain A and protein')
mol.get('serial', sel='not chain A or not protein')
2018-02-24 04:27:52,480 - htmd.molecule.molecule - INFO - Removed 72 atoms. 1629 atoms remaining in the molecule.
array([], dtype=int64)
Joining molecules/segments#
Molecule.append
appends a Molecule object (e.g. ligand, water or ion
segments, etc.) to another molecule object (e.g. protein, membrane).
For example, to append the benzamidine ligand (saved above as a PDB file) to the molecule we are working with (which is now only the protein chain A), simply do:
ligand = Molecule('/tmp/ligand.pdb')
print(mol.get('name', 'resname BEN'))
mol.append(ligand)
print(mol.get('name', 'resname BEN'))
[]
['C1' 'C2' 'C3' 'C4' 'C5' 'C6' 'C' 'N1' 'N2']
You can also add an atom using Molecule.insert
. This method allows
to choose the index at which to insert the Molecule
object:
atom = Molecule()
atom.empty(1)
atom.record = 'ATOM'
atom.name = 'CA'
atom.resid = 0
atom.resname = 'XXX'
atom.chain = 'X'
atom.coords = [6, 3, 2]
newligand = ligand.copy()
newligand.insert(atom, 0)
for field in ['name', 'resid', 'resname', 'chain', 'coords']:
print(field, newligand.get(field, sel='index 0'))
name ['CA']
resid [0]
resname ['XXX']
chain ['X']
coords [[ 6. 3. 2.]]
Playing with coordinates#
Coordinates can be used to perform geometric tasks on your molecule, such as translations or rotations.
For example, calculate the geometric center of your molecule:
coo = mol.get('coords')
c = np.mean(coo, axis=0)
Then, Molecule.moveBy
can be used to translate the molecule and
center it on the origin [0, 0, 0] using the previosly calculated center
c
:
mol.moveBy(-c)
Finally, check the new center (which is [0, 0, 0] within the precision):
np.mean(mol.get('coords'),axis=0)
array([ 2.15420940e-07, -2.36497249e-06, 4.63504330e-06], dtype=float32)
Rotations#
Another common task is rotations (e.g. to build a protein - ligand system). For example, load up your ligand and visualize its orientation:
ligand = Molecule('3PTB')
ligand.filter('resname BEN')
ligand_orig = ligand.copy()
2018-02-24 04:28:23,209 - htmd.molecule.readers - INFO - Using local copy for 3PTB: /home/joao/maindisk/software/repos/Acellera/htmd/htmd/data/pdb/3ptb.pdb
2018-02-24 04:28:23,332 - htmd.molecule.molecule - WARNING - Residue insertions were detected in the Molecule. It is recommended to renumber the residues using the Molecule.renumberResidues() method.
2018-02-24 04:28:23,367 - htmd.molecule.molecule - INFO - Removed 1692 atoms. 9 atoms remaining in the molecule.
Then, calculate its geometric center and use Molecule.rotateBy
to
rotate your ligand with the use of a rotation matrix M
:
ligcenter = np.mean(ligand.get('coords'),axis=0)
M = uniformRandomRotation()
ligand.rotateBy(M, center=ligcenter)
One can use the transpose of rotation matrix M
to rotate the ligand
back to the original position and verify its coordinates are the same
within precision:
ligand.rotateBy(np.transpose(M), center=ligcenter)
np.allclose(ligand.get('coords'), ligand_orig.get('coords'))
True
Another example is to rotate around an axis
ligand.rotateBy(rotationMatrix([1, 0, 0], math.pi),center=[0, 0, 0])
Note: the uniformRandomRotation()
function provides the
transformation matrix for uniformly distributed random rotation.
Working with MD trajectories#
Molecule
also allows to: * read MD trajectories:
from htmd.home import home
molTraj = Molecule(os.path.join(home(dataDir='dhfr'), 'dhfr.psf'))
molTraj.read(os.path.join(home(dataDir='dhfr'), 'dhfr.xtc'))
molTraj.numFrames
2018-02-24 04:28:37,143 - htmd.molecule.readers - WARNING - No time information read from /home/joao/maindisk/software/repos/Acellera/htmd/htmd/data/dhfr/dhfr.xtc. Defaulting to 0.1ns framestep.
100
and manipulate them (e.g. drop/keep frames):
frame1 = molTraj.copy()
frame1.dropFrames(keep=0)
frame1.numFrames
1
last10frames = molTraj.copy()
last10frames.dropFrames(drop=range(molTraj.numFrames)[-10:])
last10frames.numFrames
90
Biochemistry of HTMD Molecules#
Print the protein sequence:
mol = Molecule('3PTB')
mol.sequence()
2018-02-24 04:28:49,658 - htmd.molecule.readers - INFO - Using local copy for 3PTB: /home/joao/maindisk/software/repos/Acellera/htmd/htmd/data/pdb/3ptb.pdb
2018-02-24 04:28:49,793 - htmd.molecule.molecule - WARNING - Residue insertions were detected in the Molecule. It is recommended to renumber the residues using the Molecule.renumberResidues() method.
{'0': 'IVGGYTCGANTVPYQVSLNSGYHFCGGSLINSQWVVSAAHCYKSGIQVRLGEDNINVVEGNEQFISASKSIVHPSYNSNTLNNDIMLIKLKSAASLNSRVASISLPTSCASAGTQCLISGWGNTKSSGTSYPDVLKCLKAPILSDSSCKSAYPGQITSNMFCAGYLEGGKDSCQGDSGGPVVCSGKLQGIVSWGSGCAQKNKPGVYTKVCNYVSWIKQTIASN'}
Create mutant proteins by mutating residues:
mutant = mol.copy()
print(mol.get('resname', 'resid 158'))
mutant.mutateResidue('resid 158', 'ARG')
print(mutant.get('resname', 'resid 158'))
['LEU' 'LEU' 'LEU' 'LEU' 'LEU' 'LEU' 'LEU' 'LEU']
['ARG' 'ARG' 'ARG' 'ARG']
Exercise 1: find residues in contact with a ligand#
#Load the 'clean' molecule with 3PTB once again
mol = Molecule('3PTB')
#Identify residues in contact with the benzamidine ligand.
#We exploit the fact that there is exactly one carbon alpha (`name CA`) atom per residue.
mol.get('resid', sel='name CA and same residue as protein within 4 of resname BEN')
Exercise 2: find duplicate residues#
#Load the 'clean' 3PTB molecule once again:
mol = Molecule('3PTB')
Solution 1.#
Identify duplicate residues, relying on PDB’s insertion attribute:
# Quick and dirty
np.unique(mol.get('resid', sel='insertion A'))
# Or equivantly, more explicit and pretty-printed
ia = mol.copy()
ia.filter("insertion A and name CA")
rid = ia.get('resid') # ia.resid also works!
rn = ia.get('resname')
for f, b in zip(rn, rid):
print(f, b)
Solution 2.#
If one doesn’t want to rely on the insertion property, we can process the list of gaps between residues.
#Let's try to use the power of the return_counts property of numpy.unique
dups = mol.copy() # Make a working copy
dups.filter("name CA and protein") # Keep only C-alphas
rid = dups.get('resid') # Get list of residue ids
nrid, count= np.unique(rid,return_counts=True)
# Get how many times each residue id appeared
nrid[count>1] # Show duplicates
Exercise 3: find gaps in the sequence of residue numbers#
#We can exploit the numpy.diff() function to compute deltas between array elements.
# array with the "holes":
# - 0 means duplicate residues
# - >1 means segments of protein missing
deltas = np.diff(rid)
print(deltas)
#Print residue IDs at which the holes (including duplicate residues) occur
new_rid = rid[:np.size(rid)-1] # no delta for last residue
new_rid[deltas!=1]
# Try printing a prettier and more explicit output
# Iterate over all residues (excluding last one)
rn = dups.get('resname')
for i in range(np.size(rid)-1):
# If there is a break...
if(deltas[i]>1):
# Remember that deltas[i]=rid[i+1]-rid[i]
print(rid[i],rn[i],' followed by ',rid[i+1],rn[i+1])