HTMD Molecules

Molecule getters and setters

  • The Molecule objects can be accessed and/or manipulated.
  • This is principally done through a pair of methods which are known as “getter/setter” methods in the object-oriented jargon.
  • Molecule.get() to access properties
  • Molecule.set() to change properties
  • The following sections will show the usage of property getters and setters in a number of real-world tasks.

First, let’s import HTMD and load 3PTB into a Molecule object:

from htmd.ui import *
mol = Molecule('3PTB')
2018-02-24 04:26:31,141 - htmd.molecule.readers - INFO - Using local copy for 3PTB: /home/joao/maindisk/software/repos/Acellera/htmd/htmd/data/pdb/3ptb.pdb
2018-02-24 04:26:31,269 - htmd.molecule.molecule - WARNING - Residue insertions were detected in the Molecule. It is recommended to renumber the residues using the Molecule.renumberResidues() method.
Please cite HTMD: Doerr et al.(2016)JCTC,12,1845.
https://dx.doi.org/10.1021/acs.jctc.6b00049
Documentation: http://software.acellera.com/
To update: conda update htmd -c acellera -c psi4

You are on the latest HTMD version (unpackaged : /home/joao/maindisk/software/repos/Acellera/htmd/htmd).

Now let’s show-case some examples.

Check the residue IDs of Cysteines in the Molecule

In order to get the residue IDs of cysteines in the Molecule, one can do:

mol.get('resid', sel='resname CYS')
array([ 22,  22,  22,  22,  22,  22,  42,  42,  42,  42,  42,  42,  58,
        58,  58,  58,  58,  58, 128, 128, 128, 128, 128, 128, 136, 136,
       136, 136, 136, 136, 157, 157, 157, 157, 157, 157, 168, 168, 168,
       168, 168, 168, 182, 182, 182, 182, 182, 182, 191, 191, 191, 191,
       191, 191, 201, 201, 201, 201, 201, 201, 220, 220, 220, 220, 220,
       220, 232, 232, 232, 232, 232, 232])

Note: residue IDs are outputted multiple times. This is due to the fact that one value is returned per matched atom, and there are 6 atoms resolved per Cys residue.

The names of the 6 atoms of cysteine residue 58 can be checked with:

mol.get('name','resname CYS and resid 58')
array(['N', 'CA', 'C', 'O', 'CB', 'SG'], dtype=object)

To obtain one residue ID per residue, one can either,

  • further restrict the selection to carbon α atoms (for example):
mol.get('resid',sel='name CA and resname CYS')
array([ 22,  42,  58, 128, 136, 157, 168, 182, 191, 201, 220, 232])

or,

  • take advantage of python and use the numpy.unique function to remove repeated entries:
np.unique(mol.get('resid',sel='resname CYS'))
array([ 22,  42,  58, 128, 136, 157, 168, 182, 191, 201, 220, 232])

Note: numpy is imported as np when from htmd.ui import * is ran.

Retrieve the coordinates of atoms

This is done by accessing the Molecule.coords property. It is a special property, since it returns a 3-column vector (for the x, y, z coordinates).

mol.get('coords','resname CYS and resid 58 and name CA')
array([[  4.23999977,  16.49500084,  27.98600006]], dtype=float32)

Note: the precision is restricted to the one in the PDB file.

If more than one atom (say, N>1) are selected, an N x 3 matrix is returned:

mol.get('coords','resname CYS and resid 58')
array([[  5.12200022,  16.71899986,  26.86300087],
       [  4.23999977,  16.49500084,  27.98600006],
       [  4.87400007,  16.95800018,  29.29999924],
       [  4.23799992,  16.76399994,  30.36199951],
       [  3.94099998,  14.9989996 ,  28.07099915],
       [  2.79200006,  14.45199966,  26.72200012]], dtype=float32)

Know the chains or segments present in the Molecule

The chains present in the Molecule can be known using:

np.unique(mol.get('chain'))
array(['A'], dtype=object)

which means that every atom is assigned to the same chain (i.e., chain A).

The same can be done for segments:

np.unique(mol.get('segid'))
array(['0', '1'], dtype=object)

These segment IDs may need to be changed for system building.

Set properties of the Molecule

Molecule.set can be used to set or change specific fields of the molecule or a selection.

For example, it can create a segment ID called ‘P’ for all protein atoms:

mol.set('segid','P','protein')
np.unique(mol.get('segid', 'protein'))
array(['P'], dtype=object)

Another useful task for Molecule.set is to rename residues.

For example, to change the residue name of all histidine residues to ‘HSN’:

mol.set('resname','HSN','resname HIS')
mol.get('resid',sel='resname HSN')
array([40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 57, 57, 57, 57, 57, 57, 57,
       57, 57, 57, 91, 91, 91, 91, 91, 91, 91, 91, 91, 91])

Count the number of waters in the Molecule

Get the indices of atoms that were recognized as water:

mol.get('serial',sel='water')
array([1641, 1642, 1643, 1644, 1645, 1646, 1647, 1648, 1649, 1650, 1651,
       1652, 1653, 1654, 1655, 1656, 1657, 1658, 1659, 1660, 1661, 1662,
       1663, 1664, 1665, 1666, 1667, 1668, 1669, 1670, 1671, 1672, 1673,
       1674, 1675, 1676, 1677, 1678, 1679, 1680, 1681, 1682, 1683, 1684,
       1685, 1686, 1687, 1688, 1689, 1690, 1691, 1692, 1693, 1694, 1695,
       1696, 1697, 1698, 1699, 1700, 1701, 1702])

Note: in this case, the hydrogens of waters are not present, so we only get one index per water without using the np.unique function.

Then, the number of waters present in the Molecule can be obtained with:

len(mol.get('serial',sel='water and name O'))
62

Alternatively, the Molecule.atomselect method can be used. It returns a vector of boolean values:

mol.atomselect('water and name O')
array([False, False, False, ...,  True,  True,  True], dtype=bool)

The fact that True counts as 1 in the sum function can be used to obtain, through a different way, the number of waters:

selection = mol.atomselect('water and name O')
print(selection)
sum(selection)
[False False False ...,  True  True  True]
62

Duplicate, modify and write Molecule objects

Use Molecule.copy to duplicate the molecule into a new object:

newmol = mol.copy()

Molecule.write can be used to output a PDB file of your whole molecule (or just a selection). The following command uses the above copied molecule newmol to write out a PDB file of the benzamidine ligand atoms present in the fetched PDB file, except for hydrogen atoms:

newmol.write('/tmp/ligand.pdb','resname BEN and noh')

Molecule.filter can be used to select specific parts of the molecule (e.g. chains, segments) and clean/remove all the rest.

For example, clean all except for protein atoms in chain A:

mol.filter('chain A and protein')
mol.get('serial', sel='not chain A or not protein')
2018-02-24 04:27:52,480 - htmd.molecule.molecule - INFO - Removed 72 atoms. 1629 atoms remaining in the molecule.
array([], dtype=int64)

Joining molecules/segments

Molecule.append appends a Molecule object (e.g. ligand, water or ion segments, etc.) to another molecule object (e.g. protein, membrane).

For example, to append the benzamidine ligand (saved above as a PDB file) to the molecule we are working with (which is now only the protein chain A), simply do:

ligand = Molecule('/tmp/ligand.pdb')
print(mol.get('name', 'resname BEN'))
mol.append(ligand)
print(mol.get('name', 'resname BEN'))
[]
['C1' 'C2' 'C3' 'C4' 'C5' 'C6' 'C' 'N1' 'N2']

You can also add an atom using Molecule.insert. This method allows to choose the index at which to insert the Molecule object:

atom = Molecule()
atom.empty(1)
atom.record = 'ATOM'
atom.name = 'CA'
atom.resid = 0
atom.resname = 'XXX'
atom.chain = 'X'
atom.coords = [6, 3, 2]
newligand = ligand.copy()
newligand.insert(atom, 0)
for field in ['name', 'resid', 'resname', 'chain', 'coords']:
    print(field, newligand.get(field, sel='index 0'))
name ['CA']
resid [0]
resname ['XXX']
chain ['X']
coords [[ 6.  3.  2.]]

Playing with coordinates

Coordinates can be used to perform geometric tasks on your molecule, such as translations or rotations.

For example, calculate the geometric center of your molecule:

coo = mol.get('coords')
c = np.mean(coo, axis=0)

Then, Molecule.moveBy can be used to translate the molecule and center it on the origin [0, 0, 0] using the previosly calculated center c:

mol.moveBy(-c)

Finally, check the new center (which is [0, 0, 0] within the precision):

np.mean(mol.get('coords'),axis=0)
array([  2.15420940e-07,  -2.36497249e-06,   4.63504330e-06], dtype=float32)

Rotations

Another common task is rotations (e.g. to build a protein - ligand system). For example, load up your ligand and visualize its orientation:

ligand = Molecule('3PTB')
ligand.filter('resname BEN')
ligand_orig = ligand.copy()
2018-02-24 04:28:23,209 - htmd.molecule.readers - INFO - Using local copy for 3PTB: /home/joao/maindisk/software/repos/Acellera/htmd/htmd/data/pdb/3ptb.pdb
2018-02-24 04:28:23,332 - htmd.molecule.molecule - WARNING - Residue insertions were detected in the Molecule. It is recommended to renumber the residues using the Molecule.renumberResidues() method.
2018-02-24 04:28:23,367 - htmd.molecule.molecule - INFO - Removed 1692 atoms. 9 atoms remaining in the molecule.

Then, calculate its geometric center and use Molecule.rotateBy to rotate your ligand with the use of a rotation matrix M:

ligcenter = np.mean(ligand.get('coords'),axis=0)
M = uniformRandomRotation()
ligand.rotateBy(M, center=ligcenter)

One can use the transpose of rotation matrix M to rotate the ligand back to the original position and verify its coordinates are the same within precision:

ligand.rotateBy(np.transpose(M), center=ligcenter)
np.allclose(ligand.get('coords'), ligand_orig.get('coords'))
True

Another example is to rotate around an axis

ligand.rotateBy(rotationMatrix([1, 0, 0], math.pi),center=[0, 0, 0])

Note: the uniformRandomRotation() function provides the transformation matrix for uniformly distributed random rotation.

Working with MD trajectories

Molecule also allows to: * read MD trajectories:

from htmd.home import home
molTraj = Molecule(os.path.join(home(dataDir='dhfr'), 'dhfr.psf'))
molTraj.read(os.path.join(home(dataDir='dhfr'), 'dhfr.xtc'))
molTraj.numFrames
2018-02-24 04:28:37,143 - htmd.molecule.readers - WARNING - No time information read from /home/joao/maindisk/software/repos/Acellera/htmd/htmd/data/dhfr/dhfr.xtc. Defaulting to 0.1ns framestep.
100
  • and manipulate them (e.g. drop/keep frames):
frame1 = molTraj.copy()
frame1.dropFrames(keep=0)
frame1.numFrames
1
last10frames = molTraj.copy()
last10frames.dropFrames(drop=range(molTraj.numFrames)[-10:])
last10frames.numFrames
90

Biochemistry of HTMD Molecules

Print the protein sequence:

mol = Molecule('3PTB')
mol.sequence()
2018-02-24 04:28:49,658 - htmd.molecule.readers - INFO - Using local copy for 3PTB: /home/joao/maindisk/software/repos/Acellera/htmd/htmd/data/pdb/3ptb.pdb
2018-02-24 04:28:49,793 - htmd.molecule.molecule - WARNING - Residue insertions were detected in the Molecule. It is recommended to renumber the residues using the Molecule.renumberResidues() method.
{'0': 'IVGGYTCGANTVPYQVSLNSGYHFCGGSLINSQWVVSAAHCYKSGIQVRLGEDNINVVEGNEQFISASKSIVHPSYNSNTLNNDIMLIKLKSAASLNSRVASISLPTSCASAGTQCLISGWGNTKSSGTSYPDVLKCLKAPILSDSSCKSAYPGQITSNMFCAGYLEGGKDSCQGDSGGPVVCSGKLQGIVSWGSGCAQKNKPGVYTKVCNYVSWIKQTIASN'}

Create mutant proteins by mutating residues:

mutant = mol.copy()
print(mol.get('resname', 'resid 158'))
mutant.mutateResidue('resid 158', 'ARG')
print(mutant.get('resname', 'resid 158'))
['LEU' 'LEU' 'LEU' 'LEU' 'LEU' 'LEU' 'LEU' 'LEU']
['ARG' 'ARG' 'ARG' 'ARG']

Exercise 1: find residues in contact with a ligand

#Load the 'clean' molecule with 3PTB once again
mol = Molecule('3PTB')
#Identify residues in contact with the benzamidine ligand.
#We exploit the fact that there is exactly one carbon alpha (`name CA`) atom per residue.
mol.get('resid', sel='name CA and same residue as protein within 4 of resname BEN')

Exercise 2: find duplicate residues

#Load the 'clean' 3PTB molecule once again:
mol = Molecule('3PTB')

Solution 1.

Identify duplicate residues, relying on PDB’s insertion attribute:

# Quick and dirty
np.unique(mol.get('resid', sel='insertion A'))
# Or equivantly, more explicit and pretty-printed
ia = mol.copy()
ia.filter("insertion A and name CA")
rid = ia.get('resid') # ia.resid also works!
rn = ia.get('resname')

for f, b in zip(rn, rid):
    print(f, b)

Solution 2.

If one doesn’t want to rely on the insertion property, we can process the list of gaps between residues.

#Let's try to use the power of the return_counts property of numpy.unique
dups = mol.copy()                       # Make a working copy
dups.filter("name CA and protein")      # Keep only C-alphas
rid = dups.get('resid')                 # Get list of residue ids

nrid, count= np.unique(rid,return_counts=True)
                                        # Get how many times each residue id appeared
nrid[count>1]                           # Show duplicates

Exercise 3: find gaps in the sequence of residue numbers

#We can exploit the numpy.diff() function to compute deltas between array elements.
# array with the "holes":
# - 0 means duplicate residues
# - >1 means segments of protein missing
deltas = np.diff(rid)
print(deltas)
#Print residue IDs at which the holes (including duplicate residues) occur
new_rid = rid[:np.size(rid)-1] # no delta for last residue
new_rid[deltas!=1]
# Try printing a prettier and more explicit output
# Iterate over all residues (excluding last one)
rn = dups.get('resname')
for i in range(np.size(rid)-1):
    # If there is a break...
    if(deltas[i]>1):
        # Remember that deltas[i]=rid[i+1]-rid[i]
        print(rid[i],rn[i],' followed by ',rid[i+1],rn[i+1])