Structural alignment of proteins by sequence#

In the field of Bioinformatics it is usual to perform a BLAST search in order to find similar proteins to the one of your interest. If the structures of some of the results are available, you might want to align them all together to see the differences. These structures will usually have mutations, or vary in the number of residues and atom order in the PDB files, which makes simple, full-structural alignment functions fail.

Therefore, HTMD provides the function moleculekit.tools.sequenceStructureAlignment, which takes two proteins and aligns both structures using their longest sequence aligment.

In this example, we will use Dopamine receptor (PDB code: ‘3PBL’) and Beta Adrenergic receptor (PDB code: ‘3NYA’). Both are GPCR proteins, so they share a great fraction of their sequence. We will use this feature to align their structures.

Quick Example#

Adrenergic receptor will be used as the reference

from htmd.ui import *
from moleculekit.tools.sequencestructuralalignment import sequenceStructureAlignment

# We will use adrenaline receptor as the reference/template

# Load dopamine receptor and display in red
dop_receptor = Molecule('3PBL')
# The crystal is a dimer, so we discard one of the units
dop_receptor.filter('protein and chain A',_logger=False)
dop_receptor.view(style='NewCartoon',color='1',name='DopRec')

# Load adrenergic receptor and display in dark blue
adr_receptor = Molecule('3NYA')
adr_receptor.filter('protein')
adr_receptor.filter('resid 0 to 342',_logger=False) # filter out the G-protein
adr_receptor.view(style='NewCartoon',color='0',name='AdrRec')

# adr_receptor acts as the template
dop_receptor_results, _ = sequenceStructureAlignment(dop_receptor,adr_receptor)
# pick the top result
dop_receptor_aligned = dop_receptor_results[0]

# See the result, dopamine receptor is now aligned and displayed in green
dop_receptor_aligned.view(style='NewCartoon',color='7',name='DopAligned')
2024-06-12 13:11:06,516 - moleculekit.molecule - INFO - Removed 81 atoms. 3527 atoms remaining in the molecule.
2024-06-12 13:11:06,680 - moleculekit.tools.sequencestructuralalignment - WARNING - 20 alignments found. Limiting to 10 as specified in the maxalignments argument.
2024-06-12 13:11:06,681 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #0 was done on 65 residues: 89-153
2024-06-12 13:11:06,689 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #1 was done on 65 residues: 89-153
2024-06-12 13:11:06,696 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #2 was done on 65 residues: 89-153
2024-06-12 13:11:06,703 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #3 was done on 65 residues: 89-153
2024-06-12 13:11:06,716 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #4 was done on 65 residues: 89-153
2024-06-12 13:11:06,725 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #5 was done on 65 residues: 89-153
2024-06-12 13:11:06,733 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #6 was done on 65 residues: 89-153
2024-06-12 13:11:06,741 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #7 was done on 65 residues: 89-153
2024-06-12 13:11:06,748 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #8 was done on 65 residues: 89-153
2024-06-12 13:11:06,755 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #9 was done on 65 residues: 89-153

Detailed Explanation#

First, we load the dopamine receptor. The crystal is a dimer of two receptors, so we discard one of the units.

from htmd.ui import *
from moleculekit.tools.sequencestructuralalignment import sequenceStructureAlignment

dop_receptor = Molecule('3PBL')
dop_receptor.filter('protein and chain A') # discard one of the units
2024-06-12 13:11:13,446 - moleculekit.molecule - INFO - Removed 3398 atoms. 3389 atoms remaining in the molecule.
array([3389, 3390, 3391, ..., 6784, 6785, 6786], dtype=int32)

Next, we load the beta adrenergic receptor. The G-protein from the Adrenergic receptor (residues from 343 to last) is discarded, to ensure that these region is not used to align both proteins

adr_receptor = Molecule('3NYA')
adr_receptor.filter('protein')
adr_receptor.filter('resid 0 to 342') # discard the G-protein
2024-06-12 13:11:14,727 - moleculekit.molecule - INFO - Removed 81 atoms. 3527 atoms remaining in the molecule.
2024-06-12 13:11:14,746 - moleculekit.molecule - INFO - Removed 1275 atoms. 2252 atoms remaining in the molecule.
array([1589, 1590, 1591, ..., 2861, 2862, 2863], dtype=int32)

Let’s see the two proteins together before the alignment. The dopamine receptor is displayed in red.

dop_receptor.view(style='NewCartoon',color='1',name='DopRec') # visualize it in red
adr_receptor.view(style='NewCartoon',color='0',name='AdrRec') # visualize it in dark blue

Now, both proteins are ready to be aligned, so we call the sequenceStructureAlignment function. The second protein molecule passed to the function acts as the reference, in our case, the adrenaline receptor

dop_receptor_results, _ = sequenceStructureAlignment(dop_receptor,adr_receptor)
2024-06-12 13:11:22,156 - moleculekit.tools.sequencestructuralalignment - WARNING - 20 alignments found. Limiting to 10 as specified in the maxalignments argument.
2024-06-12 13:11:22,157 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #0 was done on 65 residues: 89-153
2024-06-12 13:11:22,164 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #1 was done on 65 residues: 89-153
2024-06-12 13:11:22,171 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #2 was done on 65 residues: 89-153
2024-06-12 13:11:22,178 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #3 was done on 65 residues: 89-153
2024-06-12 13:11:22,184 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #4 was done on 65 residues: 89-153
2024-06-12 13:11:22,191 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #5 was done on 65 residues: 89-153
2024-06-12 13:11:22,198 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #6 was done on 65 residues: 89-153
2024-06-12 13:11:22,205 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #7 was done on 65 residues: 89-153
2024-06-12 13:11:22,212 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #8 was done on 65 residues: 89-153
2024-06-12 13:11:22,219 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #9 was done on 65 residues: 89-153

dop_receptor_results stores different structural alignments, each using a different portion of the sequence alignment. By default, only the best 10 alignments are stored, but you can modify this behaviour setting the parameter maxalignments to an arbitrary number.

Let’s look at the best result by choosing the first item in dop_receptor_results. Dopamine receptor is shown in green.

dop_receptor_aligned = dop_receptor_results[0] # pick the best result
dop_receptor_aligned.view(style='NewCartoon',color='7',name='DopAligned')