{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Structural alignment of proteins by sequence"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the field of Bioinformatics it is usual to perform a BLAST search in order to find\n",
    "similar proteins to the one of your interest. If the structures of some of the results\n",
    "are available, you might want to align them all together to see the differences. \n",
    "These structures will usually have mutations, or vary in the number of residues and atom order in the PDB files, which makes simple, full-structural alignment functions fail.\n",
    "\n",
    "Therefore, HTMD provides the function ```moleculekit.tools.sequenceStructureAlignment```, which takes two proteins and\n",
    "aligns both structures using their longest **sequence** aligment.\n",
    "\n",
    "In this example, we will use Dopamine receptor (PDB code: '3PBL') and Beta Adrenergic receptor (PDB code: '3NYA').\n",
    "Both are GPCR proteins, so they share a great fraction of their sequence. We will use this feature to align their structures."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quick Example\n",
    "\n",
    "Adrenergic receptor will be used as the reference"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2024-06-12 13:11:06,516 - moleculekit.molecule - INFO - Removed 81 atoms. 3527 atoms remaining in the molecule.\n",
      "2024-06-12 13:11:06,680 - moleculekit.tools.sequencestructuralalignment - WARNING - 20 alignments found. Limiting to 10 as specified in the `maxalignments` argument.\n",
      "2024-06-12 13:11:06,681 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #0 was done on 65 residues: 89-153\n",
      "2024-06-12 13:11:06,689 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #1 was done on 65 residues: 89-153\n",
      "2024-06-12 13:11:06,696 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #2 was done on 65 residues: 89-153\n",
      "2024-06-12 13:11:06,703 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #3 was done on 65 residues: 89-153\n",
      "2024-06-12 13:11:06,716 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #4 was done on 65 residues: 89-153\n",
      "2024-06-12 13:11:06,725 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #5 was done on 65 residues: 89-153\n",
      "2024-06-12 13:11:06,733 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #6 was done on 65 residues: 89-153\n",
      "2024-06-12 13:11:06,741 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #7 was done on 65 residues: 89-153\n",
      "2024-06-12 13:11:06,748 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #8 was done on 65 residues: 89-153\n",
      "2024-06-12 13:11:06,755 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #9 was done on 65 residues: 89-153\n"
     ]
    }
   ],
   "source": [
    "from htmd.ui import *\n",
    "from moleculekit.tools.sequencestructuralalignment import sequenceStructureAlignment\n",
    "\n",
    "# We will use adrenaline receptor as the reference/template\n",
    "\n",
    "# Load dopamine receptor and display in red\n",
    "dop_receptor = Molecule('3PBL')\n",
    "# The crystal is a dimer, so we discard one of the units\n",
    "dop_receptor.filter('protein and chain A',_logger=False) \n",
    "dop_receptor.view(style='NewCartoon',color='1',name='DopRec')  \n",
    "\n",
    "# Load adrenergic receptor and display in dark blue\n",
    "adr_receptor = Molecule('3NYA')\n",
    "adr_receptor.filter('protein')\n",
    "adr_receptor.filter('resid 0 to 342',_logger=False) # filter out the G-protein\n",
    "adr_receptor.view(style='NewCartoon',color='0',name='AdrRec')\n",
    "\n",
    "# adr_receptor acts as the template\n",
    "dop_receptor_results, _ = sequenceStructureAlignment(dop_receptor,adr_receptor) \n",
    "# pick the top result\n",
    "dop_receptor_aligned = dop_receptor_results[0]\n",
    "\n",
    "# See the result, dopamine receptor is now aligned and displayed in green\n",
    "dop_receptor_aligned.view(style='NewCartoon',color='7',name='DopAligned')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Detailed Explanation\n",
    "\n",
    "First, we load the dopamine receptor. The crystal is a dimer of two receptors, so we discard one of the units."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2024-06-12 13:11:13,446 - moleculekit.molecule - INFO - Removed 3398 atoms. 3389 atoms remaining in the molecule.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "array([3389, 3390, 3391, ..., 6784, 6785, 6786], dtype=int32)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from htmd.ui import *\n",
    "from moleculekit.tools.sequencestructuralalignment import sequenceStructureAlignment\n",
    "\n",
    "dop_receptor = Molecule('3PBL')\n",
    "dop_receptor.filter('protein and chain A') # discard one of the units"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we load the beta adrenergic receptor. The G-protein from the Adrenergic receptor (residues from 343 to last) is discarded, to ensure that these region is not used to align both proteins"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2024-06-12 13:11:14,727 - moleculekit.molecule - INFO - Removed 81 atoms. 3527 atoms remaining in the molecule.\n",
      "2024-06-12 13:11:14,746 - moleculekit.molecule - INFO - Removed 1275 atoms. 2252 atoms remaining in the molecule.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "array([1589, 1590, 1591, ..., 2861, 2862, 2863], dtype=int32)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "adr_receptor = Molecule('3NYA')\n",
    "adr_receptor.filter('protein')\n",
    "adr_receptor.filter('resid 0 to 342') # discard the G-protein "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's see the two proteins together before the alignment. The dopamine receptor is displayed in red."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "dop_receptor.view(style='NewCartoon',color='1',name='DopRec') # visualize it in red \n",
    "adr_receptor.view(style='NewCartoon',color='0',name='AdrRec') # visualize it in dark blue"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, both proteins are ready to be aligned, so we call the ```sequenceStructureAlignment``` function. The second protein molecule passed to the function acts as the reference, in our case, the adrenaline receptor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2024-06-12 13:11:22,156 - moleculekit.tools.sequencestructuralalignment - WARNING - 20 alignments found. Limiting to 10 as specified in the `maxalignments` argument.\n",
      "2024-06-12 13:11:22,157 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #0 was done on 65 residues: 89-153\n",
      "2024-06-12 13:11:22,164 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #1 was done on 65 residues: 89-153\n",
      "2024-06-12 13:11:22,171 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #2 was done on 65 residues: 89-153\n",
      "2024-06-12 13:11:22,178 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #3 was done on 65 residues: 89-153\n",
      "2024-06-12 13:11:22,184 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #4 was done on 65 residues: 89-153\n",
      "2024-06-12 13:11:22,191 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #5 was done on 65 residues: 89-153\n",
      "2024-06-12 13:11:22,198 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #6 was done on 65 residues: 89-153\n",
      "2024-06-12 13:11:22,205 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #7 was done on 65 residues: 89-153\n",
      "2024-06-12 13:11:22,212 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #8 was done on 65 residues: 89-153\n",
      "2024-06-12 13:11:22,219 - moleculekit.tools.sequencestructuralalignment - INFO - Alignment #9 was done on 65 residues: 89-153\n"
     ]
    }
   ],
   "source": [
    "dop_receptor_results, _ = sequenceStructureAlignment(dop_receptor,adr_receptor)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```dop_receptor_results``` stores different structural alignments, each using a different portion of the sequence alignment.\n",
    "By default, only the best 10 alignments are stored, but you can modify this behaviour setting the parameter ```maxalignments``` to an arbitrary number.\n",
    "\n",
    "Let's look at the best result by choosing the first item in ```dop_receptor_results```. Dopamine receptor is shown in green."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "dop_receptor_aligned = dop_receptor_results[0] # pick the best result \n",
    "dop_receptor_aligned.view(style='NewCartoon',color='7',name='DopAligned')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}