{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Adaptive Bandit\n",
    "\n",
    "### Concept\n",
    "* Adaptive sampling algorithms usually employ empirical policies, and they are not based on any mathematical decission process.\n",
    "\n",
    "* We descrive adaptive sampling in terms of a multi-armed bandit problem to develop a novel adaptive sampling algorithm, Adaptive Bandit [ref], providing strong fundamentals to tackle the exploration-exploitation dilemma faced in adaptive sampling.\n",
    "\n",
    "* Adaptive Bandit is framed into a reinforcement-learning based framework, using an action-value function and an upper confidence bound selection algorithm, improving adaptive sampling\u2019s performance and versatility when faced against different free energy landscape.\n",
    "\n",
    "* Discretized conformational states are defined as actions, and each action has an associated reward distribution. When an action is picked, the algorithm computes the associated reward for that action, based on MSM free energy estimations, and applies a policy to select the next action.\n",
    "\n",
    "* AdaptiveBandit relies on the UCB1 algorithm to optimize the action-picking policy, defining an upper confidence bound for each action based on the number of times the agent has picked that action and the total number of actions taken"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    ".. math::\n",
    "\n",
    "    a_t = argmax_{a\\in\\mathcal{A}}\\left[{Q_t(a) + c\\sqrt{\\frac{\\ln{t}}{N_t(a)}}}\\right]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A. P\u00e9rez, P. Herrera-Nieto, S. Doerr and G. De Fabritiis, [AdaptiveBandit: A multi-armed bandit framework for adaptive sampling in molecular simulations](https://arxiv.org/abs/2002.12582). arXiv preprint 2020; arXiv:2002.12582.\n",
    "\n",
    "Auer P. [Using confidence bounds for exploitation-exploration trade-offs](http://www.jmlr.org/papers/volume3/auer02a/auer02a.pdf). Journal of Machine Learning Research. 2002; 3(Nov):397-422."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Getting started\n",
    "\n",
    "This tutorial will show you how to properly set up an `AdaptiveBandit` project, highlighting the main differences with respect to the standard adaptive sampling. As an example, we will perform some folding simulations using the chicken villin headpice (PDB: 2F4K).\n",
    "\n",
    "Let's start by importing HTMD and the `AdaptiveBandit` class:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2024-06-11 16:35:05,922 - numexpr.utils - INFO - Note: NumExpr detected 20 cores but \"NUMEXPR_MAX_THREADS\" not set, so enforcing safe limit of 8.\n",
      "2024-06-11 16:35:05,923 - numexpr.utils - INFO - NumExpr defaulting to 8 threads.\n",
      "2024-06-11 16:35:06,079 - rdkit - INFO - Enabling RDKit 2022.09.1 jupyter extensions\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Please cite HTMD: Doerr et al.(2016)JCTC,12,1845. https://dx.doi.org/10.1021/acs.jctc.6b00049\n",
      "HTMD Documentation at: https://software.acellera.com/htmd/\n",
      "\n",
      "You are on the latest HTMD version (2.3.28+5.g9dbc5091a.dirty).\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from htmd.ui import *\n",
    "from htmd.adaptive.adaptivebandit import AdaptiveBandit"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "AdaptiveBandit uses the same project structure as adaptive sampling, with each simulation being associated to a single directory with all the files to run it.\n",
    "\n",
    "To begin, get the starting generators [here](https://ndownloader.figshare.com/files/65181204). You can also download the data using `wget -O gen.tar.gz https://ndownloader.figshare.com/files/65181204`. You will have to uncompress that tar.gz file and allow execution in all `run.sh` files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "for file in glob('./generators/*/run.sh'):\n",
    "    os.chmod(file, 0o755)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These generators contain prepared unfolded structures of villin, which we want to simulate long enough to reach the folded native structure."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### AdaptiveBandit\n",
    "\n",
    "We start our `AdaptiveBandit` project in the same way as with adaptive sampling, by defining the queue used for simulations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "queue = LocalGPUQueue()\n",
    "queue.datadir = './data'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "ab = AdaptiveBandit()\n",
    "ab.app = queue"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, we define the `nmin`, `nmax` and `nframes` to set the maximum amount of simulated frames"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "ab.nmin=0\n",
    "ab.nmax=2\n",
    "ab.nframes = 1000000"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And we choose the projection and clustering method used to construct a Markov model at each epoch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "ab.clustmethod = MiniBatchKMeans\n",
    "ab.projection = MetricSelfDistance('protein and name CA')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Up until now, the setup is exactly the same as with `AdaptiveMD`. However, `AdaptiveBandit` has an additional parameter, which sets the $c$ parameter from the UCB1 equation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "ab.exploration = 0.01  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Additionally, `AdaptiveBandit` accepts a goal function as an input that will be used to initialize our action-value estimates. In this example, we will use the contacts goal function defined in the previous tutorial to initialize the $Q(a)$ values. The `goal_init` parameter sets an $N_t(a)$ initial value proportional to the max frames per cluster at the end of the run, which represents the statistical certainty we give to the goal function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2024-06-11 16:35:20,819 - moleculekit.molecule - WARNING - Alternative atom locations detected. Only altloc A was kept. If you prefer to keep all use the keepaltloc=\"all\" option when reading the file.\n",
      "2024-06-11 16:35:20,856 - moleculekit.molecule - INFO - Removed 40 atoms. 614 atoms remaining in the molecule.\n"
     ]
    }
   ],
   "source": [
    "ref = Molecule('2F4K')\n",
    "\n",
    "def contactGoal(mol, crystal):\n",
    "    crystalCO = MetricSelfDistance('protein and name CA', periodic=None,\n",
    "                                   metric='contacts',\n",
    "                                   threshold=10).project(crystal).squeeze()\n",
    "    proj = MetricSelfDistance('protein and name CA',\n",
    "                              metric='contacts',\n",
    "                              threshold=10).project(mol)\n",
    "    # How many crystal contacts are seen?\n",
    "    co_score = np.sum(proj[:, crystalCO] == 1, axis=1).astype(float)\n",
    "    co_score = co_score / np.sum(crystalCO)\n",
    "    return co_score\n",
    "\n",
    "ab.goalfunction = (contactGoal, (ref,))\n",
    "ab.goal_init = 0.3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And now, we just need to launch our `AdaptiveBandit` run:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2024-06-11 17:19:05,905 - htmd.adaptive.adaptive - INFO - Processing epoch 0\n",
      "2024-06-11 17:19:05,907 - htmd.adaptive.adaptive - INFO - Epoch 0, generating first batch\n",
      "2024-06-11 17:19:05,923 - jobqueues.util - INFO - Trying to determine all GPU devices\n",
      "2024-06-11 17:19:05,952 - jobqueues.localqueue - INFO - Using GPU devices 0\n",
      "2024-06-11 17:19:05,953 - jobqueues.util - INFO - Trying to determine all GPU devices\n",
      "2024-06-11 17:19:05,979 - jobqueues.localqueue - INFO - Queueing /home/sdoerr/Work/htmd/tutorials/input/e1s1_villin_100ns_1\n",
      "2024-06-11 17:19:05,980 - jobqueues.localqueue - INFO - Queueing /home/sdoerr/Work/htmd/tutorials/input/e1s2_villin_100ns_2\n",
      "2024-06-11 17:19:05,980 - jobqueues.localqueue - INFO - Running /home/sdoerr/Work/htmd/tutorials/input/e1s1_villin_100ns_1 on device 0\n"
     ]
    }
   ],
   "source": [
    "ab.run()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}