Adaptive Bandit#
Concept#
Adaptive sampling algorithms usually employ empirical policies, and they are not based on any mathematical decission process.
We descrive adaptive sampling in terms of a multi-armed bandit problem to develop a novel adaptive sampling algorithm, Adaptive Bandit [ref], providing strong fundamentals to tackle the exploration-exploitation dilemma faced in adaptive sampling.
Adaptive Bandit is framed into a reinforcement-learning based framework, using an action-value function and an upper confidence bound selection algorithm, improving adaptive sampling’s performance and versatility when faced against different free energy landscape.
Discretized conformational states are defined as actions, and each action has an associated reward distribution. When an action is picked, the algorithm computes the associated reward for that action, based on MSM free energy estimations, and applies a policy to select the next action.
AdaptiveBandit relies on the UCB1 algorithm to optimize the action-picking policy, defining an upper confidence bound for each action based on the number of times the agent has picked that action and the total number of actions taken
A. Pérez, P. Herrera-Nieto, S. Doerr and G. De Fabritiis, AdaptiveBandit: A multi-armed bandit framework for adaptive sampling in molecular simulations. arXiv preprint 2020; arXiv:2002.12582.
Auer P. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research. 2002; 3(Nov):397-422.
Getting started#
This tutorial will show you how to properly set up an AdaptiveBandit
project, highlighting the main differences with respect to the standard
adaptive sampling. As an example, we will perform some folding
simulations using the chicken villin headpice (PDB: 2F4K).
Let’s start by importing HTMD and the AdaptiveBandit
class:
from htmd.ui import *
from htmd.adaptive.adaptivebandit import AdaptiveBandit
2024-06-11 16:35:05,922 - numexpr.utils - INFO - Note: NumExpr detected 20 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-06-11 16:35:05,923 - numexpr.utils - INFO - NumExpr defaulting to 8 threads.
2024-06-11 16:35:06,079 - rdkit - INFO - Enabling RDKit 2022.09.1 jupyter extensions
Please cite HTMD: Doerr et al.(2016)JCTC,12,1845. https://dx.doi.org/10.1021/acs.jctc.6b00049
HTMD Documentation at: https://software.acellera.com/htmd/
You are on the latest HTMD version (2.3.28+5.g9dbc5091a.dirty).
AdaptiveBandit uses the same project structure as adaptive sampling, with each simulation being associated to a single directory with all the files to run it.
To begin, get the starting generators
here. You can also
download the data using
wget -O gen.tar.gz https://ndownloader.figshare.com/files/25767026
.
You will have to uncompress that tar.gz file and allow execution in all
run.sh
files.
for file in glob('./generators/*/run.sh'):
os.chmod(file, 0o755)
These generators contain prepared unfolded structures of villin, which we want to simulate long enough to reach the folded native structure.
AdaptiveBandit#
We start our AdaptiveBandit
project in the same way as with adaptive
sampling, by defining the queue used for simulations.
queue = LocalGPUQueue()
queue.datadir = './data'
ab = AdaptiveBandit()
ab.app = queue
Then, we define the nmin
, nmax
and nframes
to set the
maximum amount of simulated frames
ab.nmin=0
ab.nmax=2
ab.nframes = 1000000
And we choose the projection and clustering method used to construct a Markov model at each epoch
ab.clustmethod = MiniBatchKMeans
ab.projection = MetricSelfDistance('protein and name CA')
Up until now, the setup is exactly the same as with AdaptiveMD
.
However, AdaptiveBandit
has an additional parameter, which sets the
\(c\) parameter from the UCB1 equation:
ab.exploration = 0.01
Additionally, AdaptiveBandit
accepts a goal function as an input
that will be used to initialize our action-value estimates. In this
example, we will use the contacts goal function defined in the previous
tutorial to initialize the \(Q(a)\) values. The goal_init
parameter sets an \(N_t(a)\) initial value proportional to the max
frames per cluster at the end of the run, which represents the
statistical certainty we give to the goal function.
ref = Molecule('2F4K')
def contactGoal(mol, crystal):
crystalCO = MetricSelfDistance('protein and name CA', periodic=None,
metric='contacts',
threshold=10).project(crystal).squeeze()
proj = MetricSelfDistance('protein and name CA',
metric='contacts',
threshold=10).project(mol)
# How many crystal contacts are seen?
co_score = np.sum(proj[:, crystalCO] == 1, axis=1).astype(float)
co_score = co_score / np.sum(crystalCO)
return co_score
ab.goalfunction = (contactGoal, (ref,))
ab.goal_init = 0.3
2024-06-11 16:35:20,819 - moleculekit.molecule - WARNING - Alternative atom locations detected. Only altloc A was kept. If you prefer to keep all use the keepaltloc="all" option when reading the file.
2024-06-11 16:35:20,856 - moleculekit.molecule - INFO - Removed 40 atoms. 614 atoms remaining in the molecule.
And now, we just need to launch our AdaptiveBandit
run:
ab.run()
2024-06-11 17:19:05,905 - htmd.adaptive.adaptive - INFO - Processing epoch 0
2024-06-11 17:19:05,907 - htmd.adaptive.adaptive - INFO - Epoch 0, generating first batch
2024-06-11 17:19:05,923 - jobqueues.util - INFO - Trying to determine all GPU devices
2024-06-11 17:19:05,952 - jobqueues.localqueue - INFO - Using GPU devices 0
2024-06-11 17:19:05,953 - jobqueues.util - INFO - Trying to determine all GPU devices
2024-06-11 17:19:05,979 - jobqueues.localqueue - INFO - Queueing /home/sdoerr/Work/htmd/tutorials/input/e1s1_villin_100ns_1
2024-06-11 17:19:05,980 - jobqueues.localqueue - INFO - Queueing /home/sdoerr/Work/htmd/tutorials/input/e1s2_villin_100ns_2
2024-06-11 17:19:05,980 - jobqueues.localqueue - INFO - Running /home/sdoerr/Work/htmd/tutorials/input/e1s1_villin_100ns_1 on device 0