Adaptive sampling algorithms usually employ empirical policies, and they are not based on any mathematical decission process.
We descrive adaptive sampling in terms of a multi-armed bandit problem to develop a novel adaptive sampling algorithm, Adaptive Bandit [ref], providing strong fundamentals to tackle the exploration-exploitation dilemma faced in adaptive sampling.
Adaptive Bandit is framed into a reinforcement-learning based framework, using an action-value function and an upper confidence bound selection algorithm, improving adaptive sampling’s performance and versatility when faced against different free energy landscape.
Discretized conformational states are defined as actions, and each action has an associated reward distribution. When an action is picked, the algorithm computes the associated reward for that action, based on MSM free energy estimations, and applies a policy to select the next action.
AdaptiveBandit relies on the UCB1 algorithm to optimize the action-picking policy, defining an upper confidence bound for each action based on the number of times the agent has picked that action and the total number of actions taken
A. Pérez, P. Herrera-Nieto, S. Doerr and G. De Fabritiis, AdaptiveBandit: A multi-armed bandit framework for adaptive sampling in molecular simulations. arXiv preprint 2020; arXiv:2002.12582.
Auer P. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research. 2002; 3(Nov):397-422.
This tutorial will show you how to properly set up an
project, highlighting the main differences with respect to the standard
adaptive sampling. As an example, we will perform some folding
simulations using the chicken villin headpice (PDB: 2F4K).
Let’s start by importing HTMD and the
from htmd.ui import * from htmd.adaptive.adaptivebandit import AdaptiveBandit
AdaptiveBandit uses the same project structure as adaptive sampling, with each simulation being associated to a single directory with all the files to run it.
To begin, get the starting generators
here. You can also
download the data using
wget -O gen.tar.gz https://ndownloader.figshare.com/files/25767026.
You will have to uncompress that tar.gz file and allow execution in all
for file in glob('./generators/*/run.sh'): os.chmod(file, 0o755)
These generators contain prepared unfolded structures of villin, which we want to simulate long enough to reach the folded native structure.
We start our
AdaptiveBandit project in the same way as with adaptive
sampling, by defining the queue used for simulations.
queue = LocalGPUQueue() queue.datadir = './data'
ab = AdaptiveBandit() ab.app = queue
Then, we define the
nframes to set the
maximum amount of simulated frames
ab.nmin=0 ab.nmax=2 ab.nframes = 1000000
And we choose the projection and clustering method used to construct a Markov model at each epoch
ab.clustmethod = MiniBatchKMeans ab.projection = MetricSelfDistance('protein and name CA')
Up until now, the setup is exactly the same as with
AdaptiveBandit has an additional parameter, which sets the
\(c\) parameter from the UCB1 equation:
ab.exploration = 0.01
AdaptiveBandit accepts a goal function as an input
that will be used to initialize our action-value estimates. In this
example, we will use the contacts goal function defined in the previous
tutorial to initialize the \(Q(a)\) values. The
parameter sets an \(N_t(a)\) initial value proportional to the max
frames per cluster at the end of the run, which represents the
statistical certainty we give to the goal function.
ref = Molecule('2F4K') def contactGoal(mol, crystal): crystalCO = MetricSelfDistance('protein and name CA', periodic=None, metric='contacts', threshold=10).project(crystal).squeeze() proj = MetricSelfDistance('protein and name CA', metric='contacts', threshold=10).project(mol) # How many crystal contacts are seen? co_score = np.sum(proj[:, crystalCO] == 1, axis=1).astype(float) co_score = co_score / np.sum(crystalCO) return co_score ab.goalfunction = (contactGoal, (ref,)) ab.goal_init = 0.3
And now, we just need to launch our