Adaptive sampling ================= Concept ------- - Exploration of the conformational space using MD simulations can waste lots of simulation time sampling the same conformational regions which does not provide any new knowledge. - Instead it would be desirable to explore more under-sampled regions of the conformational space, to overcome energetic barriers and eventually reach the desired conformation (folded protein / bound ligand etc.) - In adaptive sampling, instead of launching thousands of simulations at once from a small set of initial structures as in naive high-throughput MD, simulations are launched in sequential batches called epochs utilizing knowledge of the conformational space obtained from all previous epochs. - The starting points of the simulations in each epoch are chosen based on some criteria; in this case, it selects conformations from the most under-sampled conformational regions detected in all previous simulations. This is done by using Markov state models which discretize the conformational space into a set of most metastable states. Then the starting conformations are sampled based on a distribution related to the population of each state. S. Doerr and G. De Fabritiis, `On-the-fly learning and sampling of ligand binding by high-throughput molecular simulations `__, J. Chem. Theory Comput. 10 (5), pp 2064–2069(2014). S. Doerr, M. J. Harvey, Frank Noé, and G. De Fabritiis, `HTMD: High-Throughput Molecular Dynamics for Molecular Discovery `__ J. Chem. Theory Comput. 2016 12 (4), 1845-1852 Unit of execution ----------------- Each simulation in adaptive is associated to a single directory which contains all files to run it. To run a project it is therefore necessary to provide one or more initial simulation directories, called generators. How to start ------------ Adaptive creates multiple directories which it uses to organize the simulations. The user only needs to provide a generators folder containing one sub-folder for each starting conformation containing all files needed to execute that simulation. For example: :: └── generators/ ├── gen1/ │   ├── structure.pdb │ ├── input │ └── ... ├── gen2/ │ ├── structure.pdb │ ├── input │ └── ... Then the adaptive will generate an ``input``, ``data`` and later a ``filtered`` folder as well, looking like this: :: ├── data/ # Contains the completed simulations (automatically created) ├── filtered/ # Contains the completed simulations without water atoms (automatically created) ├── generators/ # Contains the initial generators provided by the user └── input/ # Contains the files needed to start all simulations of all epochs (automatically created) Adaptive uses a naming scheme for simulations which follows the pattern: ``e4s3_e2s1p0f45``. This name tells us that this simulation was generated in epoch 4 as the 3rd simulation of the batch. The starting conformation was taken from simulation 1 of epoch 2 from the first piece of the simulation [*]_ and from frame 45 of that simulation piece. .. [*] some MD software might fragment simulations into pieces. Usually though this number will be 0 and can be ignored. Simulation length ----------------- 1. The length of each simulation is really system dependent. 2. It could be anything like tens of nanoseconds to hundred of nanoseconds. 3. As a rule of thumb use twice the expected lag time for your molecular process (e.g. for binding anything between 30 and 100 ns). Simulation details ------------------ As only the coordinates files are seeded for every new epoch, simulations cannot use a velocity file. Velocities are therefore reinitialized to the Maxwell Boltzmann distribution at the given temperature. Adaptive script example ----------------------- The power of adaptive sampling is accessible on HTMD through the :class:`~htmd.adaptive.adaptiverun.AdaptiveMD` class: .. code:: python from htmd.ui import * app = LocalGPUQueue() app.datadir = './data' md = AdaptiveMD() md.nmin=5 md.nmax=10 md.nepochs = 30 md.app = app md.projection = MetricDistance('name CA', '(resname BEN) and ((name C7) or (name C6))', periodic='selections', metric='contacts') md.ticadim = 3 md.updateperiod = 14400 # execute every 4 hours md.run() Execution in a notebook ----------------------- 1. It is possible to run the adaptive scheme syncronosly or asyncrounsly. 2. The option ``updateperiod`` controls this behaviour. 3. The default is to run and exit, so ``updateperiod`` needs to be specified if adaptive should be run synchronously Setting a simple cron job ------------------------- 1. This is useful for having the script execute automatically every x hours. 2. Do not set ``updateperiod`` then, or set it to zero such that the program will execute and exit .. code:: bash #!/bin/bash -login # cron.sh file # use crontab -e to add this line: # 0 */4 * * * cd /pathtomydir/; ./cron.sh # python conf.py Visualizing the starting conformations -------------------------------------- If we want to look at what structures were chosen as the starting conformations of a given epoch we can use a code snippet like the following: .. code:: python for s in glob('input/e28s*'): # Visualize all starting conf of epoch 28 mol = Molecule(s+'/structure.pdb') mol.read(s+'/input.coor') mol.view()