How to configure adaptive sampling parameters#
Goal#
Set up an AdaptiveMD run: choose how many simulations run per epoch, when to stop, and where input / data / filtered trajectories live on disk.
Minimal example#
from htmd.adaptive.adaptiverun import AdaptiveMD
from htmd.projections.tica import TICA
from moleculekit.projections.metricdistance import MetricSelfDistance
from jobqueues.localqueue import LocalGPUQueue
from htmd.clustering.kcenters import KCenter
ad = AdaptiveMD()
ad.app = LocalGPUQueue() # or SlurmQueue() / LsfQueue() / PBSQueue()
ad.nmin = 5 # minimum concurrent sims to keep running
ad.nmax = 10 # maximum concurrent sims at any time
ad.nepochs = 30 # stop after this many epochs (waits for all running sims to finish)
ad.generatorspath = "./generators" # input templates for new sims
ad.inputpath = "./input" # where adaptive writes per-epoch input dirs
ad.datapath = "./data" # where completed sims land
ad.filteredpath = "./filtered" # auto-filtered trajectories (water stripped)
ad.projection = MetricSelfDistance("protein and name CA", metric="contacts")
ad.clustmethod = KCenter
ad.macronum = 4
ad.ticadim = 3
ad.ticalag = 20 # in frames
ad.run()
ad.run() blocks until either nepochs adaptive rounds have completed or you KeyboardInterrupt. Each epoch: collect any newly-finished sims, project + cluster + build the model, choose new starting frames from under-explored microstates, write new input directories under inputpath/, submit them through app.
Parameters that matter#
Parameter |
What it does |
|---|---|
|
The |
|
Low-water mark on the running fleet. A new epoch only fires when the running count drops to |
|
Refill ceiling. When an epoch fires it spawns |
|
Independent OR-stop conditions: the run ends when either the epoch count reaches |
|
Directory containing template input directories. Only seeds epoch 1; later epochs copy from parent sim input dirs instead. |
|
Per-epoch input dirs / completed sim outputs / water-stripped filtered trajs. |
|
Name of the starting-coordinate file each new sim gets (the parent frame). Default |
|
Name of the starting-box file (PBC). Default |
|
A moleculekit |
|
TICA dimensions to keep / lag in frames. Set |
|
Clustering algorithm class + number of macrostates. |
|
Sub-sample frames before clustering - useful when sims write at high frequency. |
|
Auto-strip waters from completed sims before clustering. Defaults: |
|
Set to |
Common variations#
Conservative exploration (few sims, many epochs)#
ad.nmin = 2
ad.nmax = 4
ad.nepochs = 100
Useful when each simulation is long (e.g. 100 ns) and you want a slow, careful walk through state space.
Aggressive exploration (large fleet, fewer epochs)#
ad.nmin = 50
ad.nmax = 100
ad.nepochs = 10
Useful when you have a large GPU cluster and per-sim cost is low - throws many short trajectories at the problem.
Stop on aggregate sampling rather than epoch count#
ad.nframes = 1_000_000 # stop when we've simulated ~100 µs total at 0.1 ns/frame
ad.nepochs = 1000 # safety upper bound
Gotchas#
generatorspathmust contain complete, runnable ACEMD input directories - each subdir is whatacemdwould consume directly. Generators only seed epoch 1: adaptive picks one generator per new sim and overwritescoorname/boxnamewith the starting frame. From epoch 2 onwards adaptive copies the parent sim’s input directory instead and overwrites the same two files.nminis a low-water mark, not a floor. A new epoch fires only when the running fleet drops to<= nmin; the epoch then spawnsnmax - runningnew sims to refill back tonmax. Withnmin=5, nmax=10and 4 sims running, the next epoch fires (4 ≤ 5) and spawns 6 new sims (10 − 4). Adaptive does not actively keep the fleet abovenminbetween epochs.The
projectionis recomputed from scratch every epoch on all completed sims, so very large feature spaces (e.g. all heavy-atom self-distances on a 500-residue protein) slow each epoch. Project to Cα-only or use TICA aggressively.KeyboardInterruptduringad.run()is safe - the nextad.run()call resumes from the last completed epoch by globbinginputpath/e*/to count epochs already on disk.
See also#
How to write a custom goal function - directed adaptive via
AdaptiveGoal.How to inspect a finished adaptive run - reconstruct the parent / child tree from
datapath/.Adaptive sampling explanation - why under-explored microstates get re-spawned.