# How to configure adaptive sampling parameters ## Goal Set up an {py:class}`~htmd.adaptive.adaptiverun.AdaptiveMD` run: choose how many simulations run per epoch, when to stop, and where input / data / filtered trajectories live on disk. ## Minimal example ```python from htmd.adaptive.adaptiverun import AdaptiveMD from htmd.projections.tica import TICA from moleculekit.projections.metricdistance import MetricSelfDistance from jobqueues.localqueue import LocalGPUQueue from htmd.clustering.kcenters import KCenter ad = AdaptiveMD() ad.app = LocalGPUQueue() # or SlurmQueue() / LsfQueue() / PBSQueue() ad.nmin = 5 # minimum concurrent sims to keep running ad.nmax = 10 # maximum concurrent sims at any time ad.nepochs = 30 # stop after this many epochs (waits for all running sims to finish) ad.generatorspath = "./generators" # input templates for new sims ad.inputpath = "./input" # where adaptive writes per-epoch input dirs ad.datapath = "./data" # where completed sims land ad.filteredpath = "./filtered" # auto-filtered trajectories (water stripped) ad.projection = MetricSelfDistance("protein and name CA", metric="contacts") ad.clustmethod = KCenter ad.macronum = 4 ad.ticadim = 3 ad.ticalag = 20 # in frames ad.run() ``` `ad.run()` blocks until either `nepochs` adaptive rounds have completed or you `KeyboardInterrupt`. Each epoch: collect any newly-finished sims, project + cluster + build the model, choose new starting frames from under-explored microstates, write new input directories under `inputpath/`, submit them through `app`. ## Parameters that matter | Parameter | What it does | | --- | --- | | `app` | The {py:class}`~jobqueues.simqueue.SimQueue` that runs the actual ACEMD jobs. Local GPU, Slurm, LSF, PBS - same interface. | | `nmin` | Low-water mark on the running fleet. A new epoch only fires when the running count drops to `<= nmin`. Adaptive does **not** actively keep the fleet above `nmin` between epochs. | | `nmax` | Refill ceiling. When an epoch fires it spawns `nmax - running` new sims. | | `nepochs`, `nframes` | Independent OR-stop conditions: the run ends when *either* the epoch count reaches `nepochs` *or* the aggregate frame count reaches `nframes`. `nframes=0` (default) disables the frame check. | | `generatorspath` | Directory containing template input directories. Only seeds **epoch 1**; later epochs copy from parent sim input dirs instead. | | `inputpath`, `datapath`, `filteredpath` | Per-epoch input dirs / completed sim outputs / water-stripped filtered trajs. | | `coorname` | Name of the starting-coordinate file each new sim gets (the parent frame). Default `"input.coor"`. | | `boxname` | Name of the starting-box file (PBC). Default `"input.xsc"`; set to `"none"` to skip writing a box file. | | `projection` | A moleculekit `Metric*` (or list) - the feature space adaptive clusters on. | | `ticadim`, `ticalag` | TICA dimensions to keep / lag in frames. Set `ticadim=0` to disable TICA. `ticalag` is silently clamped at runtime to `min(min(trajLengths)/2, ticalag)` (floor 2). | | `clustmethod`, `macronum` | Clustering algorithm class + number of macrostates. | | `skip` | Sub-sample frames before clustering - useful when sims write at high frequency. | | `filter`, `filtersel` | Auto-strip waters from completed sims before clustering. Defaults: `True`, `"not water"`. | | `dryrun` | Set to `True` to test the spawning logic without submitting jobs. | ## Common variations ### Conservative exploration (few sims, many epochs) ```python ad.nmin = 2 ad.nmax = 4 ad.nepochs = 100 ``` Useful when each simulation is long (e.g. 100 ns) and you want a slow, careful walk through state space. ### Aggressive exploration (large fleet, fewer epochs) ```python ad.nmin = 50 ad.nmax = 100 ad.nepochs = 10 ``` Useful when you have a large GPU cluster and per-sim cost is low - throws many short trajectories at the problem. ### Stop on aggregate sampling rather than epoch count ```python ad.nframes = 1_000_000 # stop when we've simulated ~100 µs total at 0.1 ns/frame ad.nepochs = 1000 # safety upper bound ``` ## Gotchas - `generatorspath` must contain **complete, runnable** ACEMD input directories - each subdir is what `acemd` would consume directly. Generators only seed **epoch 1**: adaptive picks one generator per new sim and overwrites `coorname`/`boxname` with the starting frame. From epoch 2 onwards adaptive copies the **parent sim's input directory** instead and overwrites the same two files. - `nmin` is a **low-water mark**, not a floor. A new epoch fires only when the running fleet drops to `<= nmin`; the epoch then spawns `nmax - running` new sims to refill back to `nmax`. With `nmin=5, nmax=10` and 4 sims running, the next epoch fires (4 ≤ 5) and spawns 6 new sims (10 − 4). Adaptive does **not** actively keep the fleet above `nmin` between epochs. - The `projection` is recomputed from scratch every epoch on **all** completed sims, so very large feature spaces (e.g. all heavy-atom self-distances on a 500-residue protein) slow each epoch. Project to Cα-only or use TICA aggressively. - `KeyboardInterrupt` during `ad.run()` is safe - the next `ad.run()` call resumes from the last completed epoch by globbing `inputpath/e*/` to count epochs already on disk. ## See also - {doc}`How to write a custom goal function ` - directed adaptive via {py:class}`~htmd.adaptive.adaptivegoal.AdaptiveGoal`. - {doc}`How to inspect a finished adaptive run ` - reconstruct the parent / child tree from `datapath/`. - {doc}`Adaptive sampling explanation <../explanation/adaptive-sampling>` - why under-explored microstates get re-spawned.