How to write a custom goal function for AdaptiveGoal#
Goal#
Drive AdaptiveGoal toward a specific objective (a target secondary structure, a known binding pocket geometry, an RMSD to a reference, …) by writing a goal function that scores each frame against that objective. Adaptive then biases new simulations to spawn from high-scoring states (directed sampling) blended with the usual under-explored-microstate exploration.
Minimal example#
import numpy as np
from htmd.adaptive.adaptivegoal import AdaptiveGoal
from moleculekit.molecule import Molecule
from moleculekit.projections.metricdistance import MetricSelfDistance
from moleculekit.projections.metricsecondarystructure import MetricSecondaryStructure
from jobqueues.localqueue import LocalGPUQueue
# Reference: per-residue secondary structure of the crystal target
crystal = Molecule("crystal.pdb")
crystal_ss = MetricSecondaryStructure().project(crystal)[0] # shape (n_residues,)
def goal_function(mol, crystal_ss=crystal_ss):
"""Score = fraction of residues whose SS matches the crystal.
Higher is better. Returns one score per frame.
"""
ss = MetricSecondaryStructure().project(mol) # (n_frames, n_residues)
return (ss == crystal_ss).mean(axis=1)
ad = AdaptiveGoal()
ad.app = LocalGPUQueue()
ad.nmin, ad.nmax, ad.nepochs = 5, 10, 30
ad.projection = MetricSelfDistance("protein and name CA", metric="contacts")
ad.goalfunction = goal_function
ad.ucscale = 0.5 # 50% undirected / 50% directed
ad.run()
goal_function(mol) takes a Molecule (one or many frames) and returns a 1-D NumPy array with one score per frame. Higher scores are “better” - adaptive uses them to pick which sampled microstates to re-spawn from.
Parameters that matter#
Parameter |
What it does |
|---|---|
|
A callable |
|
Mixing ratio between undirected (exploration of under-sampled microstates) and directed (goal-driven) components. |
|
|
|
When |
|
Optional path to a |
Common variations#
RMSD-to-reference goal#
from moleculekit.projections.metricrmsd import MetricRmsd
ref = Molecule("target.pdb")
def goal_rmsd(mol, ref=ref):
# Higher score for lower RMSD: 1 / (1 + RMSD)
rmsd = MetricRmsd(ref, "protein and name CA").project(mol)
return 1.0 / (1.0 + rmsd)
Multi-criteria goal#
def goal_combined(mol):
rmsd = MetricRmsd(ref, "protein and name CA").project(mol)
sasa = MetricSasa(sel="resname LIG").project(mol).sum(axis=1)
rmsd_score = 1.0 / (1.0 + rmsd)
sasa_score = 1.0 - np.clip(sasa / 1000.0, 0, 1) # bury the ligand
return 0.6 * rmsd_score + 0.4 * sasa_score
Combine projections by hand inside the goal; adaptive just sees the final scalar score.
Use cached reference data#
from functools import partial
ref_features = MetricSecondaryStructure().project(Molecule("ref.pdb"))[0]
def goal(mol, ref):
ss = MetricSecondaryStructure().project(mol)
return (ss == ref).mean(axis=1)
ad.goalfunction = partial(goal, ref=ref_features)
partial lets you pre-bind reference state so the goal doesn’t recompute it for every adaptive epoch.
Gotchas#
The goal function must return one score per frame. Returning
(n_frames, k)raises a confusing error inside the spawn logic.Adaptive calls the goal once per epoch on every accumulated frame, so an O(n²) goal becomes prohibitive after a few epochs. Vectorise / cache.
Don’t make the goal too sharp - if only 0.01% of frames score above zero, the directed component degenerates to the highest-scoring single frame and you lose diversity. Use a smooth scoring function (
1 / (1 + x), tanh-shaped, etc.).ucscale=0(pure goal) tends to over-exploit one basin.ucscale=0.5is a good default;autoscale=Trueadapts it dynamically.
See also#
How to configure adaptive sampling - all the
AdaptiveMDknobs thatAdaptiveGoalinherits.Adaptive sampling explanation - how directed + undirected components combine in FAST.
htmd.adaptive.adaptivegoal.AdaptiveGoal- API reference.