How to write a custom goal function for AdaptiveGoal#

Goal#

Drive AdaptiveGoal toward a specific objective (a target secondary structure, a known binding pocket geometry, an RMSD to a reference, …) by writing a goal function that scores each frame against that objective. Adaptive then biases new simulations to spawn from high-scoring states (directed sampling) blended with the usual under-explored-microstate exploration.

Minimal example#

import numpy as np
from htmd.adaptive.adaptivegoal import AdaptiveGoal
from moleculekit.molecule import Molecule
from moleculekit.projections.metricdistance import MetricSelfDistance
from moleculekit.projections.metricsecondarystructure import MetricSecondaryStructure
from jobqueues.localqueue import LocalGPUQueue

# Reference: per-residue secondary structure of the crystal target
crystal = Molecule("crystal.pdb")
crystal_ss = MetricSecondaryStructure().project(crystal)[0]   # shape (n_residues,)


def goal_function(mol, crystal_ss=crystal_ss):
    """Score = fraction of residues whose SS matches the crystal.
    Higher is better. Returns one score per frame.
    """
    ss = MetricSecondaryStructure().project(mol)               # (n_frames, n_residues)
    return (ss == crystal_ss).mean(axis=1)


ad = AdaptiveGoal()
ad.app = LocalGPUQueue()
ad.nmin, ad.nmax, ad.nepochs = 5, 10, 30
ad.projection = MetricSelfDistance("protein and name CA", metric="contacts")
ad.goalfunction = goal_function
ad.ucscale = 0.5                                               # 50% undirected / 50% directed
ad.run()

goal_function(mol) takes a Molecule (one or many frames) and returns a 1-D NumPy array with one score per frame. Higher scores are “better” - adaptive uses them to pick which sampled microstates to re-spawn from.

Parameters that matter#

Parameter	What it does
`goalfunction`	A callable `mol -> np.ndarray` of shape `(n_frames,)`. Closures / `functools.partial` are fine.
`ucscale`	Mixing ratio between undirected (exploration of under-sampled microstates) and directed (goal-driven) components. `0` = pure goal, `1` = pure exploration, `0.5` = balanced.
`statetype`	`"micro"` (default) or `"macro"` - which state granularity scores get aggregated to. (`"cluster"` is inherited from `AdaptiveMD` but not supported here - the directed-component code only handles `micro` / `macro`.)
`autoscale`	When `True`, adjusts `ucscale` automatically based on how stuck the run is on its goal score. Companion knobs: `autoscalediff` (default 10, epochs window), `autoscalemult` (default 1, ucscale step), `autoscaletol` (default 0.2, goal-improvement tolerance).
`savegoal`	Optional path to a `.pkl` file. If set, AdaptiveGoal pickles the projected goal values per trajectory each epoch - useful when iterating on the goal function.

Common variations#

RMSD-to-reference goal#

from moleculekit.projections.metricrmsd import MetricRmsd

ref = Molecule("target.pdb")

def goal_rmsd(mol, ref=ref):
    # Higher score for lower RMSD: 1 / (1 + RMSD)
    rmsd = MetricRmsd(ref, "protein and name CA").project(mol)
    return 1.0 / (1.0 + rmsd)

Multi-criteria goal#

def goal_combined(mol):
    rmsd = MetricRmsd(ref, "protein and name CA").project(mol)
    sasa = MetricSasa(sel="resname LIG").project(mol).sum(axis=1)
    rmsd_score = 1.0 / (1.0 + rmsd)
    sasa_score = 1.0 - np.clip(sasa / 1000.0, 0, 1)     # bury the ligand
    return 0.6 * rmsd_score + 0.4 * sasa_score

Combine projections by hand inside the goal; adaptive just sees the final scalar score.

Use cached reference data#

from functools import partial

ref_features = MetricSecondaryStructure().project(Molecule("ref.pdb"))[0]

def goal(mol, ref):
    ss = MetricSecondaryStructure().project(mol)
    return (ss == ref).mean(axis=1)

ad.goalfunction = partial(goal, ref=ref_features)

partial lets you pre-bind reference state so the goal doesn’t recompute it for every adaptive epoch.

Gotchas#

The goal function must return one score per frame. Returning (n_frames, k) raises a confusing error inside the spawn logic.
Adaptive calls the goal once per epoch on every accumulated frame, so an O(n²) goal becomes prohibitive after a few epochs. Vectorise / cache.
Don’t make the goal too sharp - if only 0.01% of frames score above zero, the directed component degenerates to the highest-scoring single frame and you lose diversity. Use a smooth scoring function (1 / (1 + x), tanh-shaped, etc.).
ucscale=0 (pure goal) tends to over-exploit one basin. ucscale=0.5 is a good default; autoscale=True adapts it dynamically.