How to extract a representative structure per macrostate#
Goal#
Get one (or more) actual coordinate frames for each macrostate of a Model - so you can render them in any viewer, write them to PDB for figures, or use them as starting structures for downstream simulations. viewStates() opens VMD interactively; this how-to uses getStates() to get the same frames as plain Molecule objects without needing a display.
Minimal example#
# One representative frame per macrostate, returned as a list of Molecule objects
mols = model.getStates(numsamples=1)
for state_idx, m in enumerate(mols):
m.write(f"./state_{state_idx}.pdb")
getStates() samples the requested states, loads the corresponding frames out of the trajectories, wraps and aligns them, and returns one Molecule per state. Each molecule contains numsamples frames. Writing it directly with m.write("file.pdb") emits only the current frame (mol.frame) - see the ensemble variation below for multi-frame output.
Parameters that matter#
Parameter on |
What it does |
|---|---|
|
List of state indices to sample. Default: all states of |
|
Number of frames per state. Default 50. Set to 1 for a single representative; larger for an ensemble figure. |
|
|
|
For |
|
Atom selection used to recentre the box. Default |
|
Atom selection used to align all returned frames. Default |
|
Reference |
|
Override the simlist used to look up trajectories (e.g. a filtered simlist for water-stripped frames). |
Common variations#
Many frames per state for an ensemble figure#
mols = model.getStates(numsamples=20)
for state_idx, m in enumerate(mols):
m.write(f"./state_{state_idx}.pdb") # topology (first frame)
m.write(f"./state_{state_idx}.xtc") # all 20 aligned snapshots
By default the PDB writer emits only the current frame (mol.frame). To get all 20 in one PDB file you can pass m.write("state_N.pdb", frames=range(m.numFrames)) (multi-MODEL output), but most viewers handle XTC trajectory files better - write the topology to a single-frame .pdb and the 20 frames to a .xtc alongside, then load both together (Molecule("state_0.pdb").read("state_0.xtc")).
Sample microstates instead of macrostates#
mols = model.getStates(states=[42], statetype="micro", numsamples=10)
mols[0].write("./micro_42.pdb")
Highest-population microstate within each macrostate#
import numpy as np
micro_to_macro = model.macro_ofmicro
micro_eq = model.msm.stationary_distribution
best_micros = []
for macro in range(model.macronum):
micros_in_macro = np.where(micro_to_macro == macro)[0]
best_micros.append(micros_in_macro[np.argmax(micro_eq[micros_in_macro])])
mols = model.getStates(states=best_micros, statetype="micro", numsamples=1)
Useful when you want the single most-populated microstate of each macrostate (a tighter “representative” than weighted-random sampling).
Gotchas#
getStatesrequires every Sim in the simlist to share a structurally equivalent topology - the_singleMolfilecheck compares Molecules viamol_equal(..., exceptFields=["coords"]), so the file paths can differ as long as the atoms / bonds / charges match. If your simlist mixes genuinely different topologies, pre-filter to a common one (simfilter) and pass the filtered simlist viasimlist=.samplemode="weighted"and"even"only make sense forstatetype="macro"- the function falls back to random for micro / cluster.The macrostate assignment is fixed at the moment you call
model.markovModel(lag, macronum). Re-fitting with a differentmacronumre-numbers macrostates - state 0 in one fit isn’t state 0 in another.Macrostate 0 is not always the bound / folded basin - inspect the structures (or check
model.eqDistribution()) before assuming.
See also#
How to read off and interpret an ITS plot - upstream: how you chose
macronumin the first place.Villin folding MSM - the canonical macrostate-decomposition flow.
htmd.model.Model.getStates(),htmd.model.Model.sampleStates()- API references.