How to bootstrap an MSM for error bars on timescales#

Goal#

Estimate the uncertainty on implied timescales and macrostate populations by re-fitting the MSM on multiple random sub-samples of the trajectory set. Bootstrap is the standard sample-stability check: if different subsets give wildly different timescales, your dataset is undersampled.

Minimal example#

import numpy as np
from htmd.model import Model
from sklearn.cluster import MiniBatchKMeans

n_boots = 10
timescales = []
for i in range(n_boots):
    boot = dataTica.bootstrap(0.8)                       # 80% random subset
    boot.cluster(MiniBatchKMeans(n_clusters=1000))
    model = Model(boot)
    model.markovModel(lag=25, macronum=4, units="ns")
    # Timescales live on the underlying deeptime MSM at model.msm
    timescales.append(model.msm.timescales() * model.data.fstep)

# Each row has length (n_active_states - 1) - active = the largest strongly-connected
# microstate set kept by counts.submodel_largest() at this lag. The count can differ
# across bootstraps if connectivity changes, so the array may be ragged - wrap in a
# list and pad if you need a rectangular array.
timescales = np.array(timescales)                        # shape (n_boots, n_active_states - 1)
slowest = timescales[:, :3]                              # top three implied timescales
print("mean top-3 timescales (ns):", slowest.mean(axis=0))
print("std  top-3 timescales (ns):", slowest.std(axis=0))

bootstrap(ratio)() returns a new MetricData containing a random ratio fraction of the trajectories (no replacement by default). The kept count is int(floor(numtraj * ratio)) - on small N this rounds down (e.g. 7 trajs × 0.8 = 5 kept, not 5.6), so bootstrapping ratios are coarser than they look. Each bootstrap is independent - re-cluster + re-fit from scratch.

model.msm.timescales() returns timescales in frames, so each row above is multiplied by model.data.fstep to get ns - see How to read off and interpret an ITS plot (“Compare lag times by direct call”) for the full explanation.

Parameters that matter#

Parameter	What it does
`ratio`	Fraction of trajectories kept in the subset (e.g. `0.8`). The remaining trajs are dropped for that bootstrap.
`replacement`	`True` to allow drawing the same trajectory multiple times (classic bootstrap). `False` (default) gives a sub-sample.
`n_boots`	Number of bootstrap rounds to run. 10-20 is enough for ballpark error bars; 50+ for tight CIs.

Common variations#

Bootstrap full distributions, not just means#

boot_results = []
for _ in range(n_boots):
    boot = dataTica.bootstrap(0.8)
    boot.cluster(MiniBatchKMeans(n_clusters=1000))
    model = Model(boot)
    model.markovModel(lag=25, macronum=4, units="ns")
    boot_results.append({
        "ts":   model.msm.timescales() * model.data.fstep,   # in ns
        # plot=False suppresses the matplotlib window that eqDistribution opens by default
        "eq":   model.eqDistribution(plot=False),
    })

# Percentile-based 95% CI on the slowest timescale:
slow = np.array([b["ts"][0] for b in boot_results])
print(f"slowest timescale: {np.percentile(slow, 50):.1f} ns "
      f"(95% CI [{np.percentile(slow, 2.5):.1f}, {np.percentile(slow, 97.5):.1f}])")

ITS plot with bootstrap error bars#

its_per_boot = []
for _ in range(n_boots):
    boot = dataTica.bootstrap(0.8)
    boot.cluster(MiniBatchKMeans(n_clusters=1000))
    model = Model(boot)
    # plotTimescales returns the ITS table when plot=False, results=True
    its, lags = model.plotTimescales(
        lags=[5, 10, 25, 50, 100], units="frames",
        plot=False, results=True,
    )
    its_per_boot.append(its)                              # (n_lags, n_its)

its = np.array(its_per_boot)                              # (n_boots, n_lags, n_its)
mean_its = its.mean(axis=0)
std_its  = its.std(axis=0)

plot=False suppresses the matplotlib figure (otherwise plotTimescales opens one per bootstrap), and results=True makes it return the timescales array + lag list instead of None.

Plot mean_its ± std_its per slow process vs lag time - that’s the bootstrap-version of the ITS plot, much more informative than a single deterministic line.

Gotchas#

bootstrap(0.8) samples at the trajectory level, not the frame level. Each bootstrap drops or keeps whole trajectories - if you have 10 trajectories, you can’t get cleaner-than-10% error bars.
bootstrap() only copies the trajectory subset - independence across bootstraps only holds if you also call boot.cluster(...) on every draw, so each bootstrap gets its own cluster centroids and transition counts. Reusing cluster assignments from the full data biases the result toward the full model.
10 bootstraps with 1000 clusters each on a multi-million-frame dataset is the slow step in this workflow. Cache dataTica between bootstraps; only the cluster + Model fit re-runs.
If different bootstraps give qualitatively different macrostate count (PCCA picks different n_macro), the underlying dataset doesn’t support the macrostate decomposition - reduce n_macro until bootstraps converge.