Restart a simulation#

You will learn: how to resume an ACEMD simulation from its checkpoint after a crash, time-limit, or planned stop.

Prerequisites:

A previous run in the directory that produced restart.chk.
The same GPU model that produced the checkpoint (an RTX 4090 checkpoint won’t restart on a different card).

Setup#

ACEMD writes restart.chk every trajectoryperiod steps regardless of whether you plan to restart. To pick up where it left off, add restart: true to the input file:

input.yaml#

structure: dhfr.psf
parameters: dhfr.prm
coordinates: dhfr.pdb
boxsize: [62.23, 62.23, 62.23]
thermostat: true
run: 100ns
restart: true

Then re-run:

acemd

What the restart restores#

restart: true overrides the on-disk initial state with the checkpoint contents:

Atomic coordinates.
Atomic velocities.
Periodic box vectors.
Thermostat and barostat state.

It also skips minimization even if minimize: > 0 is set.

What the restart does not do#

It does not roll back the trajectory file. New frames are appended to output.xtc from the checkpoint’s step onward.
It does not re-run already-completed steps. run is a total target step count, not an additional duration: with run: 100ns and a checkpoint at 60 ns, the restart adds 40 ns to reach the 100 ns total. If the checkpoint is already at or past run, the simulation immediately exits with nothing to do — bump run higher to extend a finished trajectory.

Extend a finished simulation#

A trajectory that reached its original run target can be extended in place: bump run to the new total and restart. Because run is the total step count, set it to the new endpoint, not the additional duration:

input.yaml#

# Original run hit 100ns; extend to 200ns total
run: 200ns
restart: true

acemd

ACEMD picks up the checkpoint at the end of the first 100 ns and runs the integrator forward to 200 ns total. New frames append to output.xtc.

Continue without a checkpoint#

If restart.chk is missing or unusable (different GPU model, ACEMD version changed, accidentally deleted), you can pick up from the last successful frame by feeding ACEMD the run’s final output files as the starting state — no restart: true:

input.yaml#

structure: dhfr.psf
parameters: dhfr.prm
coordinates: output.coor   # final positions of previous run
velocities: output.vel     # final velocities
boxsize: output.xsc        # final box vectors
thermostat: true
run: 100ns
minimize: 0                # don't re-minimize already-equilibrated coordinates

acemd

Caveats:

Thermostat and barostat state are lost. A new Langevin random seed is used; if you’re sensitive to bitwise reproducibility this matters. For typical statistical-sampling work it does not.
Previous output files are renamed, not deleted. Because this is a fresh run from ACEMD’s point of view, any existing output.csv, output.xtc, output.coor, output.vel, output.xsc (and any user-set trajforcefile / trajvelocityfile) is moved aside to output.1.<ext> (then output.2.<ext>, …) before new ones are written. Your previous trajectory and final state are preserved, just under different filenames.

Restart from Python#

from acemd import acemd

acemd(".", restart=True)

Gotchas#

Restarting on a different GPU model fails. If you need to migrate hardware, run a fresh start from the last output.coor/output.vel/output.xsc instead.
If the GPU drivers or ACEMD version changed since the checkpoint was written, the restart may fail. Re-running from output.* files is the safe fallback.

Restart a simulation#

Setup#

What the restart restores#

What the restart does not do#

Extend a finished simulation#

Continue without a checkpoint#

Restart from Python#

Gotchas#

See also#