How to write a structure#
Goal#
Save a Molecule to disk in a chosen file format.
Which format to use#
For a prepared / templated Molecule that carries bonds, bond orders, and formal charges, prefer mmCIF (.cif or .bcif) — it round-trips nearly all the data a Molecule holds. PDB is still fine for quick interchange with legacy tools, but you lose bond orders and the file is bound by fixed column widths. JSON is for application development where you need a lossless round-trip of the in-memory representation.
Minimal example#
from moleculekit.molecule import Molecule
mol = Molecule("3PTB")
mol.write("output.cif")
Parameters that matter#
The signature is write() with parameters filename, sel=None, type=None, and **kwargs. Format-specific options (e.g. writebonds for PDB) are passed via **kwargs.
Parameter |
Type |
Default |
What it does |
|---|---|---|---|
|
|
required |
Output file path; the extension determines the format. |
|
|
|
Atom selection — only the selected atoms are written. |
|
|
|
Explicitly override the format (e.g. |
Supported formats#
Extension |
Format |
Carries |
|---|---|---|
|
mmCIF / PDBx |
Recommended default. Full topology incl. bonds, bond orders, formal charges, segid, occupancy, B-factors. |
|
Binary mmCIF |
Compact binary variant of |
|
Protein Data Bank |
Coords + chain/resid/segid + formal charges (columns 79–80). Bond orders are not stored. |
|
AutoDock PDB |
PDB + partial charges. |
|
Macromolecular Transmission Format |
Compact binary. |
|
Tripos MOL2 |
Bonds + bond orders + atom types. |
|
Structure-Data File |
Bond orders; small molecules. |
|
CHARMM / NAMD topology |
Topology only (no coords). |
|
GROMACS structure |
Coords + topology (single frame). |
|
Plain XYZ |
Elements + coords only. |
|
GROMACS compressed traj |
Coordinates only, lossy float16. |
|
CHARMM/NAMD binary traj |
Coordinates only, full precision. |
|
GROMACS full-precision traj |
Coords + optional velocities/forces. |
|
AMBER NetCDF |
Coordinates only, full precision. |
|
AMBER binpos |
Coordinates only. |
|
NAMD extended system |
Box / restart info. |
|
Coordinate files |
Coordinates only — pair with a topology. |
|
mdtraj-supported |
Requires |
|
moleculekit JSON |
Lossless round-trip of the in-memory Molecule. Useful in app development; rarely the right choice for tutorials. |
For trajectory formats (xtc, dcd, trr, netcdf, binpos), only the coordinates are written — keep a topology file (PSF, PRMTOP, PDB) alongside.
Common variations#
# Write a protein-only sub-selection as mmCIF (preserves bonds + charges)
mol.write("protein.cif", sel="protein")
# Write a PDB for interchange with a tool that doesn't read mmCIF
mol.write("output.pdb")
# Round-trip a Molecule losslessly through JSON (e.g. for app state)
mol.write("mol.json")
roundtrip = Molecule("mol.json")
Gotchas#
PDB cannot store explicit bond orders (single/double/aromatic) — use mmCIF, MOL2, or SDF if those are needed downstream. PDB can store formal charges in columns 79–80, but many third-party PDB parsers ignore them.
The
chainfield is written as a single character in PDB;segid(up to 4 characters) survives a PDB round-trip only in the SEGID column, which many programs ignore. mmCIF preserves both faithfully.