How to write a structure#

Goal#

Save a Molecule to disk in a chosen file format.

Which format to use#

For a prepared / templated Molecule that carries bonds, bond orders, and formal charges, prefer mmCIF (.cif or .bcif) — it round-trips nearly all the data a Molecule holds. PDB is still fine for quick interchange with legacy tools, but you lose bond orders and the file is bound by fixed column widths. JSON is for application development where you need a lossless round-trip of the in-memory representation.

Minimal example#

from moleculekit.molecule import Molecule

mol = Molecule("3PTB")
mol.write("output.cif")

Parameters that matter#

The signature is write() with parameters filename, sel=None, type=None, and **kwargs. Format-specific options (e.g. writebonds for PDB) are passed via **kwargs.

Parameter

Type

Default

What it does

filename

str

required

Output file path; the extension determines the format.

sel

str, bool mask, or index array

None (all)

Atom selection — only the selected atoms are written.

type

str

None

Explicitly override the format (e.g. "pdb") when the extension is ambiguous.

Supported formats#

Extension

Format

Carries

cif

mmCIF / PDBx

Recommended default. Full topology incl. bonds, bond orders, formal charges, segid, occupancy, B-factors.

bcif, bcif.gz

Binary mmCIF

Compact binary variant of cif; same content, smaller and faster.

pdb, pdb.gz

Protein Data Bank

Coords + chain/resid/segid + formal charges (columns 79–80). Bond orders are not stored.

pdbqt

AutoDock PDB

PDB + partial charges.

mmtf

Macromolecular Transmission Format

Compact binary.

mol2

Tripos MOL2

Bonds + bond orders + atom types.

sdf

Structure-Data File

Bond orders; small molecules.

psf

CHARMM / NAMD topology

Topology only (no coords).

gro

GROMACS structure

Coords + topology (single frame).

xyz, xyz.gz

Plain XYZ

Elements + coords only.

xtc

GROMACS compressed traj

Coordinates only, lossy float16.

dcd

CHARMM/NAMD binary traj

Coordinates only, full precision.

trr

GROMACS full-precision traj

Coords + optional velocities/forces.

netcdf, nc, ncdf

AMBER NetCDF

Coordinates only, full precision.

binpos

AMBER binpos

Coordinates only.

xsc

NAMD extended system

Box / restart info.

coor, crd, inpcrd

Coordinate files

Coordinates only — pair with a topology.

h5, gro, mdcrd, lammpstrj, ncrst, rst7

mdtraj-supported

Requires mdtraj.

json

moleculekit JSON

Lossless round-trip of the in-memory Molecule. Useful in app development; rarely the right choice for tutorials.

For trajectory formats (xtc, dcd, trr, netcdf, binpos), only the coordinates are written — keep a topology file (PSF, PRMTOP, PDB) alongside.

Common variations#

# Write a protein-only sub-selection as mmCIF (preserves bonds + charges)
mol.write("protein.cif", sel="protein")
# Write a PDB for interchange with a tool that doesn't read mmCIF
mol.write("output.pdb")
# Round-trip a Molecule losslessly through JSON (e.g. for app state)
mol.write("mol.json")
roundtrip = Molecule("mol.json")

Gotchas#

  • PDB cannot store explicit bond orders (single/double/aromatic) — use mmCIF, MOL2, or SDF if those are needed downstream. PDB can store formal charges in columns 79–80, but many third-party PDB parsers ignore them.

  • The chain field is written as a single character in PDB; segid (up to 4 characters) survives a PDB round-trip only in the SEGID column, which many programs ignore. mmCIF preserves both faithfully.

See also#