How to read a structure#
Goal#
Load a molecular structure into a Molecule object from a local file or directly from the RCSB PDB by accession ID.
Minimal example#
from moleculekit.molecule import Molecule
# Fetch directly from RCSB
mol = Molecule("3PTB")
# Load from a local file
mol = Molecule("structure.pdb")
Parameters that matter#
The Molecule constructor signature is Molecule(filename=None, name=None, **kwargs). When filename is given, the constructor forwards **kwargs to read(), so any read-time options can be passed alongside the path.
Parameter |
Type |
Default |
What it does |
|---|---|---|---|
|
|
|
Path to the file (any supported format) or a 4-character RCSB PDB ID. If |
|
|
|
Optional display label; does not affect atom data. |
|
|
|
File-type override ( |
|
|
|
Raises if element symbols are missing or unrecognised; set to |
|
|
|
Which alternate-location indicator to keep when reading PDB/mmCIF. |
Supported formats#
The reader picks a backend from the file extension. Trajectories are covered in How to read a trajectory; the formats below carry topology (atoms, residues, optional bonds) and in some cases a single frame of coordinates.
Extension |
Format |
Notes |
|---|---|---|
|
Protein Data Bank |
Most common; coords + optional |
|
gzipped PDB |
Transparently decompressed. |
|
mmCIF / PDBx |
Modern PDB replacement; full bond / |
|
Binary mmCIF |
Compact binary mmCIF. |
|
Macromolecular Transmission Format |
Compact binary; deprecated upstream but still readable. |
|
Tripos MOL2 |
Bonds + bond orders + atom types. |
|
Structure-Data File |
Small-molecule format with bond orders. |
|
Schrödinger Maestro |
Bond orders + force-field atom types. |
|
AMBER topology |
Topology only; pair with a coordinate file. |
|
CHARMM / NAMD topology |
Topology only. |
|
GROMACS topology |
Falls back to PRMTOP reader if not GROMACS-style. |
|
GROMACS structure |
Via mdtraj. |
|
Plain XYZ |
Coords + elements only; no bonds. |
|
AutoDock PDB |
PDB with partial charges. |
|
Gaussian input |
Coords + elements. |
|
CHARMM residue topology |
Residue templates. |
|
AMBER prepi |
Residue templates. |
|
Coordinate files |
Coordinates only — load a topology first. |
|
Trajectory |
|
|
mdtraj-supported |
Requires the |
|
moleculekit JSON |
Lossless round-trip of a Molecule. |
|
AlphaFold output |
AlphaFold-style topology. |
Special case: a bare 4-character string with no extension (e.g. Molecule("3PTB")) is interpreted as an RCSB PDB ID and fetched over the network. Set the LOCAL_PDB_REPO environment variable to point at a local mirror to avoid repeated downloads.
Common variations#
# Load an AMBER PRMTOP topology
mol = Molecule("topology.prmtop")
# Load a PSF topology, then read trajectory frames separately
mol = Molecule("topology.psf")
mol.read("trajectory.xtc")
# Two-step read: construct empty, then load
mol = Molecule()
mol.read("structure.mol2")
Gotchas#
PDB format has fixed column widths: residue names longer than 4 characters and atom names longer than 4 characters are silently truncated.
For PDB IDs that are not in the RCSB,
Moleculeraises a download error — check the spelling and network connectivity.When
validateElements=True(default) the reader raises on unknown element symbols; setvalidateElements=Falseif working with custom atom types.