Skip to content

Structure & Identifiers

A Structure is the core qcdata object for representing a molecule or molecular super structure in 3D space. Structure objects can be created directly from symbol and geometry information (geometry must be in Bohr), from SMILES strings, from xyz files, or opened from Structure objects previously saved to disk.

qcdata.Structure

Structure(**data: Any)

A Structure object with atoms and their corresponding cartesian coordinates, charge, multiplicity, and identifiers such as name, smiles, etc.

Attributes:

Name Type Description
symbols list[str]

The atomic symbols of the structure.

geometry SerializableNDArray

The geometry of the structure in Cartesian coordinates. Units are Bohr (AU).

identifiers Identifiers

Identifiers for the structure such as name, smiles, etc.

charge int

The molecular charge.

multiplicity int

The molecular multiplicity.

connectivity list[tuple[int, int, float]]

Explicit description of the bonds between atoms. Each tuple contains the indices of the atoms in the bond and the order of the bond. E.g., [(0, 1, 1.0), (1, 2, 2.0)] indicates a single bond between atoms 0 and 1 and a double bond between atoms 1 and 2.

extras Dict[str, Any]

Additional information to bundle with the object. Use for schema development and scratch space.

ids Identifiers

@property Shortcut to access identifiers.

geometry_angstrom ndarray

@property The geometry of the structure in Angstrom.

atomic_numbers list[int]

@property The atomic numbers of the atoms in the structure.

formula str

@property The molecular formula of the structure using the Hill System.

Example
from qcdata import Structure

structure = Structure(
    symbols=["H", "O", "H"],
    geometry=[[0.0, 0.0, 0.0], [0.0, 0.0, 1.0], [0.0, 0.0, 2.0]],
    charge=0,  # optional; defaults to 0
    multiplicity=1,  # optional; defaults to 1
    identifiers={"smiles": "O"},  # optional
)
Source code in src/qcdata/models/structure.py
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
def __init__(self, **data: Any):
    """Create a new Structure object.

    Example:
        ```python
        from qcdata import Structure

        structure = Structure(
            symbols=["H", "O", "H"],
            geometry=[[0.0, 0.0, 0.0], [0.0, 0.0, 1.0], [0.0, 0.0, 2.0]],
            charge=0,  # optional; defaults to 0
            multiplicity=1,  # optional; defaults to 1
            identifiers={"smiles": "O"},  # optional
        )

        ```
    """
    # Backwards compatibility for 'ids' attribute.
    if identifiers := data.pop("ids", None):
        warnings.warn(
            "Passing 'ids' is deprecated and will be removed in a future "
            "release. Please use 'identifiers' instead. Once instantiated, "
            "you can use structure.ids to access the identifiers as a shortcut.",
            category=FutureWarning,
            stacklevel=2,
        )
        data["identifiers"] = identifiers
    super().__init__(**data)

from_xyz classmethod

from_xyz(
    xyz_str: str,
    *,
    charge: int | None = None,
    multiplicity: int | None = None,
) -> Self

Create a Structure from an XYZ file or string.

Parameters:

Name Type Description Default
xyz_str str

The XYZ string.

required
charge int | None

The molecular charge of the structure. If not provided, will read from the XYZ string if set or default to 0.

None
multiplicity int | None

The molecular multiplicity of the structure. If not provided, will read from the XYZ string if set or default to 1.

None
Note

Will read qcdata data such as charge and multiplicity from the comments line with a qcdata_key=value format (if it is present). Also will read in qcdata__identifiers_* keys and additional non-qcdata comments.

Example
struct = Structure.from_xyz(xyz_str)
Source code in src/qcdata/models/structure.py
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
@classmethod
def from_xyz(
    cls,
    xyz_str: str,
    *,
    charge: int | None = None,
    multiplicity: int | None = None,
) -> Self:
    """Create a Structure from an XYZ file or string.

    Args:
        xyz_str: The XYZ string.
        charge: The molecular charge of the structure. If not provided, will read
            from the XYZ string if set or default to 0.
        multiplicity: The molecular multiplicity of the structure. If not provided,
            will read from the XYZ string if set or default to 1.

    Note:
        Will read qcdata data such as `charge` and `multiplicity` from the comments
        line with a `qcdata_key=value` format (if it is present). Also will read in
        qcdata__identifiers_* keys and additional non-qcdata comments.

    Example:
        ```python
        struct = Structure.from_xyz(xyz_str)
        ```
    """

    lines = xyz_str.split("\n")

    num_atoms = int(lines[0])

    # Collect comments
    structure_kwargs: dict[str, Any] = {}
    identifier_kwargs: dict[str, Any] = {}
    other_comments: list[str] = []

    for item in lines[1].strip().split():
        if item.startswith("qcdata__identifiers_"):
            key = item.split("=")[0].replace("qcdata__identifiers_", "")
            value = item.split("=")[1]
            identifier_kwargs[key] = value
        elif item.startswith("qcdata_"):
            key = item.split("=")[0].replace("qcdata_", "")
            value = item.split("=")[1]
            structure_kwargs[key] = value
        else:
            other_comments.append(item)

    if charge is not None and "charge" in structure_kwargs:
        raise ValueError("Charge cannot be set in the file and as an argument.")
    if multiplicity is not None and "multiplicity" in structure_kwargs:
        raise ValueError(
            "Multiplicity cannot be set in the file and as an argument."
        )

    # Set charge and multiplicity if provided
    if charge is not None:
        structure_kwargs["charge"] = charge
    if multiplicity is not None:
        structure_kwargs["multiplicity"] = multiplicity

    symbols = []
    geometry = []
    for line in lines[2 : 2 + num_atoms]:
        split_line = line.split()
        symbols.append(split_line[0])
        geometry.append([float(val) / BOHR_TO_ANGSTROM for val in split_line[1:]])

    return cls(
        symbols=symbols,
        geometry=geometry,
        **structure_kwargs,
        identifiers=Identifiers(**identifier_kwargs),
        extras={cls._xyz_comment_key: other_comments},
    )

to_xyz

to_xyz(precision: int = 17) -> str

Return an xyz string representation of the structure.

Parameters:

Name Type Description Default
precision int

The number of decimal places to include in the xyz file. Default 17 which captures all precision of float64.

17

Notes: Will add qcdata data such as charge and multiplicity to the comments line with a qcdata_key=value format.

Source code in src/qcdata/models/structure.py
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
def to_xyz(self, precision: int = 17) -> str:
    """Return an xyz string representation of the structure.

    Args:
        precision: The number of decimal places to include in the xyz file. Default
            17 which captures all precision of float64.
    Notes:
        Will add qcdata data such as charge and multiplicity to the comments line with
        a `qcdata_key=value` format.
    """

    qcdata_data = {  # These get added to comments line (line 2) in xyz file
        "qcdata_charge": self.charge,
        "qcdata_multiplicity": self.multiplicity,
    }

    # Add identifiers to qcdata_data
    for key, value in self.identifiers.__dict__.items():
        if key != "extras" and value:
            qcdata_data[f"qcdata__identifiers_{key}"] = value

    assert isinstance(self.geometry, np.ndarray)  # For mypy
    geometry_angstrom = self.geometry * BOHR_TO_ANGSTROM

    xyz_lines = []
    xyz_lines.append(f"{len(self.symbols)}")
    # Add qcdata data to comments line
    comments = f"{' '.join([f'{k}={v}' for k, v in qcdata_data.items()])}"
    # Add any other comments
    if xyz_comments := self.extras.get(self._xyz_comment_key, []):
        comments += " " + " ".join(xyz_comments)
    xyz_lines.append(comments)

    # Create a format string using the precision parameter
    format_str = f"{{:2s}} {{: >18.{precision}f}} {{: >18.{precision}f}} {{: >18.{precision}f}}"  # noqa: E501

    for symbol, (x, y, z) in zip(self.symbols, geometry_angstrom):
        xyz_lines.append(format_str.format(symbol, x, y, z))
    xyz_lines.append("")  # Append newline to end of file
    return "\n".join(xyz_lines)

distance

distance(i: int, j: int, units: LengthUnit = BOHR) -> float

Calculate the distance between two atoms.

Parameters:

Name Type Description Default
i int

The index of the first atom.

required
j int

The index of the second atom.

required
units LengthUnit

The units to return the distance in. Defaults to "bohr". May be "bohr" or "angstrom".

BOHR

Returns:

Type Description
float

The distance between the atoms in units (Bohr or Angstrom).

Example
struct.distance(0, 1)
1.34
Source code in src/qcdata/models/structure.py
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
def distance(self, i: int, j: int, units: LengthUnit = LengthUnit.BOHR) -> float:
    """Calculate the distance between two atoms.

    Args:
        i: The index of the first atom.
        j: The index of the second atom.
        units: The units to return the distance in. Defaults to "bohr".
            May be "bohr" or "angstrom".

    Returns:
        The distance between the atoms in units (Bohr or Angstrom).

    Example:
        ```python
        struct.distance(0, 1)
        1.34
        ```
    """
    distance = np.linalg.norm(self.geometry[i] - self.geometry[j])
    if units == LengthUnit.ANGSTROM:
        return float(distance * BOHR_TO_ANGSTROM)
    return float(distance)

add_smiles

add_smiles(
    *, program: str = "rdkit", hydrogens: bool = False
) -> None

!! DEPRECATED !!

This helper has been removed to qcinf (see qcinf.structure_to_smiles). It will be removed from qcdata in a future release.

Source code in src/qcdata/models/structure.py
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
def add_smiles(
    self: "Structure",
    *,
    program: str = "rdkit",
    hydrogens: bool = False,
) -> None:
    """
    !! DEPRECATED !!

    This helper has been removed to **qcinf** (see `qcinf.structure_to_smiles`).
    It will be removed from qcdata in a future release.
    """
    warnings.warn(
        "`Structure.add_smiles()` has moved to `qcinf` and is no longer "
        "implemented here.\n\n"
        "Install qcinf and replace your call with:\n\n"
        "    from qcinf import structure_to_smiles\n"
        "    smiles = structure_to_smiles(struct, backend='rdkit|openbabel')\n"
        "    struct.add_identifiers(smiles=smiles)\n\n",
        DeprecationWarning,  # use FutureWarning if you want it visible by default
        stacklevel=2,
    )
    raise NotImplementedError(
        "Structure.add_smiles() is removed. "
        "Use qcinf.structure_to_smiles and struct.add_identifiers instead."
    )

qcdata.Identifiers

Structure identifiers.

Attributes:

Name Type Description
name str | None

A human-readable, common name for the structure.

name_IUPAC str | None

The IUPAC name of the structure.

smiles str | None

The SMILES representation of the structure.

canonical_smiles str | None

The canonical SMILES representation of the structure.

canonical_smiles_program str | None

The program used to generate the canonical SMILES.

canonical_explicit_hydrogen_smiles str | None

The canonical explicit hydrogen SMILES representation of the structure.

canonical_isomeric_smiles str | None

The canonical isomeric SMILES representation of the structure.

canonical_isomeric_explicit_hydrogen_smiles str | None

The canonical isomeric explicit hydrogen SMILES representation of the structure.

canonical_isomeric_explicit_hydrogen_mapped_smiles str | None

The canonical isomeric explicit hydrogen mapped SMILES representation of the structure.

inchi str | None

The InChI representation of the structure.

inchikey str | None

The InChIKey representation of the structure.

pubchem_cid str | None

The PubChem Compound ID of the structure.

pubchem_sid str | None

The PubChem Substance ID of the structure.

pubchem_conformerid str | None

The PubChem Conformer ID of the structure.

extras Dict[str, Any]

Additional information to bundle with the object. Use for schema development and scratch space.