Reading atoms from configuration files

MDMC enables reading atomic configurations and topologies (if included) from configuration files.

Atomic configurations are read through ASE, so any format supported by ASE is available.

In theory, it is possible to read a configuration file by simply passing it to the read function, as MDMC will determine the file format from the extension, if it is a supported format:

[1]:
from MDMC.readers.configurations import read
atoms = read('data/Paracetamol.cif')

# The return type of read is a list of atoms
print('The return type is: {}'.format(type(atoms)))
The return type is: <class 'list'>
[2]:
# We can use this to create a Molecule
from MDMC.MD import Molecule
paracetamol = Molecule(atoms=atoms)

The paracetamol Molecule read from the CIF file can be visualized using the view function:

[4]:
from MDMC.gui import view
view(paracetamol, viewer='ASE')

From this it appears that the configuration and topology has been correctly read; however, MDMC does not automatically determine if certain atoms or bonded interactions are equivalent. For example, all three hydrogens in the methyl group are equivalent, as are the bonds, bond angles and dihedrals these hydrogens are involved in. In the case of paracetamol, there are 20 bonds in total, however only 12 of them are unique.

[ ]:
# To see all of the Bond interactions, we can filter the paracetamol interactions by name
bonds = list(filter(lambda x: x.name == 'Bond', paracetamol.interactions))

# There are 20 bonds in total
print('Number of bonds: {}'.format(len(bonds)))

# We can cast the list of bonds to a set to see the number of unique bonds - in this case it is still 20
unique_bonds = set(bonds)
print('Number of unique bonds: {}'.format(len(unique_bonds)))

There are two methods for specifying which atoms are equivalent; these can be seen by viewing the documentation relevant to the file format of the configuartion file, which can be accessed by passing docstring=True when calling read. For example, with a CIF file:

[ ]:
read('Paracetamol.cif', docstring=True)

So it is possible to define equivalent atoms either by specifying the name of each Atom or the atom_type of each Atom. These names or atom types must be passed in the same order as the atoms are defined within the CIF file (i.e. in the _atom_site_label identifier).

Equivalence specified by atom_types

As with all atom_types specified in MDMC the integers used should be monotonically increasing.

[ ]:
# from MDMC.readers.configurations import read
atoms = read('data/Paracetamol.cif', atom_type=[1, 2, # Oxygens
                                           3, # Nitrogen
                                           4, 5, 6, 7, 6, 6, 6, 8, # Carbons
                                           9, 9, 9, 10, 10, 10, 10, 11, 12]) # Hydrogens
paracetamol = Molecule(atoms=atoms)

# Print the total number of bonds and the number of unique bonds
bonds = list(filter(lambda x: x.name == 'Bond', paracetamol.interactions))
print('Number of bonds: {}'.format(len(bonds)))
print('Number of unique bonds: {}'.format(len(set(bonds))))

Equivalence specified by names

The names must be strings which are only shared by equivalent atoms. These set the name attribute for each Atom. As covered in the tutorial Applying a Forcefield, if each Atom has an Atom.name which is defined in a ForceField, the ForceField can be applied to set the Parameter values for all interactions. An example of this is shown below, where the names provided are the OPLSAA atom types:

[ ]:
# from MDMC.readers.configurations import read
atoms = read('data/Paracetamol.cif', name=['109', '177', # Oxygens
                                            '207', # Nitrogen
                                            '208', '108', '90', '178', '90', '90', '90', '185', # Carbons
                                            '85', '85', '85', '91', '91', '91', '91', '183', '110']) # Hydrogens
paracetamol = Molecule(atoms=atoms)

# Print the total number of bonds and the number of unique bonds
bonds = list(filter(lambda x: x.name == 'Bond', paracetamol.interactions))
print('Number of bonds: {}'.format(len(bonds)))
print('Number of unique bonds: {}'.format(len(set(bonds))))

The paracetamol can now be added to a Universe and the OPLSAA ForceField can be applied to set the Parameter values:

[ ]:
from MDMC.MD import Universe
universe = Universe(10.)
universe.add_structure(paracetamol)
universe.add_force_field('OPLSAA')
[ ]:
print(universe)
[ ]:
universe.nonbonded_interactions

The effect of this can be seen if we visualize the paracetamol molecule using the 'ASE' viewer and select View -> Charges from the GUI menu bar.

[ ]:
view(paracetamol, viewer='ASE')

Display for paracetamol with charges

Getting configurations directly from databases or PubChem

As configurations are read via ASE, MDMC supports converting any ASE Atoms object into a list of MDMC Atoms. Indeed, all our file reader is really doing is this:

ase_atoms = ase.io.read(file)
atoms = MDMC.MD.ase.convert.ASE_to_MDMC(atoms)

and much in the same way, anything that can be read as ASE atoms can be converted into an MDMC Molecule. In particular, we will see how to read in molecules from ASE’s built-in databases and from PubChem.

A warning on using databases and the Internet: For quality of research, it is recommended that a file is used (and made available) for any read-in configurations used in research. This allows both the creator of the configuration to be credited, and also ensures your work is reproducible (as databases or websites can change!)

Building from an ASE built-in database

ASE has several built in databases containing many popular molecules. The ase.build.molecule function allows users to read in ASE molecules from these databases.

[ ]:
from ase.build import molecule
from MDMC.MD.ase.convert import ASE_to_MDMC

# we read in a benzene molecule
ase_atoms = molecule("C6H6")
MDMC_atoms = ASE_to_MDMC(ase_atoms)
benzene = Molecule(atoms=MDMC_atoms)
view(benzene)

Please see the relevant documentation for more information on ASE’s built-in databases.

Building from the Internet

ASE also has an integration with PubChem, which allows you to search the PubChem website directly and read in the result as an ASE Atoms object. This can again be easily converted to an MDMC molecule.

[ ]:
from ase.data.pubchem import pubchem_atoms_search

# search by name, CID or SMILES string
ase_aspirin = pubchem_atoms_search(name='aspirin')
aspirin = Molecule(atoms=ASE_to_MDMC(ase_aspirin))

ase_glucose = pubchem_atoms_search(cid='5793')
glucose = Molecule(atoms=ASE_to_MDMC(ase_glucose))

ase_benzaldehyde = pubchem_atoms_search(smiles='C1=CC=C(C=C1)C=O')
benzaldehyde = Molecule(atoms=ASE_to_MDMC(ase_benzaldehyde))
[ ]:
view(aspirin)
[ ]:
view(glucose)
[ ]:
view(benzaldehyde)