Molecule standardisation

This is the public version of a tool designed to provide a simple way of standardising molecules as a prelude to e.g. molecular modelling exercises.

A Python module performs the complete standardisation procedure. In addition, the modules that implement the individual steps in this procedure may be accessed separately if required, for example as part of a custom standardisation pipeline.

The tool is open-source and is available from GitHub.

A slide-set describing some of the background to the project is available

In summary, the general procedure for standardising a molecule (with the documentation for the appropriate module linked) is...

Break bonds to Group I or II metals [break_bonds]
Neutralise charges by adding/removing protons [neutralise]
Apply standardisation rules [rules]
Apply tautomerism rules [ tautomerism ]
Re-run neutralisation (in case any charges are exposed by rules)
Discard any salt/solvate components [unsalt]
Return standardised parent

The complete procedure is implemented by the standardise module; a bare-bones alternative workflow using the individual modules is shown here.

The documentation is contained in the project docs/ directory, and consists of a set of Jupyter Notebooks, which can be viewed (and run and edited) by starting a notebook server in that directory.

A simple command-line driver program standardiser_mol.py is available in the project bin/ directory. It take SD or SMILES as input, and writes out a file containing those structures that have been successfully standardised and one containing structures for which the procedure has failed.

Acknowledgements

This work was funded by the IMI eTOX project.
The salt dictionary used is based on that used in the ChEMBL database; this was compiled by L.J. Bellis, A. Hersey and others and was in turn was based on that used in the USAN nomenclature.
Some of the standardisation rules were inspired by those used in the InChI software.
This project is built using the RDKit chemistry toolkit.

Licensing

This code is released under the Apache 2.0 license. Copyright [2020] is retained by the EMBL-EBI.

PreviousMAIP NextInput data file

Last updated 4 years ago

Was this helpful?