Molecule standardisation

This is the public version of a tool designed to provide a simple way of standardising molecules as a prelude to e.g. molecular modelling exercises.

A Python module performs the complete standardisation procedure. In addition, the modules that implement the individual steps in this procedure may be accessed separately if required, for example as part of a custom standardisation pipeline.

The tool is open-source and is available from GitHubarrow-up-right.

A slide-set describing some of the background to the project is available

standardiser

In summary, the general procedure for standardising a molecule (with the documentation for the appropriate module linked) is...

The complete procedure is implemented by the standardisearrow-up-right module; a bare-bones alternative workflow using the individual modules is shown herearrow-up-right.

The documentation is contained in the project docs/ directory, and consists of a set of Jupyter Notebooksarrow-up-right, which can be viewed (and run and edited) by starting a notebook server in that directory.

A simple command-line driver program standardiser_mol.py is available in the project bin/ directory. It take SD or SMILES as input, and writes out a file containing those structures that have been successfully standardised and one containing structures for which the procedure has failed.

Acknowledgements

  • This work was funded by the IMI eTOXarrow-up-right project.

  • The salt dictionary used is based on that used in the ChEMBL database; this was compiled by L.J. Bellis, A. Hersey and others and was in turn was based on that used in the USANarrow-up-right nomenclature.

  • Some of the standardisation rules were inspired by those used in the InChIarrow-up-right software.

  • This project is built using the RDKitarrow-up-right chemistry toolkit.

Licensing

This code is released under the Apache 2.0arrow-up-right license. Copyright [2020] is retained by the EMBL-EBIarrow-up-right.

Last updated

Was this helpful?