Molecule standardisation
This is the public version of a tool designed to provide a simple way of standardising molecules as a prelude to e.g. molecular modelling exercises.
A Python module performs the complete standardisation procedure. In addition, the modules that implement the individual steps in this procedure may be accessed separately if required, for example as part of a custom standardisation pipeline.
The tool is open-source and is available from GitHub.
A slide-set describing some of the background to the project is available
In summary, the general procedure for standardising a molecule (with the documentation for the appropriate module linked) is...
  • Break bonds to Group I or II metals [break_bonds]
  • Neutralise charges by adding/removing protons [neutralise]
  • Apply standardisation rules [rules]
  • Apply tautomerism rules [ tautomerism ]
  • Re-run neutralisation (in case any charges are exposed by rules)
  • Discard any salt/solvate components [unsalt]
  • Return standardised parent
The complete procedure is implemented by the standardise module; a bare-bones alternative workflow using the individual modules is shown here.
The documentation is contained in the project docs/ directory, and consists of a set of Jupyter Notebooks, which can be viewed (and run and edited) by starting a notebook server in that directory.
A simple command-line driver program is available in the project bin/ directory. It take SD or SMILES as input, and writes out a file containing those structures that have been successfully standardised and one containing structures for which the procedure has failed.


  • This work was funded by the IMI eTOX project.
  • The salt dictionary used is based on that used in the ChEMBL database; this was compiled by L.J. Bellis, A. Hersey and others and was in turn was based on that used in the USAN nomenclature.
  • Some of the standardisation rules were inspired by those used in the InChI software.
  • This project is built using the RDKit chemistry toolkit.


This code is released under the Apache 2.0 license. Copyright [2020] is retained by the EMBL-EBI.
Last modified 1yr ago
Copy link