# Molecule standardisation

A Python module performs the complete standardisation procedure. In addition, the modules that implement the individual steps in this procedure may be accessed separately if required, for example as part of a custom standardisation pipeline.

The tool is open-source and is available from [GitHub](https://github.com/flatkinson/standardiser).

A slide-set describing some of the background to the project is available

{% file src="<https://4087209198-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M3jybVs4wNCG5yxALsP%2F-M6_WIEDLUwSUtWdHT1D%2F-M6_X5U7HjV4LkBIi6pV%2Fstandardiser.pdf?alt=media&token=9f4eeced-667c-486e-839b-95969bd1a00f>" %}
standardiser
{% endfile %}

In summary, the general procedure for standardising a molecule (with the documentation for the appropriate module linked) is...

* Break bonds to Group I or II metals \[[**`break_bonds`**](https://github.com/flatkinson/standardiser/blob/master/standardiser/docs/01_break_bonds.ipynb)]
* Neutralise charges by adding/removing protons \[[**`neutralise`**](https://github.com/flatkinson/standardiser/blob/master/standardiser/docs/02_neutralise.ipynb)]
* Apply standardisation rules \[[**`rules`**](https://github.com/flatkinson/standardiser/blob/master/standardiser/docs/03_rules.ipynb)]
* Apply tautomerism rules \[ [tautomerism](https://github.com/flatkinson/standardiser/blob/master/standardiser/docs/Keto-enol_tautomerism.ipynb) ]
* Re-run neutralisation (in case any charges are exposed by rules)
* Discard any salt/solvate components \[[**`unsalt`**](https://github.com/flatkinson/standardiser/blob/master/standardiser/docs/04_unsalt.ipynb)]
* Return standardised parent

The complete procedure is implemented by the [**`standardise`**](https://github.com/flatkinson/standardiser/blob/master/standardiser/docs/05_standardise.ipynb) module; a bare-bones alternative workflow using the individual modules is shown [here](https://wwwdev.ebi.ac.uk/chembl/extra/francis/standardiser/06_alternative.html).

The documentation is contained in the project **`docs/`** directory, and consists of a set of [Jupyter Notebooks](https://jupyter.org/), which can be viewed (and run and edited) by starting a notebook server in that directory.

A simple command-line driver program **`standardiser_mol.py`** is available in the project **`bin/`** directory. It take SD or SMILES as input, and writes out a file containing those structures that have been successfully standardised and one containing structures for which the procedure has failed.

#### Acknowledgements <a href="#acknowledgements" id="acknowledgements"></a>

* This work was funded by the [IMI eTOX](https://www.imi.europa.eu/projects-results/project-factsheets/etox) project.
* The salt dictionary used is based on that used in the ChEMBL database; this was compiled by L.J. Bellis, A. Hersey and others and was in turn was based on that used in the [USAN](http://www.ama-assn.org/ama/pub/physician-resources/medical-science/united-states-adopted-names-council/naming-guidelines/organic-radicals-counterions-solvent-molecules-used.page) nomenclature.
* Some of the standardisation rules were inspired by those used in the [InChI](http://www.inchi-trust.org/home) software.
* This project is built using the [RDKit](http://www.rdkit.org/) chemistry toolkit.

#### Licensing <a href="#licensing" id="licensing"></a>

This code is released under the [Apache 2.0](http://opensource.org/licenses/apache2.0.php) license. Copyright \[2020] is retained by the [EMBL-EBI](http://www.ebi.ac.uk).
