> For the complete documentation index, see [llms.txt](https://chembl.gitbook.io/malaria-project/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://chembl.gitbook.io/malaria-project/molecule-standardisation.md).

# Molecule standardisation

A Python module performs the complete standardisation procedure. In addition, the modules that implement the individual steps in this procedure may be accessed separately if required, for example as part of a custom standardisation pipeline.

The tool is open-source and is available from [GitHub](https://github.com/flatkinson/standardiser).

A slide-set describing some of the background to the project is available

{% file src="/files/-M6\_X5U7HjV4LkBIi6pV" %}
standardiser
{% endfile %}

In summary, the general procedure for standardising a molecule (with the documentation for the appropriate module linked) is...

* Break bonds to Group I or II metals \[[**`break_bonds`**](https://github.com/flatkinson/standardiser/blob/master/standardiser/docs/01_break_bonds.ipynb)]
* Neutralise charges by adding/removing protons \[[**`neutralise`**](https://github.com/flatkinson/standardiser/blob/master/standardiser/docs/02_neutralise.ipynb)]
* Apply standardisation rules \[[**`rules`**](https://github.com/flatkinson/standardiser/blob/master/standardiser/docs/03_rules.ipynb)]
* Apply tautomerism rules \[ [tautomerism](https://github.com/flatkinson/standardiser/blob/master/standardiser/docs/Keto-enol_tautomerism.ipynb) ]
* Re-run neutralisation (in case any charges are exposed by rules)
* Discard any salt/solvate components \[[**`unsalt`**](https://github.com/flatkinson/standardiser/blob/master/standardiser/docs/04_unsalt.ipynb)]
* Return standardised parent

The complete procedure is implemented by the [**`standardise`**](https://github.com/flatkinson/standardiser/blob/master/standardiser/docs/05_standardise.ipynb) module; a bare-bones alternative workflow using the individual modules is shown [here](https://wwwdev.ebi.ac.uk/chembl/extra/francis/standardiser/06_alternative.html).

The documentation is contained in the project **`docs/`** directory, and consists of a set of [Jupyter Notebooks](https://jupyter.org/), which can be viewed (and run and edited) by starting a notebook server in that directory.

A simple command-line driver program **`standardiser_mol.py`** is available in the project **`bin/`** directory. It take SD or SMILES as input, and writes out a file containing those structures that have been successfully standardised and one containing structures for which the procedure has failed.

#### Acknowledgements <a href="#acknowledgements" id="acknowledgements"></a>

* This work was funded by the [IMI eTOX](https://www.imi.europa.eu/projects-results/project-factsheets/etox) project.
* The salt dictionary used is based on that used in the ChEMBL database; this was compiled by L.J. Bellis, A. Hersey and others and was in turn was based on that used in the [USAN](http://www.ama-assn.org//ama/pub/physician-resources/medical-science/united-states-adopted-names-council/naming-guidelines/organic-radicals-counterions-solvent-molecules-used.page) nomenclature.
* Some of the standardisation rules were inspired by those used in the [InChI](http://www.inchi-trust.org/home) software.
* This project is built using the [RDKit](http://www.rdkit.org/) chemistry toolkit.

#### Licensing <a href="#licensing" id="licensing"></a>

This code is released under the [Apache 2.0](http://opensource.org/licenses/apache2.0.php) license. Copyright \[2020] is retained by the [EMBL-EBI](http://www.ebi.ac.uk).


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://chembl.gitbook.io/malaria-project/molecule-standardisation.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
