# Biomedical annotations

*Note: At the moment, our biomedical annotations are only availble throught the bulk data download. Bare with us as we are working on the UI integration.*

We adopted a new approach for non-chemical annotations: moving from our in-house NLP model to a commercial grammar and dictionary-based system LeadMine, developed by[ NextMove Software](https://www.nextmovesoftware.com/leadmine.html).

LeadMine uses curated and public dictionaries with a custom grammar for fast and accurate text annotation. It has some options to automatically fix spelling mistakes that are frequently found in patent text due to the OCR. Using the provided dictionaries, it can also resolve an annotation to a unique identifier.&#x20;

Using these functionalities, we annotated all patents in SureChEMBL for three key biomedical entity types and match them to the relevant data source when possible:

* Gene/Protein: HGNC, Uniprot
* Disease: MeSH, Human Disease Ontology
* Mechanism (terms such as inhibitor, antagonist, modulator, etc.)

LeadMine is fast, robust, and easily scalable. Exactly what we need for SureChEMBL production-level annotation!

<figure><img src="/files/Y4bslaF7MFEPzRMenH9S" alt=""><figcaption></figcaption></figure>

*Patent annotation with Leadmine. Colour code: orange: generic chemical name, pink: generic molecule, grey: anatomy, violet: molecule dictionary, turquoise: mechanism, green: PubChem dictionary, dark red: gene, yellow: polymer, light red: journal, khaki: organism, dark orange: disease*

For now, the new biomedical annotations are limited to these three types. More entity types, or custom dictionaries, may follow in later phases.

<br>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://chembl.gitbook.io/surechembl/patent-annotation/biomedical-annotations.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
