ChEMBL Data Web Services

Getting Started

To best way to get started is to have a look at some example URLs requesting data from the ChEMBL web services. The table below provides a list of examples and a description of the data being returned.

Description

Example URL

Return all molecules

https://www.ebi.ac.uk/chembl/api/data/molecule

Keyword search for targets that contain 'cyclin' in pref_name

https://www.ebi.ac.uk/chembl/api/data/target?pref_name__contains=cyclin

Return molecules with molecular weight <= 300

https://www.ebi.ac.uk/chembl/api/data/molecule?molecule_properties__mw_freebase__lte=300

Return molecules with molecular weight <= 300 AND pref_name ends with nib

https://www.ebi.ac.uk/chembl/api/data/molecule?molecule_properties__mw_freebase__lte=300&pref_name__iendswith=nib

Return image for CHEMBL25

https://www.ebi.ac.uk/chembl/api/data/image/CHEMBL25

Flexmatch ("exact") search with SMILES c1ncccc1

https://www.ebi.ac.uk/chembl/api/data/molecule.json?molecule_structures__canonical_smiles__flexmatch=c1ncccc1

Substructure search with SMILES CC(=O)Oc1ccccc1C(=O)O (Aspirin)

https://www.ebi.ac.uk/chembl/api/data/substructure/CC(=O)Oc1ccccc1C(=O)O

Similarity search with SMILES CN1C(=O)C=C(c2cccc(Cl)c2)c3cc(ccc13)[[email protected]@](N)(c4ccc(Cl)cc4)c5cncn5C with 80% tanimoto similarity cut off

https://www.ebi.ac.uk/chembl/api/data/similarity/CN1C(=O)C=C(c2cccc(Cl)c2)c3cc(ccc13)[[email protected]@](N)(c4ccc(Cl)cc4)c5cncn5C/80

Resources

The following table provides a list of all ChEMBL web service resources currently available.

Resource Name

Description

URL

Activity

Activity values recorded in an Assay

https://www.ebi.ac.uk/chembl/api/data/activity

Assay

Assay details as reported in source Document/Dataset

https://www.ebi.ac.uk/chembl/api/data/assay

ATC

WHO ATC Classification for drugs

https://www.ebi.ac.uk/chembl/api/data/atc_class

Binding Site

Target binding site definition

https://www.ebi.ac.uk/chembl/api/data/binding_site

Biotherapeutic

Biotherapeutic molecules, which includes HELM notation and sequence data

https://www.ebi.ac.uk/chembl/api/data/biotherapeutic

Cell Line

Cell line information

https://www.ebi.ac.uk/chembl/api/data/cell_line

ChEMBL ID Lookup

Look up ChEMBL Id entity type

https://www.ebi.ac.uk/chembl/api/data/chembl_id_lookup

Compound Record

Occurence of a given compound in a spcecific document

https://www.ebi.ac.uk/chembl/api/data/compound_record

Compound Structural Alert

Indicates certain anomaly in compound structure

https://www.ebi.ac.uk/chembl/api/data/compound_structural_alert

Document

Document/Dataset from which Assays have been extracted

https://www.ebi.ac.uk/chembl/api/data/document

Document Similarity

Provides documents similar to a given one

https://www.ebi.ac.uk/chembl/api/data/document_similarity

Document Term

Provides keywords extracted from a document using the TextRank algorithm

https://www.ebi.ac.uk/chembl/api/data/document_term

Drug

Approved drugs information, icluding (but not limited to) applicants, patent numbers and research codes

https://www.ebi.ac.uk/chembl/api/data/drug

Drug Indication

Joins drugs with diseases providing references to relevant sources

https://www.ebi.ac.uk/chembl/api/data/drug_indication

GO Slim

GO slim ontology

https://www.ebi.ac.uk/chembl/api/data/go_slim

Image

Graphical (png, svg, json) representation of Molecule

https://www.ebi.ac.uk/chembl/api/data/image/CHEMBL1

Mechanism

Mechanism of action information for FDA-approved drugs

https://www.ebi.ac.uk/chembl/api/data/mechanism

Metabolism

Metabolic pathways with references

https://www.ebi.ac.uk/chembl/api/data/metabolism

Molecule

Molecule information, including properties, structural representations and synonyms

https://www.ebi.ac.uk/chembl/api/data/molecule

Molecule Form

Relationships between molecule parents and salts

https://www.ebi.ac.uk/chembl/api/data/molecule_form

Protein Classification

Protein family classification of TargetComponents

https://www.ebi.ac.uk/chembl/api/data/protein_class

Substructure

Molecule substructure search

https://www.ebi.ac.uk/chembl/api/data/substructure/CN%28CCCN%29c1cccc2ccccc12

Similarity

Molecule similarity search

https://www.ebi.ac.uk/chembl/api/data/similarity/CC%28=O%29Oc1ccccc1C%28=O%29O/70

Source

Document/Dataset source

https://www.ebi.ac.uk/chembl/api/data/source

Status

API status with chembl DB version number and API software version number

https://www.ebi.ac.uk/chembl/api/data/status

Target

Targets (protein and non-protein) defined in Assay

https://www.ebi.ac.uk/chembl/api/data/target

Target Component

Target sequence information (A Target may have 1 or more sequences)

https://www.ebi.ac.uk/chembl/api/data/target_component

Target Relation

Describes relations between targets

https://www.ebi.ac.uk/chembl/api/data/target_relation

Tissue

Tissue classification

https://www.ebi.ac.uk/chembl/api/data/tissue

Supported formats

These are formats currently supported by the API:

Meta Data and Pagination

It is now possible to download all data from a specific ChEMBL web service resource. This is made possible by returning responses from the web services in 'pages', which can be navigated through using a 'page_meta' section. The 'page_meta' section includes information about total number of hits, total number of pages and links to the next and previous pages. An example 'page_meta' section is displayed below:

"page_meta": {
"limit": 20,
"next": "/chembl/api/data/activity.json?limit=20&offset=20",
"offset": 0,
"previous": null,
"total_count": 13520737
}

To download all ChEMBL activity endpoints (>13 million), the following URL can be used: https://www.ebi.ac.uk/chembl/api/data/activity. By inspecting the the 'page_meta' section the link to page 2 can be found, e.g. https://www.ebi.ac.uk/chembl/api/data/activity?limit=20&offset=20.

Filtering and Ordering

It is possible to apply search filters to all resource requests using a URL friendly query language. For example, it is possible to return all ChEMBL targets that contain the term 'kinase' in the pref_name attribute with the following URL: https://www.ebi.ac.uk/chembl/api/data/target?pref_name__contains=kinase.

The pattern for applying a filter is as follows:

https://www.ebi.ac.uk/chembl/api/data/[resource]?[field]__[filter_type]=[value]

Examples of other filter type are listed in the table below.

Filter Type

Description

Example URL

exact (iexact)

Exact match with query

https://www.ebi.ac.uk/chembl/api/data/assay?assay_type__exact=B

contains (icontains)

Wild card search with query

https://www.ebi.ac.uk/chembl/api/data/assay?description__icontains=toxicity

startswith (istartswith)

Starts with query

https://www.ebi.ac.uk/chembl/api/data/target?pref_name__istartswith=serotonin

endswith (iendswith)

Ends with query

https://www.ebi.ac.uk/chembl/api/data/cell_line?cell_source_tissue__iendswith=carcinoma

regex (iregex)

Regular expression query

https://www.ebi.ac.uk/chembl/api/data/target?pref_name__iregex=(cdk1|cdk2)

gt (gte)

Greater than (or equal)

https://www.ebi.ac.uk/chembl/api/data/molecule?molecule_properties__full_mwt__gte=100

lt (lte)

Less than (or equal)

https://www.ebi.ac.uk/chembl/api/data/molecule?molecule_properties__alogp__lte=5

range

Within a range of values

https://www.ebi.ac.uk/chembl/api/data/molecule?molecule_properties__full_mwt__range=200,500

in

Appears within list of query values

https://www.ebi.ac.uk/chembl/api/data/molecule?molecule_chembl_id__in=CHEMBL25,CHEMBL941,CHEMBL1000

isnull

Field is null

https://www.ebi.ac.uk/chembl/api/data/molecule?helm_notation__isnull=false

search

Special type of filter allowing a full text search based on Solr queries.

https://www.ebi.ac.uk/chembl/api/data/molecule/search.json?q=aspirin https://www.ebi.ac.uk/chembl/api/data/target/search.json?q=lipoxygenase https://www.ebi.ac.uk/chembl/api/data/chembl_id_lookup/search?q=morphine https://www.ebi.ac.uk/chembl/api/data/activity/search?q=%22TG-GATES%22

To order the results returned by a particular field the 'order=[field]' argument as added to a request. For example a user can sort targets based on the pref_name using the following URL: https://www.ebi.ac.uk/chembl/api/data/target?order_by=pref_name

The default ordering is in ascending order. To return the results in descending orede place a '-' before the field name: https://www.ebi.ac.uk/chembl/api/data/target?order_by=-pref_name

Note that it is possible combine order_by and filter arguments:https://www.ebi.ac.uk/chembl/api/data/target?pref_name__contains=kinase&order_by=-pref_name

Chemical Searching

The 'Substructure' and 'Similarity' web service resources allow for the chemical content of ChEMBL to be searched. Similar to the other resources, these search based resources except filtering, paging and ordering arguments. These methods accept SMILES, InChI Key and molecule ChEMBL_ID as arguments and in the case of similarity searches an additional identity cut-off is needed. Some example molecule searches are provided in the table below.

Chemical Search Description

Example URL

Substructure search for against ChEMBL using aspirin SMILES string

https://www.ebi.ac.uk/chembl/api/data/substructure/CC(=O)Oc1ccccc1C(=O)O

Substructure search for against ChEMBL using aspirin CHEMBL_ID

https://www.ebi.ac.uk/chembl/api/data/substructure/CHEMBL25

Substructure search for against ChEMBL using aspirin InChI Key

https://www.ebi.ac.uk/chembl/api/data/substructure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N

Similarity (80% cut off) search for against ChEMBL using aspirin SMILES string

https://www.ebi.ac.uk/chembl/api/data/similarity/CC(=O)Oc1ccccc1C(=O)O/80

Similarity (80% cut off) search for against ChEMBL using aspirin CHEMBL_ID

https://www.ebi.ac.uk/chembl/api/data/similarity/CHEMBL25/80

Similarity (80% cut off) search for against ChEMBL using aspirin InChI Key

https://www.ebi.ac.uk/chembl/api/data/similarity/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/80

Searching with InChI key is only possible for InChI keys found in the ChEMBL database. The system does not try and convert InChI key to a chemical representation.

Molecule Images

The Image resource returns a graphical representation of a ChEMBL molecule. Unlike the other resources it does not except filtering and paging arguments, but does except image specific arguments. These are defined in the table below.

Image Argument

Description

Allowed Values

Default Value

Example URL

format

Image format

png, svg and json

png

https://www.ebi.ac.uk/chembl/api/data/image/CHEMBL25?format=svg

dimensions

Size of image in pixels

1-500

500

https://www.ebi.ac.uk/chembl/api/data/image/CHEMBL25?dimensions=200

ignoreCoords

Choose to use or ignore coordinates in ChEMBL molfiles

1 or 0

0 (Use ChEMBL molfile coordinates)

https://www.ebi.ac.uk/chembl/api/data/image/CHEMBL25?ignoreCoords=1

engine

Chemical toolkit used to generate image

rdkit or indigo

rdkit

https://www.ebi.ac.uk/chembl/api/data/image/CHEMBL25?engine=indigo

bgColor

Background color

Full list here

transparent

https://www.ebi.ac.uk/chembl/api/data/image/CHEMBL1.png?bgColor=orange

GET, POST and special characters

In GET request all the parameters has to be encoded into URL. Because there is a limitation of how long a URL can be it's often more convenient to use POST requests instead. POST parameters are embedded into request body and can be of any size. This is especially important when retrieving a long list of entities identified by (random) IDs.

ChEMBL API supports both GET and POST but since POST has a special meaning in REST protocol (CREATE), a special header has to be added to every POST request:

X-HTTP-Method-Override:GET

Another issue is character encoding. SMILES strings often contain characters (such as #, % or \) that have a special meaning in URLs. This is why when using GET, all parameters should be percent-encoded.

One example is a following SMILES string:

[Na+].CO[[email protected]@H](CCC#C\C=C/CCCC(C)CCCCC=C)C(=O)[O-]

Which can be encoded into URL in a following way: https://www.ebi.ac.uk/chembl/api/data/molecule/%5BNa+%5D.CO%[email protected]@H%5D(CCC%23C%5CC=C/CCCC(C)CCCCC=C)C(=O)%5BO-%5D

Below is another example of retrieving a molecule (CHEMBL1628285) that has the longest SMILES string currently stored in ChEMBL. The original SMILES string is:

CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC1OC(CO)C(O)C(O)C1O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC2OC(CO)C(O)C(O)C2O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC3OC(CO)C(O)C(O)C3O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC4OC(CO)C(O)C(O)C4O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC5OC(CO)C(O)C(O)C5O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC6OC(CO)C(O)C(O)C6O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC7OC(CO)C(O)C(O)C7O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC8OC(CO)C(O)C(O)C8O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC9OC(CO)C(O)C(O)C9O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC%10OC(CO)C(O)C(O)C%10O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC%11OC(CO)C(O)C(O)C%11O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC%12OC(CO)C(O)C(O)C%12O)C(O)CO.CCCCCCCCCC(C(=O)NCCc%13ccc(OP(=S)(Oc%14ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%14)N(C)\\N=C\\c%15ccc(Op%16(Oc%17ccc(\\C=N\\N(C)P(=S)(Oc%18ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%18)Oc%19ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%19)cc%17)np(Oc%20ccc(\\C=N\\N(C)P(=S)(Oc%21ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%21)Oc%22ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%22)cc%20)(Oc%23ccc(\\C=N\\N(C)P(=S)(Oc%24ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%24)Oc%25ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%25)cc%23)np(Oc%26ccc(\\C=N\\N(C)P(=S)(Oc%27ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%27)Oc%28ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%28)cc%26)(Oc%29ccc(\\C=N\\N(C)P(=S)(Oc%30ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%30)Oc%31ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%31)cc%29)n%16)cc%15)cc%13)P(=O)(O)[O-]

After encoding, the URL becomes this

CORS and JSONP

Both techniques are supported

Web Service Client

To help users get started with using the updated ChEMBL web services the existing web service client has also been released. This is written in the Python programming language and is available to install from Python Package Index by typing:

pip install chembl_webresource_client

The client code is open and hosted on GitHub: https://github.com/chembl/chembl_webresource_client.

The following list provides some example use cases of the client:

Search molecule by synonym

from chembl_webresource_client.new_client import new_client
molecule = new_client.molecule
res = molecule.search('viagra')

Search target by gene name

from chembl_webresource_client.new_client import new_client
target = new_client.target
gene_name = 'GABRB2'
res = target.search(gene_name)

Search target by synonym

from chembl_webresource_client.new_client import new_client
target = new_client.target
gene_name = 'GABRB2'
res = target.filter(target_synonym__icontains=gene_name)

Having a list of molecules ChEMBL IDs in a CSV file, produce another CSV file that maps every compound ID into a list of UniProt accession numbers and save the mapping into output csv file

import csv
from chembl_webresource_client.new_client import new_client
# This will be our resulting structure mapping compound ChEMBL IDs into target uniprot IDs
compounds2targets = dict()
# First, let's just parse the csv file to extract compounds ChEMBL IDs:
with open('compounds_list.csv', 'rb') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
compounds2targets[row[0]] = set()
# OK, we have our source IDs, let's process them in chunks:
chunk_size = 50
keys = compounds2targets.keys()
for i in range(0, len(keys), chunk_size):
# we jump from compounds to targets through activities:
activities = new_client.activity.filter(molecule_chembl_id__in=keys[i:i + chunk_size])
# extracting target ChEMBL IDs from activities:
for act in activities:
compounds2targets[act['molecule_chembl_id']].add(act['target_chembl_id'])
# OK, now our dictionary maps from compound ChEMBL IDs into target ChEMBL IDs
# We would like to replace target ChEMBL IDs with uniprot IDs
for key, val in compounds2targets.items():
# We don't know how many targets are assigned to a given compound so again it's
# better to process targets in chunks:
lval = list(val)
uniprots = set()
for i in range(0, len(val), chunk_size):
targets = new_client.target.filter(target_chembl_id__in=lval[i:i + chunk_size])
uniprots |= set(sum([[comp['accession'] for comp in t['target_components']] for t in targets],[]))
compounds2targets[key] = uniprots
# Finally write it to the output csv file
with open('compounds_2_targets.csv', 'wb') as csvfile:
writer = csv.writer(csvfile)
for key, val in compounds2targets.items():
writer.writerow([key] + list(val))

Having a list of molecules ChEMBL IDs in a CSV file, produce another CSV file that maps every compound ID into a list of human gene names.

import csv
from chembl_webresource_client.new_client import new_client
# This will be our resulting structure mapping compound ChEMBL IDs into target uniprot IDs
compounds2targets = dict()
# First, let's just parse the csv file to extract compounds ChEMBL IDs:
with open('compounds_list.csv', 'rb') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
compounds2targets[row[0]] = set()
# OK, we have our source IDs, let's process them in chunks:
chunk_size = 50
keys = compounds2targets.keys()
for i in range(0, len(keys), chunk_size):
# we jump from compounds to targets through activities:
activities = new_client.activity.filter(molecule_chembl_id__in=keys[i:i + chunk_size])
# extracting target ChEMBL IDs from activities:
for act in activities:
compounds2targets[act['molecule_chembl_id']].add(act['target_chembl_id'])
# OK, now our dictionary maps from compound ChEMBL IDs into target ChEMBL IDs
# We would like to replace target ChEMBL IDs with uniprot IDs
for key, val in compounds2targets.items():
# We don't know how many targets are assigned to a given compound so again it's
# better to process targets in chunks:
lval = list(val)
genes = set()
for i in range(0, len(val), chunk_size):
targets = new_client.target.filter(target_chembl_id__in=lval[i:i + chunk_size])
for target in targets:
for component in target['target_components']:
for synonym in component['target_component_synonyms']:
if synonym['syn_type'] == "GENE_SYMBOL":
genes.add(synonym['component_synonym'])
compounds2targets[key] = genes
# Finally write it to the output csv file
with open('compounds_2_genes.csv', 'wb') as csvfile:
writer = csv.writer(csvfile)
for key, val in compounds2targets.items():
writer.writerow([key] + list(val))

Find compounds similar to given SMILES query with similarity threshold of 85%

from chembl_webresource_client.new_client import new_client
similarity = new_client.similarity
res = similarity.filter(smiles="CO[[email protected]@H](CCC#C\C=C/CCCC(C)CCCCC=C)C(=O)[O-]", similarity=85)

Find compounds similar to aspirin (CHEMBL25) with similarity threshold of 70%

from chembl_webresource_client.new_client import new_client
molecule = new_client.molecule
similarity = new_client.similarity
aspirin_chembl_id = molecule.search('aspirin')[0]['molecule_chembl_id']
res = similarity.filter(chembl_id="CHEMBL25", similarity=70)

Perform substructure search using SMILES

from chembl_webresource_client.new_client import new_client
substructure = new_client.substructure
res = substructure.filter(smiles="CN(CCCN)c1cccc2ccccc12")

Perform substructure search using ChEMBL ID

from chembl_webresource_client.new_client import new_client
substructure = new_client.substructure
substructure.filter(chembl_id="CHEMBL25")

Get a single molecule by ChEMBL ID

from chembl_webresource_client.new_client import new_client
molecule = new_client.molecule
m1 = molecule.get('CHEMBL25')

Get a single molecule by SMILES

from chembl_webresource_client.new_client import new_client
molecule = new_client.molecule
m1 = molecule.get('CC(=O)Oc1ccccc1C(=O)O')

Get a single molecule by InChi Key

from chembl_webresource_client.new_client import new_client
molecule = new_client.molecule
molecule.get('BSYNRYMUTXBXSQ-UHFFFAOYSA-N')

Get many compounds by their ChEMBL IDs

from chembl_webresource_client.new_client import new_client
molecule = new_client.molecule
records = molecule.get(['CHEMBL6498', 'CHEMBL6499', 'CHEMBL6505'])

Get many compounds by a list of SMILES

from chembl_webresource_client.new_client import new_client
molecule = new_client.molecule
records = molecule.get(['CNC(=O)c1ccc(cc1)N(CC#C)Cc2ccc3nc(C)nc(O)c3c2',
'Cc1cc2SC(C)(C)CC(C)(C)c2cc1\\N=C(/S)\\Nc3ccc(cc3)S(=O)(=O)N',
'CC(C)C[[email protected]](NC(=O)[[email protected]@H](NC(=O)[[email protected]](Cc1c[nH]c2ccccc12)NC(=O)[[email protected]]3CCCN3C(=O)C(CCCCN)CCCCN)C(C)(C)C)C(=O)O'])

Get many compounds by a list of InChi Keys

from chembl_webresource_client.new_client import new_client
molecule = new_client.molecule
records = molecule.get(['XSQLHVPPXBBUPP-UHFFFAOYSA-N', 'JXHVRXRRSSBGPY-UHFFFAOYSA-N', 'TUHYVXGNMOGVMR-GASGPIRDSA-N'])

Obtain the pChEMBL values for compound

from chembl_webresource_client.new_client import new_client
activities = new_client.activity
res = activities.filter(molecule_chembl_id="CHEMBL25", pchembl_value__isnull=False)

Obtain the pChEMBL value for a specific compound AND a specific target

from chembl_webresource_client.new_client import new_client
activities = new_client.activity
activities.filter(molecule_chembl_id="CHEMBL25", target_chembl_id="CHEMBL612545", pchembl_value__isnull=False)

Get all approved drugs

from chembl_webresource_client.new_client import new_client
molecule = new_client.molecule
approved_drugs = molecule.filter(max_phase=4)

Get approved drugs for lung cancer

from chembl_webresource_client.new_client import new_client
drug_indication = new_client.drug_indication
molecules = new_client.molecule
lung_cancer_ind = drug_indication.filter(efo_term__icontains="LUNG CARCINOMA")
lung_cancer_mols = molecules.filter(molecule_chembl_id__in=[x['molecule_chembl_id'] for x in lung_cancer_ind])

Get all molecules in ChEMBL with no Rule-of-Five violations

from chembl_webresource_client.new_client import new_client
molecule = new_client.molecule
no_violations = molecule.filter(molecule_properties__num_ro5_violations=0)

Get all biotherapeutic molecules

from chembl_webresource_client.new_client import new_client
molecule = new_client.molecule
biotherapeutics = molecule.filter(biotherapeutic__isnull=False)

Return molecules with molecular weight <= 300

from chembl_webresource_client.new_client import new_client
molecule = new_client.molecule
light_molecules = molecule.filter(molecule_properties__mw_freebase__lte=300)

Return molecules with molecular weight <= 300 AND pref_name ends with nib

from chembl_webresource_client.new_client import new_client
molecule = new_client.molecule
light_nib_molecules = molecule.filter(molecule_properties__mw_freebase__lte=300).filter(pref_name__iendswith="nib")
from chembl_webresource_client.new_client import new_client
target = new_client.target
activity = new_client.activity
herg = target.search('herg')[0]
herg_activities = activity.filter(target_chembl_id=herg['target_chembl_id']).filter(standard_type="Ki")
from chembl_webresource_client.new_client import new_client
activity = new_client.activity
res = activity.search('"TG-GATES"')

Get all activities for a specific target with assay type 'B' OR 'F'

from chembl_webresource_client.new_client import new_client
activity = new_client.activity
res = activity.filter(target_chembl_id='CHEMBL3938', assay_type__iregex='(B|F)')

Search for ADMET-reated inhibitor assays

from chembl_webresource_client.new_client import new_client
assay = new_client.assay
res = assay.search('inhibitor').filter(assay_type='A')

Get cell line by cellosaurus id

from chembl_webresource_client.new_client import new_client
cell_line = new_client.cell_line
res = cell_line.filter(cellosaurus_id="CVCL_0417")

Filter drugs by approval year and name

from chembl_webresource_client.new_client import new_client
drug = new_client.drug
res = drug.filter(first_approval=1976).filter(usan_stem="-azosin")

Get tissue by BTO ID

from chembl_webresource_client.new_client import new_client
tissue = new_client.tissue
res = tissue.filter(bto_id="BTO:0001073")

Get tissue by Caloha id

from chembl_webresource_client.new_client import new_client
tissue = new_client.tissue
res = tissue.filter(caloha_id="TS-0490")

Get tissue by Uberon id

from chembl_webresource_client.new_client import new_client
tissue = new_client.tissue
res = tissue.filter(uberon_id="UBERON:0000173")

Get tissue by name

from chembl_webresource_client.new_client import new_client
tissue = new_client.tissue
res = tissue.filter(pref_name__istartswith='blood')

Search documents for 'cytokine'

from chembl_webresource_client.new_client import new_client
document = new_client.document
res = document.search('cytokine')

Search for compound in Unichem

from chembl_webresource_client.new_client import new_client
ret = unichem.get('AIN')

Resolve InChi Key to Inchi using Unichem

from chembl_webresource_client.unichem import unichem_client as unichem
ret = unichem.inchiFromKey('AAOVKJBEBIDNHE-UHFFFAOYSA-N')

Convert SMILES to CTAB

from chembl_webresource_client.unichem import unichem_client as unichem
aspirin = utils.smiles2ctab('O=C(Oc1ccccc1C(=O)O)C')

Convert SMILES to image and image back to SMILES

from chembl_webresource_client.utils import utils
aspirin = 'CC(=O)Oc1ccccc1C(=O)O'
im = utils.smiles2image(aspirin)
mol = utils.image2ctab(im)
smiles = utils.ctab2smiles(mol).split()[2]
self.assertEqual(smiles, aspirin)

Compute fingerprints

from chembl_webresource_client.utils import utils
aspirin = utils.smiles2ctab('O=C(Oc1ccccc1C(=O)O)C')
fingerprints = utils.sdf2fps(aspirin)

Compute Maximal Common Substructure

from chembl_webresource_client.utils import utils
smiles = ["O=C(NCc1cc(OC)c(O)cc1)CCCC/C=C/C(C)C", "CC(C)CCCCCC(=O)NCC1=CC(=C(C=C1)O)OC", "c1(C=O)cc(OC)c(O)cc1"]
mols = [utils.smiles2ctab(smile) for smile in smiles]
sdf = ''.join(mols)
result = utils.mcs(sdf)

Compute various molecular descriptors

from chembl_webresource_client.utils import utils
aspirin = utils.smiles2ctab('O=C(Oc1ccccc1C(=O)O)C')
num_atoms = json.loads(utils.getNumAtoms(aspirin))[0]
mol_wt = json.loads(utils.molWt(aspirin))[0]
log_p = json.loads(utils.logP(aspirin))[0]
tpsa = json.loads(utils.tpsa(aspirin))[0]
descriptors = json.loads(utils.descriptors(aspirin))[0]

Standardise molecule

from chembl_webresource_client.utils import utils
mol = utils.smiles2ctab("[Na]OC(=O)Cc1ccc(C[NH3+])cc1.c1nnn[n-]1.O")
st = utils.standardise(mol)

Example Queries

The table below provides a list of example searches a user may wish to carry out using the ChEMBL web services. The aim of the list is highlight the type of data that can be retrieved from ChEMBL using the web services. The examples can be adapted, extended and chained together to build up more complex workflows.

Description

Example URL

Get all approved drugs

https://www.ebi.ac.uk/chembl/api/data/molecule?max_phase=4

Get all molecules in ChEMBL with no Rule-of-Five violations

https://www.ebi.ac.uk/chembl/api/data/molecule?molecule_properties__num_ro5_violations=0

Get all biotherapeutic molecules

https://www.ebi.ac.uk/chembl/api/data/molecule?biotherapeutic__isnull=false

Get all functional/phenotypic assays (assay_type=F), from the literature (src_id 1)

https://www.ebi.ac.uk/chembl/api/data/assay?assay_type=F&src_id=1

Get all binding assays (assay_type=B), which also contain the term 'insulin'

https://www.ebi.ac.uk/chembl/api/data/assay?assay_type=B&description__icontains=insulin

Get all Clearance activity values (standard_type=CL), for rat, mouse and human

https://www.ebi.ac.uk/chembl/api/data/activity?standard_type=CL&target_organism__in=Homo%20sapiens,Rattus%20norvegicus,Mus%20musculus

Get all cell lines, which end with the term 'carcinoma'

https://www.ebi.ac.uk/chembl/api/data/cell_line?cell_source_tissue__iendswith=carcinoma

Get all mechanism of action details muscarinic acetylcholine receptor antagonists

https://www.ebi.ac.uk/chembl/api/data/mechanism?mechanism_of_action__icontains=Muscarinic%20acetylcholine%20receptor&action_type=ANTAGONIST

Get all targets (single protein, protein complexes, protein families etc.), which contain UniProt accession Q13936

https://www.ebi.ac.uk/chembl/api/data/target?target_components__accession=Q13936

Get the entity type for CHEMBL1000

https://www.ebi.ac.uk/chembl/api/data/chembl_id_lookup?chembl_id=CHEMBL1000

Use Cases

The following use cases are provided as an example of how it is possible to chain together ChEMBL web service calls to answer to complicated questions using ChEMBL data.

Investigating the potency of approved drugs against their efficacy targets

Since ChEMBL includes both mechanism of action information for approved drugs and pharmacology data from published assays, it is interesting to combine this information and investigate the potency of a drug against its efficacy target. This might be done either to confirm or refute the proposed target assignment, or to better understand how the in-vitro potency of a compound might relate to clinical efficacy or ADMET properties. The following example shows how this type of analysis could be carried out using the ChEMBL web services.

  1. Use the molecule end point to retrieve a list of approved drugs (max_phase=4): https://www.ebi.ac.uk/chembl/api/data/molecule?max_phase=4 Using ChEMBL_20, this will retrieve 2795 drugs in total. We will use CHEMBL998 (loratadine) as an example, but the same workflow could be repeated for the others.

  2. Use the mechanism end point to retrieve the mechanism of action and target of each drug: https://www.ebi.ac.uk/chembl/api/data/mechanism?molecule_chembl_id=CHEMBL998 Loratadine is reported to be a histamine H1 receptor antagonist, represented by the ChEMBL target CHEMBL231.

  3. Use the assay end point to identify any binding assays (assay_type=B) for the human histamine H1 receptor: https://www.ebi.ac.uk/chembl/api/data/assay?target_chembl_id=CHEMBL231&relationship_type=D&assay_type=B Note the relationship_type=D filter restricts the results to assays where we are confident that the human receptor was tested and not an orthologue. A total of 213 assays are identified. We will take CHEMBL1909156 as an example.

  4. Combine the results of the above queries to identify any potency measurements for loratadine in these assays: https://www.ebi.ac.uk/chembl/api/data/activity?molecule_chembl_id=CHEMBL998&assay_chembl_id=CHEMBL1909156 This assay reports an IC50 value of 170 nM and a Ki measurement of 20 nM for loratadine against the histamine H1 receptor. This process could be repeated for the other 212 assays by either iterating through them individually or, where a sufficiently small number of assays are returned, using the __in filter on the activity end point to retrieve several assays at once e.g., https://www.ebi.ac.uk/chembl/api/data/activity?molecule_chembl_id=CHEMBL998&assay_chembl_id__in=CHEMBL830379,CHEMBL1909156,CHEMBL882906,CHEMBL691450 An additional 2 assays are identified in this way, reporting an IC50 value of 290 nM and a Ki value of 414 nM. These values could be averaged, or the lowest taken, to give an indication of the average potency of the compound, or the assay conditions could be investigated further to try to identify which assay might be most reliable or informative (the ChEMBL identifier for the document from which the data are extracted is also provided by the activity end point).

More examples

As mentioned above, there is an IPython notebook with example API calls and the corresponsing Python client code https://github.com/chembl/mychembl/blob/master/ipython_notebooks/09_myChEMBL_web_services.ipynb. Additionally there is a comprehensive test suite, covering almost all the functionality offered by the API. It can be found in the Python client library: https://github.com/chembl/chembl_webresource_client/blob/master/chembl_webresource_client/tests.py.

Contents
Getting Started
Resources
Supported formats
Meta Data and Pagination
Filtering and Ordering
Chemical Searching
Molecule Images
GET, POST and special characters
CORS and JSONP
Web Service Client
Search molecule by synonym
Search target by gene name
Search target by synonym
Having a list of molecules ChEMBL IDs in a CSV file, produce another CSV file that maps every compound ID into a list of UniProt accession numbers and save the mapping into output csv file
Having a list of molecules ChEMBL IDs in a CSV file, produce another CSV file that maps every compound ID into a list of human gene names.
Find compounds similar to given SMILES query with similarity threshold of 85%
Find compounds similar to aspirin (CHEMBL25) with similarity threshold of 70%
Perform substructure search using SMILES
Perform substructure search using ChEMBL ID
Get a single molecule by ChEMBL ID
Get a single molecule by SMILES
Get a single molecule by InChi Key
Get many compounds by their ChEMBL IDs
Get many compounds by a list of SMILES
Get many compounds by a list of InChi Keys
Obtain the pChEMBL values for compound
Obtain the pChEMBL value for a specific compound AND a specific target
Get all approved drugs
Get approved drugs for lung cancer
Get all molecules in ChEMBL with no Rule-of-Five violations
Get all biotherapeutic molecules
Return molecules with molecular weight <= 300
Return molecules with molecular weight <= 300 AND pref_name ends with nib
Get all Ki activities related to the hERG target
Get all activities related to the Open TG-GATES project
Get all activities for a specific target with assay type 'B' OR 'F'
Search for ADMET-reated inhibitor assays
Get cell line by cellosaurus id
Filter drugs by approval year and name
Get tissue by BTO ID
Get tissue by Caloha id
Get tissue by Uberon id
Get tissue by name
Search documents for 'cytokine'
Search for compound in Unichem
Resolve InChi Key to Inchi using Unichem
Convert SMILES to CTAB
Convert SMILES to image and image back to SMILES
Compute fingerprints
Compute Maximal Common Substructure
Compute various molecular descriptors
Standardise molecule
Example Queries
Use Cases