ChEMBL Data Web Services

Getting Started

To best way to get started is to have a look at some example URLs requesting data from the ChEMBL web services. The table below provides a list of examples and a description of the data being returned.
Description
Example URL
Keyword search for targets that contain 'cyclin' in pref_name
Return molecules with molecular weight <= 300 AND pref_name ends with nib
Substructure search with SMILES CC(=O)Oc1ccccc1C(=O)O (Aspirin)
Similarity search with SMILES CN1C(=O)C=C(c2cccc(Cl)c2)c3cc(ccc13)[[email protected]@](N)(c4ccc(Cl)cc4)c5cncn5C with 80% tanimoto similarity cut off

Resources

The following table provides a list of all ChEMBL web service resources currently available.
Resource Name
Description
URL
Activity
Activity values recorded in an Assay
Assay
Assay details as reported in source Document/Dataset
ATC
WHO ATC Classification for drugs
Binding Site
Target binding site definition
Biotherapeutic
Biotherapeutic molecules, which includes HELM notation and sequence data
Cell Line
Cell line information
ChEMBL ID Lookup
Look up ChEMBL Id entity type
Compound Record
Occurence of a given compound in a spcecific document
Compound Structural Alert
Indicates certain anomaly in compound structure
Document
Document/Dataset from which Assays have been extracted
Document Similarity
Provides documents similar to a given one
Document Term
Provides keywords extracted from a document using the TextRank algorithm
Drug
Approved drugs information, icluding (but not limited to) applicants, patent numbers and research codes
Drug Indication
Joins drugs with diseases providing references to relevant sources
Image
Graphical (png, svg, json) representation of Molecule
Mechanism
Mechanism of action information for FDA-approved drugs
Metabolism
Metabolic pathways with references
Molecule
Molecule information, including properties, structural representations and synonyms
Molecule Form
Relationships between molecule parents and salts
Protein Classification
Protein family classification of TargetComponents
Source
Document/Dataset source
Status
API status with chembl DB version number and API software version number
Target
Targets (protein and non-protein) defined in Assay
Target Component
Target sequence information (A Target may have 1 or more sequences)
Target Relation
Describes relations between targets
Tissue
Tissue classification

Supported formats

These are formats currently supported by the API:

Meta Data and Pagination

It is now possible to download all data from a specific ChEMBL web service resource. This is made possible by returning responses from the web services in 'pages', which can be navigated through using a 'page_meta' section. The 'page_meta' section includes information about total number of hits, total number of pages and links to the next and previous pages. An example 'page_meta' section is displayed below:
1
"page_meta": {
2
3
"limit": 20,
4
"next": "/chembl/api/data/activity.json?limit=20&offset=20",
5
"offset": 0,
6
"previous": null,
7
"total_count": 13520737
8
9
}
Copied!
To download all ChEMBL activity endpoints (>13 million), the following URL can be used: https://www.ebi.ac.uk/chembl/api/data/activity. By inspecting the the 'page_meta' section the link to page 2 can be found, e.g. https://www.ebi.ac.uk/chembl/api/data/activity?limit=20&offset=20.

Filtering and Ordering

It is possible to apply search filters to all resource requests using a URL friendly query language. For example, it is possible to return all ChEMBL targets that contain the term 'kinase' in the pref_name attribute with the following URL: https://www.ebi.ac.uk/chembl/api/data/target?pref_name__contains=kinase.
The pattern for applying a filter is as follows:
1
https://www.ebi.ac.uk/chembl/api/data/[resource]?[field]__[filter_type]=[value]
Copied!
Examples of other filter type are listed in the table below.
Filter Type
Description
Example URL
exact (iexact)
Exact match with query
contains (icontains)
Wild card search with query
To order the results returned by a particular field the 'order=[field]' argument as added to a request. For example a user can sort targets based on the pref_name using the following URL: https://www.ebi.ac.uk/chembl/api/data/target?order_by=pref_name
The default ordering is in ascending order. To return the results in descending orede place a '-' before the field name: https://www.ebi.ac.uk/chembl/api/data/target?order_by=-pref_name
Note that it is possible combine order_by and filter arguments:https://www.ebi.ac.uk/chembl/api/data/target?pref_name__contains=kinase&order_by=-pref_name

Chemical Searching

The 'Substructure' and 'Similarity' web service resources allow for the chemical content of ChEMBL to be searched. Similar to the other resources, these search based resources except filtering, paging and ordering arguments. These methods accept SMILES, InChI Key and molecule ChEMBL_ID as arguments and in the case of similarity searches an additional identity cut-off is needed. Some example molecule searches are provided in the table below.
Chemical Search Description
Example URL
Substructure search for against ChEMBL using aspirin SMILES string
Substructure search for against ChEMBL using aspirin CHEMBL_ID
Substructure search for against ChEMBL using aspirin InChI Key
Similarity (80% cut off) search for against ChEMBL using aspirin SMILES string
Similarity (80% cut off) search for against ChEMBL using aspirin CHEMBL_ID
Similarity (80% cut off) search for against ChEMBL using aspirin InChI Key
Searching with InChI key is only possible for InChI keys found in the ChEMBL database. The system does not try and convert InChI key to a chemical representation.

Molecule Images

The Image resource returns a graphical representation of a ChEMBL molecule. Unlike the other resources it does not except filtering and paging arguments, but does except image specific arguments. These are defined in the table below.
Image Argument
Description
Allowed Values
Default Value
Example URL
format
Image format
png, svg and json
png
dimensions
Size of image in pixels
1-500
500
ignoreCoords
Choose to use or ignore coordinates in ChEMBL molfiles
1 or 0
0 (Use ChEMBL molfile coordinates)
engine
Chemical toolkit used to generate image
rdkit or indigo
rdkit
bgColor
Background color
Full list here
transparent

GET, POST and special characters

In GET request all the parameters has to be encoded into URL. Because there is a limitation of how long a URL can be it's often more convenient to use POST requests instead. POST parameters are embedded into request body and can be of any size. This is especially important when retrieving a long list of entities identified by (random) IDs.
ChEMBL API supports both GET and POST but since POST has a special meaning in REST protocol (CREATE), a special header has to be added to every POST request:
1
X-HTTP-Method-Override:GET
Copied!
Another issue is character encoding. SMILES strings often contain characters (such as #, % or \) that have a special meaning in URLs. This is why when using GET, all parameters should be percent-encoded.
One example is a following SMILES string:
1
[Na+].CO[[email protected]@H](CCC#C\C=C/CCCC(C)CCCCC=C)C(=O)[O-]
Copied!
Below is another example of retrieving a molecule (CHEMBL1628285) that has the longest SMILES string currently stored in ChEMBL. The original SMILES string is:
1
CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC1OC(CO)C(O)C(O)C1O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC2OC(CO)C(O)C(O)C2O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC3OC(CO)C(O)C(O)C3O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC4OC(CO)C(O)C(O)C4O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC5OC(CO)C(O)C(O)C5O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC6OC(CO)C(O)C(O)C6O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC7OC(CO)C(O)C(O)C7O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC8OC(CO)C(O)C(O)C8O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC9OC(CO)C(O)C(O)C9O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC%10OC(CO)C(O)C(O)C%10O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC%11OC(CO)C(O)C(O)C%11O)C(O)CO.CCCCCCCCCCCCCCCC[NH2+]OC(CO)C(O)C(OC%12OC(CO)C(O)C(O)C%12O)C(O)CO.CCCCCCCCCC(C(=O)NCCc%13ccc(OP(=S)(Oc%14ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%14)N(C)\\N=C\\c%15ccc(Op%16(Oc%17ccc(\\C=N\\N(C)P(=S)(Oc%18ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%18)Oc%19ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%19)cc%17)np(Oc%20ccc(\\C=N\\N(C)P(=S)(Oc%21ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%21)Oc%22ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%22)cc%20)(Oc%23ccc(\\C=N\\N(C)P(=S)(Oc%24ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%24)Oc%25ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%25)cc%23)np(Oc%26ccc(\\C=N\\N(C)P(=S)(Oc%27ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%27)Oc%28ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%28)cc%26)(Oc%29ccc(\\C=N\\N(C)P(=S)(Oc%30ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%30)Oc%31ccc(CCNC(=O)C(CCCCCCCCC)P(=O)(O)[O-])cc%31)cc%29)n%16)cc%15)cc%13)P(=O)(O)[O-]
Copied!
After encoding, the URL becomes this

CORS and JSONP

Both techniques are supported

Web Service Client

To help users get started with using the updated ChEMBL web services the existing web service client has also been released. This is written in the Python programming language and is available to install from Python Package Index by typing:
1
pip install chembl_webresource_client
Copied!
The client code is open and hosted on GitHub: https://github.com/chembl/chembl_webresource_client.
The following list provides some example use cases of the client:

Search molecule by synonym

1
from chembl_webresource_client.new_client import new_client
2
molecule = new_client.molecule
3
res = molecule.search('viagra')
Copied!

Search target by gene name

1
from chembl_webresource_client.new_client import new_client
2
target = new_client.target
3
gene_name = 'GABRB2'
4
res = target.search(gene_name)
Copied!

Search target by synonym

1
from chembl_webresource_client.new_client import new_client
2
target = new_client.target
3
gene_name = 'GABRB2'
4
res = target.filter(target_synonym__icontains=gene_name)
Copied!

Having a list of molecules ChEMBL IDs in a CSV file, produce another CSV file that maps every compound ID into a list of UniProt accession numbers and save the mapping into output csv file

1
import csv
2
from chembl_webresource_client.new_client import new_client
3
4
# This will be our resulting structure mapping compound ChEMBL IDs into target uniprot IDs
5
compounds2targets = dict()
6
7
# First, let's just parse the csv file to extract compounds ChEMBL IDs:
8
with open('compounds_list.csv', 'rb') as csvfile:
9
reader = csv.reader(csvfile)
10
for row in reader:
11
compounds2targets[row[0]] = set()
12
13
# OK, we have our source IDs, let's process them in chunks:
14
chunk_size = 50
15
keys = compounds2targets.keys()
16
17
for i in range(0, len(keys), chunk_size):
18
# we jump from compounds to targets through activities:
19
activities = new_client.activity.filter(molecule_chembl_id__in=keys[i:i + chunk_size])
20
# extracting target ChEMBL IDs from activities:
21
for act in activities:
22
compounds2targets[act['molecule_chembl_id']].add(act['target_chembl_id'])
23
24
# OK, now our dictionary maps from compound ChEMBL IDs into target ChEMBL IDs
25
# We would like to replace target ChEMBL IDs with uniprot IDs
26
27
for key, val in compounds2targets.items():
28
# We don't know how many targets are assigned to a given compound so again it's
29
# better to process targets in chunks:
30
lval = list(val)
31
uniprots = set()
32
for i in range(0, len(val), chunk_size):
33
targets = new_client.target.filter(target_chembl_id__in=lval[i:i + chunk_size])
34
uniprots |= set(sum([[comp['accession'] for comp in t['target_components']] for t in targets],[]))
35
compounds2targets[key] = uniprots
36
37
# Finally write it to the output csv file
38
with open('compounds_2_targets.csv', 'wb') as csvfile:
39
writer = csv.writer(csvfile)
40
for key, val in compounds2targets.items():
41
writer.writerow([key] + list(val))
Copied!

Having a list of molecules ChEMBL IDs in a CSV file, produce another CSV file that maps every compound ID into a list of human gene names.

1
import csv
2
from chembl_webresource_client.new_client import new_client
3
4
# This will be our resulting structure mapping compound ChEMBL IDs into target uniprot IDs
5
compounds2targets = dict()
6
7
# First, let's just parse the csv file to extract compounds ChEMBL IDs:
8
with open('compounds_list.csv', 'rb') as csvfile:
9
reader = csv.reader(csvfile)
10
for row in reader:
11
compounds2targets[row[0]] = set()
12
13
# OK, we have our source IDs, let's process them in chunks:
14
chunk_size = 50
15
keys = compounds2targets.keys()
16
17
for i in range(0, len(keys), chunk_size):
18
# we jump from compounds to targets through activities:
19
activities = new_client.activity.filter(molecule_chembl_id__in=keys[i:i + chunk_size])
20
# extracting target ChEMBL IDs from activities:
21
for act in activities:
22
compounds2targets[act['molecule_chembl_id']].add(act['target_chembl_id'])
23
24
# OK, now our dictionary maps from compound ChEMBL IDs into target ChEMBL IDs
25
# We would like to replace target ChEMBL IDs with uniprot IDs
26
27
for key, val in compounds2targets.items():
28
# We don't know how many targets are assigned to a given compound so again it's
29
# better to process targets in chunks:
30
lval = list(val)
31
genes = set()
32
for i in range(0, len(val), chunk_size):
33
targets = new_client.target.filter(target_chembl_id__in=lval[i:i + chunk_size])
34
for target in targets:
35
for component in target['target_components']:
36
for synonym in component['target_component_synonyms']:
37
if synonym['syn_type'] == "GENE_SYMBOL":
38
genes.add(synonym['component_synonym'])
39
compounds2targets[key] = genes
40
41
# Finally write it to the output csv file
42
with open('compounds_2_genes.csv', 'wb') as csvfile:
43
writer = csv.writer(csvfile)
44
for key, val in compounds2targets.items():
45
writer.writerow([key] + list(val))
46
Copied!

Find compounds similar to given SMILES query with similarity threshold of 85%

1
from chembl_webresource_client.new_client import new_client
2
similarity = new_client.similarity
3
res = similarity.filter(smiles="CO[[email protected]@H](CCC#C\C=C/CCCC(C)CCCCC=C)C(=O)[O-]", similarity=85)
Copied!

Find compounds similar to aspirin (CHEMBL25) with similarity threshold of 70%

1
from chembl_webresource_client.new_client import new_client
2
molecule = new_client.molecule
3
similarity = new_client.similarity
4
aspirin_chembl_id = molecule.search('aspirin')[0]['molecule_chembl_id']
5
res = similarity.filter(chembl_id="CHEMBL25", similarity=70)
Copied!

Perform substructure search using SMILES

1
from chembl_webresource_client.new_client import new_client
2
substructure = new_client.substructure
3
res = substructure.filter(smiles="CN(CCCN)c1cccc2ccccc12")
Copied!

Perform substructure search using ChEMBL ID

1
from chembl_webresource_client.new_client import new_client
2
substructure = new_client.substructure
3
substructure.filter(chembl_id="CHEMBL25")
Copied!

Get a single molecule by ChEMBL ID

1
from chembl_webresource_client.new_client import new_client
2
molecule = new_client.molecule
3
m1 = molecule.get('CHEMBL25')
Copied!

Get a single molecule by SMILES

1
from chembl_webresource_client.new_client import new_client
2
molecule = new_client.molecule
3
m1 = molecule.get('CC(=O)Oc1ccccc1C(=O)O')
Copied!

Get a single molecule by InChi Key

1
from chembl_webresource_client.new_client import new_client
2
molecule = new_client.molecule
3
molecule.get('BSYNRYMUTXBXSQ-UHFFFAOYSA-N')
Copied!

Get many compounds by their ChEMBL IDs

1
from chembl_webresource_client.new_client import new_client
2
molecule = new_client.molecule
3
records = molecule.get(['CHEMBL6498', 'CHEMBL6499', 'CHEMBL6505'])
Copied!

Get many compounds by a list of SMILES

1
from chembl_webresource_client.new_client import new_client
2
molecule = new_client.molecule
3
records = molecule.get(['CNC(=O)c1ccc(cc1)N(CC#C)Cc2ccc3nc(C)nc(O)c3c2',
4
'Cc1cc2SC(C)(C)CC(C)(C)c2cc1\\N=C(/S)\\Nc3ccc(cc3)S(=O)(=O)N',
5
'CC(C)C[[email protected]](NC(=O)[[email protected]@H](NC(=O)[[email protected]](Cc1c[nH]c2ccccc12)NC(=O)[[email protected]]3CCCN3C(=O)C(CCCCN)CCCCN)C(C)(C)C)C(=O)O'])
Copied!

Get many compounds by a list of InChi Keys

1
from chembl_webresource_client.new_client import new_client
2
molecule = new_client.molecule
3
records = molecule.get(['XSQLHVPPXBBUPP-UHFFFAOYSA-N', 'JXHVRXRRSSBGPY-UHFFFAOYSA-N', 'TUHYVXGNMOGVMR-GASGPIRDSA-N'])
Copied!

Obtain the pChEMBL values for compound

1
from chembl_webresource_client.new_client import new_client
2
activities = new_client.activity
3
res = activities.filter(molecule_chembl_id="CHEMBL25", pchembl_value__isnull=False)
Copied!

Obtain the pChEMBL value for a specific compound AND a specific target

1
from chembl_webresource_client.new_client import new_client
2
activities = new_client.activity
3
activities.filter(molecule_chembl_id="CHEMBL25", target_chembl_id="CHEMBL612545", pchembl_value__isnull=False)
Copied!

Get all approved drugs

1
from chembl_webresource_client.new_client import new_client
2
molecule = new_client.molecule
3
approved_drugs = molecule.filter(max_phase=4)
Copied!

Get approved drugs for lung cancer

1
from chembl_webresource_client.new_client import new_client
2
drug_indication = new_client.drug_indication
3
molecules = new_client.molecule
4
lung_cancer_ind = drug_indication.filter(efo_term__icontains="LUNG CARCINOMA")
5
lung_cancer_mols = molecules.filter(molecule_chembl_id__in=[x['molecule_chembl_id'] for x in lung_cancer_ind])
Copied!

Get all molecules in ChEMBL with no Rule-of-Five violations

1
from chembl_webresource_client.new_client import new_client
2
molecule = new_client.molecule
3
no_violations = molecule.filter(molecule_properties__num_ro5_violations=0)
Copied!

Get all biotherapeutic molecules

1
from chembl_webresource_client.new_client import new_client
2
molecule = new_client.molecule
3
biotherapeutics = molecule.filter(biotherapeutic__isnull=False)
Copied!

Return molecules with molecular weight <= 300

1
from chembl_webresource_client.new_client import new_client
2
molecule = new_client.molecule
3
light_molecules = molecule.filter(molecule_properties__mw_freebase__lte=300)
Copied!

Return molecules with molecular weight <= 300 AND pref_name ends with nib

1
from chembl_webresource_client.new_client import new_client
2
molecule = new_client.molecule
3
light_nib_molecules = molecule.filter(molecule_properties__mw_freebase__lte=300).filter(pref_name__iendswith="nib")
Copied!
1
from chembl_webresource_client.new_client import new_client
2
target = new_client.target
3
activity = new_client.activity
4
herg = target.search('herg')[0]
5
herg_activities = activity.filter(target_chembl_id=herg['target_chembl_id']).filter(standard_type="Ki")
Copied!
1
from chembl_webresource_client.new_client import new_client
2
activity = new_client.activity
3
res = activity.search('"TG-GATES"')
Copied!

Get all activities for a specific target with assay type 'B' OR 'F'

1
from chembl_webresource_client.new_client import new_client
2
activity = new_client.activity
3
res = activity.filter(target_chembl_id='CHEMBL3938', assay_type__iregex='(B|F)')
Copied!

Search for ADMET-reated inhibitor assays

1
from chembl_webresource_client.new_client import new_client
2
assay = new_client.assay
3
res = assay.search('inhibitor').filter(assay_type='A')
Copied!

Get cell line by cellosaurus id

1
from chembl_webresource_client.new_client import new_client
2
cell_line = new_client.cell_line
3
res = cell_line.filter(cellosaurus_id="CVCL_0417")
Copied!

Filter drugs by approval year and name

1
from chembl_webresource_client.new_client import new_client
2
drug = new_client.drug
3
res = drug.filter(first_approval=1976).filter(usan_stem="-azosin")
Copied!

Get tissue by BTO ID

1
from chembl_webresource_client.new_client import new_client
2
tissue = new_client.tissue
3
res = tissue.filter(bto_id="BTO:0001073")
Copied!

Get tissue by Caloha id

1
from chembl_webresource_client.new_client import new_client
2
tissue = new_client.tissue
3
res = tissue.filter(caloha_id="TS-0490")
Copied!

Get tissue by Uberon id

1
from chembl_webresource_client.new_client import new_client
2
tissue = new_client.tissue
3
res = tissue.filter(uberon_id="UBERON:0000173")
Copied!

Get tissue by name

1
from chembl_webresource_client.new_client import new_client
2
tissue = new_client.tissue
3
res = tissue.filter(pref_name__istartswith='blood')
Copied!

Search documents for 'cytokine'

1
from chembl_webresource_client.new_client import new_client
2
document = new_client.document
3
res = document.search('cytokine')
Copied!

Search for compound in Unichem

1
from chembl_webresource_client.new_client import new_client
2
ret = unichem.get('AIN')
Copied!

Resolve InChi Key to Inchi using Unichem

1
from chembl_webresource_client.unichem import unichem_client as unichem
2
ret = unichem.inchiFromKey('AAOVKJBEBIDNHE-UHFFFAOYSA-N')
Copied!

Convert SMILES to CTAB

1
from chembl_webresource_client.unichem import unichem_client as unichem
2
aspirin = utils.smiles2ctab('O=C(Oc1ccccc1C(=O)O)C')
Copied!

Convert SMILES to image and image back to SMILES

1
from chembl_webresource_client.utils import utils
2
aspirin = 'CC(=O)Oc1ccccc1C(=O)O'
3
im = utils.smiles2image(aspirin)
4
mol = utils.image2ctab(im)
5
smiles = utils.ctab2smiles(mol).split()[2]
6
self.assertEqual(smiles, aspirin)
Copied!

Compute fingerprints

1
from chembl_webresource_client.utils import utils
2
aspirin = utils.smiles2ctab('O=C(Oc1ccccc1C(=O)O)C')
3
fingerprints = utils.sdf2fps(aspirin)
Copied!

Compute Maximal Common Substructure

1
from chembl_webresource_client.utils import utils
2
smiles = ["O=C(NCc1cc(OC)c(O)cc1)CCCC/C=C/C(C)C", "CC(C)CCCCCC(=O)NCC1=CC(=C(C=C1)O)OC", "c1(C=O)cc(OC)c(O)cc1"]
3
mols = [utils.smiles2ctab(smile) for smile in smiles]
4
sdf = ''.join(mols)
5
result = utils.mcs(sdf)
Copied!

Compute various molecular descriptors

1
from chembl_webresource_client.utils import utils
2
aspirin = utils.smiles2ctab('O=C(Oc1ccccc1C(=O)O)C')
3
num_atoms = json.loads(utils.getNumAtoms(aspirin))[0]
4
mol_wt = json.loads(utils.molWt(aspirin))[0]
5
log_p = json.loads(utils.logP(aspirin))[0]
6
tpsa =