ChEMBL Data Web Services
Getting Started
To best way to get started is to have a look at some example URLs requesting data from the ChEMBL web services. The table below provides a list of examples and a description of the data being returned.
Description | Example URL |
Return all molecules | |
Keyword search for targets that contain 'cyclin' in pref_name | |
Return molecules with molecular weight <= 300 | |
Return molecules with molecular weight <= 300 AND pref_name ends with nib | |
Return image for CHEMBL25 | |
Connectiity search with SMILES c1ncccc1 | |
Substructure search with SMILES CC(=O)Oc1ccccc1C(=O)O (Aspirin) | |
Similarity search with SMILES CN1C(=O)C=C(c2cccc(Cl)c2)c3cc(ccc13)[C@@](N)(c4ccc(Cl)cc4)c5cncn5C with 80% tanimoto similarity cut off |
Resources
The following table provides a list of all ChEMBL web service resources currently available.
Resource Name | Description | URL |
Activity | Activity values recorded in an Assay | |
Assay | Assay details as reported in source Document/Dataset | |
ATC | WHO ATC Classification for drugs | |
Binding Site | Target binding site definition | |
Biotherapeutic | Biotherapeutic molecules, which includes HELM notation and sequence data | |
Cell Line | Cell line information | |
ChEMBL ID Lookup | Look up ChEMBL Id entity type | |
Compound Record | Occurence of a given compound in a specific document | |
Compound Structural Alert | Indicates certain anomaly in compound structure | |
Document | Document/Dataset from which Assays have been extracted | |
Document Similarity | Provides documents similar to a given one | |
Document Term | Provides keywords extracted from a document using the TextRank algorithm | |
Drug | Approved drugs information, including (but not limited to) applicants, patent numbers and research codes. This endpoint aggregates data on the parent, please use the parent chembl id found in other endpoints | |
Drug Indication | Joins drugs with diseases providing references to relevant sources | |
Drug Warning | Safety information for drugs withdrawn from one or more regions of the world and drugs that carry a warning for severe or life threatening adverse effects | |
GO Slim | GO slim ontology | |
Image | Graphical (svg) representation of Molecule | |
Mechanism | Mechanism of action information for approved drugs | |
Metabolism | Metabolic pathways with references | |
Molecule | Molecule information, including properties, structural representations and synonyms | |
Molecule Form | Relationships between molecule parents and salts | |
Organism | Simple organism classification | |
Protein Classification | Protein family classification of TargetComponents | |
Similarity | Molecule similarity search | |
Source | Document/Dataset source | |
Status | API status with ChEMBL DB version number and API software version number | |
Substructure | Molecule substructure search | |
Target | Targets (protein and non-protein) defined in Assay | |
Target Component | Target sequence information (a Target may have 1 or more sequences) | |
Target Relation | Describes relations between targets | |
Tissue | Tissue classification | |
X-Ref (Cross references) | Cross references to other resources for compounds |
Supported formats
These are formats currently supported by the API:
NOTE: if both the extension and the format parameter are specified, the format parameter will take precedence
For example this request https://www.ebi.ac.uk/chembl/api/data/molecule.yaml?format=json will return a JSON document
Format Name | Example URL | Comments |
XML | If format is not specified it defaults to XML. | |
JSON | ||
YAML | ||
SVG | Only compound images | |
SDF | Only compounds |
Meta Data and Pagination
It is now possible to download all data from a specific ChEMBL web service resource. This is made possible by returning responses from the web services in 'pages', which can be navigated through using a 'page_meta' section. The 'page_meta' section includes information about total number of hits, total number of pages and links to the next and previous pages. An example 'page_meta' section is displayed below:
To download all ChEMBL activity endpoints (>13 million), the following URL can be used: https://www.ebi.ac.uk/chembl/api/data/activity. By inspecting the the 'page_meta' section the link to page 2 can be found, e.g. https://www.ebi.ac.uk/chembl/api/data/activity?limit=20&offset=20.
Filtering and Ordering
It is possible to apply search filters to all resource requests using a URL friendly query language. For example, it is possible to return all ChEMBL targets that contain the term 'kinase' in the pref_name attribute with the following URL: https://www.ebi.ac.uk/chembl/api/data/target?pref_name__contains=kinase.
The pattern for applying a filter is as follows:
Examples of other filter type are listed in the table below.
Filter Type | Description | Example URL |
exact (iexact) | Exact match with query (case insensitive equivalent) | |
contains (icontains) | Wild card search with query (case insensitive equivalent) | |
startswith (istartswith) | Starts with query (case insensitive equivalent) | |
endswith (iendswith) | Ends with query (case insensitive equivalent) | |
regex (iregex) | Regular expression query (case insensitive equivalent) | |
gt (gte) | Greater than (or equal) | |
lt (lte) | Less than (or equal) | |
range | Within a range of values | |
in | Appears within list of query values | |
isnull | Field is null | |
search | Special type of filter allowing a full text search based on elastic search queries | |
only | Select specific properties from the original endpoint and returns only the desired properties on each record |
To order the results returned by a particular field the 'order=[field]' argument as added to a request. For example a user can sort targets based on the pref_name using the following URL: https://www.ebi.ac.uk/chembl/api/data/target?order_by=pref_name
The default ordering is in ascending order. To return the results in descending orede place a '-' before the field name: https://www.ebi.ac.uk/chembl/api/data/target?order_by=-pref_name
Note that it is possible combine order_by and filter arguments:https://www.ebi.ac.uk/chembl/api/data/target?pref_name__contains=kinase&order_by=-pref_name
Chemical Searching
The 'Substructure' and 'Similarity' web service resources allow for the chemical content of ChEMBL to be searched. Similar to the other resources, these search based resources except filtering, paging and ordering arguments. These methods accept SMILES, InChI Key and molecule ChEMBL_ID as arguments and in the case of similarity searches an additional identity cut-off is needed. Some example molecule searches are provided in the table below.
Chemical Search Description | Example URL |
Substructure search for against ChEMBL using aspirin SMILES string | |
Substructure search for against ChEMBL using aspirin CHEMBL_ID | |
Substructure search for against ChEMBL using aspirin InChI Key | |
Similarity (80% cut off) search for against ChEMBL using aspirin SMILES string | |
Similarity (80% cut off) search for against ChEMBL using aspirin CHEMBL_ID | |
Similarity (80% cut off) search for against ChEMBL using aspirin InChI Key |
Searching with InChI key is only possible for InChI keys found in the ChEMBL database. The system does not try and convert InChI key to a chemical representation.
Molecule Images
The Image resource returns a graphical representation of a ChEMBL molecule. Unlike the other resources it does not except filtering and paging arguments, but does accept image specific arguments. These are defined in the table below.
Image Argument | Description | Allowed Values | Default Value | Example URL |
format | Image format | svg | svg | |
dimensions | Size of image in pixels | 1-500 | 500 | |
ignoreCoords | Choose to use or ignore coordinates in ChEMBL molfiles | 1 or 0 | 0 (Use ChEMBL molfile coordinates) |
GET, POST and special characters
In GET request all the parameters has to be encoded into URL. Because there is a limitation of how long a URL can be it's often more convenient to use POST requests instead. POST parameters are embedded into request body and can be of any size. This is especially important when retrieving a long list of entities identified by (random) IDs.
ChEMBL API supports both GET and POST but since POST has a special meaning in REST protocol (CREATE), a special header has to be added to every POST request:
Another issue is character encoding. SMILES strings often contain characters (such as #, % or \) that have a special meaning in URLs. This is why when using GET, all parameters should be percent-encoded.
One example is a following SMILES string:
Which can be encoded into URL in a following way: https://www.ebi.ac.uk/chembl/api/data/molecule/%5BNa+%5D.CO%5BC@@H%5D(CCC%23C%5CC=C/CCCC(C)CCCCC=C)C(=O)%5BO-%5D
Below is another example of retrieving a molecule (CHEMBL1628285) that has the longest SMILES string currently stored in ChEMBL. The original SMILES string is:
After encoding, the URL becomes this
CORS and JSONP
You can call our web services directly from JavaScript on your own web application since both techniques are supported.
Web Services Python Client
To help users get started with using the updated ChEMBL web services the existing web service client has also been released. This is written in the Python programming language and is available to install from Python Package Index by typing:
The client code is open and hosted on GitHub: https://github.com/chembl/chembl_webresource_client.
Jupyter notebook with examples available.
Example Queries
The table below provides a list of example searches a user may wish to carry out using the ChEMBL web services. The aim of the list is highlight the type of data that can be retrieved from ChEMBL using the web services. The examples can be adapted, extended and chained together to build up more complex workflows.
Description | Example URL |
Get all approved drugs | |
Get all molecules in ChEMBL with no Rule-of-Five violations | |
Get all biotherapeutic molecules | |
Get all functional/phenotypic assays (assay_type=F), from the literature (src_id 1) | |
Get all binding assays (assay_type=B), which also contain the term 'insulin' | |
Get all Clearance activity values (standard_type=CL), for rat, mouse and human | |
Get all cell lines, which end with the term 'carcinoma' | |
Get all mechanism of action details muscarinic acetylcholine receptor antagonists | |
Get all targets (single protein, protein complexes, protein families etc.), which contain UniProt accession Q13936 | |
Get the entity type for CHEMBL1000 |
Use Cases
The following use cases are provided as an example of how it is possible to chain together ChEMBL web service calls to answer to complicated questions using ChEMBL data.
Investigating the potency of approved drugs against their efficacy targets
Since ChEMBL includes both mechanism of action information for approved drugs and pharmacology data from published assays, it is interesting to combine this information and investigate the potency of a drug against its efficacy target. This might be done either to confirm or refute the proposed target assignment, or to better understand how the in-vitro potency of a compound might relate to clinical efficacy or ADMET properties. The following example shows how this type of analysis could be carried out using the ChEMBL web services.
Use the molecule end point to retrieve a list of approved drugs (max_phase=4): https://www.ebi.ac.uk/chembl/api/data/molecule?max_phase=4 Using ChEMBL_20, this will retrieve 2795 drugs in total. We will use CHEMBL998 (loratadine) as an example, but the same workflow could be repeated for the others.
Use the mechanism end point to retrieve the mechanism of action and target of each drug: https://www.ebi.ac.uk/chembl/api/data/mechanism?molecule_chembl_id=CHEMBL998 Loratadine is reported to be a histamine H1 receptor antagonist, represented by the ChEMBL target CHEMBL231.
Use the assay end point to identify any binding assays (assay_type=B) for the human histamine H1 receptor: https://www.ebi.ac.uk/chembl/api/data/assay?target_chembl_id=CHEMBL231&relationship_type=D&assay_type=B Note the relationship_type=D filter restricts the results to assays where we are confident that the human receptor was tested and not an orthologue. A total of 213 assays are identified. We will take CHEMBL1909156 as an example.
Combine the results of the above queries to identify any potency measurements for loratadine in these assays: https://www.ebi.ac.uk/chembl/api/data/activity?molecule_chembl_id=CHEMBL998&assay_chembl_id=CHEMBL1909156 This assay reports an IC50 value of 170 nM and a Ki measurement of 20 nM for loratadine against the histamine H1 receptor. This process could be repeated for the other 212 assays by either iterating through them individually or, where a sufficiently small number of assays are returned, using the __in filter on the activity end point to retrieve several assays at once e.g., https://www.ebi.ac.uk/chembl/api/data/activity?molecule_chembl_id=CHEMBL998&assay_chembl_id__in=CHEMBL830379,CHEMBL1909156,CHEMBL882906,CHEMBL691450 An additional 2 assays are identified in this way, reporting an IC50 value of 290 nM and a Ki value of 414 nM. These values could be averaged, or the lowest taken, to give an indication of the average potency of the compound, or the assay conditions could be investigated further to try to identify which assay might be most reliable or informative (the ChEMBL identifier for the document from which the data are extracted is also provided by the activity end point).
Last updated