General Questions
Last updated
Last updated
The data is updated regularly, with releases approximately 2-3 times a year.
The ChEMBLID is a unique ID that has been assigned to compounds, targets, assays, documents, tissues and cell types in ChEMBL. It can be used to retrieve a Report Card page for these entities, or to search for them using the keyword search.
Molregno is our ChEMBL internal identification given to each compound. The ChEMBLID is the externally viewed identification for each compound.
1) can be used to map between 2 sets of identifiers.
The are on the FTP website and include a ChEMBL_ID to PubChem_ID mapping table (source mapping table for src1 to src22).
2) In the ChEMBL database, identifiers may be found in the compound_records table (as the compound_key).
Data are routinely extracted from seven core journals:
Bioorg Med Chem Lett ()
J Med Chem ()
Bioorg Med Chem ()
J Nat Prod ()
Eur J Med Chem ()
ACS Med Chem Lett ()
MedChemComm ().
However, we also have data from selected articles in more than 200 journals including Antimicrob Agents Chemother, Med Chem Res, J Agric Food Chem and Drug Metab Dispos. ChEMBL currently contains data from more than 86,000 journal articles, and we also now include data from selected patents.
As the data and the compounds are continually being curated, it is not possible to keep a track of these alterations or additions. The CHEMBL_ID_LOOKUP table stores a list of all active and inactive (obsolete) entities under the status field: ACTIVE OBS. It is also possible to download previous releases and identify changes through a comparison of the data.
ChEMBL Database:
Zdrazil B, Felix E, Hunter F, Manners EJ, Blackshaw J, Corbett S, de Veij M, Ioannidis H, Lopez DM, Mosquera JF, Magarinos MP, Bosc N, Arcila R, Kizilören T, Gaulton A, Bento AP, Adasme MF, Monecke P, Landrum GA, Leach AR. The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res. 2024 Jan 5;52(D1):D1180-D1192.
ChEMBL Web Services:
Davies M, Nowotka M, Papadatos G, Dedman N, Gaulton A, Atkinson F, Bellis L, Overington JP. (2015) 'ChEMBL web services: streamlining access to drug discovery data and utilities.' Nucleic Acids Res., 43(W1) W612-W620.
ChEMBL RDF:
S. Jupp, J. Malone, J. Bolleman, M. Brandizi, M. Davies, L. Garcia, A. Gaulton, S. Gehant, C. Laibe, N. Redaschi, S.M Wimalaratne, M. Martin, N. Le Novère, H. Parkinson, E. Birney and A.M Jenkinson (2014) The EBI RDF Platform: Linked Open Data for the Life Sciences Bioinformatics 30 1338-1339
The best way to access large amounts of data is to install a database instance on your own computer using MySQL.
Yes, there is a ChEMBL RDF. It is stored on the FTP site.
At present we do not maintain the RDF, and therefore any recently changed data fields in the ChEMBL database will not be included in the existing RDF version.
ChEMBL_14 is an earlier release of the data, with ChEMBL_15 being an updated version. Subsequent updates will have consecutive numbers, so the one with the highest number will be the most recent full version of the data.
ChEMBL is a database of bioactivity data and we do not supply compounds.
However, you can use the cross-references on the compound report cards to view commercial chemical providers.
ChEMBL also includes compound property calculations, derived from commercial software. When using these commercial calculations, users should ensure that they refer to and abide by commercial licensing agreements. For example, calculations should not be extracted in isolation with the aim to train models intended to replicate the commercial process.
Each new ChEMBL release includes literature data from the previous 12-18 months. The precise cutoff point for extraction of literature data varies slightly between ChEMBL releases and for different journals. Occasionally, selected articles are added to ChEMBL at a later date, therefore the ChEMBL version and publication dates are not always associated. A new release may also contain new data for existing compounds.
ChEMBL provide periodic updates to the database that include new data, corrections, additional curation and new features. Subsequent updates will have consecutive numbers, so the version with the highest number will be the most recent full version of the data. Version information is found in the VERSION table.
The API and interface will always reflect the data in the current release and can not currently be used to extract data from previous versions. However, old data dumps remain available for download and can be queried using a database tool and the SQL language to obtain version-specific data.
ChEMBL is a database of bioactivity data for drug-like compounds and includes data from seven core medicinal chemistry journals, patents and deposited data that fits within the scope of ChEMBL. Deposited data is primarily dose-response and ADMET type data. Please get in touch if you believe your data could be included in ChEMBL.
You can sign up to our ChEMBL Announce mailing list (), where you will be kept up to date with all new releases and changes to the database.
If you need help, please and we will be able to let you know what, if anything, has happened to the data you are interested in.
doi: . PMID: ; PMCID: .
DOI: PMC:
DOI: PMID:
We have a dedicated email address for data queries, error reporting or help requests. This is:
You can in our to report issues with the interface only.
You can in our to report issues with the data only.
You can in our to report issues with the ChEMBL API.
You can in our to report issues with the Python client library.
is a database of bioactive drug-like small molecules, it contains 2-D structures, calculated properties (e.g. logP, Molecular Weight, Lipinski Parameters, etc.) and abstracted bioactivities (e.g. binding constants, pharmacology and ADMET data). We attempt to normalise the bioactivities into a uniform set of end-points and units where possible, and also to tag the links between a molecular target and a published assay with a set of varying confidence levels. The data is abstracted and curated from the primary scientific literature, and cover a significant fraction of the SAR and discovery of modern drugs.
is a repository for Open Access primary screening and medicinal chemistry data directed at neglected diseases - endemic tropical diseases of the developing regions of the Africa, Asia, and the Americas. The primary purpose of ChEMBL-NTD is to provide a freely accessible and permanent archive and distribution centre for deposited data. ChEMBL-NTD is a subset of the data in the free medicinal chemistry and drug discovery database ChEMBLdb.
Web Service documentation and example queries can be found at:
The ChEMBL database is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License (). This allows use, redistribution and adaption of ChEMBL as long as appropriate attribution is given (e.g., cite the current ) and ensure that any adaptations are redistributed under the same license.
For regular depositors, you can sign up to the chembl_depositor mailing list () to receive updates on ChEMBL deposition timelines.
Documentation for depositors can be found here - where you can see the information required to submit to ChEMBL.
The ChEMBL KNIME nodes are no longer maintained but our recommendation is to use the generic KNIME REST nodes to access the ChEMBL web services, as this will be simpler to maintain and adapt to cope with schema changes etc. There are some example workflows available, using this approach, that have been provided by Daria Goldmann e.g.,