Comment on page
General Questions
The data is updated regularly, with releases approximately 2-3 times a year.
The ChEMBLID is a unique ID that has been assigned to compounds, targets, assays, documents, tissues and cell types in ChEMBL. It can be used to retrieve a Report Card page for these entities, or to search for them using the keyword search.
Molregno is our ChEMBL internal identification given to each compound. The ChEMBLID is the externally viewed identification for each compound.
The source mapping tables are on the FTP website and include a ChEMBL_ID to PubChem_ID mapping table (source mapping table for src1 to src22).
2) In the ChEMBL database, identifiers may be found in the compound_records table (as the compound_key).
Data are routinely extracted from seven core journals:
- Bioorg Med Chem Lett (https://www.sciencedirect.com/journal/bioorganic-and-medicinal-chemistry-letters)
However, we also have data from selected articles in more than 200 journals including Antimicrob Agents Chemother, Med Chem Res, J Agric Food Chem and Drug Metab Dispos. ChEMBL currently contains data from more than 86,000 journal articles, and we also now include data from selected patents.
You can sign up to our ChEMBL Announce mailing list (http://listserver.ebi.ac.uk/mailman/listinfo/chembl-announce), where you will be kept up to date with all new releases and changes to the database.
As the data and the compounds are continually being curated, it is not possible to keep a track of these alterations or additions. The CHEMBL_ID_LOOKUP table stores a list of all active and inactive (obsolete) entities under the status field: ACTIVE OBS. It is also possible to download previous releases and identify changes through a comparison of the data.
If you need help, please contact us and we will be able to let you know what, if anything, has happened to the data you are interested in.
ChEMBL Database:
Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Félix E, Magariños MP, Mosquera JF, Mutowo P, Nowotka M, Gordillo-Marañón M, Hunter F, Junco L, Mugumbate G, Rodriguez-Lopez M, Atkinson F, Bosc N, Radoux CJ, Segura-Cabrera A, Hersey A, Leach AR. (2019) 'ChEMBL: towards direct deposition of bioassay data.' Nucleic Acids Res., 47(D1) D930-D940.
ChEMBL Web Services:
Davies M, Nowotka M, Papadatos G, Dedman N, Gaulton A, Atkinson F, Bellis L, Overington JP. (2015) 'ChEMBL web services: streamlining access to drug discovery data and utilities.' Nucleic Acids Res., 43(W1) W612-W620.
ChEMBL RDF:
S. Jupp, J. Malone, J. Bolleman, M. Brandizi, M. Davies, L. Garcia, A. Gaulton, S. Gehant, C. Laibe, N. Redaschi, S.M Wimalaratne, M. Martin, N. Le Novère, H. Parkinson, E. Birney and A.M Jenkinson (2014) The EBI RDF Platform: Linked Open Data for the Life Sciences Bioinformatics 30 1338-1339
- 1.We have a dedicated email address for data queries, error reporting or help requests. This is: [email protected]
- 2.You can create an issue in our GitHub Interface Repository to report issues with the interface only.
- 3.
- 4.
- 5.You can create an issue in our GitHub Python Client Repository to report issues with the Python client library.
The best way to access large amounts of data is to install a database instance on your own computer using MySQL.
Yes, there is a ChEMBL RDF. It is stored on the FTP site.
ChEMBL_14 is an earlier release of the data, with ChEMBL_15 being an updated version. Subsequent updates will have consecutive numbers, so the one with the highest number will be the most recent full version of the data.
ChEMBL is a database of bioactive drug-like small molecules, it contains 2-D structures, calculated properties (e.g. logP, Molecular Weight, Lipinski Parameters, etc.) and abstracted bioactivities (e.g. binding constants, pharmacology and ADMET data). We attempt to normalise the bioactivities into a uniform set of end-points and units where possible, and also to tag the links between a molecular target and a published assay with a set of varying confidence levels. The data is abstracted and curated from the primary scientific literature, and cover a significant fraction of the SAR and discovery of modern drugs.
ChEMBL-NTD is a repository for Open Access primary screening and medicinal chemistry data directed at neglected diseases - endemic tropical diseases of the developing regions of the Africa, Asia, and the Americas. The primary purpose of ChEMBL-NTD is to provide a freely accessible and permanent archive and distribution centre for deposited data. ChEMBL-NTD is a subset of the data in the free medicinal chemistry and drug discovery database ChEMBLdb.
ChEMBL is a database of bioactivity data and we do not supply compounds.
However, you can use the cross-references on the compound report cards to view commercial chemical providers.
The ChEMBL database is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License (https://creativecommons.org/licenses/by-sa/3.0/). This allows use, redistribution and adaption of ChEMBL as long as appropriate attribution is given (e.g., cite the current ChEMBL paper:) and ensure that any adaptations are redistributed under the same license.
Each new ChEMBL release includes literature data from the previous 12-18 months. The precise cutoff point for extraction of literature data varies slightly between ChEMBL releases and for different journals. Occasionally, selected articles are added to ChEMBL at a later date, therefore the ChEMBL version and publication dates are not always associated. A new release may also contain new data for existing compounds.
ChEMBL provide periodic updates to the database that include new data, corrections, additional curation and new features. Subsequent updates will have consecutive numbers, so the version with the highest number will be the most recent full version of the data. Version information is found in the VERSION table.
The API and interface will always reflect the data in the current release and can not currently be used to extract data from previous versions. However, old data dumps remain available for download and can be queried using a database tool and the SQL language to obtain version-specific data.
ChEMBL is a database of bioactivity data for drug-like compounds and includes data from seven core medicinal chemistry journals, patents and deposited data that fits within the scope of ChEMBL. Deposited data is primarily dose-response and ADMET type data. Please get in touch if you believe your data could be included in ChEMBL.
For regular depositors, you can sign up to the chembl_depositor mailing list (https://listserver.ebi.ac.uk/mailman/listinfo/chembl-depositors) to receive updates on ChEMBL deposition timelines.
Documentation for depositors can be found here - https://chembl.gitbook.io/chembl-loader/ where you can see the information required to submit to ChEMBL.
The ChEMBL KNIME nodes are no longer maintained but our recommendation is to use the generic KNIME REST nodes to access the ChEMBL web services, as this will be simpler to maintain and adapt to cope with schema changes etc. There are some example workflows available, using this approach, that have been provided by Daria Goldmann e.g.,
https://www.knime.com/blog/a-restful-way-to-find-and-retrieve-data
https://hub.knime.com/knime/workflows/Examples/50_Applications/30_RESTful_ChEMBL/01_ChEMBL_REST_Services*qBCbzaFZ85qUMoWZ
https://hub.knime.com/knime/workflows/Examples/50_Applications/30_RESTful_ChEMBL/02_ChEMBL_Structure_Search*wCygb7lTvPqnYXuB
https://hub.knime.com/knime/workflows/Examples/50_Applications/30_RESTful_ChEMBL/03_ChEMBL_Bioactivity_Search*tGSG1nPdGg4cW7Qp
Last modified 1mo ago