📥
ChEMBL Data Deposition Guide
  • Introduction
  • Overview
    • Source Identifier
    • Depositor-Defined Identifiers
      • AIDX
      • RIDX
      • CIDX
    • File types and names
  • File structure
    • File hierarchy
    • Simplified input data schema
    • Deposition file list
    • Field names and data types - basic submission
      • The CONTACT field
      • ACTION_TYPE valid names
      • TARGET_TYPE list
      • Adding context using the ACTIVITY_PROPERTIES file
    • The ASSAY_DESCRIPTION field
  • Complex results sets
    • Linking files through depositor defined IDs
    • Linking multiple result types using TEOID
    • Supplementary data in the ACTIVITY_SUPP file
    • Flexible SAMID mapping
    • Field names and data types - more complex data types
  • Example dataset
  • Common data issues
  • Advanced features and documentation
  • Depositing activities against other depositors entities
  • Creating a COMPOUND_CTAB file from a file containing SMILES strings
  • FAQs
  • Glossary
Powered by GitBook
On this page

File structure

PreviousFile types and namesNextFile hierarchy

Last updated 1 month ago

These pages describe all aspects of the requirements for loading bioactivity data, including the column headers and the permitted content of the cells (or 'fields') within these files.

Files must be supplied as tab-separated text files with UTF-8 or ASCII encoding. Spreadsheets will not work with the loader.

Within these files, depositor-defined identifiers are used for compounds/substances, assays and references, as described . These identifiers provide a point of reference between different files; column headers are used to indicate the column position of the depositor defined IDs.

  • For the three key entities, AIDX, CIDX and RIDX, a single primary file is used to define or redefine them.

  • Other secondary files may reference these identifiers to define or redefine properties of the existing entities, ie: identifiers which either exist in the corresponding primary file in the same deposition job, or already exist in the database.

  • Do not use N/A, NA, Null or other values as placeholders. Simply leave the value blank.

Guidance

  • Consider limiting numeric data to a small number of decimal places so it is easily compared by users.

  • Leave values empty if they are null. If you use placeholders like "-", "None" or "NULL" then data may load into the database while being invalid, or may be difficult to search for as we will not record these as a null value. The loader attempts to convert such values to nulls, but we cannot cover every possibility.

  • The absence of data from a secondary file does not imply that the depositor wishes to delete the secondary file data for a DDI from the database, however new data deposited under a previously used DDI will overwrite the data from the older deposition and replace it with the new data. This loader behaviour is designed to minimise the risk of inadvertently deleting data, while allowing data to be updated.

  • Therefore, if adding data to existing DDIs, it is necessary to submit the original deposition alongside the new data.

Some depositors wish to deposit bioactivity data obtained using either assays (AIDXs) or compounds (CIDXs) defined by other depositors. In order to do this, these AIDX and CIDXs must already have been deposited within ChEMBL by their respective owners, and when citing these AIDXs and CIDXs in the ACTIVITY file, the depositor must add an extra column to the file, giving the src_id of the owner of these IDs.

earlier
Depositing Activity Data against other depositors' entities
3MB
Data deposition ChEMBL slides.pptx