File structure

These pages describe all aspects of the requirements for loading bioactivity data, including the column headers and the permitted content of the cells (or 'fields') within these files.

Files must be supplied as tab-separated text files with UTF-8 or ASCII encoding. Spreadsheets will not work with the loader.

Within these files, depositor-defined identifiers are used for compounds/substances, assays and references, as described earlier. These identifiers provide a point of reference between different files; column headers are used to indicate the column position of the depositor defined IDs.

  • For the three key entities, AIDX, CIDX and RIDX, a single primary file is used to define or redefine them.

  • Other secondary files may reference these identifiers to define or redefine properties of the existing entities, ie: identifiers which either exist in the corresponding primary file in the same deposition job, or already exist in the database.

  • Do not use N/A, NA, Null or other values as placeholders. Simply leave the value blank.

Guidance

  • Consider limiting numeric data to a small number of decimal places so it is easily compared by users.

  • Leave values empty if they are null. If you use placeholders like "-", "None" or "NULL" then data may load into the database while being invalid, or may be difficult to search for as we will not record these as a null value. The loader attempts to convert such values to nulls, but we cannot cover every possibility.

  • The absence of data from a secondary file does not imply that the depositor wishes to delete the secondary file data for a DDI from the database, however new data deposited under a previously used DDI will overwrite the data from the older deposition and replace it with the new data. This loader behaviour is designed to minimise the risk of inadvertently deleting data, while allowing data to be updated.

  • Therefore, if adding data to existing DDIs, it is necessary to submit the original deposition alongside the new data.

Some depositors wish to deposit bioactivity data obtained using either assays (AIDXs) or compounds (CIDXs) defined by other depositors. In order to do this, these AIDX and CIDXs must already have been deposited within ChEMBL by their respective owners, and when citing these AIDXs and CIDXs in the ACTIVITY file, the depositor must add an extra column to the file, giving the src_id of the owner of these IDs.

Last updated