File structure

These pages describe all aspects of the requirements for loading bioactivity data, including the column headers and the permitted content of the cells (or 'fields') within these files.

Files must be supplied as tab-separated text files with UTF-8 or ASCII encoding. Spreadsheets will not work with the loader.

Within these files, depositor-defined identifiers are used for compounds/substances (CIDX), assays (AICX) and references (RIDX), as described earlier. These identifiers provide a point of reference between different files.

  • For the three key entities, AIDX, CIDX and RIDX, a single primary file is used to define or redefine them.

    • RIDX is defined in the REFERENCE file

    • AIDX is defined in the ASSAYS file

    • CIDX is defined in the COMPOUND_RECORD file

  • Other secondary files may reference these identifiers to define or redefine properties of the existing entities, using identifiers which are in the corresponding primary file in the same deposition job.

  • Do not use N/A, NA, Null or other values as placeholders. Simply leave the value blank.

Guidance

  • Consider limiting numeric data to a small number of decimal places so it is easily compared by users.

  • Leave values empty if they are null. If you use placeholders like "-", "None" or "NULL" then data may load into the database while being invalid, or may be difficult to search for as we will not record these as a null value. The loader attempts to convert such values to nulls, but we cannot cover every possibility.

  • New data deposited to the same source under a previously used DDI will overwrite the data from the older deposition and replace it with the new data. It is therefore strongly advised that the DDIs in a deposition are unique and meaningful to that dataset.

Last updated