> For the complete documentation index, see [llms.txt](https://chembl.gitbook.io/chembl-data-deposition-guide/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://chembl.gitbook.io/chembl-data-deposition-guide/file-structure.md).

# File structure

### The ChEMBL Deposition Files, an Overview

These pages describe all aspects of the requirements for loading bioactivity data, including the column headers and the permitted content of the cells (or 'fields') within these files.

Files must be supplied as tab-separated text files (.tsv) with [UTF-8](https://en.wikipedia.org/wiki/UTF-8) or [ASCII](https://en.wikipedia.org/wiki/ASCII) encoding.  Spreadsheets will not work with the loader.

Within these files, depositor-defined identifiers are used for compounds/substances (CIDX), assays (AICX) and references (RIDX), as described [earlier](/chembl-data-deposition-guide/deposition-overview/depositor-defined-identifiers.md).  These identifiers provide a point of reference between different files.&#x20;

* For the three key entities, AIDX, CIDX and RIDX, a single primary file is used to define or redefine them.
  * RIDX is defined in the REFERENCE file
  * AIDX is defined in the ASSAYS file
  * CIDX is defined in the COMPOUND\_RECORD file
* Other secondary files may reference these identifiers to define or redefine properties of the existing entities, using identifiers which are in the corresponding primary file in the same deposition job.
* **Do not** use `N/A`, `NA`, `Null` or other values as placeholders. Simply leave the value blank.
* If you use placeholders like "-", "None" or "NULL" then data may load into the database while being invalid, or may be difficult to search for as we will not record these as a null value.&#x20;

### Guidance

* Consider limiting numeric data to a small number of decimal places so it is easily compared by users.&#x20;
* New data deposited to the same source under previously used DDIs will **overwrite** the data from the older deposition and replace it with the new data. It is therefore strongly advised that the DDIs in a deposition are unique and meaningful to that dataset.

### Further Resources

[This presentation](https://docs.google.com/presentation/d/1K3bbp0SZ-NNhijhUPUL5osg__YpY_HnVumFWpDs76hg/edit?slide=id.g3894db9e123_0_3#slide=id.g3894db9e123_0_3) contains slides that cover this chapter and can be reused to explain ChEMBL deposition within your organisation. A recording will be available soon.&#x20;


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://chembl.gitbook.io/chembl-data-deposition-guide/file-structure.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
