File hierarchy

Deposition Files

During the loading process primary, secondary and tertiary files are treated slightly differently.

Primary Files

Primary files update and create the major entities in ChEMBL. Thus, the 3 primary files have the following effects at load time.

ASSAY files define AIDXs and REFERENCE files define RIDXs. If the AIDX or RIDX does not yet exist in the ASSAYS table for this src_id then one is created. However, if a record with that identifier already exists, then the data associated with the old submission is overwritten with the new one.

COMPOUND_RECORD files define combinations of CIDX and RIDX. If a CIDX-RIDX combination does not yet exist in the COMPOUND_RECORDS table for this src_id, then one will be created. If a record with that identifier combination already exists, then data associated with the old submission will be overwritten with the COMPOUND_NAME, etc, present in the incoming file.

Secondary Files

Secondary files, for example ASSAY_PARAM and COMPOUND_CTAB, do not create entities but serve to add data to the three key DDIs (AIDX, RIDX and CIDX). The DDIs used in these files must already be present, either in the database or in a corresponding primary file in the same deposition.

Apart from this dependency, secondary files work relatively independently of the primary files. If DDIs were established in earlier submissions, the presence of data in secondary files will wipe and replace the secondary data associated with these DDIs, regardless of whether the corresponding DDI is present in the corresponding primary file.

Similarly, the absence of an established DDI from a secondary file will have no effect on these associated data; the data will be assigned to the default RIDX. To delete associated secondary data, empty records for the DDI must be present in the secondary files.

Tertiary Files

Files which in some way define relationships between primary identifiers are termed tertiary files, for example ACTIVITY, ACTIVITY_PARAMETERS, ACTIVITY_SUPP. A requirement for such files is that referential integrity must be maintained with respect to the three key DDIs that are required to define an activity record: AIDX, CIDX and RIDX.

Updating of tertiary files can only be achieved by a wipe and replace process. The process requires that all ACTIVITY data assigned to a particular JOB_ID is wiped (but not primary and secondary file data for the same JOB_ID) and the updated ACTIVITY data then re-loaded.

Last updated