Linking supplementary files through Depositor Defined Identifiers

Supporting data stored in the ACTIVITY_SUPP table and require the use of two additional depositor defined identifiers: supplementary activity mapping ID (SAMID) and record grouping ID (REGID). The table is used for the bulk storage of raw data relating to ACTIVITY values, including both data used directly to calculate the ACTIVITY value, and also miscellaneous data from the same experiment, not used in this calculation, but which may be of interest to the specialist user.

*IDX

Description

Primary File

SAMID

Supplementary Activity Mapping ID

ACTIVITY_SUPP

REGID

REcord Grouping ID

ACTIVITY_SUPP

SAMID identifiers in the ACTIVITY_SUPP_MAP file

The ACTIVITY_SUPP_MAP file acts as a intermediate file to connect data in the ACTIVITY file, represented by ACT_IDs, to data in the ACTIVITY_SUPP file, represented by SAMIDs. The ACTIVITY_SUPP_MAP file should therefore only contain two columns, ACT_ID and SAMID, to map which records in the ACTIVITY_SUPP table are supporting evidence for which records in the ACTIVITY table.

SAMIDs are mandatory in the ACTIVITY_SUPP_MAP file, and all SAMIDs used in a deposition must be present at least once in the ACTIVITY_SUPP file. SAMID can be left as null in the ACTIVITY_SUPP file if the supplementary record does not directly point to any ACTIVITY value in particular.

REGID identifiers

A REGID is used to cluster or group together related records in the ACTIVITY_SUPP file, just as the TEOID is used in the ACTIVITY table. An SAMID usually refers to a single specific measurement like a time point, rather than a group, animal or well.

A REGID must be a positive integer such as 1,2,3, etc. The REGID only has meaning within the deposition, so values can be re-used between depositions without the potential to overwrite previous depositions, as is the case with some other depositor defined identifiers. When preparing data for loading it can be useful to think of REGIDs as 'row numbers' in a 2D table of data.

A SAMID (Single Measurement ID) uniquely identifies a single data point in ACTIVITY_SUPP

One individual measurement at one compound, dose, time point and in one entity (animal, plate well etc). So biological replicates (animals) have differing SAMIDs.

A REGID (Regimen ID) groups different measurements for the same entity and conditions

E.g. All measurements for hematology, clinical chemistry, organ weight AND pathology observed for one animal at the same compound, dose & time point.

Complex result Sets as a 2D matrix

For many scientists one of the most convenient ways of browsing the values in a highly complex result set is in the form of a 2D matrix, such as a spreadsheet. Indeed, this is the most common format for complex result sets presented to ChEMBL administrators for loading into ChEMBL.

The ACTIVITY_SUPP table can be thought of as a transformation of a 2D matrix where REGIDs represent rows, the TYPEs are the column headers, and the VALUEs are the cells. In fact, one of the best ways of creating the ACTIVITY_SUPP file can be to start with a 2D matrix of all the data to be loaded, and transform these data into the ACTIVITY_SUPP file with a script. By doing this, reconstructing the original data set is straightforward when such data are subsequently exported from ChEMBL.

Last updated