# Linking supplementary data through SAMID and REGID

Supporting data stored in the ACTIVITY\_SUPP table and require the use of two additional depositor defined identifiers: supplementary activity mapping ID (SAMID) and record grouping ID (REGID). The table is used for the bulk storage of raw data relating to ACTIVITY values, including both data used directly to calculate the ACTIVITY value, and also miscellaneous data from the same experiment, not used in this calculation, but which may be of interest to the specialist user.

<table data-header-hidden><thead><tr><th width="103.33333333333331"></th><th width="317"></th><th></th></tr></thead><tbody><tr><td><strong>*IDX</strong></td><td><strong>Description</strong></td><td><strong>Primary File</strong></td></tr><tr><td><strong>SAMID</strong></td><td><strong>S</strong>upplementary <strong>A</strong>ctivity <strong>M</strong>apping <strong>ID</strong></td><td>ACTIVITY_SUPP</td></tr><tr><td><strong>REGID</strong></td><td><strong>RE</strong>cord <strong>G</strong>rouping <strong>ID</strong></td><td>ACTIVITY_SUPP </td></tr></tbody></table>

### SAMID identifiers in the ACTIVITY\_SUPP\_MAP file

The ACTIVITY\_SUPP\_MAP file acts as a intermediate file to connect data in the ACTIVITY file, represented by ACT\_IDs, to data in the ACTIVITY\_SUPP file, represented by SAMIDs.  The ACTIVITY\_SUPP\_MAP file should therefore only contain two columns, ACT\_ID and SAMID, to map which records in the ACTIVITY\_SUPP table are supporting evidence for which records in the ACTIVITY table.

SAMIDs are mandatory in the ACTIVITY\_SUPP\_MAP file, and all SAMIDs used in a deposition must be present at least once in the ACTIVITY\_SUPP file. SAMID can be left as null in the ACTIVITY\_SUPP file if the supplementary record does not directly point to any ACTIVITY value in particular.

### REGID identifiers

A REGID is used to cluster or group together related records in the ACTIVITY\_SUPP file, just as the TEOID is used in the ACTIVITY table.  An SAMID usually refers to a single specific measurement like a time point, rather than a group, animal or well.

A REGID must be a positive integer such as 1,2,3, etc.  The REGID only has meaning within the deposition, so values can be re-used between depositions without the potential to overwrite previous depositions, as is the case with some other depositor defined identifiers.  When preparing data for loading it can be useful to think of REGIDs as 'row numbers' in a 2D table of data.

<table data-header-hidden><thead><tr><th width="258"></th><th></th></tr></thead><tbody><tr><td><strong>A SAMID (Single Measurement ID) uniquely identifies a </strong><em><strong>single data point</strong></em><strong> in ACTIVITY_SUPP</strong> </td><td><strong>One individual measurement</strong> at <strong>one compound, dose, time point</strong> and in <strong>one entity</strong> (animal, plate well etc). <br>So biological replicates (animals) have differing SAMIDs.</td></tr><tr><td><strong>A REGID (Regimen ID) groups </strong><em><strong>different</strong></em> <em><strong>measurements</strong></em><strong> for the </strong><em><strong>same</strong></em> <em><strong>entity</strong></em> and conditions</td><td>E.g. All measurements for hematology, clinical chemistry, organ weight AND pathology observed <strong>for one animal</strong> at the <strong>same compound, dose &#x26; time point.</strong></td></tr></tbody></table>

### Complex result Sets as a 2D matrix

For many scientists one of the most convenient ways of browsing the values in a highly complex result set is in the form of a 2D matrix, such as a spreadsheet. Indeed, this is the most common format for complex result sets presented to ChEMBL administrators for loading into ChEMBL.

The ACTIVITY\_SUPP table can be thought of as a transformation of a 2D matrix where REGIDs represent rows, the TYPEs are the column headers, and the VALUEs are the cells. In fact, one of the best ways of creating the ACTIVITY\_SUPP file can be to start with a 2D matrix of all the data to be loaded, and transform these data into the ACTIVITY\_SUPP file with a script. By doing this, reconstructing the original data set is straightforward when such data are subsequently exported from ChEMBL.
