Complex results sets
ChEMBL has always handled simple forms of bioactivity data extremely well, such as 'cpdX has affinity (Ki) of Y uM in assay Z.' Such data is represented by a single row in an ACTIVITY file, such those shown in the Example dataset.
Some users choose to store supplementary and supporting data outside ChEMBL, but it is also possible to deposit such additional data in ChEMBL using the ACTIVITY_SUPP file. For example, a depositor might want to store the individual percentage inhibition points on a curve where an IC50 value has been quoted in the ACTIVITY file.
Other examples of more complex data could include:
Assays where multiple different result types are required to describe the activity of the test compound, not just a single value as in the example above.
Data where different values of the same result type must be qualified or contextualised to convey the result faithfully.
A need to provide supporting or supplementary data, and not simply a reference to a literature paper.
Other practicalities
One of the practical considerations for us at ChEMBL is that we have fairly limited resources, so we will not always have time to get to grips with understanding extremely complex assays to the same level as the creators of the data. Depositors of complex result sets often ask us to reformat the data for loading. For this reason, our process may involve asking depositors to:
Provide a 2D matrix (spreadsheet) of the data to be loaded.
Identify the minimal self defining, self-referencing 'units' or 'sets' within the data, including both test and control data.
Identify the top-level result types from the set, singly or in combination. Depositors should also advise on how these should be averaged, if necessary, and what properties may be associated with them, for example test concentrations, age ranges or Hill Slopes.
Advise on how the set(s) may be pivoted to one or more top level result types.
Last updated