# FAQs

**Unichem Contents**

**What sort of sources are used in UniChem ?**

* Any chemically aware database that contains compounds which have been assigned ids (named as src\_compound\_ids in UniChem) and structures. A source may store structures in a variety of ways.&#x20;
* If standard InChIs are provided by the source, then these are used by UniChem.&#x20;
* However, if a source does not provide standard InChIs, UniChem will produce these during loading of the data.&#x20;
* There need not be a 1:1 relationship between src\_compound\_id's and InChIs in a source. Many sources define chemical uniqueness by other means than the standard InChI. UniChem will handle 1:many and many:1 relationships between src\_compound\_ids and standard InChIs.

&#x20;**What sources are currently used in UniChem ?**\
&#x20;These are listed on the 'Sources' page...\
&#x20;Go [here](https://www.ebi.ac.uk/unichem/sources)\
\
&#x20;**Some src\_compound\_ids are missing from a source in UniChem. Why is this ?**\
&#x20;This is explained on the 'Reasons for data omission' page...\
&#x20;Go [here](https://chembl.gitbook.io/unichem/submission-of-data-to-unichem/rules-for-loading)

**The source I am trying to create hyperlinks to does not use the src\_compound\_id to create the URL for compound-specific pages. How does UniChem deal with this ?**\
In these circumstances, UniChem makes use of 'auxiliary data' to create links, as described immediately below.\
\
**What is 'auxiliary data' for a src\_compound\_id and when would I need to use it ?**

* Most sources within UniChem create URLs for compound specific pages by simply appending a src\_compound\_id to a ‘base URL’ (eg: appending 'CHEMBL59' to '<https://www.ebi.ac.uk/chembldb/compound/inspect/>' gives: <https://www.ebi.ac.uk/chembldb/compound/inspect/CHEMBL59).&#x20>;
* However, in some instances of UniChem a small number of sources exist which create URLs for compound-specific pages by using strings or identifiers ('auxiliary data') that are different to the src\_compound\_ids for the source. This is not very common, but is dealt with in UniChem by use of an additional mapping step for these sources, where the src\_compound\_ids are mapped to the 'auxiliary data'.&#x20;
* Sources where this step is necesary are marked up with a '1' in the 'AUX\_FOR\_URL' field of the UC\_SOURCES table.&#x20;

**It looks like some data is replicated in multiple sources in UniChem, for example… ‘pubchem’, ‘pubchem\_tpharma’ and 'pubchem\_dotf’ all come from PubChem. Why is this ?**

* Some users wish to create hyperlinks to an entire data source, others to only sub-sets of data within a data sources. For this reason, some sources are maintained in UniChem as a separate source for the entire source, and others for sub-sets of the source.
* Some sources in UniChem, such as PubChem, have integrated structures from a wide variety of depositing primary sources. In the case of PubChem, the structures as originally deposited are assigned a different set of identifiers (SIDs) to the ‘integrated’ equivalent of the molecule (which is assigned a ‘CID’). SIDs therefore represents the original depositor-defined version of the structure, and the CID represents ‘PubChem’s’ normalized, integrated form of the molecule.&#x20;
* Sometimes these structures are different to one another, as the normalization process may change the structure.&#x20;
* In the case of PubChem, UniChem has included some sub-sets on the basis of the original depositor, and has therefore adopted the following policy:&#x20;
  * For the entire Pubchem data source CIDs are used.&#x20;
  * For depositor-defined sub-sets of PubChem, SIDs are used instead.&#x20;
* Please go [here](http://www.jcheminf.com/content/5/1/3) for a full discussion of provenance in relation to UniChem.\
  \
  \ <br>
