Structure search type
There are several approaches to locating chemical structures within patent documents. SureChEMBL focuses on the three most popular methods.
By definition, the examined molecule within a patent is called a target, the structure we seek is called a query, and a target molecule matching the query structure is called a hit.
The following chemical searches are available:
Substructure
Chemists are most often interested in Substructure search, that is, whether a target structure contains the query structure within it.
Note: If special molecular features are present on the query (eg. stereochemistry, charge, etc.), only those targets containing the feature are considered hits. However, if a feature is missing from the query, it is not checked and targets without that feature may appear as hits.
Similarity
A Similarity structure search looks for target structures that are similar to the query structure. The similarity concept implemented is based on hashed binary chemical fingerprints derived using a Tanimoto metric. That is to say, the presence of molecular features are recorded for both the query and the target and then compared using a standard formula.
Note: See Tanimoto coefficient and fingerprint generation for a complete description of the concept and method used.
Note: If you choose Similarity as your search type, you will be prompted to provide a Tanimoto coefficient between 50 and 100%.
Identical
In an Identical structure search, all molecular features need to be equal (e.g. a non-stereo query will only match a non-stereo target).
Connectivity
Connectivity search retrieves compounds with common atomic connectivity but differing stereochemistry or isotopic forms.
Last updated