For the complete documentation index, see llms.txt. This page is also available as Markdown.

Structure search type

There are several approaches to locating chemical structures within patent documents. SureChEMBL focuses on the three most popular methods.

By definition, the examined molecule within a patent is called a target, the structure we seek is called a query, and a target molecule matching the query structure is called a hit.

The following chemical searches are available:

Search type
Description

Substructure

Chemists are most often interested in Substructure search, that is, whether a target structure contains the query structure within it.

Note: If special molecular features are present on the query (eg. stereochemistry, charge, etc.), only those targets containing the feature are considered hits. However, if a feature is missing from the query, it is not checked and targets without that feature may appear as hits.

Similarity

A Similarity structure search looks for target structures that are similar to the query structure. The similarity concept implemented is based on hashed binary chemical fingerprints derived using a Tanimoto metric. That is to say, the presence of molecular features are recorded for both the query and the target and then compared using a standard formula.

Note: See Tanimoto coefficient and fingerprint generation for a complete description of the concept and method used.

Note: If you choose Similarity as your search type, you will be prompted to provide a Tanimoto coefficient between 50 and 100%.

Identical

In an Identical structure search, all molecular features need to be equal (e.g. a non-stereo query will only match a non-stereo target).

Connectivity

Connectivity search retrieves compounds with common atomic connectivity but differing stereochemistry or isotopic forms.

Last updated