Tanimoto Coefficient and Fingerprint Generation

A number of thresholds or measures are available for Similarity searching. The higher the threshold the closer the target structures are to the query structure. By default the Similarity search within SureChEMBL uses the Tanimoto coefficient to calculate the degree of similarity between the query and the target structures. The Tanimoto coefficient has two arguments:

  • The fingerprint of the query structure

  • The fingerprint of the target structure

A fingerprint is comprised of a list of predefined structure fragments or feature found within a structure. Each feature that is present is represented as “on” by using the number 1 (as in one bit).

Tanimoto coefficient formula

T = NA&B / (NA + NB – NA&B)

NA represents the number of "on" features (bits) in structure A.

NB represents represents the number of "on" features (bits) in structure B.

NA&B represents the number of "on" features (bits) common to both fingerprints A and B.

The hashed binary chemical fingerprint of a molecule is a fixed-length bit string (a sequence of "0" and "1" digits) that contains information on the structure.

Fingerprint type

SureChEMBL makes use of the Chemaxom chemical fingerprint. It is a hashed binary fingerprint of 1024 bits with a maximum number of bonds in the pattern of 7.

The process of fingerprint generation is as follows:

  1. Up to a given bond, all linear paths (linear patterns) consisting of bonds and atoms of a structure are detected.

  2. Branching points at the end of each linear pattern are also detected.

  3. All cycle (cyclic patterns) are detected. Using a proprietary hashing method, a given number of bits in the bit stream are set for each pattern. It is possible that the same bit is set by multiple patterns. This phenomenon is called bit collision. A few bit collisions in the fingerprint are tolerable, but too many may result in losing information in the fingerprint.

Example

Reference websites

Details presented within this documentation kindly provided by and reproduced from ChemAxon (www.chemaxon.com). For a more thorough explanation of the Chemaxon fingerprints, see https://docs.chemaxon.com/display/docs/chemical-hashed-fingerprint.md

Last updated