MAIP returns a score for each compound that passed the standardisation procedure. Compounds failing standardisation are included in the output file but with no score.
MAIP returns the model score.To help user with their decision making we provide some performance metrics that were calculated for 3 different validation sets. We have calculated enrichment factors for different fractions of each dataset and provided the model scores that classify each dataset fraction.
The score (active class logarithmic joint probability) is generated by the NaΓ―ve Bayes model described in Xia et al. (2004). For a given compound it is defined by:
Where P(active|Fi) is the model posterior probability for each feature Fi. It is defined by:
Where A is the number of active molecules in the set, T the total number of molecules in the set, TFi the total number of molecules that contain feature Fi and AFi the number of active molecules that contain feature Fi.
We used two metrics to measure the model performance
ROC AUC score. The receiver operator characteristic (ROC) curve is calculated by plotting the fraction of false positives on the x-axis and the fraction of true positives on the y-axis. The area under the curve (AUC) provides a measure of how well a model is able to classify binary data. A value 0.5 corresponds to selecting compounds at random while a perfect model will return an ROC AUC value of 1. This metric is often used to measure the performance of classification models as it is insensitive to class imbalance. The figure below shows an example of an ROC curve.
Enrichment factor. The enrichment factor (EF) is the hit rate (the proportion of active compounds) within a defined sorted fraction divided by the total hit rate. It is defined by :
Where X is the user-defined fraction, P[X] the number of actives in the fraction, N[X] the total number of compounds in the fraction, P the total number of actives and N the total number of compounds. The EF is frequently used as a pragmatic measure of performance, reflecting the common use of in silico models to identify a subset of compounds for experimental evaluation. In this work, we calculated this metric at 1% and 10%, respectively.
Our model was validated with three experimental validation sets. Here are some details:
Dataset name | dataset size | number of actives | number of inactives | hit rate |
St. Jude Screening Set | 220,691 | 9,082 | 211,609 | 4.12% |
MMV test set | 5,869 | 1,198 | 4,671 | 20.41% |
PubChem | 91,796 | 384 | 91,412 | 0.42% |
Note the difference in terms of active compounds, particularly for MMV test set
Performance metrics | ROC AUC score | EF[1%] | EF[10%] | EF[50%] | #actives | #inactives |
St. Jude Screening Set | 0.81 | 12.1 (71) | 4.8 (36) | 1.8 (15) | 9,082 | 211,609 |
MMV test set | 0.67 | 3.5 (60) | 2.1 (41) | 1.4 (23) | 1,198 | 4,671 |
Pubchem | 0.69 | 7.0 (56) | 2.8 (47) | 1.5 (34) | 384 | 91,412 |
(the number between parentheses indicate the score for which x% is achieved)
The model shows enrichment when tested against all three validation sets. For each dataset, the higher the model score, the greater the observed enrichment. Also, the thresholds needed to pick 1%, 10% and 50% of the predictions correlate with the dataset size.
We recommend that users refer to these data when analysing their results. For instance, if one predicts a 1 million compound library but (for example) few of those compounds have a high score (perhaps only 1% of them have a score above 15) we would expect the enrichment for that subset to be modest.
Finally, users are advised to apply additional in silico filters to assess the suitability of any hits from MAIP prior to screening. High scoring compounds may have physicochemical properties and/or substructures that are unsuitable as starting points for a malaria drug discovery programme. In addition, some of the training sets used in MAIP contain examples of known anti-malarial compounds (eg aminoquinolines). Thus, molecules with a high score in the model may have already been worked on extensively in anti-malarial programmes. Public bioactivity resources such as ChEMBL can be used to suggest whether anti-malarial activity is already known for particular structural classes.