Comment on page

# Using MAIP results

How to use the MAIP results for your dataset

MAIP returns a score for each compound that passed the standardisation procedure. Compounds failing standardisation are included in the output file but with no score.

MAIP returns the model score.To help user with their decision making we provide some performance metrics that were calculated for 3 different validation sets. We have calculated enrichment factors for different fractions of each dataset and provided the model scores that classify each dataset fraction.

The score (active class logarithmic joint probability) is generated by the Naïve Bayes model described in Xia

*et al.*(2004). For a given compound it is defined by:Where

*P(active|Fi)*is the model posterior probability for each feature*Fi.*It is defined by:Where

*A*is the number of active molecules in the set,*T*the total number of molecules in the set,*TFi*the total number of molecules that contain feature*Fi*and*AFi*the number of active molecules that contain feature*Fi*.We used two metrics to measure the model performance

*ROC AUC score.*The receiver operator characteristic (ROC) curve is calculated by plotting the fraction of false positives on the x-axis and the fraction of true positives on the y-axis. The area under the curve (AUC) provides a measure of how well a model is able to classify binary data. A value 0.5 corresponds to selecting compounds at random while a perfect model will return an ROC AUC value of 1. This metric is often used to measure the performance of classification models as it is insensitive to class imbalance. The figure below shows an example of an ROC curve.

Example of ROC curve

*Enrichment factor.*The enrichment factor (EF) is the hit rate (the proportion of active compounds) within a defined sorted fraction divided by the total hit rate. It is defined by :

Where

*X*is the user-defined fraction,*P[X]*the number of actives in the fraction,*N[X]*the total number of compounds in the fraction,*P*the total number of actives and*N*the total number of compounds. The EF is frequently used as a pragmatic measure of performance, reflecting the common use of*in silico*models to identify a subset of compounds for experimental evaluation. In this work, we calculated this metric at 1% and 10%, respectively.Our model was validated with three experimental validation sets. Here are some details:

Dataset name | dataset size | number of actives | number of inactives | hit rate |
---|---|---|---|---|

St. Jude Screening Set | 220,691 | 9,082 | 211,609 | 4.12% |

MMV test set | 5,869 | 1,198 | 4,671 | 20.41% |

PubChem | 91,796 | 384 | 91,412 | 0.42% |

Note the difference in terms of active compounds, particularly for MMV test set

Individual distribution of model scores for the active (orange) and inactive (blue) compounds for the St. Jude Screening set. Grey lines illustrate the fractions of compounds used to calculated the enrichment factors.

Individual distribution of model scores for the active (orange) and inactive (blue) compounds for the MMV test set. Grey lines illustrate the fractions of compounds used to calculated the enrichment factors.

Individual distribution of model scores for the active (orange) and inactive (blue) compounds for the PubChem set. Grey lines illustrate the fractions of compounds used to calculated the enrichment factors.

Performance metrics | ROC AUC score | EF[1%] | EF[10%] | EF[50%] | #actives | #inactives |

St. Jude Screening Set | 0.81 | 12.1 (71) | 4.8 (36) | 1.8 (15) | 9,082 | 211,609 |

MMV test set | 0.67 | 3.5 (60) | 2.1 (41) | 1.4 (23) | 1,198 | 4,671 |

Pubchem | 0.69 | 7.0 (56) | 2.8 (47) | 1.5 (34) | 384 | 91,412 |

(the number between parentheses indicate the score for which x% is achieved)

The model shows enrichment when tested against all three validation sets. For each dataset, the higher the model score, the greater the observed enrichment. Also, the thresholds needed to pick 1%, 10% and 50% of the predictions correlate with the dataset size.

We recommend that users refer to these data when analysing their results. For instance, if one predicts a 1 million compound library but (for example) few of those compounds have a high score (perhaps only 1% of them have a score above 15) we would expect the enrichment for that subset to be modest.

Finally, users are advised to apply additional in silico filters to assess the suitability of any hits from MAIP prior to screening. High scoring compounds may have physicochemical properties and/or substructures that are unsuitable as starting points for a malaria drug discovery programme. In addition, some of the training sets used in MAIP contain examples of known anti-malarial compounds (eg aminoquinolines). Thus, molecules with a high score in the model may have already been worked on extensively in anti-malarial programmes. Public bioactivity resources such as ChEMBL can be used to suggest whether anti-malarial activity is already known for particular structural classes.

Last modified 2yr ago