Comment on page

# Using MAIP results

How to use the MAIP results for your dataset
MAIP returns a score for each compound that passed the standardisation procedure. Compounds failing standardisation are included in the output file but with no score.

## Usage

MAIP returns the model score.To help user with their decision making we provide some performance metrics that were calculated for 3 different validation sets. We have calculated enrichment factors for different fractions of each dataset and provided the model scores that classify each dataset fraction.

## Score calculation

The score (active class logarithmic joint probability) is generated by the Naïve Bayes model described in Xia et al. (2004). For a given compound it is defined by: Where P(active|Fi) is the model posterior probability for each feature Fi. It is defined by: Where A is the number of active molecules in the set, T the total number of molecules in the set, TFi the total number of molecules that contain feature Fi and AFi the number of active molecules that contain feature Fi.

## Model validation

### Performance metrics

We used two metrics to measure the model performance
ROC AUC score. The receiver operator characteristic (ROC) curve is calculated by plotting the fraction of false positives on the x-axis and the fraction of true positives on the y-axis. The area under the curve (AUC) provides a measure of how well a model is able to classify binary data. A value 0.5 corresponds to selecting compounds at random while a perfect model will return an ROC AUC value of 1. This metric is often used to measure the performance of classification models as it is insensitive to class imbalance. The figure below shows an example of an ROC curve. Example of ROC curve
Enrichment factor. The enrichment factor (EF) is the hit rate (the proportion of active compounds) within a defined sorted fraction divided by the total hit rate. It is defined by : Where X is the user-defined fraction, P[X] the number of actives in the fraction, N[X] the total number of compounds in the fraction, P the total number of actives and N the total number of compounds. The EF is frequently used as a pragmatic measure of performance, reflecting the common use of in silico models to identify a subset of compounds for experimental evaluation. In this work, we calculated this metric at 1% and 10%, respectively.

### Validation sets

Our model was validated with three experimental validation sets. Here are some details:
Dataset name
dataset size
number of actives
number of inactives
hit rate
St. Jude Screening Set
220,691
9,082
211,609
4.12%
MMV test set
5,869
1,198
4,671
20.41%
PubChem
91,796
384
91,412
0.42%
Note the difference in terms of active compounds, particularly for MMV test set

### Results

#### St. Jude Screening Set Individual distribution of model scores for the active (orange) and inactive (blue) compounds for the St. Jude Screening set. Grey lines illustrate the fractions of compounds used to calculated the enrichment factors.

#### MMV test set Individual distribution of model scores for the active (orange) and inactive (blue) compounds for the MMV test set. Grey lines illustrate the fractions of compounds used to calculated the enrichment factors.

#### PubChem Individual distribution of model scores for the active (orange) and inactive (blue) compounds for the PubChem set. Grey lines illustrate the fractions of compounds used to calculated the enrichment factors.

#### Result summary

 Performance metrics ROC AUC score EF[1%] EF[10%] EF[50%] #actives #inactives St. Jude Screening Set 0.81 12.1 (71) 4.8 (36) 1.8 (15) 9,082 211,609 MMV test set 0.67 3.5 (60) 2.1 (41) 1.4 (23) 1,198 4,671 Pubchem 0.69 7.0 (56) 2.8 (47) 1.5 (34) 384 91,412
(the number between parentheses indicate the score for which x% is achieved)

## Wrap-up

The model shows enrichment when tested against all three validation sets. For each dataset, the higher the model score, the greater the observed enrichment. Also, the thresholds needed to pick 1%, 10% and 50% of the predictions correlate with the dataset size.
We recommend that users refer to these data when analysing their results. For instance, if one predicts a 1 million compound library but (for example) few of those compounds have a high score (perhaps only 1% of them have a score above 15) we would expect the enrichment for that subset to be modest.
Finally, users are advised to apply additional in silico filters to assess the suitability of any hits from MAIP prior to screening. High scoring compounds may have physicochemical properties and/or substructures that are unsuitable as starting points for a malaria drug discovery programme. In addition, some of the training sets used in MAIP contain examples of known anti-malarial compounds (eg aminoquinolines). Thus, molecules with a high score in the model may have already been worked on extensively in anti-malarial programmes. Public bioactivity resources such as ChEMBL can be used to suggest whether anti-malarial activity is already known for particular structural classes.