|
The Combined panel is associated with a report as a whole, not with individual indications within the report. It shows the performance, per target value, of a predictive model based on the combination of discovered indications for high/low target values.
At the top of the above panel, you can select the target value. Depending on whether this target value is numerical or categorical, you will see a different panel. Below we first explain the numerical case, more information on the categorical panel is provided below.
Notice that all information in this panel is always based on a separate validation set, i.e., a fraction of the data not used either for finding the individual indications, or for building the model.
In the bottom-right curve Predicted-Actual, predicted vs. actual target values are plotted. Ideally, all dots should be on the diagonal and the root-mean-squared-error is 0. The correlation value refers to the correlation between predicted and actual target values. The rank correlation is a variant that relates predicted rank to actual rank.
The three following curves provide information on the quality of a ranking based on predicted values. Given:
In all three curves, the red line indicates the performance of a trivial model on this task, i.e. one that produces a random ranking.
- a particular cut off value (see bottom right box) and
- a definition of hit as being above (rank high) or below (rank low) that cut off the question is how well the model is able to prioritize, i.e., sort hits to the top.
In the Cumulative response curve, you can find the fraction of hits (y-axis) contained within the first n percent of data (x-axis). On average, you can expect the top N percent to contain N percent of the hits (i.e. the diagonal). If the model can produce a ranking with > N percent hits in the the top N percent, it does better than random as the ranking contains an enriched top N.
The Lift curve shows -for each top N (x-axis)- how many times better (y-axis) than random the model is performing. If the lift curve goes through (25,3), that means the model produces 3 times more hits in the top 25 than a random ranker would.
The ROC (Receiver Operating Characteristic) curve shows the fraction of non-hits (x-axis: false alarms) you have to include in the top N if you want to obtain a particular fraction of hits. Ideally, the curve would go straight up from (0,0) to (0,100). The diagonal again represents the performance of the trivial random ranker. As with the two previous curves, the quality of the predictive model is proportional to the area under the curve (0 ≤ auc ≤ 1, in the example 0.74). between the red curve and the blue one.
For the categorical case, the "Cumulative respons", "Lift", and "ROC" curves shown above are provided for each category.
For instance, in the above panel, the curves for class 'positive' are shown.
The Confusion matrix shows mis-classification details. The statistics for the correct predictions are shown on the diagonal. Each row in the confusion matrix corresponds to an actual category and shows the number of examples from that category that are classified into the different bins. For instance, in the above case, there are 20 examples from class 'negative' that are correctly predicted, and 17 cases that are assigned to the wrong bin 'positive'.
Based on the confusion matrix, a list of scores (both overall and per class) can be computed. Those are shown in the two middle tables.
|
© 2002-2010 DTAI, K.U.Leuven. All rights reserved.