Parameters

Distributions and data

training data

Golden standard: training a model from the fully labeled data

Postprocessing

scaling the probabilities of a non-traditional classifier

Does this work in theory? I.e. if correct probabilities $\Pr(s=1|x)$ are found?

Yes!

Does the non-traditional classifier work?

Not necessarily

Moving the decision threshold

It works in theory (with correct $\Pr(s=1|x)$)

But what if non-traditional classifier does not predict correct probabilities? Then this also does not help. Biased twoards the negative class because there were more negative examples during training.

Preprocessing: per class reweighting

Does this work in Theory?

The probabilities are not correct, but the decision threshold is expected to be correct

Does this work in practice?

Again, our model is not able to predict the correct probabilities and is biased towards the negative examples, because that space was more clearly negative. So now, not only the probabilities are incorrect, but also the decision threshold.

Preprocessing: per instance reweighting, using their probability to be labeled

Does this work in theory?

Yes!

Does this work in practice?

Again we have the problem of the bias in learning the non-traditional classifier, which prevents us from correctly weighting the instances. Additionally, our non-traditional classifier learned probabilities $\Pr(s=1|x)>c$, which results in probabiliteis $\Pr(y=1|s=0,x)>1$ which is impossible, so we need to cut them off at 1.

This together makes it not work as well in practice as in theory. But still, the resulting model is better then the class-weighted instances, which we also expected from the theory.

Preprocessing: per instance weighting (risk minimization)

Does this work in theory?

Yes!

Does this work in practice?

Yes! But your model needs to be able to handle negative weights, and this is not always the case. ==> Method modification possible to handle edge cases