

It's easy to generate more confusion matrices to get a feel for how the two ways of calculating accuracy differ, such as with this R code:
c.matrix <- rbind(c(1,0),
c(1,1));
sum(diag(c.matrix))/sum(c.matrix); # "overall" proportion correct
first.row <- c.matrix[1,1] / (c.matrix[1,1] + c.matrix[1,2])
second.row <- c.matrix[2,2] / (c.matrix[2,1] + c.matrix[2,2])
(first.row + second.row)/2; # "balanced" proportion correct
So, should we use balanced accuracy instead of overall? Yes, it's probably better to use balanced accuracy when there's just one test set, and it isn't balanced. I tend to be extremely skeptical about interpreting classification results when the training set is not balanced, and would want to investigate a lot more before deciding that balanced accuracy reliably compensates for unbalanced training sets. However, it's probably fine to use balanced accuracy with unbalanced test sets in situations like cross-classification, where a classifier is trained once on a balanced training set (e.g., one person's dataset), and then tested once (e.g., another person's dataset). Datasets requiring cross-validation need to be fully balanced, because the each testing set contributes to the training set in other folds.
For more, see Brodersen, Kay H., Cheng Soon Ong, Klaas E. Stephan, and Joachim M. Buhmann. 2010. "The balanced accuracy and its posterior distribution." DOI: 10.1109/ICPR.2010.764
For cross-validation, rather than requiring balanced data, you just need stratified sampling, so that the proportions of the various classes in each fold remain approximately constant. In that way, the training folds and each testing fold statistically resemble the overall training data.
ReplyDelete