I've posted before about permutation schemes for the group level; here's an explicit illustration of the case of leave-one-subject-out cross-validation, using a version of these conventions. I'm using "leave-one-subject-out" as shorthand to refer to cross-validation in which partitioning is done on the participants, as illustrated here.
This diagram illustrates a fMRI dataset from two tasks and four participants, with one example of each task per person. Since there are four participants there are four cross-validation folds; here, the arrows indicate that for the first fold the classifier is trained on data from subjects 2, 3, and 4, while subject 1's data makes up the test set. The overall mean accuracy results from averaging the four accuracies (see this for more explanation).
How to get a significance value for the overall mean accuracy (grey oval)? We can't just do a t-test (or binomial) over the accuracies we got by leaving out each person (as often done after within-subjects classification), because the accuracies are not at all independent: we need a test for the final mean accuracy itself.
My strategy is to do dataset-wise permutation testing by label-flipping within one or more participants each fold. In this case, there are only four participants, so we could flip labels in just one or two of the participants, for 10 possible permutations (choose(4,1) + choose(4,2)) (flipping all the subjects' labels would recreate the original dataset, since this is dataset-wise permutation testing.) Here, the task labels for participant 1 only are flipped (the first example becomes task 2, the second, task 1). The entire cross-validation then proceeds with each flipped-labels dataset.
I follow the same strategy when there are more than one example of each class per person: when the person's labels are to be 'flipped' for a permutation I relabel all the class 1 examples as class 2 (and class 2 as class 1). While the number of possible permutations is limited by number of participants in this scheme, it increases rapidly with the number of participants.
Anyone do anything different?
No comments:
Post a Comment