MVPA Meanderings: demo: permutation tests for within-subjects cross-validation

Tuesday, August 18, 2015

demo: permutation tests for within-subjects cross-validation

This R code demonstrates how to carry out a permutation test at the group level, when using within-subjects cross-validation. I describe permutation schemes in a pair of PRNI conference papers; see DOI:10.1109/PRNI.2013.44 for an introduction and single subjects, and this one for the group level, including the fully balanced within-subjects approach used in the demo. A blog post from a few years ago also describes some of the issues, using examples structured quite similarly to this demo.

For this demo I used part of the dataset from doi:10.1093/cercor/bhu327, which can be downloaded from the OSF site. I did a lot of temporal compression with this dataset, which is useful for the demo, since only about 18 MB of files need to be downloaded.Unfortunately, none of the analyses we did for the paper are quite suitable to demonstrate simple permutation testing with within-subjects cross-validation, so this demo performs a new analysis. The demo analysis is valid, just not really sensible for the paper's hypotheses (so, don't be confused when you can't find it in the paper!).

The above figure is generated by the demo code, and shows the results of the test. The demo uses 18 subjects' data, and their null distributions are shown as blue histograms. The true-labeled accuracy for each person is plotted as a red line, and listed in the title, along with its p-value, calculated from the shown null distribution (the best-possible p-value, 1/2906, rounds to 0).

The dataset used for the demo has no missings: each of the people has six runs, with four examples (two of each class) in each run. Thus, I can use a single set of labels for the permutation test, carrying out the relabelings and classifications in each person individually (since it's within-subjects cross-validation), but with the null distribution for each person built from the same relabelings. Using the same relabelings in each person allows the group-level null distribution (green, in the image above) to be built from the across-subjects average accuracy for each relabeling. In a previous post I called this fully-balanced strategy "single corresponding stripes", illustrated with the image below; see that post (or the demo code) for more detail.

The histogram for the across-subjects means (green histogram; black column) is narrower than the individual subject's histograms. This is sensible: for any particular permutation relabeling, one person might have a high accuracy, and another person a low accuracy; averaging the values together gives a value closer to chance. Rephrased, each individual has at least one permutation with very low (0.2) accuracy (as can be seen in the blue histograms). But different labelings made that low accuracy in each person, so the lowest group-mean accuracy was 0.4.

The group mean of 0.69 was higher than all the permuted-label group means, giving a p-value of 0.0003 = 1/2906 (2906 permutation relabelings were run, all possible). The equivalent t-test is shown in the last panel, and also produces a very significant p-value.

4 comments:

DeeuuMay 27, 2016 at 9:48 AM
Hi,

Really enjoy reading your posts...please keep it up!

MVPA is not at all my area but I'm interesting in applying similar permutation tests to within-subject linear regressions.

I'm having a hard time understand why cross-validation is been done here anyway?

Can't you do just the fit, get the statistic (accuracy), and then permutate the data and recompute the statistic under the null? Why drop some data out?

For my purposes, I wish to perform within-subject linear regression and run a permutation test to test the significance of r^2...rather than rely on the assumptions.

ReplyDelete
Replies
Ryan SJanuary 31, 2017 at 1:54 PM
Hi Jo,

First, I just wanted to thank you for these posts - sorting things out online I often find them the clearest explanations of things.

Second, I wanted to ask - where have you found it easiest to implement these methods in searchlights? Especially for the 'single-corresponding' permutations? I am likely to build some PyMVPA code in the mean time but was curious.

Thanks!
Ryan
ReplyDelete
Replies

Add comment