Friday, September 25, 2015

How to do cross-validation? Have to partition on the runs?

People sometimes ask why many MVPA studies use leave-one-run-out cross-validation (partitioning on the runs), and if it's valid to set up the cross-validation in other ways, such as leaving out individual trials. Short answer: partitioning on the runs is a sensible default for single-subject analyses, but (as usual with fMRI) other methods are also valid in certain situations.

As I've written before, cross-validation is not a statistical test itself, but rather the technique by which we get the statistic we want to test (see this for a bit more on doing statistical testing). My default example statistic is accuracy from a linear SVM, but these issues apply to other statistics and algorithms as well.

Why is partitioning on the runs a sensible default? A big reason is that scanner runs (also called "sessions") are natural breaks in fMRI datasets: a run is the period in which the scanner starts, takes a bunch of images at intervals of one TR, then stops. Images from within a single scanner run are intrinsically more related to each other than to images from different runs, partly because they were collected closer together in time (so the subject's mental state and physical position should be more similar; single hemodynamic responses can last more than one TR), and because many software packages perform some corrections on each run individually (e.g., SPM motion correction usually aligns each volume to the first image within its run). The fundamental nature of temporal dependency is reflected in the terminology of many analysis programs, such as "chunks" in pyMVPA and "bricks" in afni.

So, since the fMRI data within each person is structured into runs, it makes sense to partition on the runs. It's not necessary to leave-one-run-out, but if you're not leaving some multiple number of runs out for the cross-validation, you should verify that your (not run-based) scheme is ok (more later). By "some multiple number of runs" I mean leaving out two, three, or more runs on each cross-validation fold. For example, if your dataset has 20 short runs you will likely have more stable (and significant) performance if you leave out two or four runs each fold rather than just one (e.g., Permutation Tests for Classification. AI Memo 2003-019). Another common reason to partition with multiple runs is if you're doing some sort of cross-classification, like training on all examples from one scanning day and testing on all examples from another scanning day. With cross-classification there might not be full cross-validation, such as if training on "novel" runs and testing on "practiced" runs is of interest for an experiment, but training on "practiced" runs and testing with "novel" runs is not. There's no problem with these cross-classification schemes, so long as one set of runs is used for training, and another (non-intersecting) set of runs for testing in an experiment-sensible (and ideally a priori) manner.

But it can sometimes be fine to do cross-validation with smaller bits of the dataset than the run, such as leaving out individual trials or blocks. The underlying criteria is the same, though: are there sensible reasons to think that the units you want to use for cross-validation are fairly independent? If your events are separated by less than 10 seconds or so (i.e., HRF duration) it's hard to argue that it'd be ok to put one event in the training set and the event after it into the testing set, since there will be so much "smearing" of information from the first event into the second (and doubly so if the event ordering is not perfectly balanced). But if your events are fairly well-spaced (e.g. a block design with 10 second action blocks separated by 20 seconds of rest), then yes, cross-validating on the blocks might be sensible, particularly if preprocessing is aimed at isolating the signal from individual blocks.

If you think that partitioning on trials or blocks, rather than full runs, is appropriate for your particular dataset and analysis, I suggest you include an explanation of why you think it's ok (e.g., because of the event timing), as well as a demonstration that your partitioning scheme did not positively bias the results. What demonstration is convincing necessarily varies with dataset and hypotheses, but could include "random runs" and control analyses. I described "random runs" in The impact of certain methodological choices ... (2011); the basic idea is to cross-validate on the runs, but then randomly assign the trials to new "runs", and do the analysis again (i.e., so that trials from the same run are mixed into the training and testing sets). If partitioning on the random runs and the real runs produce similar results, then mixing the trials didn't artificially increase performance, and so partitioning other than on the runs (like leaving out individual blocks) is probably ok. Control analyses are classifications that shouldn't work, such as using visual cortex to classify an auditory stimulus. If such negative control analyses aren't negative (can classify) in a particular cross-validation scheme, then that cross-validation scheme shouldn't be trusted (the training and testing sets are not independent, or some other error is occurring). But if the target analysis (e.g., classifying the auditory stimuli with auditory cortex) is successful while the control analysis (e.g., classifying the auditory stimuli with visual cortex) is not, and both use the same cross-validation scheme, then the partitioning was probably ok.

This post has focused on cross-validation when each person is analyzed separately. However, for completeness, I want to mention that another sensible way to partition fMRI data is on the people, such as with leave-one-subject-out cross-validation. Subjects are generally independent (and so suitable "chunks" for cross-validation), unless the design includes an unusual population (e.g., identical twins) or design (e.g., social interactions).

Friday, September 11, 2015

a nice "Primer on Pattern-Based Approaches to fMRI"

John-Dylan Haynes has a new MVPA review article out in Neuron, "A Primer on Pattern-Based Approaches to fMRI" (full citation below). I say "new" because he (and Rees) was the author of one of the first MVPA overview articles ("Decoding mental states from brain activity in humans"), back in 2006. As with the earlier article, this is a good introduction to some of the main methods and issues; I'll highlight a few things here, in no particular order.

At left is part of Figure 4, which nicely illustrates four different "temporal selection" options for event-related designs. The red and green indicate the two different classes of stimuli (cats and dogs), with the black line the fMRI signal in a single voxel across time.

I often refer to these as "temporal compression" options, but like "temporal selection" even better: we are "selecting" how to represent the temporal aspect of the data, and "compression" isn't necessarily involved.

John-Dylan Haynes (quite properly, in my opinion) recommends permutation tests (over binomial and t-tests) for estimating significance, noting that one of their (many) benefits is in detecting biases or errors in the analysis procedure.

Overall, I agree with his discussion of spatial selection methods (Figure 3 is a nice summary), and was intrigued by the mention of wavelet pyramids as a form of spatial filtering. I like the idea of spatial filtering to quantify the spatial scale at which information is present, but haven't run across a straightforward implementation (other than searchlight analysis); wavelet pyramids might need more investigation.

I also appreciate his caution against trying too many different classifiers and analysis procedures on the same dataset, pointing out that this can cause "circularity and overfitting". This problem is also sometimes referred to as having too many experimenter degrees of freedom: if you try enough versions of an analysis you can probably find one that's "significant." His advice to use nested cross-validation strategies (e.g., tuning the analysis parameters on a subset of the subjects) is solid, if sometimes difficult in practice because of the limited number of available subjects. The advice to use an analysis that "substantially decreases the number of free parameters" is also solid, if somewhat unsatisfying: I often advise people to use linear SVM, c=1 as the default analysis. While tuning parameters and testing other classifiers could conceivably lead to a higher accuracy, the risk of false positives from exploiting experimenter degrees of freedom is very real.

I also like his inclusion of RSA and encoding methods in this "primer". Too often we think of classifier-based MVPA, RSA, and encoding methods as unrelated techniques, but it's probably more accurate to think of them as "cousins," or falling along a continuum.

It's clear by now that I generally agree with his issue framing and discussion, though I do have a few minor quibbles, such as his description of MVPA methods as aimed at directly studying "the link between mental representations and corresponding multivoxel fMRI activity patterns". I'd assert that MVPA are also useful as a proxy for regional information content, even without explicit investigation of cognitive encoding patterns. But these are minor differences; I encourage you to take a look at this solid review article. Haynes JD (2015). A Primer on Pattern-Based Approaches to fMRI: Principles, Pitfalls, and Perspectives. Neuron, 87 (2), 257-70 PMID: 26182413