Tuesday, February 17, 2015

research blogging: concatenating vs. averaging timepoints

A little bit of the (nicely described) methods in Shen et al. 2014 ("Decoding the individual finger movements ..." citation below) caught my eye: they report better results when concatenating the images from adjacent time points instead of averaging (or analyzing each independently). The study was straightforward: classifying which finger (or thumb) did a button press. They got good accuracies classifying single trials, with both searchlights and anatomical ROIs. There's a lot of nice methodological detail, including how they defined the ROIs in individual participants, and enough description of the permutation testing to tell that they followed what I'd call a dataset-wise scheme (nice to see!).

But what I want to highlight here is a pretty minor part of the paper: during preliminary analyses they classified the button presses in individual images (i.e., single timepoints; the image acquired during 1 TR), the average of two adjacent images (e.g., averaging the images collected 3 and 4 TR after a button press), and by concatenating adjacent images (e.g., concatenating the images collected 3 and 4 TR after the button press), and found the best results for concatenation (they don't specify how much better).

Concretely, concatenation sends more voxels to the classifier each time: if a ROI has 100 voxels, concatenating two adjacent images means that each example has 200 voxels (the 100 ROI voxels at timepoint 1 and the 100 ROI voxels at timepoint 2). The classifier doesn't "know" that this is actually 100 voxels at two timepoints; it "sees" 200 unique voxels. Shen et al.used linear SVM (c=1), which generally handles large numbers of voxels well; doubling ROI sizes might hurt the performance of other classifiers.

I haven't tried concatenating timepoints; my usual procedure is averaging (or fitting a HRF-type model). But I know others have also had success with concatenation; feel free to comment if you have any experience (good or bad).

ResearchBlogging.orgShen, G., Zhang, J., Wang, M., Lei, D., Yang, G., Zhang, S., & Du, X. (2014). Decoding the individual finger movements from single-trial functional magnetic resonance imaging recordings of human brain activity European Journal of Neuroscience, 39 (12), 2071-2082 DOI: 10.1111/ejn.12547

Saturday, February 14, 2015

hyperacuity with MVPA: a verdict yet?

A few years ago a debate started about whether MVPA hyperacuity is possible: can we pick up signals from sources smaller than an individual voxel? This topic popped up for me again recently, so this post organizes my notes, gives some impressions, and points out some of the key papers.

beginnings: V1 and grating orientations

In 2005 Kamitani & Tong and Haynes & Rees reported being able to use MVPA methods with fMRI data to detect the orientation of gratings in V1. The anatomy and function of early visual areas is better understood than most parts of the brain; we know that they are organized into columns, each sensitive to particular visual attributes. In V1, the orientation columns are known to be much smaller than (typical) fMRI voxels, so how could classification be possible?

Multiple groups, including Kamitani & Tong in their 2005 paper, suggest that this "hyperacuity" could be due to a voxel-level bias in the underlying architecture, whether in the distribution of orientation columns, the vasculature, or some combination of the two. The idea here is that, since columns are not perfectly evenly spatially distributed, each voxel will end up with more columns of one orientation than another, and this subtle population-level bias  is what's being detected in the MVPA.

does degrading the signal reduce hyperacuity?

From the beginning, the idea that hyperacuity was behind the detection of orientation was met with both excitement and skepticism. In 2010 Hans op de Beeck wrote a NeuroImage Comments and Controversy article which kicked off a series of papers trying to confirm (or not) hyperacuity by means of degrading the fMRI images. The logic is straightforward: if subtle biases within individual voxels are making the classification possible, degrading the images (blurring, adding noise, filtering), should dramatically reduce the classification accuracy.

op de Beeck (2010) smoothed images with information at varying spatial scales at different FWHM to change their signal, interpreting the results as suggesting that the apparent hyperacuity might actually be due to larger-scale patterns (spanning multiple voxels). Objections to this technique were raised, however, partly because smoothing's affect on information content is complex and difficult to interpret. Swisher et al. (2010) used spatial filters, rather than smoothing, to degrade the signal in very high-resolution images, and found that small (< 1 mm) scale information was present, and critical for classification accuracy. But, the presence of small-scale signal in high-resolution images doesn't preclude the presence of larger (> 2 mm) scale information; indeed, larger-scale information was also found by Swisher et al. (2010). Filtering was also used by Freeman et al. (2011), who identified larger ("coarser") scale information about orientation in V1. Alink et al. (2013) also used filtering, along with more complex stimuli, finding information in a range of scales, but also cautioning that the numerous interactions and complications mean that filtering is not a perfect approach.

spiraling around

Recently, a set of studies have tried another approach: changing the stimuli (such as to spirals) to try to avoid potential confounds related to the visual properties of the stimuli. These debates (e.g., Freeman et al. 2013, Carlson 2014)  get too much into details of visual processing for me to summarize here, but a new set of Comments and Controversy NeuroImage articles (Carlson & Wardle, Clifford & Mannion 2015) suggests that using spiral stimuli won't be definitive, either.

my musings, and does this imply anything about MVPA?

Overall, I'm landing in the "bigger patterns, not hyperacuity" camp. I find the demonstrations of larger-scale patterns convincing, and a more plausible explanation of the signal, at least for human fMRI with ~ 3 mm voxels; it strikes me as equally reasonable that very small-scale patterns could dominate for high-resolution scanning in anesthetized animals (e.g., Swisher et al. 2010).

But do these debates imply anything for the usefulness of MVPA as a whole? Carlson and Wardle (2015) suggest that it does, pointing out that at this 10-year anniversary of the first papers suggesting the possibility of hyperacuity, we still haven't "determined the underlying source of information, despite our strong understanding of the physiology of early visual cortex." I wonder if this is because the best understanding of the physiology of early visual cortex is at the fine scale (neurons and columns), not the coarse scale ( > 5 mm maps). I agree that interpreting the properties of individual voxels from MVPA is fraught with difficulty; interpreting the properties of groups of voxels is much more robust.

papers mentioned here, plus some other relevant papers