I really like this image and analogy for describing some of the distortions that can arise from searchlight analysis: a very small informative area ("the needle") can turn into a large informative area in the information map ("the haystack"), but the reverse is also possible: a large informative area can turn into a small area in the information map ("haystack in the needle").
I copied this image from the poster Matthew
Cieslak,
Shivakumar
Viswanathan,
and
Scott
T.
Grafton
presented last year at SfN (poster 626.16, Fitting and Overfitting in Searchlights, SfN2011). The current article covers some of the same issues as the poster, providing a mathematical foundation and detailed explanation.
They step through several proofs of information map properties, using reasonable assumptions. One result I'll highlight here is that the information map's representation of a fixed-size informative area will grow as searchlight radius increases (my phrasing, not theirs). Note that this (and the entire paper) is describing the single-subject, not group level of analysis.
This fundamental 'growing' property is responsible for many of the strange things that can appear in searchlight maps, such as the edge effects I posted about here. As Viswanathan et al. point out in the paper, it also means that interpreting the number of voxels found significant in a searchlight analysis is fraught with danger: it is affected by many factors other than the amount and location of informative voxels. They also show that it is possible to have just 430 properly-spaced informative voxels create the entire brain to be marked as informative in the information map, using just 8 mm radius searchlights (that's not particularly large in the literature).
I recommend taking a look at this paper if you generate or interpret information maps via searchlight analysis, particularly if you have a mathematical bent. It nicely complements diagram- and description-based explanations of searchlight analysis (including, hopefully soon, my own). It certainly does not include all the aspects of information mapping, but provides a solid foundation for those it does include.
Shivakumar Viswanathan, Matthew Cieslak, & Scott T. Grafton (2012). On the geometric structure of fMRI searchlight-based information maps. arXiv: 1210.6317v1
Wednesday, October 31, 2012
Thursday, October 25, 2012
permuting searchlight maps: Stelzer
Now to the proposals in Stelzer, not just their searchlight shape!
This is a dense methodological paper, laying out a way (and rationale) to carry out permutation tests for group-level classifier-based searchlight analysis (linear svm). This is certainly a needed topic; as pointed out in the article, the assumptions behind t-tests are certainly violated in searchlight analysis, and using the binomial is also problematic (they suggest that it is too lenient, which strikes me as plausible).
Here's my interpretation of what they propose:
Most of this strikes me as quite reasonable. I've actually previously implemented almost this exact procedure (minus the cluster thresholding) on a searchlight dataset (not linear svms).
The part that makes me twitch the most is step 2: turning the 100 maps for each person into 100,000 group-average maps. I've been wanting to post about this anyway in the context of my ROI-based permutation testing example. But in brief, what makes me uncomfortable is the way 100 maps turn into 100000. Why not just calculate 5 for each person? 5^12 >> 100,000 (they had 12 subjects in some of the examples). Somehow 100 for each person feels more properly random than 5 for each person, but how many are really needed to properly estimate the variation? I will expand on this more (and give a few alternatives), hopefully somewhat soon.
The other thing that makes me wonder is the leniency. They show (e.g. Figure 11) that many more voxels are called significant in their method than with a t-test, claiming that as closer to the truth. This relates to my concern about how to combine over subjects: using 100,000 group maps allows very small p-values. But if the 100,000 aren't as variable as they should be, the p-values will be inflated.
Stelzer, J., Chen, Y., & Turner, R. (2012). Statistical inference and multiple testing correction in classification-based multi-voxel pattern analysis (MVPA): Random permutations and cluster size control. NeuroImage DOI: 10.1016/j.neuroimage.2012.09.063
UPDATE (30 October): We discussed this paper in a journal club and a coworker explained that the authors do explain the choice of 100 permutations per person in Figure 8 and the section "Undersampling of the permutation space". They made a dataset with one searchlight and many examples (80, 120, 160), then varied the number of permutations they calculated for each individual (10, 100, 1000, 10,000). They then made 100,000 group "maps" as before (my step 2), drawing from each group of single-subject permutations. Figure 8 shows the resulting histograms: the curves for 100, 1000, and 10,000 individual permutations are quite similar, which they use as rationale for running 100 permutations for each person (my step 1).
I agree that this is a reasonable way to choose a number of each-person permutations, but I'm still not entirely comfortable with the way different permutation maps are combined. I'll explain and show this more in a separate post.
This is a dense methodological paper, laying out a way (and rationale) to carry out permutation tests for group-level classifier-based searchlight analysis (linear svm). This is certainly a needed topic; as pointed out in the article, the assumptions behind t-tests are certainly violated in searchlight analysis, and using the binomial is also problematic (they suggest that it is too lenient, which strikes me as plausible).
Here's my interpretation of what they propose:
- Generate 100 permuted searchlight maps for each person. You could think of all the possible label (i.e. class, stimulus type, whatever you're classifying) rearrangements as forming a very large pool. Pick 100 different rearrangements for each person and do the searchlight analysis with this rearrangement. (The permuted searchlight analysis must be done exactly as the real one was - same cross-validation scheme, etc.)
- Generate 100,000 averaged group searchlight maps. Each group map is made by picking one permuted map from each person (out of the 100 made for each person in step 1) and averaging the values voxel-wise. In other words, stratified sampling with replacement.
- Do a permutation test at each voxel, calculating the accuracy corresponding to a p = 0.001 threshold. In other words, at each voxel you record the 100th biggest accuracy after sorting the 100,000 accuracies generated in step 2. (100/100000 = 0.001)
- Threshold the 100,000 permuted group maps and the one real-labeled group map using the voxel-wise thresholds calculated in step 3. Now the group maps are binary (pass the threshold or not).
- Apply a clustering algorithm to all the group maps. They clustered voxels only if they shared a face. I don't think they used a minimum cluster size, but rather called un-connected voxels clusters of size 1 voxel. (This isn't really clear to me.)
- Count the number of clusters by size in each of the 100,000 permuted maps and 1 real map. (this gives counts like 10 clusters with 30 voxels in map #2004, etc.)
- Generate the significance of the real map's clusters using the counts made in step 6. I think they calculated the significance for each cluster size separately then did FDR, but it's not obvious to me ("Cluster-size statistics" section towards end of "Materials and Methods").
- Done! The voxels passing step 7 are significant at the cluster level, corrected for multiple comparisons (Figure 3F of paper). The step 4 threshold map can be used for uncorrected p-values (Figure 3E of paper).
Most of this strikes me as quite reasonable. I've actually previously implemented almost this exact procedure (minus the cluster thresholding) on a searchlight dataset (not linear svms).
The part that makes me twitch the most is step 2: turning the 100 maps for each person into 100,000 group-average maps. I've been wanting to post about this anyway in the context of my ROI-based permutation testing example. But in brief, what makes me uncomfortable is the way 100 maps turn into 100000. Why not just calculate 5 for each person? 5^12 >> 100,000 (they had 12 subjects in some of the examples). Somehow 100 for each person feels more properly random than 5 for each person, but how many are really needed to properly estimate the variation? I will expand on this more (and give a few alternatives), hopefully somewhat soon.
The other thing that makes me wonder is the leniency. They show (e.g. Figure 11) that many more voxels are called significant in their method than with a t-test, claiming that as closer to the truth. This relates to my concern about how to combine over subjects: using 100,000 group maps allows very small p-values. But if the 100,000 aren't as variable as they should be, the p-values will be inflated.
Stelzer, J., Chen, Y., & Turner, R. (2012). Statistical inference and multiple testing correction in classification-based multi-voxel pattern analysis (MVPA): Random permutations and cluster size control. NeuroImage DOI: 10.1016/j.neuroimage.2012.09.063
UPDATE (30 October): We discussed this paper in a journal club and a coworker explained that the authors do explain the choice of 100 permutations per person in Figure 8 and the section "Undersampling of the permutation space". They made a dataset with one searchlight and many examples (80, 120, 160), then varied the number of permutations they calculated for each individual (10, 100, 1000, 10,000). They then made 100,000 group "maps" as before (my step 2), drawing from each group of single-subject permutations. Figure 8 shows the resulting histograms: the curves for 100, 1000, and 10,000 individual permutations are quite similar, which they use as rationale for running 100 permutations for each person (my step 1).
I agree that this is a reasonable way to choose a number of each-person permutations, but I'm still not entirely comfortable with the way different permutation maps are combined. I'll explain and show this more in a separate post.
searchlight shapes: Stelzer
This is the first of what will likely be a series of posts on a paper in press at NeuroImage:
Stelzer, J., et al., Statistical inference and multiple testing correction in classification-based multi-voxel pattern analysis (MVPA). NeuroImage (2012), http://dx.doi.org/10.1016/j.neuroimage.2012.09.063
There is a lot in this paper, touching some of my favorite topics (permutation testing, using the binomial, searchlight analysis, Malin's 'random' searchlights).
But in this post I'll just highlight the searchlight shapes used in the paper. They're given in this sentence: "The searchlight volumes to these diameters were 19 (D=3), 57 (D=5), 171 (D=7), 365 (D=9), and 691 (D=11) voxels, respectively." The authors don't list the software they used; I suspect it was custom matlab code.
Here I'll translate the first few sizes to match the convention I used in the other searchlight shape posts:
Here's the searchlight from Figure 1, and my blown-up version for a two-voxel radius searchlight.
It looks like they added the plus signs to the outer edges of a three-by-three cube. This doesn't follow any of my iterative rules, but perhaps would result from fitting a particular sphere-type rule.
Stelzer, J., et al., Statistical inference and multiple testing correction in classification-based multi-voxel pattern analysis (MVPA). NeuroImage (2012), http://dx.doi.org/10.1016/j.neuroimage.2012.09.063
There is a lot in this paper, touching some of my favorite topics (permutation testing, using the binomial, searchlight analysis, Malin's 'random' searchlights).
But in this post I'll just highlight the searchlight shapes used in the paper. They're given in this sentence: "The searchlight volumes to these diameters were 19 (D=3), 57 (D=5), 171 (D=7), 365 (D=9), and 691 (D=11) voxels, respectively." The authors don't list the software they used; I suspect it was custom matlab code.
Here I'll translate the first few sizes to match the convention I used in the other searchlight shape posts:
diameter | radius | number of surrounding voxels | notes |
---|---|---|---|
3 | 1 | 18 | This looks like my 'edges or faces touch' searchlight. |
5 | 2 | 56 | This has more voxels than the 'default' searchlight, but less than my two-voxel radius searchlight. Squinting at Figure 1 in the text, I came up with the shape below. |
Here's the searchlight from Figure 1, and my blown-up version for a two-voxel radius searchlight.
It looks like they added the plus signs to the outer edges of a three-by-three cube. This doesn't follow any of my iterative rules, but perhaps would result from fitting a particular sphere-type rule.
Tuesday, October 23, 2012
many options: frightening results
If you need a good scare this Halloween season, I suggest reading On the plurality of (methodological) worlds: estimating the analytic flexibility of fMRI experiments.
This is not an MVPA paper, but I have no doubt its conclusions are just as relevant to MVPA. Joshua Carp took one fMRI dataset and constructed no less than 34,560 significance maps (i.e. mass-univariate group statistical maps), deriving from 6,912 analysis pipelines (e.g. smoothing or not, slice-time correction or not) and various statistical choices (e.g. FDR or RFT corrections).
The scary part is that these various analysis choices are all reasonable and in use, but produced different results. To quote from his conclusion, "While some research outcomes were relatively stable across analysis pipelines, others varied widely from one pipeline to another. Given the extent of this variability, a motivated researcher determined to find significant activation in practically any brain region will very likely succeed–as will another researcher determined to find null results in the same region."
For just one highlight, Figure 4 shows the peak voxels identified in the 6,912 pipelines, color-coded by the number of pipelines in which each was peak. The color bar maxes at 526: no voxel was peak in more than 526 of the 6,912 maps. But the peaks are not distributed randomly: they're grouped in anatomically sensible ways (which is good).
This particular map reinforces my bias towards ROI-based analyses: should we really be interpreting tiny blobs or coordinate locations when they can be so susceptible to being shifted by reasonable analysis choices?
I am reminded of Simmons et. al's recommendations for describing results. We must be more disciplined and stringent about the sensitivity of our results to somewhat arbitrary choices, and more forgiving to less-than-perfect results when reviewing.
I certainly don't think that these results indicate that we should all give up, abandoning all fMRI analysis. But we should be even more skeptical about our results. Do they only appear in one 'magic' pipeline? Or do they more-or-less hold over perturbations in thresholds and processing?
Carp, J. (2012). On the Plurality of (Methodological) Worlds: Estimating the Analytic Flexibility of fMRI Experiments Frontiers in Neuroscience, 6 DOI: 10.3389/fnins.2012.00149
Ah, I see that Neuroskeptic also commented on this paper.
This is not an MVPA paper, but I have no doubt its conclusions are just as relevant to MVPA. Joshua Carp took one fMRI dataset and constructed no less than 34,560 significance maps (i.e. mass-univariate group statistical maps), deriving from 6,912 analysis pipelines (e.g. smoothing or not, slice-time correction or not) and various statistical choices (e.g. FDR or RFT corrections).
The scary part is that these various analysis choices are all reasonable and in use, but produced different results. To quote from his conclusion, "While some research outcomes were relatively stable across analysis pipelines, others varied widely from one pipeline to another. Given the extent of this variability, a motivated researcher determined to find significant activation in practically any brain region will very likely succeed–as will another researcher determined to find null results in the same region."
For just one highlight, Figure 4 shows the peak voxels identified in the 6,912 pipelines, color-coded by the number of pipelines in which each was peak. The color bar maxes at 526: no voxel was peak in more than 526 of the 6,912 maps. But the peaks are not distributed randomly: they're grouped in anatomically sensible ways (which is good).
This particular map reinforces my bias towards ROI-based analyses: should we really be interpreting tiny blobs or coordinate locations when they can be so susceptible to being shifted by reasonable analysis choices?
I am reminded of Simmons et. al's recommendations for describing results. We must be more disciplined and stringent about the sensitivity of our results to somewhat arbitrary choices, and more forgiving to less-than-perfect results when reviewing.
I certainly don't think that these results indicate that we should all give up, abandoning all fMRI analysis. But we should be even more skeptical about our results. Do they only appear in one 'magic' pipeline? Or do they more-or-less hold over perturbations in thresholds and processing?
Carp, J. (2012). On the Plurality of (Methodological) Worlds: Estimating the Analytic Flexibility of fMRI Experiments Frontiers in Neuroscience, 6 DOI: 10.3389/fnins.2012.00149
Ah, I see that Neuroskeptic also commented on this paper.
Monday, October 22, 2012
postdoc position: David Badre's lab
David Badre is looking for a postdoc with an MVPA bent. Here's the posting, if you're interested.
Thursday, October 18, 2012
some SfN reflections
I'm now back from SfN and sorting through my notes. It was great to see familiar people again, and to meet some people I only knew from email and articles.
I thought I'd share here a few of my MVPA-related impressions, in no particular order. These are of course personal and no claim to be representative; please share if you found something different or think I missed a trend.
I thought I'd share here a few of my MVPA-related impressions, in no particular order. These are of course personal and no claim to be representative; please share if you found something different or think I missed a trend.
- RSA and searchlight analysis are quite popular. I can't remember seeing an analysis using only ROI-based MVPA. I saw several analyses combining searchlight and RSA (e.g. searching the brain for spheres with a particular RSA pattern).
- Linear classifiers (mostly svms) and Pearson correlation are very widely used. I saw a few nonlinear svms, but not many. Some posters included diagrams illustrating a linear svm, while others simply mentioned using MVPA, with the use of a linear svm being implicit.
- Feature selection ("voxel picking") is a major concern. Multiple people mentioned having no real idea which methods should be considered, much less knowing a principled, a priori way to choose a method for any particular analysis. This concern probably feeds into the popularity of searchlight methods.
- I saw quite a few efforts to relate (e.g. correlate) classification results with behavioral results and/or subject characteristics.
- Multiple studies did feature selection by choosing the X most active voxels (as determined by a univariate test on the BOLD), within the whole brain or particular regions.
Monday, October 8, 2012
me and MVPA at SfN
I'll be at SfN this weekend/next week. Contact me if you'd like to chat about anything fMRI analysis related. I also have a poster: 390.06, Monday morning; stop on by.
Some of us using MVPA are having an informal gathering Tuesday (16 October), 6:30 pm. Contact me for the details if you'd like to join.
By the way, anyone know what this year's SfN logo is supposed to be? I find it oddly disturbing.
Some of us using MVPA are having an informal gathering Tuesday (16 October), 6:30 pm. Contact me for the details if you'd like to join.
By the way, anyone know what this year's SfN logo is supposed to be? I find it oddly disturbing.
Monday, October 1, 2012
searchlight shapes: BrainVoyager
Rainer Goebel kindly provided a description and images of the searchlight creation used in BrainVoyager:
"BrainVoyager uses the "sphere" approach (as in our original PNAS paper Kriegeskorte, Goebel, Bandettini 2006), i.e. voxels are considered in a cube neighborhood defined by the radius and in this neighborhood only those voxels are included in the "sphere" that have an Euclidean distance from the center of less than (or equal to) the radius of the sphere. From your blog, I think the resulting shapes in BrainVoyager are the same as for pyMVPA.
Note, however, that in BrainVoyager the radius is a float value (not an integer) and this allows to create "spheres" where the center layer has a single element on each side at cardinal axes (e.g. with radius 1, 2, 3, 4... voxels, see snapshot below) but also "compact" spheres as you seem to have used by setting the radius, e.g. to 1.6, 1.8, 2.6, 2.8, 3.6...). "
At right is an image Rainer generated showing radius 1.0 and 2.0 searchlights created in BrainVoyager.
I am intrigued by Rainer's comment that using non-integer radii will make more "compact" spheres; non-interger radii underscores the need to be explicit in describing searchlight shape in methods sections. It appears that pyMVPA requires integer radii, but the Princeton MVPA Toolbox does not.
"BrainVoyager uses the "sphere" approach (as in our original PNAS paper Kriegeskorte, Goebel, Bandettini 2006), i.e. voxels are considered in a cube neighborhood defined by the radius and in this neighborhood only those voxels are included in the "sphere" that have an Euclidean distance from the center of less than (or equal to) the radius of the sphere. From your blog, I think the resulting shapes in BrainVoyager are the same as for pyMVPA.
Note, however, that in BrainVoyager the radius is a float value (not an integer) and this allows to create "spheres" where the center layer has a single element on each side at cardinal axes (e.g. with radius 1, 2, 3, 4... voxels, see snapshot below) but also "compact" spheres as you seem to have used by setting the radius, e.g. to 1.6, 1.8, 2.6, 2.8, 3.6...). "
At right is an image Rainer generated showing radius 1.0 and 2.0 searchlights created in BrainVoyager.
I am intrigued by Rainer's comment that using non-integer radii will make more "compact" spheres; non-interger radii underscores the need to be explicit in describing searchlight shape in methods sections. It appears that pyMVPA requires integer radii, but the Princeton MVPA Toolbox does not.
Subscribe to:
Posts (Atom)