Friday, May 10, 2013

searchlight analysis interpretation: worst case scenario

I like searchlight analysis. It avoids some curse-of-dimensionality headaches, covers the whole brain, makes pretty pictures (brain blob maps!), and can be easier for people accustomed to mass-univariate analyses.

But if I like searchlight analysis, why write a paper about it with "pitfalls" in the title? Well, because things can go badly wrong. I do not at all want to imply that searchlight analysis should be abandoned! Instead, I think that searchlight analyses need to be interpreted cautiously; some common interpretations do not always hold. Am I just picking nits, or does this actually matter in applications?

Here's an example of how searchlight analysis interpretation often is written up in application papers.

Suppose that the image at left is a slice of a group-level searchlight analysis results map for some task, showing the voxels surviving significance testing. These voxels form two large clusters, one posterior in region Y and the other a bit more lateral in region X (I'll just call them X and Y because I'm not great at anatomy and it really doesn't matter for the example - this isn't actually even searchlight data). We write up a paper, describing how we found significant clusters in the left X and Y which could correctly classify our task. Our interpretation is focused on possible contributions of areas X and Y to our task, drawing parallels to other studies talking about X and Y.

This is the sort of interpretation that I think is not supported by the searchlight analysis alone, and should not be made without additional lines of evidence.

Why? Because, while the X and Y regions could be informative for our task, the searchlight analysis itself does not demonstrate that these are more informative than other regions, or even that the voxel clusters themselves are informative (as implied in the interpretation). It is possible that the voxel clusters found significant in the searchlight analysis may not actually be informative outside the context of the particular searchlight analysis we ran (i.e. dependent upon our choice of classifier, distance metric, and group-level statistics).

The problem is the way my hypothetical interpretation shifts from the searchlight analysis itself to regions and clusters: The analysis found voxels which are at the center of significant searchlights, but the interpretation is about the voxels and regions, not the searchlights. Unfortunately, it is not guaranteed that if information is significantly present at one spatial scale (the searchlights) it will be present at smaller ones (the voxels) or larger (the regions).

Back to the hypothetical paper, it is possible that classifying the task using only the voxels making up cluster X could fail (i.e. instead of a searchlight analysis we make a ROI from a significant cluster and classify the task with just those voxels). This is one of my worst-case scenarios: the interpretation that regions X and/or Y have task information is wrong.

But that's not the only direction in which the interpretation could be wrong: X and Y could be informative, but a whole lot of other regions could also be informative, even more informative than X and Y. This is again a problem of shifting the scale of the interpretation away from the searchlights themselves: our searchlight analysis did not show that X and Y are the most significant clusters. One way of picturing this is if we did another searchlight analysis, this time using searchlights with the same number of voxels as Y: we could end up with a very different map (the center of Y will be informative, but many other voxels could also be informative, perhaps more than Y itself).

These are complex claims, and include none of the supporting details and demonstrations found in the paper. My goal here is rather to highlight the sort of searchlight analysis interpretations that the paper describe; the sorts of interpretations that are potentially problematic. But, note the "potentially problematic", not "impossible"! The paper (and future posts) describe follow-up tests that can support interpretations like in my scenario, ways to show we're not in one of the worst cases.

No comments:

Post a Comment