Linear classifiers require (at least some) individual voxels to have a bias; a difference in BOLD over the conditions. The fourth example from Kragel, Carter, & Huettel (Figure 2, clipped at right) shows a case in which linear classifiers fail (but nonlinear succeed): in class A the informative voxels take value x in half of the examples and -x in the other half of the examples (red and black squares), and 0 in all class B examples. The average value for the voxel across all examples is thus 0, there is no overall bias, and linear classifiers perform at chance.

Here is another example of a case in which linear classifiers fail, showing the problem in a different way: you can't draw a line to separate the blue and pink points in the 2D graph. A nonlinear classifier could distinguish the blue and pink cases, using a rule as simple as

*if voxel 1 is equal to voxel 2, then the blue class, otherwise, the pink class*.

So I fully agree that linear classifiers only detect linearly-seperable patterns; and suggest that a clearer way to think about what is being found by a linear classifier is a

*pooling of biases over many voxels*(or how to combine weak signals), rather than a "pattern".

But I do not agree that all this means that we need to use both linear and nonlinear classifiers.Setting aside the methodological difficulties (including how the model comparisons should be done), would we

*want*to detect nonlinear patterns in task-related BOLD data? And would such a pattern occur? Linear classifiers fail in cases like those shown here (i.e. XOR-type) when the examples are exactly balanced, which I suspect does not happen in noisy fMRI data: even a small bias (such as having most of the examples x and fewer -x in the interactive example) will lead to some detection.

Kragel, P., Carter, R., & Huettel, S. (2012). What Makes a Pattern? Matching Decoding Methods to Data in Multivariate Pattern Analysis Frontiers in Neuroscience, 6 DOI: 10.3389/fnins.2012.00162

## No comments:

## Post a Comment