I like that the recent article by Kragel, Carter, & Huettel point out what is detected by linear classifiers, but it didn't convince me that linear algorithms should be be abandoned for fMRI MVPA (or MVPC, to use their acronym); or even that we should run non-linear classifiers for comparison to linear results as general practice.
Linear classifiers require (at least some) individual voxels to have a bias; a difference in BOLD over the conditions.
The fourth example from Kragel, Carter, & Huettel (Figure 2, clipped at right) shows a case in which linear classifiers fail (but nonlinear succeed): in class A the informative voxels take value x in half of the examples and -x in the other half of the examples (red and black squares), and 0 in all class B examples. The average value for the voxel across all examples is thus 0, there is no overall bias, and linear classifiers perform at chance.
Here is another example of a case in which linear classifiers fail, showing the problem in a different way: you can't draw a line to separate the blue and pink points in the 2D graph. A nonlinear classifier could distinguish the blue and pink cases, using a rule as simple as if voxel 1 is equal to voxel 2, then the blue class, otherwise, the pink class.
So I fully agree that linear classifiers only detect linearly-seperable patterns; and suggest that a clearer way to think about what is being found by a linear classifier is a pooling of biases over many voxels (or how to combine weak signals), rather than a "pattern".
But I do not agree that all this means that we need to use both linear and nonlinear classifiers.Setting aside the methodological difficulties (including how the model comparisons should be done), would we want to detect nonlinear patterns in task-related BOLD data? And would such a pattern occur? Linear classifiers fail in cases like those shown here (i.e. XOR-type) when the examples are exactly balanced, which I suspect does not happen in noisy fMRI data: even a small bias (such as having most of the examples x and fewer -x in the interactive example) will lead to some detection.
Kragel, P., Carter, R., & Huettel, S. (2012). What Makes a Pattern? Matching Decoding Methods to Data in Multivariate Pattern Analysis Frontiers in Neuroscience, 6 DOI: 10.3389/fnins.2012.00162