Monday, May 14, 2018

mean pattern subtraction confusion

A grad student brought several papers warning against subtracting the mean pattern (and other types of normalization) before correlational analyses (including, but not only, RSA) to my attention. I was puzzled: Pearson correlation ignores additive and  multiplicative transformations, so how could it be affected by subtracting the mean? Reading closely, my confusion was from what exactly was meant by "subtracting the mean pattern"; it is still the case that not all forms of "normalization" before correlational MVPA are problematic.

The key is where and how the mean-subtraction and/or normalization is done. Using the row- and column-scaling terminology from Pereira, Mitchell, and Botvinick (2009) (datasets arranged with voxels in columns, examples (trials, frames, whatever) in rows): the mean pattern subtraction warned against in Walther et al. (2016) and Garrido et al. (2013) is a particular form of column-scaling, not row-scaling. (A different presentation of some of these same ideas is in Hebart and Baker (2018), Figure 5.)

Here's my illustration of two different types of subtracting the mean; code to generate these figures is below the jump. The original pane shows the "activity patterns" for each of five trials in a four-voxel ROI. The appearance of these five trials after each has been individual normalized (row-scaled) is in the "trial normalized" pane, while the appearance after the mean pattern was subtracted is at right (c.f. Figure 1D in Walther et al. (2016) and Figure 1 in Garrido et al. (2013)).

Note that Trial2 is Trial1+15; Trial3 is Trial1*2; Trial4 is the mirror image of Trial1; Trial5 has a shape similar to Trial4 but positioned near Trial1. (Example adapted from Figure 8.1 of Cluster analysis for researchers (1984) by H. Charles Romesburg.)

And here are matrix views of the Pearson correlation and Euclidean distance between each pair of trials in each version of the dataset:

The mean pattern subtraction did not change the patterns' appearance as much as the trial normalization did: the patterns are centered on 0, but the extents are the same (e.g., voxel2 ranges from 0 to 80 in the original dataset, -40 to 40 after mean pattern subtraction). The row (trial) normalized image has a much different appearance: centered on zero, but the patterns for trials 1, 2, and 3 are now identical, and the range is much less (e.g., voxel2 is now a bit less than -1 to a bit more than 1). Accordingly, the trial-normalized similarity matrix is identical to the original when calculated with correlation, but different when calculated with Euclidean distance; the mean pattern subtracted similarity matrix is identical to the original when calculated with Euclidean distance, but different when calculated with Pearson correlation.

This is another case where lack of clear terminology can lead to confusion; "mean pattern subtraction" and "subtracting the mean from each pattern" are quite different procedures and have quite different effects, depending on your distance metric (correlation or Euclidean).