Monday, May 14, 2018

mean pattern subtraction confusion

A grad student brought several papers warning against subtracting the mean pattern (and other types of normalization) before correlational analyses (including, but not only, RSA) to my attention. I was puzzled: Pearson correlation ignores additive and  multiplicative transformations, so how could it be affected by subtracting the mean? Reading closely, my confusion was from what exactly was meant by "subtracting the mean pattern"; it is still the case that not all forms of "normalization" before correlational MVPA are problematic.


The key is where and how the mean-subtraction and/or normalization is done. Using the row- and column-scaling terminology from Pereira, Mitchell, and Botvinick (2009) (datasets arranged with voxels in columns, examples (trials, frames, whatever) in rows): the mean pattern subtraction warned against in Walther et al. (2016) and Garrido et al. (2013) is a particular form of column-scaling, not row-scaling. (A different presentation of some of these same ideas is in Hebart and Baker (2018), Figure 5.)

Here's my illustration of two different types of subtracting the mean; code to generate these figures is below the jump. The original pane shows the "activity patterns" for each of five trials in a four-voxel ROI. The appearance of these five trials after each has been individual normalized (row-scaled) is in the "trial normalized" pane, while the appearance after the mean pattern was subtracted is at right (c.f. Figure 1D in Walther et al. (2016) and Figure 1 in Garrido et al. (2013)).

Note that Trial2 is Trial1+15; Trial3 is Trial1*2; Trial4 is the mirror image of Trial1; Trial5 has a shape similar to Trial4 but positioned near Trial1. (Example adapted from Figure 8.1 of Cluster analysis for researchers (1984) by H. Charles Romesburg.)


And here are matrix views of the Pearson correlation and Euclidean distance between each pair of trials in each version of the dataset:

The mean pattern subtraction did not change the patterns' appearance as much as the trial normalization did: the patterns are centered on 0, but the extents are the same (e.g., voxel2 ranges from 0 to 80 in the original dataset, -40 to 40 after mean pattern subtraction). The row (trial) normalized image has a much different appearance: centered on zero, but the patterns for trials 1, 2, and 3 are now identical, and the range is much less (e.g., voxel2 is now a bit less than -1 to a bit more than 1). Accordingly, the trial-normalized similarity matrix is identical to the original when calculated with correlation, but different when calculated with Euclidean distance; the mean pattern subtracted similarity matrix is identical to the original when calculated with Euclidean distance, but different when calculated with Pearson correlation.

This is another case where lack of clear terminology can lead to confusion; "mean pattern subtraction" and "subtracting the mean from each pattern" are quite different procedures and have quite different effects, depending on your distance metric (correlation or Euclidean).

R (knitr) code to reproduce the figures above.

 \documentclass{article}  
  \addtolength{\oddsidemargin}{-1.25in}  
  \addtolength{\evensidemargin}{-1.25in}  
  \addtolength{\textwidth}{2.5in}  
  \addtolength{\topmargin}{-.875in}  
  \addtolength{\textheight}{1.75in}  
 \begin{document}  
   
 \noindent This file was written by Jo Etzel (\texttt{jetzel@wustl.edu}) and published to the blog mvpa meanderings (\texttt{mvpa.blogspot.com}) in May 2018. It may be adapted for personal use, but should be cited rather than redistributed. \par  
 \vspace{0.2 cm}   
 \noindent Examples of pattern similarity adapted from Figure 8.1 of \underline{Cluster Analysis for Researchers} by H. Charles Romesburg (1984). Consider these the activation patterns for five trials, using a ROI with just four voxels. Further assume that reasonable preprocessing was done, followed by some sort of temporal compression (e.g., averaging volumes corresponding to the HRF or fitting a linear model and extracting the PEIs). The five trials' activation is plotted below, with voxels along the x-axis and activation along the y-axis. \par  
 \vspace{0.2 cm}   
 \noindent Note that Trial2 is Trial1+15; Trial3 is Trial1*2; Trial4 is the mirror image of Trial1; Trial5 was has a shape similar to Trial4 but positioned near Trial1. \par  
 \vspace{0.1 cm}   
 <<code1, cache=TRUE, echo=FALSE, dev='pdf', fig.height=2.5, fig.width=8, fig.align='center'>>=  
 layout(matrix(1:3, c(1,3)));  
 par(mar=c(2, 2, 2, 0.75), mgp=c(1.1, 0.2, 0), tcl=-0.3)   
 # mar: c(bottom, left, top, right) gives the number of lines of margin to be specified on the four sides of the plot. Default is c(5, 4, 4, 2) + 0.1.  
   
 library(scales); # for the colors (brewer_pal)  
 rm(list=ls());  
   
 data1 <- c(20,40,25,30); data2 <- c(35,55,40,45); # 1:4 are the Romesburg example dataset for figure 8.1.  
 data3 <- c(40,80,50,60); data4 <- c(20, 0,15,10);  
 data5 <- c(30,20,26,20);   
   
   
 # original  
 plot(x=0,y=0, col='white', xlim=c(0.75,4.25), ylim=c(0,85), xlab="", ylab="BOLD-derived activity", main="", cex.axis=0.9, xaxt='n');  
 mtext(side=3, text="original", line=0.1, cex=0.8);   
 axis(side=1, at=1:4, labels=paste0("voxel", 1:4), cex.axis=0.9);  
 lines(x=c(-1,5), y=rep(0,2));  
 lines(x=1:4, y=data1, col='black'); points(x=1:4, y=data1, col='black', pch=1);  
 lines(x=1:4, y=data2, col='slateblue'); points(x=1:4, y=data2, col='slateblue', pch=2);  
 lines(x=1:4, y=data3, col='brown2'); points(x=1:4, y=data3, col='brown2', pch=3);  
 lines(x=1:4, y=data4, col='chartreuse4'); points(x=1:4, y=data4, col='chartreuse4', pch=4);  
 lines(x=1:4, y=data5, col='darkgrey'); points(x=1:4, y=data5, col='darkgrey', pch=5);  
 legend('topleft', legend=paste0("Trial",1:5), pch=1:5, col=c('black', 'slateblue', 'brown2', 'chartreuse4', 'darkgrey'), lty='solid', bty='n', cex=0.8);  
   
   
 # individually trial normalized  
 plot(x=0,y=0, col='white', xlim=c(0.75,4.25), ylim=c(-2,2), xlab="", ylab="BOLD-derived activity", main="", cex.axis=0.9, xaxt='n');  
 mtext(side=3, text="trial normalized", line=0.1, cex=0.8);   
 axis(side=1, at=1:4, labels=paste0("voxel", 1:4), cex.axis=0.9);  
 lines(x=c(-1,5), y=rep(0,2));  
 lines(x=1:4, y=scale(data1), col='black'); points(x=1:4, y=scale(data1), col='black', pch=1);  
 lines(x=1:4, y=scale(data2), col='slateblue'); points(x=1:4, y=scale(data2), col='slateblue', pch=2);  
 lines(x=1:4, y=scale(data3), col='brown2'); points(x=1:4, y=scale(data3), col='brown2', pch=3);  
 lines(x=1:4, y=scale(data4), col='chartreuse4'); points(x=1:4, y=scale(data4), col='chartreuse4', pch=4);  
 lines(x=1:4, y=scale(data5), col='darkgrey'); points(x=1:4, y=scale(data5), col='darkgrey', pch=5);  
 legend('topright', legend=paste0("Trial",1:5), pch=1:5, col=c('black', 'slateblue', 'brown2', 'chartreuse4', 'darkgrey'), lty='solid', bty='n', cex=0.8);  
   
   
 # mean pattern subtracted: average BOLD at each voxel, subtract from each trial. mean(c(1,3)) and mean(c(3,2))  
 #c(1- mean(c(1,3)), 3-mean(c(3,2)));  
 tmp.tbl <- rbind(data1, data2, data3, data4, data5);  
 vox.means <- apply(tmp.tbl, 2, mean);  
 plot(x=0,y=0, col='white', xlim=c(0.75,4.25), ylim=c(-45,45), xlab="", ylab="BOLD-derived activity", main="", cex.axis=0.9, xaxt='n');  
 mtext(side=3, text="mean pattern subtracted", line=0.1, cex=0.8);   
 axis(side=1, at=1:4, labels=paste0("voxel", 1:4), cex.axis=0.9);  
 lines(x=c(-1,5), y=rep(0,2));  
 lines(x=1:4, y=data1-vox.means, col='black');     points(x=1:4, y=data1-vox.means, col='black', pch=1);  
 lines(x=1:4, y=data2-vox.means, col='slateblue');   points(x=1:4, y=data2-vox.means, col='slateblue', pch=2);  
 lines(x=1:4, y=data3-vox.means, col='brown2');    points(x=1:4, y=data3-vox.means, col='brown2', pch=3);  
 lines(x=1:4, y=data4-vox.means, col='chartreuse4');  points(x=1:4, y=data4-vox.means, col='chartreuse4', pch=4);  
 lines(x=1:4, y=data5-vox.means, col='darkgrey');   points(x=1:4, y=data5-vox.means, col='darkgrey', pch=5);  
 legend('topleft', legend=paste0("Trial",1:5), pch=1:5, col=c('black', 'slateblue', 'brown2', 'chartreuse4', 'darkgrey'), lty='solid', bty='n', cex=0.8);  
   
   
 @  
   
 \vspace{0.3 cm}  
 \noindent \textbf{Pearson correlation} between each pair of trials, shown as a similarity matrix. The matrix for the ``trial normalized'' transformation matches the original, but the ``mean pattern subtracted'' does not. \par  
 <<code2, cache=TRUE, echo=FALSE, dev='pdf', fig.height=2, fig.width=7.75, fig.align='center'>>=  
 layout(matrix(1:4, c(1,4)));  
 par(mar=c(2, 2, 1.75, 0.75), mgp=c(1.1, 0.2, 0), tcl=-0.3);  
 # mar: c(bottom, left, top, right) gives the number of lines of margin to be specified on the four sides of the plot. Default is c(5, 4, 4, 2) + 0.1.  
   
 cond.ids <- paste0("Trial", 1:5);  
 clr.scale <- brewer_pal("div", palette="RdBu")(11);  
 pal <- colorRampPalette(clr.scale);  
 centers <- seq(from=0, to=1, length.out=5); # midpoints of the boxes  
   
 make.RSA.plot <- function(in.tbl, ttl) { # in.tbl <- rbind(data1, data2, data3, data4, data5); ttl <- "original";  
  cor.tbl <- array(NA, c(nrow(in.tbl), nrow(in.tbl)));  
  for (i in 1:nrow(in.tbl)) {  
   for (j in 1:nrow(in.tbl)) { cor.tbl[i,j] <- cor(in.tbl[i,], in.tbl[j,]); }  
  }  
    
  plt.matrix <- matrix(cor.tbl, nrow=5, ncol=5)  
  plt.matrix <- plt.matrix[,5:1]; # so plot has the usual diagonal direction  
  image(plt.matrix, col=pal(40), zlim=c(-1,1), main="", xlab="", ylab="", axes=FALSE, useRaster=TRUE);  
  mtext(side=3, text=ttl, line=0.1, cex=0.8);   
  axis(side=1, at=centers, labels=cond.ids, cex.axis=0.8, las=1, lwd.ticks=0)  
  axis(side=2, at=centers, labels=rev(cond.ids), cex.axis=0.8, las=1, lwd.ticks=0)  
    
  for (i in 1:nrow(cor.tbl)) {  
   for (j in 1:nrow(cor.tbl)) { text(x=centers[i], y=centers[6-j], labels=round(cor.tbl[i,j],2), col='white', cex=0.8); }  
  }  
  box();  
 }  
   
 do.tbl <- rbind(data1, data2, data3, data4, data5)  
 make.RSA.plot(do.tbl, "original");   
   
 do.tbl <- rbind(t(scale(data1)), t(scale(data2)), t(scale(data3)), t(scale(data4)), t(scale(data5)));  
 make.RSA.plot(do.tbl, "trial normalized");   
   
 tmp.tbl <- rbind(data1, data2, data3, data4, data5);  
 vox.means <- apply(tmp.tbl, 2, mean);  
 do.tbl <- rbind(data1-vox.means, data2-vox.means, data3-vox.means, data4-vox.means, data5-vox.means)  
 make.RSA.plot(do.tbl, "mean pattern subtracted");   
   
 # legend in the last spot  
 plot(x=0, y=0, col='white', ylab="", xlab="", main="", bty='n', xaxt='n', yaxt='n')  
 legend('center', fill=rev(pal(11)), legend=rev(seq(from=-1, to=1, length.out=11)), bty='n', cex=0.9, title="Pearson correlation");  
   
 @  
   
   
 \vspace{0.3 cm}  
 \noindent \textbf{Euclidean distance} between each pair of trials, shown as a similarity matrix. The matrix for the ``mean pattern subtracted'' transformation matches the original, but the ``trial normalized'' does not. \par  
 <<code3, cache=TRUE, echo=FALSE, dev='pdf', fig.height=2, fig.width=7.75, fig.align='center'>>=  
 layout(matrix(1:4, c(1,4)));  
 par(mar=c(2, 2, 1.75, 0.75), mgp=c(1.1, 0.2, 0), tcl=-0.3);  
 # mar: c(bottom, left, top, right) gives the number of lines of margin to be specified on the four sides of the plot. Default is c(5, 4, 4, 2) + 0.1.  
   
 cond.ids <- paste0("Trial", 1:5);  
 clr.scale <- brewer_pal("seq", palette="Blues")(9);  
 pal <- colorRampPalette(clr.scale);  
 centers <- seq(from=0, to=1, length.out=5); # midpoints of the boxes  
   
 make.RSA.plot <- function(in.tbl, ttl) { # in.tbl <- rbind(data1, data2, data3, data4, data5); ttl <- "original";  
  cor.tbl <- array(NA, c(nrow(in.tbl), nrow(in.tbl)));  
  for (i in 1:nrow(in.tbl)) {  
   for (j in 1:nrow(in.tbl)) { cor.tbl[i,j] <- dist(rbind(in.tbl[i,], in.tbl[j,])); }  
  }  
    
  plt.matrix <- matrix(cor.tbl, nrow=5, ncol=5)  
  plt.matrix <- plt.matrix[,5:1]; # so plot has the usual diagonal direction  
  image(plt.matrix, col=pal(40), zlim=c(0,105), main="", xlab="", ylab="", axes=FALSE, useRaster=TRUE);  
  mtext(side=3, text=ttl, line=0.1, cex=0.8);   
  axis(side=1, at=centers, labels=cond.ids, cex.axis=0.8, las=1, lwd.ticks=0)  
  axis(side=2, at=centers, labels=rev(cond.ids), cex.axis=0.8, las=1, lwd.ticks=0)  
    
  for (i in 1:nrow(cor.tbl)) {  
   for (j in 1:nrow(cor.tbl)) { text(x=centers[i], y=centers[6-j], labels=round(cor.tbl[i,j],2), col='black', cex=0.8); }  
  }  
  box();  
 }  
   
 do.tbl <- rbind(data1, data2, data3, data4, data5)  
 make.RSA.plot(do.tbl, "original");   
   
 do.tbl <- rbind(t(scale(data1)), t(scale(data2)), t(scale(data3)), t(scale(data4)), t(scale(data5)));  
 make.RSA.plot(do.tbl, "trial normalized");   
   
 tmp.tbl <- rbind(data1, data2, data3, data4, data5);  
 vox.means <- apply(tmp.tbl, 2, mean);  
 do.tbl <- rbind(data1-vox.means, data2-vox.means, data3-vox.means, data4-vox.means, data5-vox.means)  
 make.RSA.plot(do.tbl, "mean pattern subtracted");   
   
 # legend in the last spot  
 plot(x=0, y=0, col='white', ylab="", xlab="", main="", bty='n', xaxt='n', yaxt='n')  
 legend('center', fill=pal(9), legend=seq(from=0, to=105, length.out=9), bty='n', cex=0.9, title="Euclidean distance");  
   
 @  
   
 \end{document}  

No comments:

Post a Comment