## Tuesday, January 15, 2019

### RSA: how to describe with a single number? - update 2

This post is another entry in the occasional series about RSA matrix quantification; the last one described two common methods: one based on differences (mean subtraction; the "contrast" method) and the other on Kendall's tau-a. Another common method is to use Pearson correlation.

I've thought of correlation as a very different quantification metric than mean subtraction. However, Michael Freund, a graduate student in the CCP lab, pointed out that there are connections between them: if you normalize the RSA matrix in the right way, quantification by mean subtraction is equivalent to correlation. This post has examples to illustrate this equivalence, as well as how the two methods (mean subtraction without scaling and correlation-types) vary in what they measure. Code for the examples and figures is at the end of this post.

Here are the 10 example RSA matrices and the reference matrix used for quantification. In the reference matrix, we expect the grey cells (1, the "target" cells) to have higher correlation than the white (0).  The number in the cells of each of the 10 example RSA matrices (e.g., from different participants) are Pearson correlations. Color scaling ranges from dark blue (1) to dark red (-1), with white for 0.

And here are the quantification scores for each matrix, calculated by each of the four methods. First, notice that the ordering of the quantification scores for the 10 example matrices is the same for difference and correlation quantification after vnorm scaling (diff.vnorm and corr.vnorm), and the same as the no-scaling correlation quantification (corr). Calculating the differences without scaling (diff, black) gives a different ordering. This demonstrates the property that Michael Freund pointed out: the distinction between the difference and correlation quantification isn't the metric but whether the RSA matrix is vector-normalized before quantification (see code line 127).

So the quantification scores (and associated interpretation - which example matrix is "best"?) vary between diff and corr (along with diff.vnorm and corr.vnorm, but I'll just use corr as shorthand); but where do the differences come from?

The two methods agree that example 4 is worst (the lowest quantification score of these ten). This strikes me as reasonable: neither of the two "target" cells (with 1 in the reference matrix) are the cells with the highest correlation - example 4 doesn't match the reference at all.

More interesting are examples 1 and 9. diff considers 9 the best by far, with 1 tied for the almost-worst, while corr considers 9 the fourth-best and 1 similar at fifth-best. Looking at the example matrices, in both 9 and 1 the two target cells have higher correlation than all the other cells, but the range of values is much larger in 9 (the not-target cells have negative correlation) than 1 (where all cells have correlation between 0.705 and 0.873). This variance difference contributes strongly to the diff method (so the two matrices have very different quantification scores), but is "undone" by the vector normalization, so corr gives 1 and 9 similar quantification scores. Examples 2 and 7 also illustrate this property.

I'll also point out examples 1 and 2, which are given the same quantification score by the diff method but 2 is better than 1 with corr. Why? 1 and 2 are identical except for the two target cells, which have  different values in 1 but the same value in example 2 - the average (via Fisher's r-to-z). 1 and 2 are identical with the diff quantification because the same number results in when the target cells are averaged. Example 2 is much better than 1 with corr, however, because having the same number in the target cells is a better match to the reference matrix, in which the same number (1) is also in the target cells.

So, which to use? If you want the size of the correlations to matter (9 better than 1), you should use diff (i.e., difference method without scaling). If you want the best quantification scores to be when all of the target cells have the same correlation (2 better than 1), you should use corr (or either of the methods after scaling). But if you just want higher correlation in the target cells, without needing equality, you should use diff.

code below the fold

This is R (knitr) code for the example and figures.

 \documentclass{article}
\begin{document}

<<startup, echo=FALSE, message=FALSE, warning=FALSE>>=

# code written by Joset A. Etzel (jetzel@wustl.edu) and may be adapted, provided this source is cited.
# first posted on mvpa.blogspot.com 15 January 2019
# Cognitive Control & Psychopathology Lab, Psychological & Brain Sciences, Washington University in St. Louis (USA)

library(scales); # for the colors (brewer_pal)
library(xtable);  # for table formatting

rm(list=ls());

get.FTrz <- function(in.val) { return(.5 * log((1+in.val)/(1-in.val))); }  # do Fisher's r-to-z transformation
get.FTzr <- function(in.val) { return((exp(2*in.val)-1)/(exp(2*in.val)+1)); } # do Fisher's z-to-r transformation

# 10 examples (e.g., individual RSA matrices), as rows in a table for ease of looping.
# Each row of xs is the lower triangle of an RSA matrix for an individual; the numbers in each cell are Pearson correlations.
xs <- rbind(c(0.728, 0.777, 0.705, 0.709, 0.873, 0.705), c(0.728, 0.831, 0.705, 0.709, 0.831, 0.705),
c(-0.728, 0.7, 0.5, -0.709, 0.8, 0.5), c(-0.524, 0.144, 0.487, 0.111, -0.024, 0.824),
c(-0.571, -0.05, -0.269, -0.698, -0.1, -0.48), c(0.401, 0.306, 0.553, 0.117, 0.992, -0.446),
c(0.2, 0.9, 0.1, 0.01, 0.83, 0.22), c(-0.104, 0.592, -0.07, 0.07, 0.773, 0.151),
c(-0.39, 0.63, -0.794, -0.22, 0.55, -0.824), c(0.158, 0.935, 0.234, 0.94, 0.529, 0.245));

# calculating the average of the "good" cells for example 2:
# get.FTzr(mean(c(get.FTrz(0.777), get.FTrz(0.873))));  # [1] 0.8310914

y <- c(0, 1, 0, 0, 1, 0);  # reference matrix (lower triangle, same order as the xs)
# w <- c(-0.25, 0.5, -0.25, -0.25, 0.5, -0.25);  # weight version of reference matrix y

# plotting parameters
centers <- seq(from=0, to=1, length.out=4); # midpoints of the boxes
cond.ids <- c("a", "b", "c", "d");  # trial labels
clr.scale <- brewer_pal("div", palette="RdBu")(11);
pal <- colorRampPalette(clr.scale);

@

\noindent \texttt{RSAmatrixQuantification.rnw} \par
\noindent code written by Joset A. Etzel (\texttt{jetzel@wustl.edu}) on 15 January 2019 and released on \texttt{mvpa.blogspot.com}. \par
\vspace{0.2 cm}
\noindent Comparing how 10 different RSA matrices are quantified with correlation (\texttt{.corr}) or differences (\texttt{.diff}), with or without scaling (\texttt{.vnorm}) The same Reference Matrix is used in all cases, as shown below; we expect the grey cells (\texttt{1}) to have higher correlation than the white (\texttt{0}). \par

<<code1, echo=FALSE, dev='pdf', fig.height=1.5, fig.width=7, fig.align="center">>=
layout(matrix(1:5, c(1,5)));
par(mar=c(1.5, 1.25, 1.75, 0.75), mgp=c(1.1, 0.2, 0), tcl=-0.3);
# mar: c(bottom, left, top, right) gives the number of lines of margin on the four sides of the plot. Default is c(5, 4, 4, 2) + 0.1.

plt.matrix <- matrix(0.25, nrow=4, ncol=4);
diag(plt.matrix) <- 1;  # identity
plt.matrix[3,1] <- 0.5; plt.matrix[1,3] <- 0.5;
plt.matrix[4,2] <- 0.5; plt.matrix[2,4] <- 0.5;
plt.matrix <- plt.matrix[,4:1]; # so plot has the usual diagonal direction
image(plt.matrix, col=c("white", "grey", "black"), breaks=c(0,0.3,0.75,1.2), main="", xlab="", ylab="", axes=FALSE, useRaster=TRUE);
mtext(side=3, text="Reference matrix", line=0.2, cex=0.7);
axis(side=1, at=centers, labels=cond.ids, cex.axis=0.9, las=1, lwd.ticks=0)
axis(side=2, at=centers, labels=rev(cond.ids), cex.axis=0.9, las=1, lwd.ticks=0)
# centers[1],[1] is 0,0: lower left corner
text(x=centers[1], y=centers[1], labels="0", cex=1); text(x=centers[2], y=centers[2], labels="0", cex=1);
text(x=centers[1], y=centers[3], labels="0", cex=1); text(x=centers[3], y=centers[1], labels="0", cex=1);
text(x=centers[1], y=centers[2], labels="1", cex=1); text(x=centers[2], y=centers[1], labels="1", cex=1);
box();

plot(x=0, y=0, col='white', ylab="", xlab="", main="", bty='n', xaxt='n', yaxt='n'); # blank plot to hold legend
legend('left', fill=pal(9), legend=seq(from=-1, to=1, length.out=9), bty='n', cex=0.9);

@

\vspace{0.2 cm}
\noindent The 10 example RSA matrices (e.g., from different participants). Numbers in cells are Pearson correlations. Color scaling ranges from dark blue (1) to dark red (-1), with white for 0, as shown above. \par
\vspace{0.1 cm}
<<code2, echo=FALSE, dev='pdf', fig.height=1.5, fig.width=7, fig.align="center">>=
layout(matrix(1:5, c(1,5)));
par(mar=c(1.5, 1.25, 1.75, 0.75), mgp=c(1.1, 0.2, 0), tcl=-0.3);
# mar: c(bottom, left, top, right) gives the number of lines of margin to be specified on the four sides of the plot. Default is c(5, 4, 4, 2) + 0.1.

for (i in 1:10) {  #i <- 1;
tmp <- matrix(1, 4, 4);
tmp[lower.tri(tmp)] <- xs[i,];
tmp[upper.tri(tmp)] <- xs[i,];
# plt.matrix <- matrix(tmp, nrow=4, ncol=4);
plt.matrix <- tmp[,4:1]; # so plot has the usual diagonal direction
image(plt.matrix, col=pal(40), zlim=c(-1,1), main="", xlab="", ylab="", axes=FALSE, useRaster=TRUE);
mtext(side=3, text=paste("example", i), line=0.1, cex=0.6);
axis(side=1, at=centers, labels=cond.ids, cex.axis=0.9, las=1, lwd.ticks=0)
axis(side=2, at=centers, labels=rev(cond.ids), cex.axis=0.9, las=1, lwd.ticks=0)
for (i in 4:1) {
for (j in 1:4) {  # i <- 4; j <- 1;
text(x=centers[i], y=centers[j], labels=round(plt.matrix[i,j],3), cex=0.8)
}
}
box();
}

@

\vspace{0.2 cm}
\noindent Quantification scores for each example matrix, calculated each of the four ways. \par
\vspace{0.1 cm}
<<code3, echo=FALSE, dev='pdf', fig.height=1.25, fig.width=8, fig.align='center'>>=
#layout(matrix(1:2, c(1,2)));
par(mar=c(2, 1, 0.1, 0.75), mgp=c(1.1, 0.2, 0), tcl=-0.3)
# mar: c(bottom, left, top, right) gives the number of lines of margin on the four sides of the plot. Default is c(5, 4, 4, 2) + 0.1.

# calculate all the quantification scores
quants.diff <- rep(NA, nrow(xs));  # blank vectors to hold the quantification scores
quants.corr <- rep(NA, nrow(xs));
quants.diff.vnorm <- rep(NA, nrow(xs));
quants.corr.vnorm <- rep(NA, nrow(xs));
for (i in 1:nrow(xs)) {  # i <- 1;
z <- get.FTrz(xs[i,]);  # Fisher's r-to-z of the row

# calculate the quantification scores, without scaling
quants.diff[i] <- mean(z[which(y == 1)]) - mean(z[which(y == 0)]);  # difference of means; (z %*% w) the same (w in startup code block; weight version of y)
quants.corr[i] <- cor(z, y);   # (z %*% w) / sqrt(sum((z - mean(z))^2)*sum((w - mean(w))^2)) the same

# do the vnorm scaling, adapted from Mike's code
tmp.vec <- z - mean(z);  # center the row (RSA matrix)
z.vnorm <- tmp.vec / sqrt(sum(tmp.vec^2));  # vector normalize

# calculate the quantification scores on the scaled RSA matrix
quants.diff.vnorm[i] <- mean(z.vnorm[which(y == 1)]) - mean(z.vnorm[which(y == 0)]);  # difference of means
quants.corr.vnorm[i] <- cor(z.vnorm, y);   # (z %*% w) / sqrt(sum((z - mean(z))^2)*sum((w - mean(w))^2)) the same
}

# plot the quantification scores
plot(x=quants.diff, y=rep(0,10), col='white', ylab="", xlim=c(-0.3, 1.4), ylim=c(-0.4,0.5), yaxt='n', xlab='quantification score', cex.lab=0.7, cex.axis=0.7)
text(quants.diff, rep(0.15,10), labels=1:10, cex=0.8);
text(quants.diff.vnorm, rep(0,10), labels=1:10, cex=0.8, col='blue');
text(quants.corr, rep(-0.15,10), labels=1:10, cex=0.8, col='green');
text(quants.corr.vnorm, rep(-0.3,10), labels=1:10, cex=0.8, col='red');

legend('top', fill=c("black", "blue", "green", "red"), legend=c("diff", "diff.vnorm", "corr", "corr.vnorm"), horiz=TRUE, bty='n', cex=0.7)

@

\vspace{0.2 cm}
\noindent The ordering of the 10 example matrices is the same for difference and correlation quantification after vnorm scaling (\texttt{diff.vnorm} and \texttt{corr.vnorm}), and the same as the no-scaling correlation quantification (\texttt{corr}). Calculating the differences without scaling (\texttt{diff}, black) gives a different ordering. \par

\end{document}