Friday, July 6, 2018

RSA: how to describe with a single number? - update

Some years ago I posted about quantifying RSA matrices with a single number. Since then, it seems that the field has settled on two methods for quantifying RSA matrices: one based on differences and the other based on Kendall's tau-a.

To illustrate the methods, suppose this is the RSA matrix for one person, and the corresponding "reference matrix"; the numbers in the cells are Pearson correlations. In this case, we expect the two cells with the same letter (2,F and 0,F; 2,P and 0,P) to have higher correlations than all the other cells (which mix letters).
The matrix is symmetrical, with 1s along the diagonal, so we'll only use the numbers from the lower triangle. We can "unwrap" these values into vectors, like this:
 x <- c(.728, .777, .705, .709, .873, .705);   # one person
 y <- c(   0,    1,    0,    0,    1,    0);   # reference

I previously called the difference-based method "averaging the triangles"; conceptually, we take the difference between the average of the two cells we expect to have a large value (cells with 1s in the Reference), and the average of the four cells we expect to have smaller values (0 in the Reference). This can also be thought of as using a contrast matrix, such as in Oedekoven, et. al (2017). Since you can't do math on correlations without going through the Fisher R-to-Z transform, here's an example of the calculation:
 get.FTrz <- function(in.val) { return(.5 * log((1+in.val)/(1-in.val))); }  # Fisher's r-to-z transformation  
 get.FTzr <- function(in.val) { return((exp(2*in.val)-1)/(exp(2*in.val)+1)); } # Fisher's z-to-r transformation  
 mean(get.FTrz(x[which(y == 1)])) - mean(get.FTrz(x[which(y == 0)]));
 # [1] 0.3006614  

(It's of course equivalent to calculate this by multiplying the sum of the y == 1 cells by 0.5 and y == 0 cells by 0.25.)

The Kendall's tau-a based method is recommended in Nili, et. al (2014). There's a function to calculate this in the DescTools R package; since it's rank-based, there's no need to do the Fisher transformation:
 # [1] 0.5333333  

Which method should you use? One consideration is if you want the magnitude of the numbers in the RSA matrix cells to matter, or just their relative sizes (ranks). Here, I made a RSA matrix with the same rank ordering as before (in x), but the difference between same-letter and different-letter cells are more extreme:
 x1 <- c(0.2, 0.9, 0.1, 0.01, 0.83, 0.22);  
 # [1] 0.5333333  # same  
 mean(get.FTrz(x1[which(y == 1)])) - mean(get.FTrz(x1[which(y == 0)]));
 # [1] 1.195997 # bigger  

The Kendall's tau-a value for x1 and y is exactly the same as x and y, but the mean difference is larger. Is it better to be sensitive to magnitude? Not necessarily. In practice, when I've calculated both methods on the same dataset I've found extremely similar results, which seems proper; it's hard to imagine a strong effect that would be only be detected by Kendall's tau-a.

UPDATE 12 December 2018: I removed transforming the difference scores back to correlations, leaving the differences as Fisher-transformed zs, since this seems easier for later interpretation and calculations.