To illustrate the methods, suppose this is the RSA matrix for one person, and the corresponding "reference matrix"; the numbers in the cells are Pearson correlations. In this case, we expect the two cells with the same letter (2,F and 0,F; 2,P and 0,P) to have higher correlations than all the other cells (which mix letters).
The matrix is symmetrical, with 1s along the diagonal, so we'll only use the numbers from the lower triangle. We can "unwrap" these values into vectors, like this:
x <- c(.728, .777, .705, .709, .873, .705); # one person
y <- c( 0, 1, 0, 0, 1, 0); # reference
I previously called the difference-based method "averaging the triangles"; conceptually, we take the difference between the average of the two cells we expect to have a large value (cells with 1s in the Reference), and the average of the four cells we expect to have smaller values (0 in the Reference). This can also be thought of as using a contrast matrix, such as in Oedekoven, et. al (2017). Since you can't do math on correlations without going through the Fisher R-to-Z transform, here's an example of the calculation:
get.FTrz <- function(in.val) { return(.5 * log((1+in.val)/(1-in.val))); } # Fisher's r-to-z transformation
get.FTzr <- function(in.val) { return((exp(2*in.val)-1)/(exp(2*in.val)+1)); } # Fisher's z-to-r transformation
mean(get.FTrz(x[which(y == 1)])) - mean(get.FTrz(x[which(y == 0)]));
# [1] 0.3006614
(It's of course equivalent to calculate this by multiplying the sum of the y == 1 cells by 0.5 and y == 0 cells by 0.25.)
The Kendall's tau-a based method is recommended in Nili, et. al (2014). There's a function to calculate this in the DescTools R package; since it's rank-based, there's no need to do the Fisher transformation:
KendallTauA(x,y)
# [1] 0.5333333
Which method should you use? One consideration is if you want the magnitude of the numbers in the RSA matrix cells to matter, or just their relative sizes (ranks). Here, I made a RSA matrix with the same rank ordering as before (in x), but the difference between same-letter and different-letter cells are more extreme:
x1 <- c(0.2, 0.9, 0.1, 0.01, 0.83, 0.22);
KendallTauA(x1,y)
# [1] 0.5333333 # same
mean(get.FTrz(x1[which(y == 1)])) - mean(get.FTrz(x1[which(y == 0)]));
# [1] 1.195997 # bigger
The Kendall's tau-a value for x1 and y is exactly the same as x and y, but the mean difference is larger. Is it better to be sensitive to magnitude? Not necessarily. In practice, when I've calculated both methods on the same dataset I've found extremely similar results, which seems proper; it's hard to imagine a strong effect that would be only be detected by Kendall's tau-a.
UPDATE 12 December 2018: I removed transforming the difference scores back to correlations, leaving the differences as Fisher-transformed zs, since this seems easier for later interpretation and calculations.
No comments:
Post a Comment