In this post I'm going to describe the statistic proposed in the paper, leaving the discussion of when (sorts of hypotheses, dataset structures) a MANOVA-type statistic might be most suitable for a (possible) later post. There's quite a bit more in the paper (and to the method) than what's summarized here!
MANOVA-related statistics have been used/proposed for searchlight analysis before, including Kriegeskorte's original paper and implementations in BrainVoyager and pyMVPA. From what I can tell (please let me know otherwise), this previous MANOVA-searchlights fit the MANOVA on the entire dataset at once: all examples/timepoints, no cross-validation. Allefeld and Haynes propose doing MANOVA-type searchlights a bit differently: “cross-validated MANOVA” and “standardized pattern distinctness”.
Most of the paper's equations review multivariate statistics and the MANOVA; the “cross-validated MANOVA” and “standardized pattern distinctness” proposed in the paper are in equations 14 to 17:
- Equation 14 is the equation for the Hotelling-Lawley Trace statistic, which Allefeld refers to as D ("pattern distinctness").
- Equation 15 shows how Allefeld and Haynes propose to calculate the statistic in a "cross-validated" way. Partitioning on the runs, they obtain a D for each partition by calculating the residual sum-of-squares matrix (E) and the first part of the H equation from the not-left-out-runs, but the second part of the H equation from the left-out run.
- Equation 16 averages the D from each "cross-validation" fold, then multiplies the average by a correction factor calculated from the number of runs, voxels, and timepoints.
- Finally, equation 17 is the equation for “standardized pattern distinctness”: dividing the value from equation 16 by the square root of the number of voxels in the searchlight.
The key part from the demo code is below. The dataset a matrix, with the voxels (for a single searchlight) in the columns and the examples (volumes) in the rows. There are two classes, "a" and "b". For simplicity, the "left-out run" is called "test" and the others "train", though this is not training and testing as meant in machine learning. train.key is a vector giving the class labels for each row of the training dataset.
For Hotelling's T2 we first calculate the "pooled" sample covariance matrix in the usual way, but using the training data only:
S123 <- ((length(which(train.key == "a"))-1) * var(train.data[which(train.key == "a"),]) + (length(which(train.key == "b"))-1) * var(train.data[which(train.key == "b"),])) / (length(which(train.key == "a")) + length(which(train.key == "b")) - 2);
To make the key equation more readable we store the total number of "a" and "b" examples:
a.count <- length(which(train.key == "a")) + length(which(test.key == "a"));
b.count <- length(which(train.key == "b")) + length(which(test.key == "b"));
and the across-examples mean vectors:
a.test.mean <- apply(test.data[which(test.key == "a"),], 2, mean);
b.test.mean <- apply(test.data[which(test.key == "b"),], 2, mean) ;
a.train.mean <- apply(train.data[which(train.key == "a"),], 2, mean);
b.train.mean <- apply(train.data[which(train.key == "b"),], 2, mean);
now we can calculate the Hotelling's T2 (D) for this "cross-validation fold" (note that solve(S123) returns the inverse of matrix S123):
((a.count*b.count)/(a.count+b.count)) * (t(a.train.mean-b.train.mean) %*% solve(S123) %*% (a.test.mean-b.test.mean));
The key is that, paralleling equation 15, the covariance matrix is computed from the training data, multiplied on the left by the mean difference vector from the training data, then on the right by the mean difference vector from the testing data.
Should we think of this way of splitting the Hotelling-Lawley Trace calculation as cross-validation? It is certainly similar: a statistic is computed on data subsets, then combined over the subsets. It feels different to me though, partly because the statistic is calculated from the "training" and "testing" sets together, and partly because I'm not used to thinking in terms of covariance matrices. I'd like to explore how the statistic behaves with different cross-validation schemes (e.g. partitioning on participants or groups of runs), and how it compares to non-cross-validated MANOVA. It'd also be interesting to compare the statistic's performance to those that don't model covariance, such as Gaussian Naive Bayes.
Interesting stuff; I hope this post helps you understand the procedure, and to keep us all thinking about the statistics we choose for our analyses.
Allefeld C, & Haynes JD (2014). Searchlight-based multi-voxel pattern analysis of fMRI by cross-validated MANOVA. NeuroImage, 89, 345-57 PMID: 24296330