MVPA Meanderings: more on the LD-t

In a previous post I wrote a bit about A Toolbox for Representational Similarity Analysis, and my efforts at figuring out the LD-t statistic. Nikolaus Kriegeskorte kindly pointed me to some additional information, and cleared up some of my confusion. Note that I'll be using the term "LD-t" in this post for consistency (and since it's short); it's a "cross-validated, normalized variation on the Mahalanobis distance", as phrased in Nili (2014).

First, the LD-t has been described and used previously (before Nili et al 2014), though (as far as I can tell) not in the context of representational similarity analysis (RSA). It is summarized in this figure (Figure S8a from Kriegeskorte et al. (2007 PNAS)); the paper's supplemental text has additional explanation ("Significance testing of ROI response-pattern differences" section). A bit more background is also on pages 69-71 of Niko's thesis.

To give a high-level picture, calculating the LD-t involves deriving a t-value from Fisher linear discriminant test set output. The linear discriminant is fit (weights calculated from the training data) with the standard algorithms, but using an error covariance matrix calculated from the residuals of fitting a GLM to the entire training dataset (time-by-voxel data matrix), which is then run through the Ledoit-Wolf Covariance Shrinkage Estimator function.

This isn't a procedure that can be dropped into an arbitrary MVPA workflow. For example, for my little classification demo I provided a single-subject 4D dataset, in which each of the 20 volumes is the temporally-compressed version (averaging, in this case) of an experimental block; the first ten class A, the second ten class B. The demo then uses cross-validation and an SVM to get an accuracy for distinguishing class A and B. It is trivial to replace SVM in that demo with LDA, but that will not produce the LD-t. One reason is that the LD-t procedure requires splitting the dataset into two parts (a single training and testing set), not arbitrary cross-validation schemes, but there are additional differences.

Here's a description of the procedure; many thanks to Carolina Ramirez for guiding me through the MATLAB! We're assuming a task-based fMRI dataset for a single person.

Perform "standard" fMRI preprocessing: motion correction, perhaps also slice-timing correction or spatial normalization. The values resulting from this could be output as a 4d data matrix of the same length as the raw data: one (preprocessed) image for each collected TR. We can also think of this as a (very large) time-by-voxel data matrix, where time is the TR.
Create design matrices (X) for each half of the dataset (A and B, using the terms from the Toolbox file fisherDiscrTRDM.m). These are as usual for fMRI mass-univariate analysis: the columns generally includes predictors for motion and linear trends as well as the experimental conditions, which have been convolved with a hemodynamic response function.
Create data matrices (Y) for each half of the dataset (training (A) and testing (B)), structured as usual for fMRI mass-univariate analysis, except including just the voxels making up the current ROI.
Now, we can do the LD-t calculations; this is function fishAtestB_optShrinkageCov_C, also from the Toolbox file fisherDiscrTRDM.m. First, fit the linear model to dataset A (training data):

eBa=inv(Xa'*Xa)*Xa'*Ya; % calculate betas
eEa=Ya-Xa*eBa; % calculate error (residuals) matrix

Use the training set residuals matrix (eEa) to estimate the error covariance, and apply the Ledoit-Wolf Covariance Shrinkage Estimator function to the covariance matrix.This is Toolbox file covdiag.m, and returns the the shrinkage estimate of the covariance matrix, Sa.
Use the inverse of that covariance matrix (invSa) and the training-set PEIs (eBa) to calculate the Fisher linear discriminant (C is a contrast matrix, made up of -1, 0, and 1 entries, corresponding to the design matrix). Fitting the discriminant function produces a set of weights, one for each voxel.

was=C'*eBa*invSa; % calculate linear discriminant weights

Now, project the test dataset (Yb) onto this discriminant: yb_was=Yb*was';
The final step is to calculate a t-value describing how well the discriminant separated the test set. t-values are a number divided by its standard error, which is the purpose of the final lines of the fishAtestB_optShrinkageCov_C function. The values are adjusted before the t-value calculation; it looks like this is intended to compensate for differing array dimensions (degrees of freedom). I don't fully understand each line of code, but here they are, for completeness:

invXTXb=inv(Xb'*Xb);
ebb_was=invXTXb*(Xb'*yb_was);
eeb_was=yb_was-Xb*ebb_was;
nDFb=size(yb_was,1)-size(Xb,2);
esb_was=diag(eeb_was'*eeb_was)/nDFb;
C_new=C(1:min([size(ebb_was,1),size(C,1)]),:);
ctb_was2=diag(C_new'*ebb_was);
se_ctb_was2=sqrt(esb_was.*diag(C_new'*invXTXb*C_new));
ts=ctb_was2./se_ctb_was2;

Once the t-value is calculated it can be converted to a p-value if desired, using the standard t-distribution.

This procedure is for a single subject. For group analysis, in Kriegeskorte et al. (2007) they did a group analysis by "concatenating the discriminant time courses of all subjects and fitting a composite design matrix with separate predictors for each subject." Nili et al. (2014, supplemental) suggests a different approach: averaging the LD-t RDMs cell-wise across people, then dividing by the square root of the number of subjects.

The statistic produced by these calculations is related to the cross-validated MANOVA described by Carsten Allefeld; I hope to compare them thoroughly in the future.

MVPA Meanderings

Thursday, August 21, 2014

more on the LD-t

3 comments: