MVPA Meanderings: March 2014

Friday, March 28, 2014

connectome workbench: working with volumes

I think of plotting surfaces when I think of the Connectome Workbench, but Workbench can do quite a few nice things with volumes as well. If you haven't already read my previous HCP tutorial posts, read those first, because I'm going to be skipping quite a bit of the introductory info. I'm using the most recent version of the Workbench, 0.84.

I'm very pleased with these screenshots in this post and will describe how I made them. The pictures show both a surface and volume-slices view of the accuracy map resulting from my searchlight demo.

Before firing up the Workbench, you'll need both the volumetric (NIfTI) and surface (*.shape.gii or *.funct.gii) versions of the file you want to plot, plus the corresponding anatomical underlays. In this case, the demo searchlight accuracy map is aligned to the MNI anatomical template, so we'll use the atlas_Conte69_74k_pals anatomy, as before. Thankfully, the atlas download includes both surface and volumetric versions, so that's ready to go. The demo volumetric NIfTI can be downloaded here, and the wb_command -volume-to-surface-mapping program will create the corresponding *shape.gii file, also as before.

Now that we have all the files we need,we can plot them in Workbench. I started a new .spec file based off the Conte69 anatomy so that I could skip loading the files I didn't need (e.g. borders) and save the ones I do need, but that's not essential (see here for explanation).

Once Workbench is open, we need to load in the images, both volume and surface, that we want to overlay, plus the volume anatomy, since that isn't included in the default Conte69_atlas-v2.LR.32k_fs_LR.wb.spec. These four files (searchlightAccuracies_rad2.nii.gz, Conte69_AverageT1w.nii.gz, searchlight_right.shape.gii, and searchlight_left.shape.gii) can all be loaded (as Volume Files and Metric Files) through repeated use of the File -> Open File dialog.

Finally, we can start making nice pictures. I found it convenient to work with just two Workbench tabs, one for the surface and one for the volume version of the dataset. You can close extra tabs by clicking the little red boxes at the top of each, and open new ones with File -> New Tab.

Go to the first tab and click the "All" radio button in the "View" part of the Toolbar. The window will change to show the surface of both hemispheres, but not the searchlight map; change the bottom two METRIC entries in the Overlay ToolBox to searchlight_right.shape.gii and searchlight_left.shape.gii. You can turn the brain around with the mouse to see the overlay (adjust the color scaling via the little wrench icon). Now, switch the top dropdown box to VOLUME searchlightAccuracies_rad2.nii.gz. This will plot the volume data inside the surface, projected onto planes (twist the brain around to see the planes). You can adjust the planes' location in the "Slice Indices/Coords" part of the Toolbar.

Go to the other tab and click the "Volume" radio button in the "View" part of the Toolbar. Set the bottom dropdown box to VOLUME Conte69_AverageT1w.nii.gz (the anatomic template), then the upper dropdown box to VOLUME searchlightAccuracies_ rad2.nii.gz.When both of these layers are checked On the view should look something like the right side part of this image.

Now we have two tabs open, one with a surface, and one with a volume. To view both side-by-side like in the screenshot, click View -> Enter Tile Tabs. There are a lot of neat things you can do in the "Tile Tabs" view; play around with it and check out the tutorial.

Another feature of the Workbench I really like for volumetric data is making "montages": views of many slices at once, like the top screenshot in this post. To make these, click the "M" button in the "Slice Plane" part of the Toolbar. You can then switch the number of slices shown by adjusting the boxes in the "Montage" part of the Toolbar, and where it starts showing slices in the "Slice Indices/Coords" part of the Toolbar. The montage view doesn't have to be axial slices, but any type - just click the buttons in the Slice Plane part of the toolbar.

So, this was an overview of what I thought was pretty nice when using the Workbench with volumetric images. I did find a few things frustrating, particularly the Yoking; I just couldn't make yoking work. ~~Also, volume-only montage view is rather like MRIcron ... but the view doesn't recenter on clicked coordinates; you adjust the position through the "Slice Indices/Coords" part of the Toolbar.~~ It would also be neat to plot horizontal lines on the surface when viewing data like in the top screenshot, where the horizontal lines indicate the slices shown in the volume montage. But, some nice features, and I'll probably be using the Workbench with volumetric data quite a bit more in the future.

UPDATE (7 May 2014): Tim Coalson told me how to get Workbench to recenter on clicked coordinates: depress the Volume ID button (yellow arrow) in the Information box (red arrow button makes it appear). He said they might make this the default in future versions of Workbench, which would help.

Note that if you click the red-arrow i button again (so it is popped-out, rather than depressed) the Information window won't keep appearing every time you click in the volume.

Tuesday, March 25, 2014

NIfTI, CIFTI, GIFTI in the HCP and Workbench: a primer

The HCP is releasing preprocessed data in both volumetric NIfTI and surface/volumetric CIFTI formats. Working with the HCP files, or doing much of anything with the Workbench, requires navigating through a plethora of .*.nii and .*.gii files. In this post I'll explain why we need all these files, and how they relate to each other. Disclaimer: I'm writing this as a primer from the viewpoint of someone familiar with volumetric fMRI data analysis; it is not at all a full description of everything the files can be used for. Also, though I'm referring to the HCP and Workbench, these file formats are used by other projects and software.

For a starting point, consider how we work with volumetric NIfTI files. Neuroimagers often think about volumetric NIfTI files as storing functional data in a 4d matrix (x,y,z, and time). Libraries such as oro.nifti make reading NIfTI files fairly easy: they create a 3d or 4d matrix of voxel values, plus a object with the header information.

While you can get an idea of the anatomy by looking at slices of the 4d functional data matrix, analyses generally rely on having a 3d matrix of anatomical data (binary mask of regions, anatomic scan, etc) perfectly aligned to the 4d functional data. So, the 4d NIfTI file doesn't contain everything we need: we get some alignment information out of the header (qfactor, etc), but also need the registered 3d anatomical data. For a concrete example, I had to provide two files for the little ROI-based analysis demo: the dataset (4d NIfTI with preprocessed BOLD) and the ROI mask (binary 3d NIfTI showing the voxels corresponding to the anatomical region of interest), plus stating that the dataset was normalized to the MNI anatomical atlas (so that we can overlay the data on the correct anatomical template).

Now, on to CIFTI. CIFTI-2 files follow the NIfTI-2 file format specification (CIFTI-2 is a "flavor" of NIfTI-2, so both use the *.nii file extension), and both consist of a data matrix and headers. In the case of the HCP data, the functional timecourses are in the data matrix part of *.dtseries.nii CIFTI files. Like NIfTI volume files, the CIFTI file contains information about where voxels are, though this information is stored in a different place (in the extension containing the CIFTI XML). But, paralleling how you need an anatomic file to figure out exactly where the voxels in a volumetric NIfTI lie, you need other files (not just the CIFTI) to tell you where the surface vertices lie, and how they're connected (the "triangles", etc). Aside: While I wrote "surface vertices" in this paragraph, note that the HCP CIFTIs store both surface vertices (for the cortical sheet) and volumetric voxels (for sub-cortical structures).

These "other files" are not a single file but multiple; as many as necessary. Having all of these files is akin to having multiple ROI files available for an analysis: you won't use each ROI in each analysis, just the ones corresponding to the anatomical area (or whatever) you need for a particular test. The "other files" for the HCP are not just ROIs, but can also be underlying anatomy at different inflation levels, maps of tissue types, etc.

For example, at left is a screenshot showing some of the "other files" provided for each HCP person in the released datasets. These files are from /100307_Q3/MNINonLinear/Native/: the maps are in subject space. Many files with similar names are in /100307_Q3/MNINonLinear/fsaverage_LR32k/: maps of the same structures/types, but aligned to the MNI template anatomy (specifically, the 32k Conte69 mesh, see page 112 of Glasser, et. al 2013).

And now we're encountering GIFTI files: many of the "other files" are in GIFTI format, with the extension .*.gii. The naming of the "other files" (the last bit before the .gii) in the HCP tends to follow the CARET conventions, and gives a hint as to what sort of information they contain:

*.surf.gii, "gifti surface files", contain only vertex coordinates and triangles (which vertices are connected). The HCP *.surf.gii files are mostly structures that you might want to overlay data onto, such as 100307.L.inflated.native.surf.gii (left hemisphere, inflated) and 100307.L.midthickness.native.surf.gii.(left hemisphere, not inflated at all, but rather halfway through the thickness of the cortical ribbon).

*.func.gii and *.shape.gii, "metric files", contain data values for every vertex. Essentially, these are data arrays whose indices correspond to a surface file - you need a matching surface file to know where in the brain to put the data stored in a metric file. For example, a metric file from the HCP release is 100307.L.corrThickness.native.shape.gii: the cortical thickness at each vertex.

For an example of how these files work together, my tutorial on plotting a NIfTI image with the Workbench uses the wb_command -volume-to-surface-mapping program to create .shape.gii files aligned to Conte69.*.midthickness.32k_fs_LR.surf.gii. The data from the volumetric NIfTI (e.g. searchlight accuracies at each voxel) is stored (by vertex) in .shape.gii files, but a shape.gii file by itself isn't enough to plot the data properly on a surface: you need an aligned .surf.gii file as well. Paralleling how you need an aligned anatomy to properly overlay a volumetric NIfTI ROI, you need an aligned surf.gii to know how to properly locate the data from a metric file.

Whew! Hopefully this primer helps explain why so many files are released with the HCP data, and a bit about how they work together. For additional information see the Workbench Glossary, as well as Glasser, et. al 2013. If you've found any references particularly useful that I haven't already linked to, please send them along and I'll add links.

I want to end this post with a BIG thank you to Tim Coalson, who patiently (and repeatedly) walked me through these file types and how they relate to each other.

UPDATE 17 November 2016: Emma Robinson describes the various surface files, including the MSM related ones.

Tuesday, March 11, 2014

Allefeld 2014: Searchlight-based multi-voxel pattern analysis of fMRI by cross-validated MANOVA

A recent paper by Carsten Allefeld and John-Dylan Haynes, "Searchlight-based multi-voxel pattern analysis of fMRI by cross-validated MANOVA (see full citation below), caught my eye. The paper advocates using a MANOVA-related statistic for searchlight analysis instead of classification-based statistics (like linear SVM accuracy). Carsten implemented the full procedure for SPM8 and Matlab; the code is available on his website.

In this post I'm going to describe the statistic proposed in the paper, leaving the discussion of when (sorts of hypotheses, dataset structures) a MANOVA-type statistic might be most suitable for a (possible) later post. There's quite a bit more in the paper (and to the method) than what's summarized here!

MANOVA-related statistics have been used/proposed for searchlight analysis before, including Kriegeskorte's original paper and implementations in BrainVoyager and pyMVPA. From what I can tell (please let me know otherwise), this previous MANOVA-searchlights fit the MANOVA on the entire dataset at once: all examples/timepoints, no cross-validation. Allefeld and Haynes propose doing MANOVA-type searchlights a bit differently: “cross-validated MANOVA” and “standardized pattern distinctness”.

Most of the paper's equations review multivariate statistics and the MANOVA; the “cross-validated MANOVA” and “standardized pattern distinctness” proposed in the paper are in equations 14 to 17:

Equation 14 is the equation for the Hotelling-Lawley Trace statistic, which Allefeld refers to as D ("pattern distinctness").
Equation 15 shows how Allefeld and Haynes propose to calculate the statistic in a "cross-validated" way. Partitioning on the runs, they obtain a D for each partition by calculating the residual sum-of-squares matrix (E) and the first part of the H equation from the not-left-out-runs, but the second part of the H equation from the left-out run.
Equation 16 averages the D from each "cross-validation" fold, then multiplies the average by a correction factor calculated from the number of runs, voxels, and timepoints.
Finally, equation 17 is the equation for “standardized pattern distinctness”: dividing the value from equation 16 by the square root of the number of voxels in the searchlight.

To understand the method a bit better I coded up a two-class version in R, using the same toy dataset as my searchlight demo. Note that this is a minimal example to show how the "cross-validation" works, not necessarily what would be useful for an actual analysis, and not showing all parts of the procedure.

The key part from the demo code is below. The dataset a matrix, with the voxels (for a single searchlight) in the columns and the examples (volumes) in the rows. There are two classes, "a" and "b". For simplicity, the "left-out run" is called "test" and the others "train", though this is not training and testing as meant in machine learning. train.key is a vector giving the class labels for each row of the training dataset.

For Hotelling's T2 we first calculate the "pooled" sample covariance matrix in the usual way, but using the training data only:
S123 <- ((length(which(train.key == "a"))-1) * var(train.data[which(train.key == "a"),]) + (length(which(train.key == "b"))-1) * var(train.data[which(train.key == "b"),])) / (length(which(train.key == "a")) + length(which(train.key == "b")) - 2);

To make the key equation more readable we store the total number of "a" and "b" examples:
a.count <- length(which(train.key == "a")) + length(which(test.key == "a"));
b.count <- length(which(train.key == "b")) + length(which(test.key == "b"));

and the across-examples mean vectors:
a.test.mean <- apply(test.data[which(test.key == "a"),], 2, mean);
b.test.mean <- apply(test.data[which(test.key == "b"),], 2, mean) ;
a.train.mean <- apply(train.data[which(train.key == "a"),], 2, mean);
b.train.mean <- apply(train.data[which(train.key == "b"),], 2, mean);

now we can calculate the Hotelling's T2 (D) for this "cross-validation fold" (note that solve(S123) returns the inverse of matrix S123):
((a.count*b.count)/(a.count+b.count)) * (t(a.train.mean-b.train.mean) %*% solve(S123) %*% (a.test.mean-b.test.mean));

The key is that, paralleling equation 15, the covariance matrix is computed from the training data, multiplied on the left by the mean difference vector from the training data, then on the right by the mean difference vector from the testing data.

Should we think of this way of splitting the Hotelling-Lawley Trace calculation as cross-validation? It is certainly similar: a statistic is computed on data subsets, then combined over the subsets. It feels different to me though, partly because the statistic is calculated from the "training" and "testing" sets together, and partly because I'm not used to thinking in terms of covariance matrices. I'd like to explore how the statistic behaves with different cross-validation schemes (e.g. partitioning on participants or groups of runs), and how it compares to non-cross-validated MANOVA. It'd also be interesting to compare the statistic's performance to those that don't model covariance, such as Gaussian Naive Bayes.

Interesting stuff; I hope this post helps you understand the procedure, and to keep us all thinking about the statistics we choose for our analyses.

Allefeld C, & Haynes JD (2014). Searchlight-based multi-voxel pattern analysis of fMRI by cross-validated MANOVA. NeuroImage, 89, 345-57 PMID: 24296330

Wednesday, March 5, 2014

"not esoteric mathematic theory"

This has to be one of the best introductions to a section on the assumptions for a statistical test I've ever seen:

"All parametric statistical procedures are inferential procedures (i.e., they make inferences about populations). Mathematics and logic dictate that inferences be based on assumptions, and so like any other parametric statistical technique, the MANOVA has assumptions with which scientists must be concerned. These assumptions are not esoteric mathematic theory but conditions of the data that must be assessed (and hopefully satisfied) before trying to interpret the results of a MANOVA."

page 253 of Reading and Understanding Multivariate Statistics by Grimm &Yarnold (1995), "Multivariate Analysis of Variance" chapter by Kevin P. Weinfurt.