Given my occasional forays into surface analysis (and lack of conviction that it's necessarily a good idea for fMRI), I was intrigued by some of the analyses in a recent paper from Dave Langers ("Assessment of tonotopically organised subdivisions in human auditory cortex using volumetric and surface-based cortical alignments").
For task fMRI, the most transparent way to compare different analysis methods seems to use a task that's fairly well defined and anatomically predictable. Motion and primary senses are probably most tractable; for example a task that has blocks of finger-tapping and toe-wiggling should produce very strong signal, and be distinguishable in motor cortex. This gives a basis of comparison: with which method do we see the finger and toe signals most distinctly?
Langers (2014) investigated tonotopic maps in auditory cortex, which map the frequency of heard sounds, using volumetric and surface analysis. While tonotopic maps are not fully understood (see the paper for details!), this is a well-defined question for comparing surface and volumetric analysis of fMRI data: we know where the primary auditory cortex is located, and we know when people were listening to which sounds. He used a Hotelling's T2-related statistic for describing the "tonotopic gradient vectors" which reminds me a bit of cvMANOVA and the LD-t, but I'll just concentrate on the surface vs. volume aspects here.
This is Figure 2 from the paper, which, gives a flowchart of the procedures for the surface and volume versions of the analysis. He mapped the final volumetric results onto a (group) surface to make it easier to compare the surface and volume results, but the preprocessing and statistics were carried out separately (and it seems to me, reasonably): SPM8 for volumetric, freesurfer for surface. The fMRI voxels were small - just 1.5 x 1.5 . 1.5 mm, which is plausible to support surface analysis.
So, which did better, surface or volume? Well, to quote from the Discussion, "... the activation detected by the surface-based method was stronger than that according to the volumetric method. At the same time, the related statistical confidence levels were more or less the same." The image below is part of Figure 4 from the paper, showing a few of the group-level results (take a look at the full paper). As he observes, "The surface-based results had a noisier appearance than the volumetric ones, in particular for the frequency-related outcomes shown in Figure 4c–e."
So, in this paper, while the surface analysis seemed to result in better anatomic alignments between people, the activation maps were not clearer or more significant. I'd very much like to see more comparisons of this type (particularly with HCP processing, given its unique aspects), to see if this is a common pattern, or something unique to tonotopic maps in auditory cortex and this particular protocol.
Langers, D. (2014). Assessment of tonotopically organised subdivisions in human auditory cortex using volumetric and surface-based cortical alignments Human Brain Mapping, 35 (4), 1544-1561 DOI: 10.1002/hbm.22272
Thursday, November 19, 2015
Wednesday, November 11, 2015
positive control analyses and checking data quality
I've long advocated that much pain can be avoided if analysis of a new dataset is begun with positive control analyses: classifying (or whatever) something that should produce a very strong signal in easy-to-identify regions. Button-presses often are good controls (particularly if the button boxes were in different hands): when classifying whether the left-hand or right-hand was moved (totally ignoring the experimental conditions), do you find motor areas? Classifying the presence or absence of visual stimuli is also a good, strong control. Once a positive control is found, some of the analysis choices and data quality checks can be run on the control analysis then carried over to the target analysis, reducing the chance of exploiting too many experimenter degrees of freedom.
Another reason to start with positive control analyses is simply to identify problems in the dataset. If the control analysis fails in a particular person, why? Were the event timings mislabeled? Movement too high? Preprocessing failed? I'd be very worried about interpreting the results of a subtle cognitive task in a person whose data is of too poor a quality to support classifying something as strong as hand movements.
The rest of this post is an example of what positive control analyses can look like, and how their results compare to measures of general dataset quality. Several practiCal fMRI posts were very useful for thinking about how to visualize the image quality, particularly this one describing temporal SNR and this one showing examples of high-quality images.
The images below show two slices (one coronal, one axial) of the mean fMRI image for four runs in six people from two datasets (people in rows, runs in columns). The first image shows a dataset with fairly decent spatial normalization, the second, not-so-good spatial normalization (the images should enlarge if clicked).
A dataset with fairly decent spatial normalization. While the images vary a bit (e.g., subject 17 is rather "pointy" compared to the others), they are all orientated correctly and capture the same brain structures.
A dataset with not-so-good spatial normalization. Note that each person is pretty consistently normalized with themselves (i.e., the images in the four runs within each person are similar), but vary quite a bit between people. For example, sub7's brain looks "short", and when viewed in 3d, is clearly tilted.
In my estimation, analysis should not proceed on this dataset: spatial normalization needs to be improved, or analysis should be performed in subject (native) space.
The top part of this image are the standard deviations. This follows the convention of the mean images: subjects in the rows, four runs in the columns, with coronal slices first, then axial slices, both of the same 3d image. All images have the same color scaling, so brightness can be compared between runs and people.
Subject 34 is the best of these three people: the images for the four runs are pretty equally dark, but the brain outline and structure are visible. Subject 37 has the second and first runs much brighter and blurrier than the third and fourth runs; the first run in subject 36 is also brighter and blurrier than the others. These runs had more movement artifacts, reflected here as higher standard deviation.
The bottom part of this image is the accuracy from a positive control searchlight analysis in these same three people. In this case, the control analysis was classifying whether a particular image was from a cue or target/response trial segment, and we expect visual and motor areas to classify. (If you're curious, it was leave-one-run-out cross-validation within each person, linear SVM, c=1, 3-voxel radius searchlights, two balanced classes.) The overlay is color-scaled to show voxels with accuracy of 0.6 as red, 1 (perfect) as brightest yellow, not showing voxels with accuracy less than 0.6 (chance = 0.5). (I used knitr to make all the images in this post; see this demo for similar code.)
The accuracies and standard deviation are consistent in these images: sub34 has the lowest standard deviation (and highest temporal SNR, though this isn't shown here) and highest classification accuracy; sub36 and sub37 have fewer high-classifying searchlights. The relationship between image quality in these diagnostic tests and control classification accuracy is not always this clear, but I have seen it pretty often, and it should exist; by definition, the control classification should succeed in people with decent image quality. If it does not, the dataset should be checked for errors, such as mislabeled event timing files.
There's no magic threshold for image quality, nor perfect strategy for recovering signal from high-movement runs. But I would be very hesitant to continue analyzing a person without clear signal in the control analysis, particularly if they stand out in the mean and standard deviation images.
Another reason to start with positive control analyses is simply to identify problems in the dataset. If the control analysis fails in a particular person, why? Were the event timings mislabeled? Movement too high? Preprocessing failed? I'd be very worried about interpreting the results of a subtle cognitive task in a person whose data is of too poor a quality to support classifying something as strong as hand movements.
The rest of this post is an example of what positive control analyses can look like, and how their results compare to measures of general dataset quality. Several practiCal fMRI posts were very useful for thinking about how to visualize the image quality, particularly this one describing temporal SNR and this one showing examples of high-quality images.
average the fMRI timeseries for each voxel
First, run-wise mean images. These are simply the average of the functional timeseries for each voxel, each run and person separately. I calculated these on the images after preprocessing, but before voxelwise normalization. This is evaluating the quality of the images as they "arrive" for MVPA; in this case, after motion-correction, slice-time correction, and spatial normalization to an anatomic template. We thus expect the slices to look fairly similar in all people (because of the normalization), basically like fuzzy anatomical images.The images below show two slices (one coronal, one axial) of the mean fMRI image for four runs in six people from two datasets (people in rows, runs in columns). The first image shows a dataset with fairly decent spatial normalization, the second, not-so-good spatial normalization (the images should enlarge if clicked).
A dataset with fairly decent spatial normalization. While the images vary a bit (e.g., subject 17 is rather "pointy" compared to the others), they are all orientated correctly and capture the same brain structures.
A dataset with not-so-good spatial normalization. Note that each person is pretty consistently normalized with themselves (i.e., the images in the four runs within each person are similar), but vary quite a bit between people. For example, sub7's brain looks "short", and when viewed in 3d, is clearly tilted.
In my estimation, analysis should not proceed on this dataset: spatial normalization needs to be improved, or analysis should be performed in subject (native) space.
standard deviation of the fMRI timeseries for each voxel
As described by practiCal fMRI, images of the standard deviation of the fMRI timeseries are useful for spotting motion or other artifacts; see his post for more details. Basically, dimmer is better for these images, and we want to be able to see some brain structure. As with the mean images, these are simply calculating the standard deviation of each voxel's timeseries, separately within each run, using the post-preprocessing functional images. All image voxels were included, not just those in the brain mask, to allow spotting of blurry edges and ghosts.The top part of this image are the standard deviations. This follows the convention of the mean images: subjects in the rows, four runs in the columns, with coronal slices first, then axial slices, both of the same 3d image. All images have the same color scaling, so brightness can be compared between runs and people.
Subject 34 is the best of these three people: the images for the four runs are pretty equally dark, but the brain outline and structure are visible. Subject 37 has the second and first runs much brighter and blurrier than the third and fourth runs; the first run in subject 36 is also brighter and blurrier than the others. These runs had more movement artifacts, reflected here as higher standard deviation.
The bottom part of this image is the accuracy from a positive control searchlight analysis in these same three people. In this case, the control analysis was classifying whether a particular image was from a cue or target/response trial segment, and we expect visual and motor areas to classify. (If you're curious, it was leave-one-run-out cross-validation within each person, linear SVM, c=1, 3-voxel radius searchlights, two balanced classes.) The overlay is color-scaled to show voxels with accuracy of 0.6 as red, 1 (perfect) as brightest yellow, not showing voxels with accuracy less than 0.6 (chance = 0.5). (I used knitr to make all the images in this post; see this demo for similar code.)
The accuracies and standard deviation are consistent in these images: sub34 has the lowest standard deviation (and highest temporal SNR, though this isn't shown here) and highest classification accuracy; sub36 and sub37 have fewer high-classifying searchlights. The relationship between image quality in these diagnostic tests and control classification accuracy is not always this clear, but I have seen it pretty often, and it should exist; by definition, the control classification should succeed in people with decent image quality. If it does not, the dataset should be checked for errors, such as mislabeled event timing files.
There's no magic threshold for image quality, nor perfect strategy for recovering signal from high-movement runs. But I would be very hesitant to continue analyzing a person without clear signal in the control analysis, particularly if they stand out in the mean and standard deviation images.
Subscribe to:
Posts (Atom)