Thursday, January 10, 2019

comparing fMRIPrep and HCP Pipelines: tSNR images

See this for the introduction to this series of posts. Yesterday I showed that the motion regressors produced by the two programs are nearly identical (as we'd hope, since they started with the same DICOMs!). But how do the preprocessed images compare? Short version: the preprocessed images are qualitatively similar in appearance (and standard QC images), with differences between people more obvious than differences between the pipeline. But only "similar", not "nearly identical" (which is how I described the motion regressors).

(Much) Longer version: The main goal of these preprocessing tests is to compare the GLM estimates (described a bit in the introduction and more later), not to perform QC on the images from the pipelines. This adds a complication, however, because the images that go into the GLMs are not exactly the same as the images that come out of the pipelines. It strikes me as most useful to look at the images as ready to go into the GLMs, since that's the final step at which things vary - identical GLMs were performed at every voxel and vertex.

We wanted to make the GLM estimate comparisons as direct and fair as possible, so aimed to do equivalent transformations of the surface and volume preprocessed images from each pipeline. If you think we missed that target, please let me know. We use afni for the after-preprocessing steps, as summarized here. First, both fMRIPrep and the HCP pipelines produce MNI-normalized volumes, but the bounding box is a bit different, so we used 3dresample to match the fMRIPrep images to the HCP, as well as to reorient to LPI. Both the fMRIPrep and HCP pipeline volumes are then smoothed (3dBlurToFWHM, -FWHM 4); surfaces are NOT smoothed with afni commands, since they were smoothed in the preprocessing pipelines. Finally, the timeseries at each voxel (or vertex) was scaled (in the usual way for afni, see this or this), using 3dcalc -a full.timeseries -b mean.timeseries -expr 'min(200, a/b*100)*step(a)*step(b)', where mean.timeseries is an image of the voxel (or vertex)-wise means, calculated with 3dTstat.


Single frames (this is #100 from the same run as in the previous post) from the ready-for-afni images look similar; an obvious difference is that the HCP pipelines mask the brain, while fMRIPrep does not. I have the color scaling the same in both images (90 to 100) and the intensity in the brain is similar (as it should be, given the scaling); the voxel under the crosshairs has the value 99.8 after fMRIPrep and 98.6 after the HCP pipelines.The frontal dropout (yellow arrow) is obvious in both, but looks like weird striping in the HCP image because of the masking; without masking it blends into the background. Spot-checking frames of the images I can't say that I've found one preprocessing consistently "better" than the other; by eye, sometimes one looks a bit better, sometimes the other.


Here's the same frame, from the surfaces. The HCP pipelines project to a different underlay surface anatomy than fMRIPrep, so the brain shapes are a bit different (fMRIPrep to fsaverage5, 10242 vertices/hemisphere; HCP with 32492 vertices/hemisphere). To be honest, I find these harder to interpret ... swirly colors.



If the single frames look similar, what about QC images? Well, by eye they also strike me as "different but similar", without an overwhelming preference for one being better than the other. The above are tSNR images of the same test run, all with the same color scaling.


Is the story in this whole series of comparison posts going to be "similar"? No, differences turn up in the GLM results, especially when summarizing within parcels ... stay tuned.

Note: the movement graphs and tSNR images were made with R knitr code. The full pdf (four tasks for this person) is here, and its source here. The paths/etc. won't work to compile the knitr (unless you're on our network!), but the plotting code and afni commands may prove useful; the sourced gifti plotting functions are here.

Wednesday, January 9, 2019

comparing fMRIPrep and HCP Pipelines: motion regressors

See this post for the introduction to this series of posts. Here, I'll describe and compare the motion regressors produced by the fMRIPrep and HCP Pipelines. Short version: the file formats are a bit different, but the estimated motion is nearly identical.

Long version: the first step is figuring out where the motion regressors are put by the two pipelines. If you kept the HCP file structure, there will be one Movement_Regressors.txt file for each preprocessed run. The files are always called Movement_Regressors.txt; you track which one goes with which run and person by the directory structure. For example, the file for subject 203418, run SternBas2_PA is /MINIMALLY_PREPROCESSED/203418/MNINonLinear/Results/tfMRI_SternBas2_PA/Movement_Regressors.txt.



The above image is the first bit of the file, read into R. It has 12 columns: x, y, z, roll, pitch, yaw, then their six derivatives (stated on page 75 of HCP_S900_Release_Reference_Manual.pdf). The rotation columns are in degrees, translation in mm. There are 600 rows in this particular text file, matching the number of volumes acquired in the run.


fMRIPrep outputs a file ending in _confounds.tsv for each run, with its location and name following the BIDS naming conventions. For example, the motion regressors file for the same subject and run as above is /sub-203418/fMRIPrep_output/fmriprep/sub-203418/ses-01/func/sub-203418_ses-01_task-Stern_acq-MB42p4PA_run-02_bold_confounds.tsv.

The above is the first bit of the file, read into R. As you can tell in the screenshot, this file has many more columns than the HCP version: 35 instead of 12 (the same number of rows, though: one per acquired volume). The columns are named, and the documentation gives details on each. The motion regressors I want are in six columns, called X, Y, Z, RotX, RotY, and RotZ. "RotX","RotY","RotZ" correspond to roll, pitch, and yaw, but are in radians; translation is in mm.

UPDATE (24 April 2019): fMRIPrep version 1.3.2 names the columns "trans_x", "trans_y", "trans_z", "rot_x", "rot_y", "rot_z".

Now I can plot both sets of motion parameters to see if they vary. I converted the fMRIPrep rotations to degrees, then plotted the six estimates from each program in the top of this figure. (TRs (1.2 ms) are along the x-axis; the vertical grey lines give 1-minute intervals.) As you hopefully can see (click the figure to enlarge), it's actually hard to tell that there are 12 lines in the top - those from fMRIPrep and the HCP pipelines are so close that it mostly looks like there are only 6 lines.
The lower frame has the FD version of the same two sets of motion estimates; HCP Pipelines in black and fMRIPrep in green (this is unfiltered FD, and our censoring threshold of 0.9 is shown in red). A little separation between the two FD lines is visible, but again, quite minimal.

I picked this run for this post because there's a bit of actual head motion, and if you zoom in around frame 525 you can see that there is a little separation between the translation estimates. I had to check a few runs to find one with any visible separation; in our 13 test people (with 8 runs each) this is pretty much as large as the discrepancy gets, which is why at this time I'm describing the motion estimates from fMRIPrep and the HCP pipeline as essentially identical.

Note: The figure above is from a knitr template that I made for doing these QC comparisons; see note at the end of this post.

Friday, January 4, 2019

comparing fMRIPrep and HCP Pipelines: introduction

This is the first in a series of posts describing a set of comparisons we've done between fMRIPrep and the HCP Pipelines: if the same task fMRI dataset is run through both pipelines, extracting surfaces and volumes from each, how do the eventual GLM results compare?

These sorts of comparisons are conceptually easy but tricky to design and time-consuming to complete. I submitted this as a poster for OHBM 2019, we will likely write this up as a paper, and  plan to release the code and images. But these blog posts are a start, and feedback is most welcome. Let me know if you want additional details or files and I can try to get them to you sooner than later.

roadmap
The final post also has some summary thoughts. An UPDATE (24 April 2019) describes the same analyses run with fMRIPrep version 1.3.2.

dataset

I am particularly interested in how the preprocessing would affect the statistics for the Dual Mechanisms of Cognitive Control (DMCC) project since that's the project I'm most involved with. I've previously shown bits from this dataset and described the acquisition parameters; briefly, each DMCC scanning session includes two runs each of four cognitive control tasks (AX-CPT, Cued task-switching, Sternberg, and Stroop). Task runs were each approximately 12 minutes, alternating AP/PA. All scanning was with a 3T SIEMENS Prisma scanner with a 32 channel head coil, 2.4 mm isotropic voxels, TR = 1.2 s, and an MB4 CMRR sequence.

The DMCC is a big dataset (currently around 80 people with at least 3 scanning sessions, and we're not done). To keep the preprocessing comparisons tractable I picked just 13 of the participants to include, requiring that they have complete Baseline session data (two runs of each of the four tasks), be unrelated, have acceptable movement (a qualitative judgement), and have visible motor activity in their single-subject Buttons GLMs (with HCP Pipeline preprocessing). This last is an effort for an unbiased QC estimate: in three of our tasks the participants push a response button, so GLMs using these button-pushes as the only events of interest (ignoring all of the cognitive conditions) should show motor activation if the signal quality is acceptable. I used the volumetric HCP Pipeline preprocessing for screening because that's what we've been using (and so have results for all people); that may introduce a bit of bias (favoring people that did well with the volumetric HCP Pipelines).

pipelines, GLMs, and parcels

These analyses use version 3.17.0 of the HCP Pipelines (adapted several years ago for the DMCC acquisitions) and fMRIPrep version 1.1.7; each pipeline began with the same DICOM images.

The same GLMs (event onsets, model types, etc.) were run on the images from each version (surface, volume) of each pipeline; all with AFNI. The "standard" DMCC GLMs are not particularly simple because the task runs are mixed (block and event), and we're fitting GLMs to model both "sustained" (block) and event-related activity. The GLMs use 3dDeconvolve and 3dREMLfit, with TENTzero (FIR-type) for the events (one knot for two TRs). Most of the GLM model parameters were dictated by the experimental design, but tweaking (e.g., how long to make the TENT knots, whether to include regressors for the block onsets and offsets) was done with the HCP pipeline preprocessed images.

To keep these preprocessing comparisons tractable at the GLM level, we only included Baseline session images, and focused on contrasting high and low cognitive control. Which trials correspond to high and low cognitive control varies across the four tasks, and we don't expect identical levels of difficulty or brain activity in the tasks. Regardless, high vs. low cognitive control conditions gives a fairly "apples to apples" basis, and is a reasonably strong effect.

Another complication for preprocessing comparisons is working out the anatomical correspondence: every voxel does not have a single vertex in the surface representation (and each vertex doesn't match a single voxel). While we had both the fMRIPrep and HCP Pipelines output MNI-normalized volumes (so volume-to-volume voxel-level correspondence), the two pipelines use different surface meshes (fsaverage5 vs. HCP), so there isn't an exact match between the vertices (see this thread, for example).

For these comparisons I deal with corresponding across the output formats by making qualitative assessments (e.g., if this GLM output statistic is thresholded the same, do the same areas tend to show up in the volumes and surfaces?), but more practically by summarizing the statistics within anatomic parcels. The parcellation from (Schaefer et al. 2017) is ideal for this: they released it in volume (MNI) and surface (both fsaverage5 and HCP) formats.We used the 400-area version here, allowing direct comparisons: a mean of parcel #6 can be computed in all four output images (HCP volume, HCP surface, fMRIPrep volume, fMRIPrep surface). The number of voxels or vertices making up each parcel varies between the volume and surfaces, of course.


UPDATE 10 January 2019: Here's the command actually used to run fMRIPrep; thanks Mitch Jeffers!
singularity run \
--cleanenv \
-B ${MOUNT}:/tmp \
/home/ccp_hcp/fmriprep/SingularityImages/fmriprep-1.1.7.simg \
--fs-license-file /tmp/.license/freesurfer/license.txt \
-w /tmp \
/tmp/${INPUT} /tmp/${OUTPUT} \
participant --participant_label ${SUBJECT} \
--output-space fsaverage5 \
-n-cpus 16 --omp-nthreads 4 \
--mem-mb 64000 -vvvv \
--use-plugin /tmp/plugin.yml



UPDATE 6 September 2019: The raw and preprocessed images of the dataset we used for these comparisons is now on openneuro, called DMCC13benchmark

UPDATE 4 January 2021: Corrected DMCC13benchmark openneuro links.

Friday, July 6, 2018

RSA: how to describe with a single number? - update

Some years ago I posted about quantifying RSA matrices with a single number. Since then, it seems that the field has settled on two methods for quantifying RSA matrices: one based on differences and the other based on Kendall's tau-a.

To illustrate the methods, suppose this is the RSA matrix for one person, and the corresponding "reference matrix"; the numbers in the cells are Pearson correlations. In this case, we expect the two cells with the same letter (2,F and 0,F; 2,P and 0,P) to have higher correlations than all the other cells (which mix letters).
The matrix is symmetrical, with 1s along the diagonal, so we'll only use the numbers from the lower triangle. We can "unwrap" these values into vectors, like this:
 x <- c(.728, .777, .705, .709, .873, .705);   # one person
 y <- c(   0,    1,    0,    0,    1,    0);   # reference

I previously called the difference-based method "averaging the triangles"; conceptually, we take the difference between the average of the two cells we expect to have a large value (cells with 1s in the Reference), and the average of the four cells we expect to have smaller values (0 in the Reference). This can also be thought of as using a contrast matrix, such as in Oedekoven, et. al (2017). Since you can't do math on correlations without going through the Fisher R-to-Z transform, here's an example of the calculation:
 get.FTrz <- function(in.val) { return(.5 * log((1+in.val)/(1-in.val))); }  # Fisher's r-to-z transformation  
 get.FTzr <- function(in.val) { return((exp(2*in.val)-1)/(exp(2*in.val)+1)); } # Fisher's z-to-r transformation  
 mean(get.FTrz(x[which(y == 1)])) - mean(get.FTrz(x[which(y == 0)]));
 # [1] 0.3006614  

(It's of course equivalent to calculate this by multiplying the sum of the y == 1 cells by 0.5 and y == 0 cells by 0.25.)

The Kendall's tau-a based method is recommended in Nili, et. al (2014). There's a function to calculate this in the DescTools R package; since it's rank-based, there's no need to do the Fisher transformation:
 KendallTauA(x,y)  
 # [1] 0.5333333  

Which method should you use? One consideration is if you want the magnitude of the numbers in the RSA matrix cells to matter, or just their relative sizes (ranks). Here, I made a RSA matrix with the same rank ordering as before (in x), but the difference between same-letter and different-letter cells are more extreme:
 x1 <- c(0.2, 0.9, 0.1, 0.01, 0.83, 0.22);  
 KendallTauA(x1,y)  
 # [1] 0.5333333  # same  
 mean(get.FTrz(x1[which(y == 1)])) - mean(get.FTrz(x1[which(y == 0)]));
 # [1] 1.195997 # bigger  

The Kendall's tau-a value for x1 and y is exactly the same as x and y, but the mean difference is larger. Is it better to be sensitive to magnitude? Not necessarily. In practice, when I've calculated both methods on the same dataset I've found extremely similar results, which seems proper; it's hard to imagine a strong effect that would be only be detected by Kendall's tau-a.

UPDATE 12 December 2018: I removed transforming the difference scores back to correlations, leaving the differences as Fisher-transformed zs, since this seems easier for later interpretation and calculations.

Tuesday, June 26, 2018

detrending and normalizing timecourses: afni 3dDetrend in R

UPDATE 27 February 2026: An expanded version of this demo is summarized in this post, full version at https://doi.org/10.17605/OSF.IO/NU324.


I've been using the afni 3dDetrend command on datasets lately, but realized that I wasn't sure of the exact way the normalization and detrending was calculated. To help me understand (and reproduce the results on files that aren't easily read by afni), this post gives R code to match 3dDetrend output, plus shows some examples of how various normalizing and detrending schemes change timecourses (there's more in the knitr and pdf than shown in this post; files below the jump).

First, we need some data. I picked a run from a handy preprocessed dataset (it doesn't really matter for the demo, but it's from multibandACQtests), then ran the image through afni's 3dDetrend command twice: once with the options -normalize -polort 0, then with -normalize -polort 2. This gives three images, and I picked a voxel at random and extracted its timecourse from each image. Code for this step starts at line 22 of the knitr file (below the jump), and text files with the values are included in the archive for this post.

Here is the voxel's "raw" timecourse: after preprocessing (the HCP pipelines, in this case), but before running 3dDetrend:

And here is the same voxel's timecourse, from the image after 3dDetrend -normalize -polort 2: centered on zero, with the shift of baseline around TR 50 reduced a bit.

From the afni documentation, 3dDetrend -normalize "Normalize[s] each output voxel time series; that is, make the sum-of-squares equal to 1." Note that this is not what R scale() does by default (which is to mean 0 and standard deviation 1; see the full demo for a comparison). Line 113 of the demo code does the afni-style normalization: (tc.raw - mean(tc.raw)) / sqrt(sum((tc.raw - mean(tc.raw))^2)), where tc.raw is a vector with the voxel's timecourse.

For the -polort 2 part, the afni documentation says: "Add Legendre polynomials of order up to and including 'ppp' in the list of vectors to remove." I used the functions from Gregor Gorjanc's blog post (thanks!) and the R orthopolynom package for this; the key part is line 140: residuals(lm(tc.raw ~ seq(tc.raw) + Legendre(x=seq(tc.raw), n=1) + Legendre(x=seq(tc.raw), n=2))). The two Legendre function terms (n=1 and n=2) are to match -polort 2; more (or fewer) terms can be included as desired.

Monday, June 4, 2018

tutorial: plotting GIfTI images in R (knitr)

NOTICE: this post was replaced by a newer version 13 March 2020. I will leave this post here, but strongly suggest you follow the new version instead.



In a previous tutorial I showed how to plot volumetric NIfTI images in R and knitr. Here, I give functions for plotting surface GIfTI brain images in R. If you're not familiar with surface images, this and this post may be useful. Please cite if you find this useful.

This uses the R gifti package (thank you, John Muschelli!) to read the GIfTI images, and the plot3D package for the plotting. It is possible to plot interactive 3d images in R (see for example, the gifti package vignette), but static images are more useful for the application I have in mind: batch-produced knitr files of surface (afni) GLM results.

Here is the first part of the knitr file produced in this tutorial; the full pdf can be downloaded here:

The "blobs" are F-statistics from a GLM targeting right-hand button pushes, so the peaks in left motor areas are sensible. The underlay surfaces are the "inflated" images from the HCP 1200 Subjects Group Average Data release; download here. The code requires you to have underlay surfaces and statistics to overlay in perfect agreement (the same number of nodes, same triangle connections, etc.), but this is a general requirement for surface plotting or analysis.


The gifti-plotting function I wrote displays the medial and lateral views, with hot colors for positive values and cool colors for negative. The function takes the positive and/or negative color scaling limits, with values closer to zero than the minimum value not shown, but those above the maximum plotted in the maximum color. This behavior is conventional for neuroimaging; for example, this is the same image, plotted in the Connectome Workbench with the same color scaling.

Full code for the demo is below the jump (or downloaded here). To run the demo you will need the two gifti overlay images, which can be downloaded here (left) and here (right), as well as the HCP 1200 underlay anatomies (see above). If you don't want to run the code within knitr the relevant functions can be separated from the latex sections, but you will need to set the image sizing and resolution.

Please let me know if you find this code useful, particularly if you develop any improvements!

UPDATE 20 July 2018: Fixed the plot.surface function to display the entire range (not truncated) in the lower right note.

UPDATE 7 September 2018: Another fix to plot.surface() to truncate the heat.colors() range so that the top values are plotted in yellow, rather than white, which wasn't very visible.

Also, Chris_Rorden posted a python and Surfice version on Neurostars.

UPDATE 9 November 2018: Added a section assigning arbitrary values to surface parcels and then plotting, matching this demo.

UPDATE 9 January 2019: Added a which.surface term to the plot.surface function, specifying whether plotting HCP or fsaverage5 surfaces. The default is HCP, for backward compatibility.

UPDATE 10 April 2019: Improved color ranges.

NOTICE: this post was replaced by a newer version 13 March 2020. I will leave this post here, but strongly suggest you follow the new version instead.



Monday, May 14, 2018

mean pattern subtraction confusion

A grad student brought several papers warning against subtracting the mean pattern (and other types of normalization) before correlational analyses (including, but not only, RSA) to my attention. I was puzzled: Pearson correlation ignores additive and  multiplicative transformations, so how could it be affected by subtracting the mean? Reading closely, my confusion was from what exactly was meant by "subtracting the mean pattern"; it is still the case that not all forms of "normalization" before correlational MVPA are problematic.


The key is where and how the mean-subtraction and/or normalization is done. Using the row- and column-scaling terminology from Pereira, Mitchell, and Botvinick (2009) (datasets arranged with voxels in columns, examples (trials, frames, whatever) in rows): the mean pattern subtraction warned against in Walther et al. (2016) and Garrido et al. (2013) is a particular form of column-scaling, not row-scaling. (A different presentation of some of these same ideas is in Hebart and Baker (2018), Figure 5.)

Here's my illustration of two different types of subtracting the mean; code to generate these figures is below the jump. The original pane shows the "activity patterns" for each of five trials in a four-voxel ROI. The appearance of these five trials after each has been individual normalized (row-scaled) is in the "trial normalized" pane, while the appearance after the mean pattern was subtracted is at right (c.f. Figure 1D in Walther et al. (2016) and Figure 1 in Garrido et al. (2013)).

Note that Trial2 is Trial1+15; Trial3 is Trial1*2; Trial4 is the mirror image of Trial1; Trial5 has a shape similar to Trial4 but positioned near Trial1. (Example adapted from Figure 8.1 of Cluster analysis for researchers (1984) by H. Charles Romesburg.)


And here are matrix views of the Pearson correlation and Euclidean distance between each pair of trials in each version of the dataset:

The mean pattern subtraction did not change the patterns' appearance as much as the trial normalization did: the patterns are centered on 0, but the extents are the same (e.g., voxel2 ranges from 0 to 80 in the original dataset, -40 to 40 after mean pattern subtraction). The row (trial) normalized image has a much different appearance: centered on zero, but the patterns for trials 1, 2, and 3 are now identical, and the range is much less (e.g., voxel2 is now a bit less than -1 to a bit more than 1). Accordingly, the trial-normalized similarity matrix is identical to the original when calculated with correlation, but different when calculated with Euclidean distance; the mean pattern subtracted similarity matrix is identical to the original when calculated with Euclidean distance, but different when calculated with Pearson correlation.

This is another case where lack of clear terminology can lead to confusion; "mean pattern subtraction" and "subtracting the mean from each pattern" are quite different procedures and have quite different effects, depending on your distance metric (correlation or Euclidean).