Friday, January 11, 2019

comparing fMRIPrep and HCP Pipelines: 20 parcels and summary

This post continues the previous; start with the introduction to this series.

The previous post showed the t-values for each parcel with the different preprocessings, but it's still a lot to absorb, and not all parcels are equally interesting. I thought it'd be useful to concentrate on the parcels with the largest high > low cognitive control effects in all four tasks, so tried a simple threshold: which parcels have t > 1 (uncorrected) in all four of the tasks, and all four of the preprocessing combinations? The twenty parcels above pass this test, and their anatomic locations are quite sensible for high > low cognitive control. This parcel-picking strategy is somewhat arbitrary, but seems reasonably unbiased.

The coefficients for each participant were shown in the previous post. To summarize those distributions, these scatterplots show the t-values for the 20 parcels (each plotting symbol a unique parcel). The number of parcels on each side of the diagonal are listed in the corners. When analyzing surfaces, more parcels had higher t-values when preprocessing was with fMRIPrep in all four tasks, most prominently Cuedts. When analyzing volumes the story was mixed (equal split of parcels in Axcpt, higher t-values with fMRIPrep in Cuedts and Sternberg; higher t-values with HCP on Stroop). Comparing surface and volume within each preprocessing (second set of scatterplots), there were higher t-values in volumes in three of the four tasks for HCP; two of the four for fMRIPrep.

The t-values are a rough measure of effect size, but don't consider the entire distribution; another strategy is to fit mixed models, which allows the coefficients for each person to be included. These really can't be sensibly summarized in a few sentences; see the last few pages of this knitr (source here) for the output. But very briefly, I used this setup for the first model (R nlme code): lme(fixed=diff~surf.vol*hcp.fp*, random=list(,,, data=mm.tbl); where surf.vol was "surface" or "volume" and hcp.fp was "HCP" or "fMRIPrep";,, and labeled the subjects, parcels (20 shown above) and tasks. Consistent with the graphs above, this model had significant interactions of surf.vol:hcp.fp and Looking only at volumes, the hcp.fp effect was significant, with fMRIPrep > HCP. Within only surfaces there was still an interaction of hcp.fp and, so the dataset had to be subsetted further. In these smaller models, fMRIPrep surfaces > HCP surfaces in all tasks; fMRIPrep surfaces > fMRIPrep volumes in all but Cuedts. Here is the output from this reduced model for Axcpt; the other tasks are here.

summary thoughts

This has been a long series of posts, and I hope interesting and useful! I've included quite a few details in these posts, but not the full dataset; we do plan to make it available, but it is obviously quite a lot and not simple to share. It seems most useful to release the preprocessed 4d images along with the afni code and key model output (most of the post-afni model code is in the knitrs linked in these posts); please contact me if you'd like something specific.

My overall impression? fMRIPrep looks preferable for preprocessing, and surface analysis looks good. I was honestly hoping that surfaces wouldn't turn out so well, since I find the formats bothersome, interpolation problematic, and the preprocessing time consuming. Volumes are obviously required for subcortical areas, but for now, we will continue to run cortical surface GLMs.

There are of course many other comparisons that can be made, and some other analyses that I did that aren't in these posts. I made a good faith effort to set up the comparisons to have the final GLM statistics as equivalent and unbiased as possible, but of course not everything can be made equal (e.g., there are more vertices per parcel with the HCP than the fMRIPrep preprocessing because of the different surface meshes). 

It's hard to say how well these results will hold for other datasets; for example, I did not fit simple GLMs since the aim was to compare the DMCC's GLMs. Different acquisition parameters may influence the results quite a bit, particularly voxel size for surface analysis (at some larger sizes I would expect surface analysis to fail). I am very curious to hear about the results if anyone else tries comparisons like these, to see how typical they are. But for now, we're using fMRIPrep for new task fMRI experiments.


  1. To clarify, these group level t-statistics were themselves computed from individual subject t-statistics (i.e., statistics on a statistic, rather than a statistic computed on an individual subject effect size)?

  2. Not t-statistics on t-statistics; the values for individual subjects going into the mixed models here (and the four-columns plots in the previous post) were the difference (high coefficient - low coefficient) from their TENT GLMs for the particular "target" knot.