Wednesday, September 15, 2021

"The Dual Mechanisms of Cognitive Control Project": post-publication analyses

The recently-published "The Dual Mechanisms of Cognitive Control Project" paper (preprint publisher) describes the motivations and components of the project as a whole, but also contains several analyses of its task fMRI data. The supplemental information for the manuscript has the R (knitr) code and input files needed to generate the results and figures in the manuscript. 

The supplemental information now has more results than those included in the paper, however: versions of the analyses using different sets of participants, parcels, and estimating the HDR curves (GLMs), which I will briefly describe here.

For example, Figure 5 (below) shows the estimated HDR curves for each task (high - low control demand) in the Baseline (blue, BAS) and Reactive (green, REA) sessions. The grey shading marks the target; when we expect the control demand difference to be greatest. Of interest is that the estimates are greater in the target window for all tasks and sessions, with the Baseline session estimates larger than those from the Reactive session (see manuscript for more explanation and framing).

The top version (supplemental file) includes 80 participants in the estimates (some related), averages the estimates from a set of 35 parcels (from the Schaefer 400x7 parcellation) found to be particularly sensitive to DMCC tasks, and uses GLMs estimating one knot for every 2 TRs.

The second version (supplemental file) shares the theoretically interesting aspects: curves mostly peak in the target (grey) area, blue curves mostly above green. There are many differences, though: the second graph is from a post-publication analysis using the DMCC55B participants (55 unrelated people; 48 of whom are in both 55B and the 80-participant set), the set of 32 Schaefer 400x7 parcels approximating the Core Multiple Demand network (12 parcels are in both sets), and GLMs estimating one knot for every TR.

It is reassuring to see that the analysis results are generally consistent despite these fairly substantial changes to its inputs. Sometimes results can look great, but are due to a statistical fluke or overfitting; in these cases small changes to the analysis that shouldn't matter (e.g., removing or replacing several participants) often make large changes in the results. The opposite occurred here: fairly substantial changes to the parcels, participants, and (to a lesser extent) GLMs led to generally matching results.

The paper's osf site now contains results files for all the different ways to set up the analyses, within the "postPublication_coreMD" and "postPublication_1TRpK" subdirectories. The variations:

  • 80 or 55 participants. Files for analyses using the DMCC55B participants have a "DMCC55B" suffix; files for the original set of 80 participants has either no suffix or "DMCC80".
  • 35 or 32 parcels. The set of 35 parcels identified via DMCC data are referred to as the 35-megaparcel or "parcels35"; the 32 parcels approximating the core MD are referred to as "core32".
  • GLMs with 1 or 2 knots per TR. The original analyses all used GLMs with 2 TRs per knot ("2TRpK"). The 1 TR per knot GLMs are abbreviated "1TRpK", including in the file names.

Tuesday, September 7, 2021

approximately matching different parcellations

We want to use the the core MD (multiple demand) regions described in Assem et al., 2020 in some analyses but ran into a difficulty: Assem2020's core MD regions are defined in terms of HCP MMP1.0 (Multimodal Parcellation, MMP) parcels and the fsLR (HCP, fsLR32K surface), but for the project we wanted to use Schaefer et al., 2018 400x7 (400 parcels by 7 networks) parcellation and the fsaverage5 surface. Is it possible to approximate the core MD regions with a set of Schaefer 400x7 parcels? How to determine which set produces the best approximation? This post describes our answer, as well as the resulting set of Schaefer 400x7 parcels; its logic and code should be adaptable to other parcellations.

Surface calculations can be complex because vertices do not correspond to a fixed area in the same way that voxels do (e.g., a cubical "searchlight" of  eight voxels will have the same volume no matter which voxel it's centered on, but the surface area covered by the circle of a vertex's neighbors will vary across the brain according to the degree of folding at that vertex). I decided to work with parcellations defined in the same space (here, fsLR), and match at the vertex level. Matching at the vertex level has some implications, including that all vertices are equally important for determining the degree of correspondence between the parcellations; vertices are not weighted by the surface area of their neighborhood. This has the advantage of being independent of factors like surface inflation, but may not be sensible in all cases.

The approximation procedure is iterative, and uses the Dice coefficient to quantify how well two collections of vertices match; a larger Dice coefficient is better. This use was inspired by the parcellation comparisons in Lawrence et al., 2021, and I adapted their calculation code. The MMP parcellation is symmetric but the Schaefer is not, so each hemisphere was run separately.

Start: Make a vector (length 32,492 since fsLR) with 1s in vertices belonging to a core MD MMP parcel and 0 elsewhere. This vector does not change. 
List all Schaefer 400x7 parcels with one or more vertex overlapping a core MD MMP parcel. (This starts with the most complete possible coverage of core MD vertices with Schaefer parcels.)

Iterative Steps:
Step 1: Make a vector (also length 32,492) with 1s in the vertices of all the listed Schaefer parcels and 0 elsewhere. Calculate the Dice coefficient between the two vectors (core MD and Schaefer). 

For each listed Schaefer parcel, make a vector with 1s in the vertices of all BUT ONE of the listed Schaefer parcels and 0 elsewhere. Calculate the Dice coefficient between these two vectors (core MD and Schaefer-1 subset). 

Step 2: Compare the Dice coefficient of each subset to that of the entire list. Form a new list of Schaefer parcels, keeping only those whose removal made the fit worse (i.e., drop the parcel from the list if the Dice coefficient was higher without the parcel).

Repeat Steps 1 and 2 until removing any additional Schaefer parcel makes the fit worse.

Using this procedure, we obtained a set of 32 Schaefer 400x7 parcels (IDs 99, 127, 129, 130, 131, 132, 137, 140, 141, 142, 148, 163, 165, 182, 186, 300, 332, 333, 334, 335, 336, 337, 340, 345, 349, 350, 351, 352, 354, 361, 365, 387) as best approximating the core MD described by Assem, et al. 2020 (colors are just to show boundaries):

The approximation seems reasonable, and we plan to use this set of parcels (and the procedure that found them) in future analyses. 

The R (knitr) code to create the above figure, calculate Dice, and find the parcel set is in Schaefer400x7approxAssem2020.rnw, along with the compiled version.

Wednesday, September 1, 2021

I'm glad we're censoring our task fMRI timecourses for motion ...

Some years ago I wrote a series of posts describing how we settled on the FD > 0.9 censoring threshold we're using for the DMCC and some other task fMRI studies. While doing some routine checks yesterday a funny bit caught my eye, which I'll share here as an example of what happens with overt head motion during fMRI, and why censoring affected frames is a good idea.

The first thing that caught my eye was strange, non-anatomical, lines in the temporal standard deviation images for a single run:

(The above image shows slices through the temporal standard deviation image for three runs from the same participant and session; each row is a separate run. See this post for more explanation and code links.)

The stripy run's standard deviation images are bright and fuzzy around the edges, suggesting that head motion is a on the high side. I looked at the realignment parameters to investigate further (see this post for code and more explanation):

The overt head motion was quite low after the first minute and a half of the run (vertical grey lines are at one-minute intervals), but there were some big jumps in the realignment parameters (and red xs - frames marked for censoring for FD > 0.9) in the first two minutes, particularly around frame 100. (TR is 1.2 s.)

I opened the preprocessed image (via fmriprep; the image's suffix is "_acq-mb4PA_run-2_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz") in Mango and stepped through the frames to see the motion directly. 

Most of the run looks nice and stable; blood flow is clearly visible in the large vessels. But there's a large distortion in frame 98 (19 in the subset), as shown below (motionExample.nii.gz on the blog osf site is frames 80:110 from the run, for a better view). 

The distorted appearance of frame 98 (center) is expected: head motion changes how the brain is positioned in the magnetic field, and it takes time for the BOLD to restabilize. (Frame 98 is the most obviously affected, but the adjacent also have some distortion.)

It would be bad to include frame 98 in our analyses: the brain areas are not where they should be. This type of error/artifact/weirdness is especially hard to identify in timeseries analysis: we can extract a complete timecourse for every voxel in this run, but its values in frame 98 do not correspond to the same sort of brain activity/BOLD as the rest of the run. Strange values in just one frame can skew the statistics summarizing the entire run, as happened here in the image at the top of this post: the stripes in the voxelwise standard deviation come from the stripes in frame 98.

But, all is actually fine: frame 98, along with several surrounding ones, are marked for censoring, and so will not be included in statistical analyses. The DMCC QC template code that generated the temporal standard deviation images ignores censoring, however, so included the distorted frames, which made the stripes.

To summarize, this post is not describing some sort of bug or error: it is well known that head motion disrupts fMRI signal, and that such frames should not be included in analyses; I am reassured that our censoring procedure properly identified the affected frames. Instead, this post is intended as an example of how a very short head motion can have a large effect on run-level summary statistics. In this case, no harm was done: the run-level statistic is for QC and purposely ignored censoring. But if this frame was included in real analyses the true task-related activity could be lost, shifted, or otherwise distorted; many fMRI phenomena have larger impact on BOLD than does task performance.

Finally, here's how the same three frames look on the surface. If you look closely you can see that the central sulcus disappears in frame 98 (marked with blue in frame 99 for contrast), though, as usual, I think strange images are easier to spot in the volume. (also via fmriprep, suffix acq-mb4PA_run-2_space-fsaverage5_hemi-L.func.gii, plotted in R.)