Monday, June 13, 2022

now available: DMCC55B rejected structurals and ratings

We received several requests for the rejected DMCC55B structural images, since comparing images of different quality can be useful for training. Thanks to the assistance of Rachel Brough, we have now released T1 and T2 images for the 13 DMCC55B people whose initial structural scans we rated of poor quality, as well as our ratings for both sets of images (the initial rejected images and better repeated ones). 

The rejected structural images (with session name “doNotUse”) and ratings are in a new sub-component (postPublication_rejectedStructurals) of the DMCC55B supplemental site, rather than with the released dataset on openneuro, to avoid confusion about which images should be used for processing (use the previously-released ones available at openneuro).

Wednesday, June 8, 2022

troubleshooting run-level failed preprocessing

Sometimes preprocessed images for a few runs are just ... wrong. (These failures can be hard to find without looking, one of the reasons I strongly suggest including visual summaries in your QC procedures; make sure you have one that works for your study population and run types.) 

Here's an example of a run with "just wrong" preprocessing that I posted on neurostars: the images are all of the same person and imaging session, but one of the runs came out of the preprocessing seriously distorted.


And here's another: the images from the run at the lower right are clearly tilted and squashed compared to the others:

The above images are temporal means made after fmriprep preprocessing, including transformation to the MNI template anatomy (i.e., _space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz);  see this post for more details and code. 

How to troubleshoot this type of partial failed preprocessing?

First, note that this is not due to the entire preprocessing pipeline failing: we have the expected set of derivative images for the participant, and everything looks fine for most runs. This suggests that the problem is with not with the pipeline, or that something is unusual about the participant.

fmriprep (and I think most preprocessing pipelines?) is not fully deterministic: if you run the same script twice with the same input files and settings the output images will not be exactly the same. They should be quite similar, but not identical. We have found that sometimes simply rerunning the preprocessing will correct the type of sporadic single-run normalization/alignment failures shown above.

If repeating the preprocessing doesn't fix the failure, or you want to investigate more before rerunning, I suggest checking the "raw" (before preprocessing; as close to the images coming off the scanner as practicable) images for oddities. Simply looking at all the images, comparing those from the run with failed preprocessing and the other (successful) runs from the same session can often make the problem apparent.

Look at the actual functional images of the problematic run (e.g., by loading the DICOMs into mango and viewing as a movie): do you see anything strange? If the problematic run seems to have more-or-less the same orientation, movement, visible blood vessels, etc. as the non-problematic runs for the person, it is unlikely that the source of the problem is the functional run itself and you should keep looking. (If the functional run itself is clearly unusual/full of artifacts, it is most likely simply unusable and should be marked as missing.)

If the functional run seems fine, look at all of the other images used in preprocessing, especially any fieldmaps and single-band reference (SBRef) images. Depending on your acquisition you may have one or more fieldmaps and SBRef images per run, or per set of runs. For the DMCC we use CMRR multiband sequences, so have an SBRef image for every functional run, plus a fieldmap each session. Both the fieldmaps and SBRef images are used in preprocessing, but differently, and if either has artifacts they will affect the preprocessing.

How an artifact in the fieldmap or SBRef affects the preprocessing can be difficult to predict; both can cause similar-looking failures. In the two examples above, the first was due to artifacts in the fieldmaps, the second in the SBRef.

This is a screen capture of three SBRef images from the session shown in the second example. The numbers identify the scans and are in temporal order; image 41 is the SBRef for the affected run (a "tilted" brain); 23 for the correctly-preprocessed run above it. There are dark bands in parts of scan 41 (red), and it looks a bit warped compared to the other two; below is how they look in coronal slices:


All SBRef images look rather odd (here, there's some expected ear dropout (yellow) and encoding direction stretching), but the key is to compare the images for runs in which preprocessing was successful (23 and 32) with those for which it failed (41). The SBRef for 41 is obviously different (red): black striping in the front, and extra stretching in the lowest slices. This striping and stretching in the SBRef (probably from movement) translated to tilting in the preprocessed output above.

What to do about it?

Ideally, you won't collect SBRef or fmaps with strange artifacts; the scanner will work properly and participants will be still. If the images are checked during acquisition it is often possible to repeat problematic scans or fix incorrect settings. This is of course the best solution!

However, sometimes an artifact or movement is not apparent during data collection, the scan can't be repeated, or you are working with an existing dataset and so are stuck with problematic scans. In these cases, I suggest doing something like in this post: look at all of the scans from the session (and other sessions if relevant) and try to determine the extent and source of the problem. 

In the case of the fieldmap artifacts, every fieldmap from the scanning session was affected, but fieldmaps for the same person from two other sessions were fine. We "fixed" the failed preprocessing by building a new BIDS dataset, swapping out bad fieldmaps for good ones and changing the filenames accordingly. Before trying this I checked that the person's head position, distortion, etc. were quite similar between the runs. I do not really recommend this type of image swapping; things can go badly wrong. But it is something to consider if you have similar artifact-filled and good images from the same person and acquisition. With the SBRef image we have another option: SBRef images are not required, so we could delete poor ones (here, scan 41) and repeat the preprocessing without them. 

Neither workaround (swapping or deleting) should be used lightly or often. But it has produced adequate quality images for us in a few cases. To evaluate I look very closely at the resulting preprocessed images and BOLD timecourses, both anatomy and comparing the positive control analysis (see also) results for runs/sessions with normal and workaround preprocessing. For example, confirm that the changes in BOLD timed with your stimuli in particular visual parcels are essentially the same in runs with the different preprocessing. If different visual parcels show the stimulus-caused changes in BOLD depending on preprocessing, the workaround was not successful.

Wednesday, September 15, 2021

"The Dual Mechanisms of Cognitive Control Project": post-publication analyses

The recently-published "The Dual Mechanisms of Cognitive Control Project" paper (preprint publisher) describes the motivations and components of the project as a whole, but also contains several analyses of its task fMRI data. The supplemental information for the manuscript has the R (knitr) code and input files needed to generate the results and figures in the manuscript. 

The supplemental information now has more results than those included in the paper, however: versions of the analyses using different sets of participants, parcels, and estimating the HDR curves (GLMs), which I will briefly describe here.

For example, Figure 5 (below) shows the estimated HDR curves for each task (high - low control demand) in the Baseline (blue, BAS) and Reactive (green, REA) sessions. The grey shading marks the target; when we expect the control demand difference to be greatest. Of interest is that the estimates are greater in the target window for all tasks and sessions, with the Baseline session estimates larger than those from the Reactive session (see manuscript for more explanation and framing).

The top version (supplemental file) includes 80 participants in the estimates (some related), averages the estimates from a set of 35 parcels (from the Schaefer 400x7 parcellation) found to be particularly sensitive to DMCC tasks, and uses GLMs estimating one knot for every 2 TRs.

The second version (supplemental file) shares the theoretically interesting aspects: curves mostly peak in the target (grey) area, blue curves mostly above green. There are many differences, though: the second graph is from a post-publication analysis using the DMCC55B participants (55 unrelated people; 48 of whom are in both 55B and the 80-participant set), the set of 32 Schaefer 400x7 parcels approximating the Core Multiple Demand network (12 parcels are in both sets), and GLMs estimating one knot for every TR.

It is reassuring to see that the analysis results are generally consistent despite these fairly substantial changes to its inputs. Sometimes results can look great, but are due to a statistical fluke or overfitting; in these cases small changes to the analysis that shouldn't matter (e.g., removing or replacing several participants) often make large changes in the results. The opposite occurred here: fairly substantial changes to the parcels, participants, and (to a lesser extent) GLMs led to generally matching results.

The paper's osf site now contains results files for all the different ways to set up the analyses, within the "postPublication_coreMD" and "postPublication_1TRpK" subdirectories. The variations:

  • 80 or 55 participants. Files for analyses using the DMCC55B participants have a "DMCC55B" suffix; files for the original set of 80 participants has either no suffix or "DMCC80".
  • 35 or 32 parcels. The set of 35 parcels identified via DMCC data are referred to as the 35-megaparcel or "parcels35"; the 32 parcels approximating the core MD are referred to as "core32".
  • GLMs with 1 or 2 knots per TR. The original analyses all used GLMs with 2 TRs per knot ("2TRpK"). The 1 TR per knot GLMs are abbreviated "1TRpK", including in the file names.

Tuesday, September 7, 2021

approximately matching different parcellations

We want to use the the core MD (multiple demand) regions described in Assem et al., 2020 in some analyses but ran into a difficulty: Assem2020's core MD regions are defined in terms of HCP MMP1.0 (Multimodal Parcellation, MMP) parcels and the fsLR (HCP, fsLR32K surface), but for the project we wanted to use Schaefer et al., 2018 400x7 (400 parcels by 7 networks) parcellation and the fsaverage5 surface. Is it possible to approximate the core MD regions with a set of Schaefer 400x7 parcels? How to determine which set produces the best approximation? This post describes our answer, as well as the resulting set of Schaefer 400x7 parcels; its logic and code should be adaptable to other parcellations.

Surface calculations can be complex because vertices do not correspond to a fixed area in the same way that voxels do (e.g., a cubical "searchlight" of  eight voxels will have the same volume no matter which voxel it's centered on, but the surface area covered by the circle of a vertex's neighbors will vary across the brain according to the degree of folding at that vertex). I decided to work with parcellations defined in the same space (here, fsLR), and match at the vertex level. Matching at the vertex level has some implications, including that all vertices are equally important for determining the degree of correspondence between the parcellations; vertices are not weighted by the surface area of their neighborhood. This has the advantage of being independent of factors like surface inflation, but may not be sensible in all cases.

The approximation procedure is iterative, and uses the Dice coefficient to quantify how well two collections of vertices match; a larger Dice coefficient is better. This use was inspired by the parcellation comparisons in Lawrence et al., 2021, and I adapted their calculation code. The MMP parcellation is symmetric but the Schaefer is not, so each hemisphere was run separately.

Start: Make a vector (length 32,492 since fsLR) with 1s in vertices belonging to a core MD MMP parcel and 0 elsewhere. This vector does not change. 
List all Schaefer 400x7 parcels with one or more vertex overlapping a core MD MMP parcel. (This starts with the most complete possible coverage of core MD vertices with Schaefer parcels.)

Iterative Steps:
Step 1: Make a vector (also length 32,492) with 1s in the vertices of all the listed Schaefer parcels and 0 elsewhere. Calculate the Dice coefficient between the two vectors (core MD and Schaefer). 

For each listed Schaefer parcel, make a vector with 1s in the vertices of all BUT ONE of the listed Schaefer parcels and 0 elsewhere. Calculate the Dice coefficient between these two vectors (core MD and Schaefer-1 subset). 

Step 2: Compare the Dice coefficient of each subset to that of the entire list. Form a new list of Schaefer parcels, keeping only those whose removal made the fit worse (i.e., drop the parcel from the list if the Dice coefficient was higher without the parcel).

Repeat Steps 1 and 2 until removing any additional Schaefer parcel makes the fit worse.


Using this procedure, we obtained a set of 32 Schaefer 400x7 parcels (IDs 99, 127, 129, 130, 131, 132, 137, 140, 141, 142, 148, 163, 165, 182, 186, 300, 332, 333, 334, 335, 336, 337, 340, 345, 349, 350, 351, 352, 354, 361, 365, 387) as best approximating the core MD described by Assem, et al. 2020 (colors are just to show boundaries):


The approximation seems reasonable, and we plan to use this set of parcels (and the procedure that found them) in future analyses. 

The R (knitr) code to create the above figure, calculate Dice, and find the parcel set is in Schaefer400x7approxAssem2020.rnw, along with the compiled version.

Wednesday, September 1, 2021

I'm glad we're censoring our task fMRI timecourses for motion ...

Some years ago I wrote a series of posts describing how we settled on the FD > 0.9 censoring threshold we're using for the DMCC and some other task fMRI studies. While doing some routine checks yesterday a funny bit caught my eye, which I'll share here as an example of what happens with overt head motion during fMRI, and why censoring affected frames is a good idea.

The first thing that caught my eye was strange, non-anatomical, lines in the temporal standard deviation images for a single run:

(The above image shows slices through the temporal standard deviation image for three runs from the same participant and session; each row is a separate run. See this post for more explanation and code links.)

The stripy run's standard deviation images are bright and fuzzy around the edges, suggesting that head motion is a on the high side. I looked at the realignment parameters to investigate further (see this post for code and more explanation):

The overt head motion was quite low after the first minute and a half of the run (vertical grey lines are at one-minute intervals), but there were some big jumps in the realignment parameters (and red xs - frames marked for censoring for FD > 0.9) in the first two minutes, particularly around frame 100. (TR is 1.2 s.)

I opened the preprocessed image (via fmriprep; the image's suffix is "_acq-mb4PA_run-2_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz") in Mango and stepped through the frames to see the motion directly. 

Most of the run looks nice and stable; blood flow is clearly visible in the large vessels. But there's a large distortion in frame 98 (19 in the subset), as shown below (motionExample.nii.gz on the blog osf site is frames 80:110 from the run, for a better view). 

The distorted appearance of frame 98 (center) is expected: head motion changes how the brain is positioned in the magnetic field, and it takes time for the BOLD to restabilize. (Frame 98 is the most obviously affected, but the adjacent also have some distortion.)

It would be bad to include frame 98 in our analyses: the brain areas are not where they should be. This type of error/artifact/weirdness is especially hard to identify in timeseries analysis: we can extract a complete timecourse for every voxel in this run, but its values in frame 98 do not correspond to the same sort of brain activity/BOLD as the rest of the run. Strange values in just one frame can skew the statistics summarizing the entire run, as happened here in the image at the top of this post: the stripes in the voxelwise standard deviation come from the stripes in frame 98.

But, all is actually fine: frame 98, along with several surrounding ones, are marked for censoring, and so will not be included in statistical analyses. The DMCC QC template code that generated the temporal standard deviation images ignores censoring, however, so included the distorted frames, which made the stripes.

To summarize, this post is not describing some sort of bug or error: it is well known that head motion disrupts fMRI signal, and that such frames should not be included in analyses; I am reassured that our censoring procedure properly identified the affected frames. Instead, this post is intended as an example of how a very short head motion can have a large effect on run-level summary statistics. In this case, no harm was done: the run-level statistic is for QC and purposely ignored censoring. But if this frame was included in real analyses the true task-related activity could be lost, shifted, or otherwise distorted; many fMRI phenomena have larger impact on BOLD than does task performance.

Finally, here's how the same three frames look on the surface. If you look closely you can see that the central sulcus disappears in frame 98 (marked with blue in frame 99 for contrast), though, as usual, I think strange images are easier to spot in the volume. (also via fmriprep, suffix acq-mb4PA_run-2_space-fsaverage5_hemi-L.func.gii, plotted in R.)

Tuesday, June 29, 2021

DMCC55B supplemental as tutorial: positive control "ONs"-style timecourses

This is the sixth (and I believe final) post in a series describing the DMCC55B supplemental files. The first introduces the dataset, the second questionnaire data, the third motion, the fourth creating and checking temporal mean, standard deviation, and tSNR images, and the fifth a parcel-wise button-press classification analysis.

As discussed in the manuscript and the introduction to the previous post, "ONs" is a positive control analysis: one that is not of experimental interest, but has a very predictable outcome; so predictable that if the expected result is not found, further analysis should stop until the problem is resolved. "ONs" is shorthand for "on-task": contrasting BOLD during all task trials against baseline..

The example DMCC55B ONs analysis uses the Stroop task and direct examination of the parcel-average timecourses for HRF-like shape, rather than the GLMs we use for internal QC. This has the advantage of being very close to the data with few dependencies, but is only perhaps practical when there are many repetitions of short trials (such as the DMCC Stroop). The DMCC Stroop task is a color-word version, with spoken responses. Contrasting task (all trials, not e.g., by congruency) against baseline should thus show activity in visual, language, and motor (especially speaking) areas; the typical "task positive" and "task negative" areas. 

As for the "buttons", both a surface and volume version of the analysis are included, and the results are similar. The first step of the analysis is the same as for buttons: normalize and detrend each vertex/voxel's timecourse using 3dDetrend (the first code block of controlAnalysis_prep.R and controlAnalysis_prep_volume.R processes both Sternberg and Stroop). The rest of the steps are brief enough to be in the knitr code chunks: make parcel-average timecourses with 3dROIstats, use the _events.tsv files to find the onset of each trial for each person, and average the timecourse across those events. 

For example, here are the timecourses for two motor parcels (113 and 116 in the 1000-parcel, 17-network Schaefer parcellation), surface (top) and volume (bottom):

Each grey line is one participant, with the across-participants mean in blue. The x-axis is TRs (1.2 second TR; event onset at 0), and the y-axis is BOLD (parcel-average after the normalizing and detrending, averaged across all events for the person). As expected, the curves approximate the canonical HRF. Interestingly, the surface timecourses are taller than the volume, though the same parcels tend to show task-related activity (or not).

To summarize further and show all the parcels at once, I averaged the BOLD across the peak timepoints (yellowish band), which makes the areas clear in the group means:



Please let me know if you explore the DMCC55B dataset and/or this supplemental; I hope you find these files useful!

Thursday, June 24, 2021

DMCC55B supplemental as tutorial: positive control "buttons" classification analysis

This is the fifth post in a series describing the DMCC55B supplemental files. The first introduces the dataset, the second questionnaire data, the third motion, and the fourth creating and checking temporal mean, standard deviation, and tSNR images.

I very strongly recommend that "positive control" analyses be performed as part of every fMRI (really, most any) analysis. The idea is that these analyses check for the existence of effects that, if the dataset is valid, must be present (and so, if they are not detected, something is most likely wrong in the dataset or analysis, and analysis should not proceed until the issues are resolved). (Julia Strand's Error Tight site collects additional suggestions for minimizing errors in research and analysis.)

The DMCC55B supplemental includes two of my favorite positive control analyses, "buttons" and "ONs", which can be adapted a wide variety of task paradigms. The "buttons" name is shorthand for analyzing the activity associated with the task responses (which are often moving fingers to press buttons). Task responses like button presses are excellent targets for positive controls because the occurrence of movement can be objectively verified (unlike e.g., psychological states, whose existence is necessarily inferred), and hand motor activity is generally strong, focal, and located in a low g-factor area (i.e., with better fMRI SNR). Further, it is nearly often possible to design an analysis around the responses that is not tied to the experimental hypotheses. (To avoid circularity, control analyses must be independent of experimental hypotheses and have high face validity.)

The DMCC55B "buttons" example uses the Sternberg task. In the DMCC Sternberg task (Figure 3) responses are made with the right hand, pressing the button under either the first or second finger to indicate whether the Probe word was a member of the current list or not. The target hypotheses for Sternberg involve aspects such as response time, whether the response was correct, and brain activity changes during List memorization; while the hypothesis for the buttons control analysis is simply that brain activity in somatomotor areas should change due to the finger motion necessary to press the response button. Rephrased, the contrast of button presses against baseline should show somatomotor activation.

The example DMCC55B Sternberg buttons positive control analysis was implemented as ROI-based classification MVPA (linear svm, c=1, e1071 R libsvm interface), with averaging (not GLMs) for the temporal compression. I ran this on the surface (fsaverage5 giftis produced by fmriprep preprocessing), within each ROI defined by the 1000-parcel, 17-network Schaefer parcellation, with leave-one-subject-out cross-validation (55-fold). I do not generally advise leave-one-subject-out CV, especially with this many people, but used it here for simplicity; a more reasonable 11-fold CV version is also in the code.

The analysis code is split between two files. The first file, controlAnalysis_prep.R, is made up of consecutive code blocks to perform the analysis, while the second file, controlAnalysis_buttons.rnw, displays the results in tables and on brains (compiled controlAnalysis_buttons.pdf). (Aside: I will sometimes include analysis code in knitr documents (e.g., QC_SD_surface.rnw) if I judge the code is short and straightforward enough that the benefit of having everything in one file is greater than the drawback of increased code length and mixing of analysis in with results.) My intention is that the two controlAnalysis_buttons files together will serve as a "starter kit" for classification analysis; be adaptable to many other datasets and applications. 

The analysis starts at the top of controlAnalysis_prep.R; the files produced by this script are used to make the results shown in controlAnalysis_buttons.rnw. The code blocks should be run in sequence, as later blocks depend on output from earlier blocks.

  • The first code block uses AFNI's 3dDetrend to normalize and detrend each vertex's timecourse. At the time of writing, 3dDetrend does not accept gifti image inputs, so the script uses readGIfTI to read the files and write 3dDetrend-friendly "1D" text files. Aside: it's possible to avoid AFNI, implementing the normalize and detrend steps entirely in R. I prefer the AFNI function, however, to avoid introducing errors, and for clarity; using established functions and programs whenever possible is generally advisable.
  • The second code block reads the _events.tsv files, and finds matching sets of frames corresponding to button press and "not" (no button-press) events. Balance is very important to avoid biasing the classification: each "event" should be the same duration, and each run should have equal numbers of events of each type. Windows for each event (i.e., when event-related brain activity should be present in the BOLD) were set to start 3 seconds after the event, and end 8 seconds after. This 3-8 second window is roughly the peak of the canonical HRF given the very short button press events; longer windows would likely be better for longer duration events.
  • The third code block performs the temporal compression, writing one file for each person, run, class, and hemisphere. The files have one row for every vertex, and one column for each example (average of the 3:8 second windows found in the second code block), plus a column with the mean across examples. 
  • The final two code blocks run the classification, with leave-one-subject-out (fourth block) and leave-five-subjects-out (11-fold, fifth block) cross-validation. Each version writes a results file with one row per parcel, and a column for the accuracy of each cross-validation fold, plus the mean accuracy over folds. These are the files read by controlAnalysis_buttons.rnw and shown on brains for Figure 9.

Implementation notes:

This code does MVPA with surface data, using text (vertices in rows) for the intermediate files. It is straightforward to identify which rows/vertices correspond to which parcels since the parcellation is of the same type (fsaverage5, in this case) as the functional data (see code in the fourth block, plus the preceding comments). Surface searchlight analyses are far less straightforward than ROI/parcel-based ones, and I don't think generally advisable, due to the varying distances between vertices.

Relatively few changes are needed to perform this same analysis with volumes. The purpose and steps within each code block are the same, though the functions to retrieve the timeseries vary, and there is only one volumetric image per run instead of the surface pair (one per hemisphere). The volumetric version of the buttons classification is divided between controlAnalysis_prep_volume.R and controlAnalysis_buttons_volume.rnw

As we would hope (since the same original data are used for both), largely the same parcels have the highest accuracy in the surface and volume versions of the MVPA, and the parcel-wise accuracies are positively correlated. The parcel-wise accuracies from the surface version are often, though not uniformly a bit higher.