Showing posts with label DMCC. Show all posts
Showing posts with label DMCC. Show all posts

Friday, December 6, 2024

"crescent" artifacts revisited

Back in 2018 I posted twice about "crescent" artifacts (first, second), and have kept a casual eye out for them since, wondering how much they might affect fMRI analysis results. 

The artifact isn't totally consistent across people, but when present, is very stable over time (i.e., sessions several years apart), and bright in temporal standard deviation QC images. The artifact's location varies with encoding direction: front for PA encoding, rear with AP encoding; the PA encoding versions tend to be much brighter and obvious than the AP. 

Below is an example of the artifact, pointed out by the pink arrows. This is the temporal standard deviation image for sub-f8570ui from the DMCC55B dataset (doi:10.18112/openneuro.ds003465.v1.0.6), the first (left, AP encoding) and second (right, PA encoding) runs of Sternberg baseline (ses-wave1bas_task-Stern), after fmriprep preprocessing:

These two runs were collected sequentially within the same session, but the artifact is only visible in the PA encoding run (right). (Briefly, DMCC used a 3T Siemens Prisma, 32-channel headcoil, CMRR MB4, no in-plane acceleration, 2.4 mm iso, 1.2 s TR, alternating AP and PA encoding runs; details at openneuro, dataset description paper, and OSF sites, plus DMCC-tagged posts on this blog.)

In the previous "crescent" posts we speculated that these could be N/2 ghosts or related to incomplete fat suppression; I am now leaning away from the Nyquist ghost idea, because the crescents don't appear to line up with the most visible ghosts. (Some ghosts are a bit visible in the above image; playing with the contrast and looking in other slices makes the usual multiband-related sets of ghosts obvious, but none clearly intersect with the artifact.) It also seems odd that ghosts would be so much brighter and change their location with AP vs. PA encoding; I am no physicist, though!

link to cortex volume?

This week I gave three lab members a set of temporal standard deviation images (similar to the pair above) for 115 participants from the DMCC dataset. The participant images were in random order, and I asked the lab members to rate how confident they were that each showed the "crescent" artifact or not. My raters agreed that 34 participants showed the artifact, and 39 did not. (Ratings were mixed or less confident on the others; I asked them to decide quickly from single pairs of QC images, not investigate closely.)

We didn't measure external head size in the participants, but did run freesurfer during preprocessing, so I used its CortexVol and eTIV statistics as proxies (a different stat better?): and the group my colleagues rated as having the artifact tended to have smaller brains than those without:

If the appearance of this artifact is indeed somewhat related to head size, then it's logical that it would (as I've observed) generally be stable over time. DMCC's population was younger adults; it'd be interesting to see if there's a relationship with a wider range of head sizes.

only with DMCC or its MB4 acquisition?

Is the artifact restricted to this particular acquisition or study? No, not somehow related to DMCC; I've checked a few DMCC participants with the artifact who later participated in other studies and they have it (or not) in all of the datasets.

To see if it's restricted to the MB4 acquisition, I looked at a few images from a different study, which also has adult participants, a 3T Prisma, 32-channel headcoil, 2.4 mm iso voxels, and CMRR sequences, but with MB6 TR 0.8 s, and PA encoding for all runs. Below are standard deviation images for three different people from this MB6 study, one run each of the same task, after fmriprep preprocessing. (I chose these three because of the artifacts; not all are so prominent.)

Since this study has all PA runs I can't directly compare artifacts across the encoding directions, but there are clearly some "crescents", and more sets of rings than typical with MB4 (which makes sense for MB6). The rings are especially obvious in person #3; some of these appear to be full-brain ghosts. I suspect the artifacts would be much clearer in subject space; I haven't looked (I'm not directly involved in the study). But a substantial minority of these participants' standard deviation images resemble #1, whose artifacts strike me as quite similar to the "crescents" in some DMCC MB4 PA images.

but does it matter?

Not all artifacts that look strange in QC images actually change task-related BOLD statistics enough to be a serious concern. (Of course, how much is too much totally depends on the particular case!) I suspect that this artifact does matter for our analyses, though, both because of where it falls in the brain and because it affects BOLD enough to be visible by eye in some cases.

The artifact's most prominent frontal location with PA runs puts it uncomfortably close to regions of interest for most of my colleagues, and is one reason I have advised shifting to all AP encoding for new studies I'm involved with. Preprocessing, motion, transformation to standard space, and spatial smoothing blurs the artifact across a wider area, hopefully diluting its effect. But the artifact's location is somewhat consistent across participants, and present in a sizable enough minority (a third, perhaps, in the datasets I've looked at), that it seems possible it could reduce signal quality in our target ROIs.

For showing that it does indeed actually affect task-related BOLD enough to matter, so far I mostly just have qualitative impressions. For example, below left is the standard deviation of one DMCC person's PA Sternberg run, with the cursors on the artifact. The right side is from averaging together (after voxel-wise normalizing and detrending) frames after pressing a button with the right hand. Squinting, the statistical image is brighter in sensible motor-related grey matter areas, marked with green. But the "crescent" may also be faintly visible, as pointed out in pink.

I can imagine quantitative tests, such as comparing the single-run (so separating AP and PA encoding runs) GLM output images from the group of participants with and without the artifact. Differences in estimates in parcels/searchlights/areas overlapping the artifact would be suggestive, particularly as the estimates vary with encoding direction and participant subgroup (with-artifact or without).

thoughts?

I'm curious, have you seen similar? Do you think this artifact is from N/2 ghosting, incomplete fat suppression, or something else? (What should I call it? "Crescent" is visually descriptive, but not standard. 😅) Seem reasonable it could be related to head size? And that it can significantly affect BOLD? Other reactions? Thanks! (And we can chat about this at my OHBM 2025 poster, which will be on this topic.)

Monday, June 13, 2022

now available: DMCC55B rejected structurals and ratings

We received several requests for the rejected DMCC55B structural images, since comparing images of different quality can be useful for training. Thanks to the assistance of Rachel Brough, we have now released T1 and T2 images for the 13 DMCC55B people whose initial structural scans we rated of poor quality, as well as our ratings for both sets of images (the initial rejected images and better repeated ones). 

The rejected structural images (with session name “doNotUse”) and ratings are in a new sub-component (postPublication_rejectedStructurals) of the DMCC55B supplemental site, rather than with the released dataset on openneuro, to avoid confusion about which images should be used for processing (use the previously-released ones available at openneuro).

Wednesday, September 15, 2021

"The Dual Mechanisms of Cognitive Control Project": post-publication analyses

The recently-published "The Dual Mechanisms of Cognitive Control Project" paper (preprint publisher) describes the motivations and components of the project as a whole, but also contains several analyses of its task fMRI data. The supplemental information for the manuscript has the R (knitr) code and input files needed to generate the results and figures in the manuscript. 

The supplemental information now has more results than those included in the paper, however: versions of the analyses using different sets of participants, parcels, and estimating the HDR curves (GLMs), which I will briefly describe here.

For example, Figure 5 (below) shows the estimated HDR curves for each task (high - low control demand) in the Baseline (blue, BAS) and Reactive (green, REA) sessions. The grey shading marks the target; when we expect the control demand difference to be greatest. Of interest is that the estimates are greater in the target window for all tasks and sessions, with the Baseline session estimates larger than those from the Reactive session (see manuscript for more explanation and framing).

The top version (supplemental file) includes 80 participants in the estimates (some related), averages the estimates from a set of 35 parcels (from the Schaefer 400x7 parcellation) found to be particularly sensitive to DMCC tasks, and uses GLMs estimating one knot for every 2 TRs.

The second version (supplemental file) shares the theoretically interesting aspects: curves mostly peak in the target (grey) area, blue curves mostly above green. There are many differences, though: the second graph is from a post-publication analysis using the DMCC55B participants (55 unrelated people; 48 of whom are in both 55B and the 80-participant set), the set of 32 Schaefer 400x7 parcels approximating the Core Multiple Demand network (12 parcels are in both sets), and GLMs estimating one knot for every TR.

It is reassuring to see that the analysis results are generally consistent despite these fairly substantial changes to its inputs. Sometimes results can look great, but are due to a statistical fluke or overfitting; in these cases small changes to the analysis that shouldn't matter (e.g., removing or replacing several participants) often make large changes in the results. The opposite occurred here: fairly substantial changes to the parcels, participants, and (to a lesser extent) GLMs led to generally matching results.

The paper's osf site now contains results files for all the different ways to set up the analyses, within the "postPublication_coreMD" and "postPublication_1TRpK" subdirectories. The variations:

  • 80 or 55 participants. Files for analyses using the DMCC55B participants have a "DMCC55B" suffix; files for the original set of 80 participants has either no suffix or "DMCC80".
  • 35 or 32 parcels. The set of 35 parcels identified via DMCC data are referred to as the 35-megaparcel or "parcels35"; the 32 parcels approximating the core MD are referred to as "core32".
  • GLMs with 1 or 2 knots per TR. The original analyses all used GLMs with 2 TRs per knot ("2TRpK"). The 1 TR per knot GLMs are abbreviated "1TRpK", including in the file names.

Tuesday, September 7, 2021

approximately matching different parcellations

We want to use the the core MD (multiple demand) regions described in Assem et al., 2020 in some analyses but ran into a difficulty: Assem2020's core MD regions are defined in terms of HCP MMP1.0 (Multimodal Parcellation, MMP) parcels and the fsLR (HCP, fsLR32K surface), but for the project we wanted to use Schaefer et al., 2018 400x7 (400 parcels by 7 networks) parcellation and the fsaverage5 surface. Is it possible to approximate the core MD regions with a set of Schaefer 400x7 parcels? How to determine which set produces the best approximation? This post describes our answer, as well as the resulting set of Schaefer 400x7 parcels; its logic and code should be adaptable to other parcellations.

Surface calculations can be complex because vertices do not correspond to a fixed area in the same way that voxels do (e.g., a cubical "searchlight" of  eight voxels will have the same volume no matter which voxel it's centered on, but the surface area covered by the circle of a vertex's neighbors will vary across the brain according to the degree of folding at that vertex). I decided to work with parcellations defined in the same space (here, fsLR), and match at the vertex level. Matching at the vertex level has some implications, including that all vertices are equally important for determining the degree of correspondence between the parcellations; vertices are not weighted by the surface area of their neighborhood. This has the advantage of being independent of factors like surface inflation, but may not be sensible in all cases.

The approximation procedure is iterative, and uses the Dice coefficient to quantify how well two collections of vertices match; a larger Dice coefficient is better. This use was inspired by the parcellation comparisons in Lawrence et al., 2021, and I adapted their calculation code. The MMP parcellation is symmetric but the Schaefer is not, so each hemisphere was run separately.

Start: Make a vector (length 32,492 since fsLR) with 1s in vertices belonging to a core MD MMP parcel and 0 elsewhere. This vector does not change. 
List all Schaefer 400x7 parcels with one or more vertex overlapping a core MD MMP parcel. (This starts with the most complete possible coverage of core MD vertices with Schaefer parcels.)

Iterative Steps:
Step 1: Make a vector (also length 32,492) with 1s in the vertices of all the listed Schaefer parcels and 0 elsewhere. Calculate the Dice coefficient between the two vectors (core MD and Schaefer). 

For each listed Schaefer parcel, make a vector with 1s in the vertices of all BUT ONE of the listed Schaefer parcels and 0 elsewhere. Calculate the Dice coefficient between these two vectors (core MD and Schaefer-1 subset). 

Step 2: Compare the Dice coefficient of each subset to that of the entire list. Form a new list of Schaefer parcels, keeping only those whose removal made the fit worse (i.e., drop the parcel from the list if the Dice coefficient was higher without the parcel).

Repeat Steps 1 and 2 until removing any additional Schaefer parcel makes the fit worse.


Using this procedure, we obtained a set of 32 Schaefer 400x7 parcels (IDs 99, 127, 129, 130, 131, 132, 137, 140, 141, 142, 148, 163, 165, 182, 186, 300, 332, 333, 334, 335, 336, 337, 340, 345, 349, 350, 351, 352, 354, 361, 365, 387) as best approximating the core MD described by Assem, et al. 2020 (colors are just to show boundaries):


The approximation seems reasonable, and we plan to use this set of parcels (and the procedure that found them) in future analyses. 

The R (knitr) code to create the above figure, calculate Dice, and find the parcel set is in Schaefer400x7approxAssem2020.rnw, along with the compiled version.

Tuesday, June 29, 2021

DMCC55B supplemental as tutorial: positive control "ONs"-style timecourses

This is the sixth (and I believe final) post in a series describing the DMCC55B supplemental files. The first introduces the dataset, the second questionnaire data, the third motion, the fourth creating and checking temporal mean, standard deviation, and tSNR images, and the fifth a parcel-wise button-press classification analysis.

As discussed in the manuscript and the introduction to the previous post, "ONs" is a positive control analysis: one that is not of experimental interest, but has a very predictable outcome; so predictable that if the expected result is not found, further analysis should stop until the problem is resolved. "ONs" is shorthand for "on-task": contrasting BOLD during all task trials against baseline..

The example DMCC55B ONs analysis uses the Stroop task and direct examination of the parcel-average timecourses for HRF-like shape, rather than the GLMs we use for internal QC. This has the advantage of being very close to the data with few dependencies, but is only perhaps practical when there are many repetitions of short trials (such as the DMCC Stroop). The DMCC Stroop task is a color-word version, with spoken responses. Contrasting task (all trials, not e.g., by congruency) against baseline should thus show activity in visual, language, and motor (especially speaking) areas; the typical "task positive" and "task negative" areas. 

As for the "buttons", both a surface and volume version of the analysis are included, and the results are similar. The first step of the analysis is the same as for buttons: normalize and detrend each vertex/voxel's timecourse using 3dDetrend (the first code block of controlAnalysis_prep.R and controlAnalysis_prep_volume.R processes both Sternberg and Stroop). The rest of the steps are brief enough to be in the knitr code chunks: make parcel-average timecourses with 3dROIstats, use the _events.tsv files to find the onset of each trial for each person, and average the timecourse across those events. 

For example, here are the timecourses for two motor parcels (113 and 116 in the 1000-parcel, 17-network Schaefer parcellation), surface (top) and volume (bottom):

Each grey line is one participant, with the across-participants mean in blue. The x-axis is TRs (1.2 second TR; event onset at 0), and the y-axis is BOLD (parcel-average after the normalizing and detrending, averaged across all events for the person). As expected, the curves approximate the canonical HRF. Interestingly, the surface timecourses are taller than the volume, though the same parcels tend to show task-related activity (or not).

To summarize further and show all the parcels at once, I averaged the BOLD across the peak timepoints (yellowish band), which makes the areas clear in the group means:



Please let me know if you explore the DMCC55B dataset and/or this supplemental; I hope you find these files useful!

Thursday, June 24, 2021

DMCC55B supplemental as tutorial: positive control "buttons" classification analysis

This is the fifth post in a series describing the DMCC55B supplemental files. The first introduces the dataset, the second questionnaire data, the third motion, and the fourth creating and checking temporal mean, standard deviation, and tSNR images.

I very strongly recommend that "positive control" analyses be performed as part of every fMRI (really, most any) analysis. The idea is that these analyses check for the existence of effects that, if the dataset is valid, must be present (and so, if they are not detected, something is most likely wrong in the dataset or analysis, and analysis should not proceed until the issues are resolved). (Julia Strand's Error Tight site collects additional suggestions for minimizing errors in research and analysis.)

The DMCC55B supplemental includes two of my favorite positive control analyses, "buttons" and "ONs", which can be adapted a wide variety of task paradigms. The "buttons" name is shorthand for analyzing the activity associated with the task responses (which are often moving fingers to press buttons). Task responses like button presses are excellent targets for positive controls because the occurrence of movement can be objectively verified (unlike e.g., psychological states, whose existence is necessarily inferred), and hand motor activity is generally strong, focal, and located in a low g-factor area (i.e., with better fMRI SNR). Further, it is nearly often possible to design an analysis around the responses that is not tied to the experimental hypotheses. (To avoid circularity, control analyses must be independent of experimental hypotheses and have high face validity.)

The DMCC55B "buttons" example uses the Sternberg task. In the DMCC Sternberg task (Figure 3) responses are made with the right hand, pressing the button under either the first or second finger to indicate whether the Probe word was a member of the current list or not. The target hypotheses for Sternberg involve aspects such as response time, whether the response was correct, and brain activity changes during List memorization; while the hypothesis for the buttons control analysis is simply that brain activity in somatomotor areas should change due to the finger motion necessary to press the response button. Rephrased, the contrast of button presses against baseline should show somatomotor activation.

The example DMCC55B Sternberg buttons positive control analysis was implemented as ROI-based classification MVPA (linear svm, c=1, e1071 R libsvm interface), with averaging (not GLMs) for the temporal compression. I ran this on the surface (fsaverage5 giftis produced by fmriprep preprocessing), within each ROI defined by the 1000-parcel, 17-network Schaefer parcellation, with leave-one-subject-out cross-validation (55-fold). I do not generally advise leave-one-subject-out CV, especially with this many people, but used it here for simplicity; a more reasonable 11-fold CV version is also in the code.

The analysis code is split between two files. The first file, controlAnalysis_prep.R, is made up of consecutive code blocks to perform the analysis, while the second file, controlAnalysis_buttons.rnw, displays the results in tables and on brains (compiled controlAnalysis_buttons.pdf). (Aside: I will sometimes include analysis code in knitr documents (e.g., QC_SD_surface.rnw) if I judge the code is short and straightforward enough that the benefit of having everything in one file is greater than the drawback of increased code length and mixing of analysis in with results.) My intention is that the two controlAnalysis_buttons files together will serve as a "starter kit" for classification analysis; be adaptable to many other datasets and applications. 

The analysis starts at the top of controlAnalysis_prep.R; the files produced by this script are used to make the results shown in controlAnalysis_buttons.rnw. The code blocks should be run in sequence, as later blocks depend on output from earlier blocks.

  • The first code block uses AFNI's 3dDetrend to normalize and detrend each vertex's timecourse. At the time of writing, 3dDetrend does not accept gifti image inputs, so the script uses readGIfTI to read the files and write 3dDetrend-friendly "1D" text files. Aside: it's possible to avoid AFNI, implementing the normalize and detrend steps entirely in R. I prefer the AFNI function, however, to avoid introducing errors, and for clarity; using established functions and programs whenever possible is generally advisable.
  • The second code block reads the _events.tsv files, and finds matching sets of frames corresponding to button press and "not" (no button-press) events. Balance is very important to avoid biasing the classification: each "event" should be the same duration, and each run should have equal numbers of events of each type. Windows for each event (i.e., when event-related brain activity should be present in the BOLD) were set to start 3 seconds after the event, and end 8 seconds after. This 3-8 second window is roughly the peak of the canonical HRF given the very short button press events; longer windows would likely be better for longer duration events.
  • The third code block performs the temporal compression, writing one file for each person, run, class, and hemisphere. The files have one row for every vertex, and one column for each example (average of the 3:8 second windows found in the second code block), plus a column with the mean across examples. 
  • The final two code blocks run the classification, with leave-one-subject-out (fourth block) and leave-five-subjects-out (11-fold, fifth block) cross-validation. Each version writes a results file with one row per parcel, and a column for the accuracy of each cross-validation fold, plus the mean accuracy over folds. These are the files read by controlAnalysis_buttons.rnw and shown on brains for Figure 9.

Implementation notes:

This code does MVPA with surface data, using text (vertices in rows) for the intermediate files. It is straightforward to identify which rows/vertices correspond to which parcels since the parcellation is of the same type (fsaverage5, in this case) as the functional data (see code in the fourth block, plus the preceding comments). Surface searchlight analyses are far less straightforward than ROI/parcel-based ones, and I don't think generally advisable, due to the varying distances between vertices.

Relatively few changes are needed to perform this same analysis with volumes. The purpose and steps within each code block are the same, though the functions to retrieve the timeseries vary, and there is only one volumetric image per run instead of the surface pair (one per hemisphere). The volumetric version of the buttons classification is divided between controlAnalysis_prep_volume.R and controlAnalysis_buttons_volume.rnw

As we would hope (since the same original data are used for both), largely the same parcels have the highest accuracy in the surface and volume versions of the MVPA, and the parcel-wise accuracies are positively correlated. The parcel-wise accuracies from the surface version are often, though not uniformly a bit higher.



Friday, June 18, 2021

DMCC55B supplemental as tutorial: basic fMRI QC: temporal mean, SD, and tSNR

This is the fourth post in a series describing the DMCC55B supplemental files. The first introduces the dataset, the second questionnaire data, and the third motion. Here I'll highlight another basic fMRI QC practice: creating and checking temporal mean, standard deviation, and tSNR images. For background on these types of QC images I suggest starting with this post, especially the multiple links in the first sentence.

For DMCC55B, we preprocessed the images as both volume and surface, so the QC was also both volume and surface. I generally find QC easier to judge with unmasked volumes, but if surface analysis is planned, the surface representations must be checked as well.

During DMCC data collection we create a QC summary file for each individual participant (example here), with volumes, surfaces, and motion (realignment parameters) for every run. A first QC check for (unmasked) volume images is straightforward: do they look like brains? This ("got brains?") check doesn't apply to surfaces, since any valid gifti image will be brain-shaped. Instead, our first QC pass for the surface images is to check the temporal mean for "tiger stripes" around the central sulcus. Below is the vertex-wise means for Bas Stern AP from the example, with the blue lines pointing to the stripes. If these stripes are not visible, something likely went wrong with the preprocessing.


For ease of group comparison, the DMCC55B QC supplemental files are split by type rather than participant: QC_SD_surface.pdf for vertex-wise temporal standard deviation, QC_tSNR_surface.pdf for vertex-wise tSNR; QC_SD_volume.pdf and QC_tSNR_volume.pdf for the corresponding voxel-wise images. With eight runs for each of 55 participants these QC summary files are still rather long, but the consistent scaling (same intensity used for all participants when plotting) makes it possible to scroll through and spot trends and oddities. Such group review can be very beneficial, especially before embarking on statistical analyses.

For example, the SD images for some participants/runs are much brighter and fuzzier than others (compare f2157me and f2499cq on page 8). Broadly, the more the temporal SD images resemble maps of brain vessel density, the better the quality, though of course the resolution will vary with fMRI acquisition parameters (e.g., in the 2.4 mm isotropic DMCC images the Circle of Willis should be clearly visible). Artifacts can also be spotted, such as the ghostly "crescents" in some participant's PA images (e.g., f5001ob).

Implementation notes:

The .rnw file with each (knitr) pdf has the R code to create the temporal statistic images, as well as produce the summary .pdfs. The startup code chunk of each document has the code to make the images (from the 4d niftis and giftis produced by fmriprep), while the later chunk plots the images. 


The volumetric code uses the AFNI 3dTstat and 3dcalc functions (called from R using system2()) to make the temporal mean, standard deviation, and tSNR images, which are written out as NIfTI images. Saving the statistical images is useful for compilation speed, but more importantly, so that they can be examined in 3d if closer checking is needed. I strongly suggest using existing functions from a well-established program (such as AFNI) whenever possible, for clarity and to avoid introducing bugs.

The volumetric images are plotted with base R image() rather than with my volumetric plotting function, because I wanted to plot one slice through each plane. Thus, the _volume 55B supplementals can also serve as an example of simple NIfTI image plotting. (This is my general volume-plotting knitr tutorial.) 

Finally, there is a comment in the volume .rnw files that the input images are already in LPI orientation. I strongly suggest transforming NIfTI images to LPI immediately after preprocessing (e.g., with 3dresample), if they are not already, as this often avoids left/right flipping. 



Unlike volumes, the surface QC code calculates the vertex-wise mean and standard deviation in R, and my gifti plotting functions. Many AFNI functions can work with giftis, but these calculations (mean, standard deviation) are so simple, and plotting surfaces from a vector so straightforward, that I decided to save the images as text rather than gifti.

Friday, June 4, 2021

DMCC55B supplemental as tutorial: basic fMRI QC: motion

This is the third post in a series describing the DMCC55B supplemental files. The first introduces the dataset, and the second the questionnaire data. Here I'll highlight the first in the set of basic fMRI QC files: looking at the realignment parameters (head motion). 

While we use an FD threshold for censoring during the GLMs, I think plots with the six separate parameters are more useful for QC. These plots make up the bulk of QC_motion.pdf; one for each functional run included in DMCC55B. Chunk code5 of QC_motion.rnw creates these plots directly from the _desc-confounds_regressors.tsv produced during fmriprep preprocessing (note: newer versions name this file _desc-confounds_timeseries.tsv, but the columns used here are unchanged), and is relatively straightforward to adapt to other datasets; the code chunks between startup and code5 produce group summary statistics and can be omitted.


Above is the top of page 5 of the compiled file, including the motion plot for the first run. This shows very little overt or apparent motion, just a slight wiggle around frame 400 (8 minutes; grey lines are 1 minute intervals), which suggests they shifted during the break between the second and third task blocks (green horizontal lines). This run would be given an "A" (top) grade in our qualitative assessment.

f1659oa has extremely low overt head motion across this run, but clear apparent motion (breathing), primarily in the dark blue trans_y line.

R note: These motion (and the other) plots are made with base R graphics in a knitr document. The "stacked" 6-parameter and FD plots are made by using three par() commands within each iteration of the loop (chunk code5): first the size and margins for the upper plot; second the size and margins of the lower plot, with new=TRUE; finally, setting new=FALSE to finish and reset for the next run. 

Wednesday, June 2, 2021

DMCC55B supplemental as tutorial: questionnaire data

The previous post introduces the DMCC55B dataset, with pointers to the dataset, documentation, and why you might want to work with it. As mentioned at the end of that post, the description manuscript is accompanied by files intended to be both an introduction to working with DMCC55B data specifically, and more general tutorials for some common analyses. This series of posts will describe the DMCC55B supplemental files in their tutorial aspect and methodological details, starting with the questionnaire data.

Separately from the fMRI sessions, DMCC participants complete up to 28 individual difference questionnaires (described in the "Behavioral Session" section of the manuscript and the DMCC NDA site). Sharing questionnaire data is difficult, since different groups often use different versions of the same questionnaires, and critical details like response ranges and scoring algorithm may not be stated. Several projects are working on standard formats for questionnaire data; as an NIH-funded project the DMCC is using the format developed by the NIMH Data Archive (NDA). For DMCC55B, the questionnaire data is released under derivatives/questionnaires/, with one (NDA-format) csv file per questionnaire.

The behavioralSession_individualDifferences.rnw file contains (knitr) code to read these NDA-format csv files, then calculate and display both group and single-subject summary statistics for each. This code should be useful for anyone reading data from NDA-format questionnaire data files, though with the warning that frequent consultation with the NDA Data Dictionary is required. There is unfortunately quite a bit of variability between the structure definitions, even in fundamentals such as the code used for missings. behavioralSession_individualDifferences.rnw does not calculate all possible summary statistics, only those currently used in the DMCC. 

Here is an example of the summaries in the compiled behavioralSession_individualDifferences.pdf, for the DOSPERT (Domain-Specific Risk-Taking) questionnaire. The NDA names this questionnaire dospert01, so that name is also used for the DMCC55B derivative files. In the case of dospert01, we were able to match our questions with those in the Dictionary, so both individual item responses and summary scores are included. 

Above left is a small bit of the released dospert01_DMCC55B.csv file. The contents are in accord with the NDA Data Dictionary (above right), except for the first few columns: GUIDs, HCP IDs, sex, and other restricted/sensitive information is omitted. We can use the Dictionary definitions to interpret the other columns, however; for example, that the first participant's answer of 7 to rt_1 corresponds to an answer of "Extremely Likely" to the question of whether they would "Admitting that your tastes are different from those of a friend.".

Scoring the DOSPERT produces five measures, which are reported as group means (above left) and for individual participants (above right); behavioralSession_individualDifferences.pdf presents similar summaries for each scored questionnaire.

Caution: behavioralSession_individualDifferences.rnw has code to parse NDA-format csv files and produce group score summaries. However, its scoring code and the Dictionary definitions and should be carefully reviewed before using with non-DMCC datasets, to make sure the desired calculations are being performed. 

an introduction to DMCC55B

It's hard to believe this is my first post of 2021! I've done a few updates to existing posts during the last few months, but not the detailed new ones I've been planning; hopefully this will be the first of a small "flurry" of new posts.

Here, I'm happy to announce the first big public release of Dual Mechanisms of Cognitive Control (DMCC) project data, "DMCC55B", together with a detailed description and example analyses. We've previously released smaller parts of the DMCC (DMCC13benchmark), which has been very useful for methods exploration, but isn't really enough data for detailed analysis of the task-related activity or individual differences. A wide range of analyses are possible with DMCC55B, and we hope the documentation both clear and detailed enough to make its use practical.

A few highlights: DMCC55B is data from 55 unrelated young adults. Each participant performed four cognitive control tasks (Stroop, AX-CPT, Cued Task-Switching, and Sternberg Working Memory) while undergoing moderately high-resolution fMRI scanning (MB4, 2.4 mm isotropic voxels, 1.2 s TR). Two runs of each task (one with AP encoding, the other PA), of approximately 12 minutes each (about 1.5 hours of task fMRI per person). There are also scores for the participants on 28 state and trait self-report questionnaires, as well as finger photoplethysmograph and respiration belt recordings collected during scanning.

This manuscript is intended to be the practical introduction to DMCC55B, providing details such as e.g., the file format of questionnaire data, the order in which task stimuli were presented, fMRI preprocessing (fmriprep output is included), and how Stroop responses were collected and scored. The manuscript also contains links and references to materials used for the main DMCC project (e.g., eprime task presentation scripts, code to extract reaction times from the Stroop audio recordings), which may be of interest or use to others in some cases, but likely only in a few specialized instances. A separate manuscript (preprint; accepted version) describes the wider DMCC project, including the theoretical basis for the task design. If some information is confusing or missing, please let me know!

Last, but most definitely not least, I want to highlight the "supplemental" accompanying DMCC55B. This designed to perform and summarize some standard behavioral and quality control analyses, with the intention of the files serving both as an introduction to working with DMCC data (e.g., how do I obtain the onset time of all AX trials?) and analysis tutorials (e.g., of a parcel-wise classification MVPA with surface data). Currently, the best introduction to this material is in the DMCC55B manuscript and the files themselves. The supplemental files are primarily knitr (R and LaTeX); they call afni functions (e.g., 3dTstat) directly when needed, and are entirely "untidy" base R, including base R graphics. (The last bit refers to the different "flavors" of R programming; I only rarely have need to visit the Tidyverse.)
 
 UPDATE 13 September 2021: Added a link to the published version of the DMCC overview manuscript.

Wednesday, November 11, 2020

DMCC GLMs: afni TENTzero, knots, and HDR curves, #2

This post is the second in a series, which begins with this post. In that first post I included examples of the estimated HDR curves for a positive control analysis, one of which is copied here for reference. 

What are the axis labels for the estimated HDR curves? How did we generate them? What made us think about changing them? 

At the first level the (curve) axis label question has a short answer: y-axes are average (across subjects) beta coefficient estimates, x-axes are knots. But this answer leads to another question: how do knots translate into time? Answering this question is more involved, because we can't translate knots to time without knowing both the TR and the afni 3dDeconvolve command. The TR is easy: 1.2 seconds. Listing the relevant afni commands is also easy:

3dDeconvolve \  
 -local_times \  
 -x1D_stop \  
 -GOFORIT 5 \  
 -input 'lpi_scale_blur4_tfMRI_AxcptBas1_AP.nii.gz lpi_scale_blur4_tfMRI_AxcptBas2_PA.nii.gz' \  
 -polort A \  
 -float \  
 -censor movregs_FD_mask.txt \  
 -num_stimts 3 \  
 -stim_times_AM1 1 f1031ax_Axcpt_baseline_block.txt 'dmBLOCK(1)' -stim_label 1 ON_BLOCKS \  
 -stim_times 2 f1031ax_Axcpt_baseline_blockONandOFF.txt 'TENTzero(0,16.8,8)' -stim_label 2 ON_blockONandOFF \  
 -stim_times 3 f1031ax_Axcpt_baseline_allTrials.txt 'TENTzero(0,21.6,10)' -stim_label 3 ON_TRIALS \  
 -ortvec motion_demean_baseline.1D movregs \  
 -x1D X.xmat.1D \  
 -xjpeg X.jpg \  
 -nobucket  
 
3dREMLfit \
 -matrix X.xmat.1D \
 -GOFORIT 5 \
 -input 'lpi_scale_blur4_tfMRI_AxcptBas1_AP.nii.gz lpi_scale_blur4_tfMRI_AxcptBas2_PA.nii.gz' \
 -Rvar stats_var_f1031ax_REML.nii.gz \
 -Rbuck STATS_f1031ax_REML.nii.gz \
 -fout \
 -tout \
 -nobout \
 -verb

Unpacking these commands is not so easy, but I will try to do so here for the most relevant parts; please comment if you spot anything not quite correct in my explanation.

First, the above commands are for the Axcpt task, baseline session, subject f1031ax (aside: their data, including the event timing text files named above, is in DMCC13benchmark). The 3dDeconvolve command has the part describing the knots and HDR estimation; I included the 3dREMLfit call for completeness, because the plotted estimates are from the Coef sub-bricks of the STATS image (produced by -Rbuck).

Since this is a control GLM in which all trials are given the same label there are only three stimulus time series: BLOCKS, blockONandOFF, and TRIALS (this is a mixed design; see introduction). The TRIALS part is what generates the estimated HDR curves plotted above, as defined by TENTzero(0,21.6,10).

We chose TENTzero instead of TENT for the HDR estimation because we do not expect anticipatory activity (the trial responses should start at zero) in this task. (see afni message board, e.g., here and here) To include the full response we decided to model the trial duration plus at least 14 seconds (not "exactly" 14 seconds because we want the durations to be a multiple of the TENT duration). My understanding is that it's not a problem if more time than needed is included in the duration; if too long the last few knots should just approximate zero. I doubt you'd often want to use a duration much shorter than that of the canonical HRF (there's probably some case when that would be useful, but for the DMCC we want to model the entire response).

From the AFNI 3dDeconvolve help

'TENT(b,c,n)': n parameter tent function expansion from times b..c after stimulus time [piecewise linear] [n must be at least 2; time step is (c-b)/(n-1)]

 

You can also use 'TENTzero' and 'CSPLINzero',which means to eliminate the first and last basis functions from each set. The effect of these omissions is to force the deconvolved HRF to be zero at t=b and t=c (to start and and end at zero response). With these 'zero' response models, there are n-2 parameters (thus for 'TENTzero', n must be at least 3).

The first TENTzero parameter is straightforward: b=0 to start at the event onset.

For the trial duration (c), we need to do some calculations. Here, the example is the DMCC Axcpt task, which has a trial duration of 5.9 seconds, so the modeled duration should be at least 14 + 5.9 = 19.9 seconds. That's not the middle value in the TENTzero command, though.

For these first analyses we decided to have the TENTs span two TRs (1.2 * 2 = 2.4 seconds), in the (mostly) mistaken hope it would improve our signal/noise, and also to make the file sizes and GLM estimation speeds more manageable (which it does). Thus, c=21.6, the shortest multiple of our desired TENT duration greater than the modeled duration (19.9/2.4 = 8.29, rounded up to 9; 9*2.4 = 21.6). 

Figuring out n requires a bit of mind bending but less calculation; I think the slide below (#5 in afni06_decon.pdf) helps: in tent function deconvolution, the n basis functions are divided into n-1 intervals. In the above calculations I rounded 8.29 up to 9; we want to model 9 intervals (of 2.4 s each). Thus,  n = 10 basis functions gives us the needed n-1 = 9 intervals.

Recall that since we're using TENTzero, the first and last tent functions are eliminated. Thus, having n=10 means that we will get 10-2=8 beta coefficients ("knots") out of the GLM. These, finally, are the x-axes in the estimated GLM curves above: the knots at which the HDR estimates are made, of which there are 8 for Axcpt. 

(Aside: 0-based afni can make some confusion for those of us more accustomed to 1-based systems, especially with TENTzero. The top plots put the value labeled as #0_Coef at 1; a point is added at 0=0 since TENTzero sets the curves to start at zero. I could also have added a zero for the last knot (9, for this Axcpt example), but did not.)

These GLMs seem to be working fine, producing sensible HDR estimates. We've been using these results in many analyses (multiple lab members with papers in preparation!), including the fMRIprep and HCP pipeline comparisons described in previous posts. We became curious, though, if we could set the knot duration to the TR, rather than twice the TR (as done here): would the estimated HDR then follow the trial timing even more precisely? In the next posts(s) I'll describe how we changed the GLMs to have the knot duration match the TR, and a (hopefully interesting) hiccup we had along the way.

UPDATE 4 January 2021: Corrected DMCC13benchmark openneuro links.

Tuesday, November 10, 2020

DMCC GLMs: afni TENTzero, knots, and HDR curves, #1

One of the things we've been busy with during the pandemic-induced data collection pause is reviewing  the DMCC (Dual Mechanisms of Cognitive Control; more descriptions here and here) project GLMs: did we set them up and are we interpreting them optimally? 

We decided that the models were not quite right, and are rerunning all of them. This post (and its successors) will describe how the original GLMs were specified, why we thought a change was needed, what the change is, and how the new results look. The posts are intended to summarize and clarify the logic and process for myself, but also for others, as I believe many people find modeling BOLD responses (here, with afni 3dDeconvolve TENTs) confusing.

 

I won't just start explaining the DMCC GLMs however, because they're complex: the DMCC has a mixed design (task trials embedded in long blocks), and we're estimating both event-related ("transient") and block-related ("sustained") responses. If you're not familiar with them, Petersen and Dubis (2012) review mixed block/event fMRI designs, and Figure 1 from Visscher, et. al (2003) (left) illustrates the idea. In addition to the event and block regressors we have "blockONandOFF" regressors in the DMCC, which are intended to capture the transition between the control and task periods (see Dosenbach et. al (2006)).

We fit many separate GLMs for the DMCC, some as controls and others to capture effects of interest. For these posts I'll describe one of the control analyses (the "ONs") to make it a bit more tractable. The ON GLMs include all three effect types (event (transient), block (sustained), blockONandOFF (block start and stop)), but do not distinguish between the trial types; all trials (events) are given the same label ("on"). The ON GLMs are a positive control; they should show areas with changes in activation between task and rest (30 seconds of fixation cross before and after task blocks). For example, there should be positive, HRF-type responses in visual areas, because all of our task trials include visual stimuli.

Below is an example of the estimated responses from our original ONs GLMs (generated by R knitr code similar to this). The GLMs were fit for every voxel individually, then averaged within the Schaefer, et al. (2018) parcels (400x7 version); these are results for four visual parcels and volumetric preprocessing. These are (robust) mean and SEM over 13 participants (DMCC13benchmark). The results have multiple columns both because there are four DMCC tasks (Axcpt (AX-CPT), Cuedts (Cued task-switching), Stern (Sternberg working memory), and Stroop (color-word variant)), and because of the mixed design, so we generate estimates for both event/transient (lines) and sustained (blue bars) effects (blockONandOFF are not included here).

The line graphs show the modeled response at each time point ("knot"). Note that the duration (and so number of estimated knots) for each task varies; e.g., Stroop has the shortest tasks and Stern the longest. (We set the modeled response to trial length + 14 seconds.)

The double-peak response shape for Axcpt, Cuedts, and Stern is expected, as the trials have two stimuli slides separated by a 4 second ITI; the Stroop response should resemble a canonical HRF since each trial has a single stimulus and is short. The task timing and some sample trials are shown below (this is  Figure 1 in a preprint (accepted version) which has much more task detail; see also the DMCC55B description). 


So far, so good: the estimated responses for each task have a sensible shape, reflecting both the HRF and task timing. But what exactly is along the x-axis for the curves? And how did we fit these GLMs, and then decide to change them a bit? ... stay tuned.

later posts in this series: #2

UPDATE 4 January 2021: Corrected DMCC13benchmark openneuro links.

Tuesday, December 3, 2019

comparing fMRIPrep and HCP Pipelines: Resting state matrix correlation

In the previous post I described comparing resting state pipelines by asking blinded people to match functional connectivity matrices by participant. As described in that post, while no one correctly matched all matrices (i.e., pairing up the hcp-Siegel and fmriprep-xcp versions for the 13 people in DMCC13benchmark), most correctly matched many of them, reassuring me that the two pipelines are producing qualitatively similar functional connectivity matrices.

In this post I show the results of correlating the functional connectivity matrices, on the suggestion of Max Bertolero (@max_bertolero, thanks!). Specifically, I turned the lower triangles of each connectivity matrix into a vector then correlated them pairwise. (In other words, "unwrapping" each functional connectivity matrix into a vector then calculating all possible pairwise correlations between the 26 vectors.)

The results are in the figure below. To walk you through it: participants are along the x-axis, matrix-to-matrix correlations  on the y-axis. Each person is given a unique plotting symbol, which is shown in black. The first column is for person 150423, whose plotting symbol is a circle. The correlation of 150423's hcp-Siegel and fmriprep-xcp matrices is about 0.75 (location of the black circle). The four colored sets of symbols are the correlation of 150423's hcp or fmriprep matrix with everyone else's hcp or fmriprep matrices, as listed along the top. For example, the bright blue column is listed as "hcp w/others' fp", and the highest symbol is a plus. Looking at the black symbols, the plus sign is 171330, so this means that the correlation of 150423's hcp-Siegel matrix with 171330's fmriprep-xcp matrix is about 0.63.
There are several interesting parts of the figure. First, the dark blue and dark red points tend to be a bit higher than the bright blue and pink: matrices derived from the same pipeline tend to have higher correlation than those from different pipelines, even from different people.

Next, most people's fmriprep and hcp matrices are more correlated than either are to anyone else's - the black symbols are usually above all of the colored ones. This is not the case for 3 people: 178950, DMCC8033964, and DMCC9478705. 203418 is also interesting, since their self-correlation is a bit lower than most, but still higher than the correlation with all the others.

Finally, I compared the impressions from this graph with those from the blinded matching (summarized in the table below), and it came out fairly consistently, particularly for the hardest to match pairs: No one correctly matched 178950 or DMCC9478705's matrices, two of the people just listed as having their self-correlations in the midst of the others. Only one person correctly matched the third (DMCC8033964).

While the hardest-to-match pairs are easy to see in this graph, the ones most often correctly matched are more difficult to predict. For example, from the height of the self-correlation and distance to the other points I'd predict that 171330 and 393550 would be relatively hard to pair, but they were the most often correctly matched. 203418 has the fourth-lowest self-correlation, but was correctly matched by more than half the testers.

Note: the "correct matches" column is how many of the 9 people doing the blinded matching test correctly matched the person's matrices.
subject ID correct pairing (hcp, fmriprep) correlation of this person’s matrices correct matches
150423 h,q 0.7518 4
155938 v,i 0.7256 6
171330 l,y 0.7643 4
178950 b,r 0.6069 0
203418 x,d 0.651 5
346945 w,j 0.7553 4
393550 u,n 0.7772 7
601127 c,m 0.7152 1
849971 k,t 0.7269 1
DMCC5775387 s,f 0.767 6
DMCC6705371 a,p 0.733 6
DMCC8033964 g,z 0.5938 1
DMCC9478705 o,e 0.5428 0


Together, both the blinded matching and correlation results reassure me that the two pipelines are giving qualitatively similar matrices, but clearly not identical, nor equally similar for all of the test people.


UPDATE 4 December 2019: Checking further, missings have a sensible impact on these results. These functional connectivity matrices were not calculated from the DMCC baseline session only (which are the runs included in DMCC13benchmark), but rather by concatenating all resting state runs for the person in the scanning wave. Concatenating gives 30 minutes of resting state total without missings or censoring.

Two subjects had missing resting state runs: DMCC8033964 is missing one (25 minutes total instead of 30), and DMCC9478705 is missing two (20 minutes). These are two of the lowest-correlation people, which is consistent with the expectation that longer scans make more stable functional connectivity matrices. It looks like the two pipelines diverge more when there's fewer input images, which isn't too surprising. 178950 didn't have missing runs but perhaps had more censoring or something; I don't immediately know how to extract those numbers from the output but expect the pipelines to vary in many details.

UPDATE 4 January 2021: Corrected DMCC13benchmark openneuro links.

Wednesday, November 6, 2019

comparing fMRIPrep and HCP Pipelines: Resting state blinded matching

The previous post has a bit of a literature review and my initial thoughts on how to compare resting state preprocessing pipeline output. We ended up trying something a bit different, which I will describe here. I actually think this turned out fairly well for our purposes, and has given me sufficient confidence that the two pipelines have qualitatively similar output.

First, the question. As described in previous (task-centric) posts, we began preprocessing the DMCC dataset with the HCP pipelines, followed by code adapted from Josh Siegel (2016) for the resting state runs. We are now switching to fmriprep for preprocessing, followed by the xcpEngine (fc-36p) for resting state runs. [xcp only supports volumes for now, so we used the HCP pipeline volumes as well.] Our fundamental concern is the impact of this processing switch: do the two pipelines give the "same" results? We could design clear benchmark tests for the task runs, but how to test the resting state analysis output is much less clear, so I settled on a qualitative test.

We're using the same DMCC13benchmark dataset as in the task comparisons, which has 13 subjects. Processing each with both pipelines gives 26 total functional connectivity matrices. Plotting the two matrices side-by-side for each person (file linked below) makes similarities easy to spot: it looks like the two pipelines made similar matrices. But we are very good at spotting similarities in paired images; are they actually similar?

The test: would blinded observers match up the 26 matrices by person (pairing the two matrices for each subject) or by something else (such as pipeline)? If observers can match most of the matrices by person, we have reason to think that the two pipelines are really producing similar output. (Side note: these sorts of tests are sometimes called Visual Statistical Inference and can work well for hard-to-quantify differences.)

For details, here's a functional connectivity matrix for one of the DMCC13benchmark subjects. There are 400 rows and columns in the matrix, since these were run within the Schaefer (2018) 400-parcels 7-network ordering parcellation. This parcellation spans both hemispheres, the first 200 on left, second 200 right. Both hemispheres are in the matrix figures (labeled and separated by black lines). The number of parcels in each network is not perfectly matched between hemispheres, so only the quadrants along the diagonal have square networks. The dotted lines separate the 7 networks: Visual, SomMot, DorsAttn, SalVentAttn, Limbic, Cont, Default (ordering and labels from Schaefer (2018)). Fisher r-to-z transformed correlations are plotted, with the color range from -1.9 (darkest blue) to 1.9 (darkest red); 0 is white.
 
Interested in trying to match the matrices yourself? This pdf has the matrices in a random order, labeled with letters. I encourage you to print the pdf , cut the matrices apart, then try to pair them into the 13 people (we started our tests with the matrices in this alphabetical order). I didn't use a set instructional script or time limit, but encouraged people not to spend more than 15-20 minutes or so, and explained the aim similarly to this post. If you do try it, please note in the comments/email me your pairings or number correct (as well as your strategy and previous familiarity with these sorts of matrices) and I'll update the tally. The answers are listed on the last page of this pdf and below the fold on this post.

Several of us had looked at the matrices for each person side-by-side before trying the blind matching, and on that basis thought that it would be much easier to pair the people than it actually was. Matching wasn't impossible, though: one lab members successfully matched 9 of the 13 people (the most so far). At the low end, two other lab members only successfully matched 2 of the 13 people; three members matched 4, one 5, one 7, and one 8.

That it was possible for anyone to match the matrices for most participants reassures me that they are indeed more similar by person than pipeline: the two pipelines are producing qualitatively similar matrices when given the same input data. For our purposes, I find this sufficient: we were not attempting to determine if one version is "better" than the other, just if they are comparable.

What do you think? Agree that this "test" suggests similarity, or would you like to see something else? pdfs with the matrices blinded and not are linked here; let me know if you're interested in the underlying numbers or plotting code and I can share that as well; unpreprocessed images are already on OpenNeuro as DMCC13benchmark.

Assigned pairings (accurate and not) after the jump.

UPDATE 3 December 2019: Correlating the functional connectivity matrices is now described in a separate post

UPDATE 4 January 2021: Corrected DMCC13benchmark openneuro links.