MVPA Meanderings: June 2021

Tuesday, June 29, 2021

DMCC55B supplemental as tutorial: positive control "ONs"-style timecourses

This is the sixth (and I believe final) post in a series describing the DMCC55B supplemental files. The first introduces the dataset, the second questionnaire data, the third motion, the fourth creating and checking temporal mean, standard deviation, and tSNR images, and the fifth a parcel-wise button-press classification analysis.

As discussed in the manuscript and the introduction to the previous post, "ONs" is a positive control analysis: one that is not of experimental interest, but has a very predictable outcome; so predictable that if the expected result is not found, further analysis should stop until the problem is resolved. "ONs" is shorthand for "on-task": contrasting BOLD during all task trials against baseline..

The example DMCC55B ONs analysis uses the Stroop task and direct examination of the parcel-average timecourses for HRF-like shape, rather than the GLMs we use for internal QC. This has the advantage of being very close to the data with few dependencies, but is only perhaps practical when there are many repetitions of short trials (such as the DMCC Stroop). The DMCC Stroop task is a color-word version, with spoken responses. Contrasting task (all trials, not e.g., by congruency) against baseline should thus show activity in visual, language, and motor (especially speaking) areas; the typical "task positive" and "task negative" areas.

As for the "buttons", both a surface and volume version of the analysis are included, and the results are similar. The first step of the analysis is the same as for buttons: normalize and detrend each vertex/voxel's timecourse using 3dDetrend (the first code block of controlAnalysis_prep.R and controlAnalysis_prep_volume.R processes both Sternberg and Stroop). The rest of the steps are brief enough to be in the knitr code chunks: make parcel-average timecourses with 3dROIstats, use the _events.tsv files to find the onset of each trial for each person, and average the timecourse across those events.

For example, here are the timecourses for two motor parcels (113 and 116 in the 1000-parcel, 17-network Schaefer parcellation), surface (top) and volume (bottom):

Each grey line is one participant, with the across-participants mean in blue. The x-axis is TRs (1.2 second TR; event onset at 0), and the y-axis is BOLD (parcel-average after the normalizing and detrending, averaged across all events for the person). As expected, the curves approximate the canonical HRF. Interestingly, the surface timecourses are taller than the volume, though the same parcels tend to show task-related activity (or not).

To summarize further and show all the parcels at once, I averaged the BOLD across the peak timepoints (yellowish band), which makes the areas clear in the group means:

Please let me know if you explore the DMCC55B dataset and/or this supplemental; I hope you find these files useful!

Thursday, June 24, 2021

DMCC55B supplemental as tutorial: positive control "buttons" classification analysis

This is the fifth post in a series describing the DMCC55B supplemental files. The first introduces the dataset, the second questionnaire data, the third motion, and the fourth creating and checking temporal mean, standard deviation, and tSNR images.

I very strongly recommend that "positive control" analyses be performed as part of every fMRI (really, most any) analysis. The idea is that these analyses check for the existence of effects that, if the dataset is valid, must be present (and so, if they are not detected, something is most likely wrong in the dataset or analysis, and analysis should not proceed until the issues are resolved). (Julia Strand's Error Tight site collects additional suggestions for minimizing errors in research and analysis.)

The DMCC55B supplemental includes two of my favorite positive control analyses, "buttons" and "ONs", which can be adapted a wide variety of task paradigms. The "buttons" name is shorthand for analyzing the activity associated with the task responses (which are often moving fingers to press buttons). Task responses like button presses are excellent targets for positive controls because the occurrence of movement can be objectively verified (unlike e.g., psychological states, whose existence is necessarily inferred), and hand motor activity is generally strong, focal, and located in a low g-factor area (i.e., with better fMRI SNR). Further, it is nearly often possible to design an analysis around the responses that is not tied to the experimental hypotheses. (To avoid circularity, control analyses must be independent of experimental hypotheses and have high face validity.)

The DMCC55B "buttons" example uses the Sternberg task. In the DMCC Sternberg task (Figure 3) responses are made with the right hand, pressing the button under either the first or second finger to indicate whether the Probe word was a member of the current list or not. The target hypotheses for Sternberg involve aspects such as response time, whether the response was correct, and brain activity changes during List memorization; while the hypothesis for the buttons control analysis is simply that brain activity in somatomotor areas should change due to the finger motion necessary to press the response button. Rephrased, the contrast of button presses against baseline should show somatomotor activation.

The example DMCC55B Sternberg buttons positive control analysis was implemented as ROI-based classification MVPA (linear svm, c=1, e1071 R libsvm interface), with averaging (not GLMs) for the temporal compression. I ran this on the surface (fsaverage5 giftis produced by fmriprep preprocessing), within each ROI defined by the 1000-parcel, 17-network Schaefer parcellation, with leave-one-subject-out cross-validation (55-fold). I do not generally advise leave-one-subject-out CV, especially with this many people, but used it here for simplicity; a more reasonable 11-fold CV version is also in the code.

The analysis code is split between two files. The first file, controlAnalysis_prep.R, is made up of consecutive code blocks to perform the analysis, while the second file, controlAnalysis_buttons.rnw, displays the results in tables and on brains (compiled controlAnalysis_buttons.pdf). (Aside: I will sometimes include analysis code in knitr documents (e.g., QC_SD_surface.rnw) if I judge the code is short and straightforward enough that the benefit of having everything in one file is greater than the drawback of increased code length and mixing of analysis in with results.) My intention is that the two controlAnalysis_buttons files together will serve as a "starter kit" for classification analysis; be adaptable to many other datasets and applications.

The analysis starts at the top of controlAnalysis_prep.R; the files produced by this script are used to make the results shown in controlAnalysis_buttons.rnw. The code blocks should be run in sequence, as later blocks depend on output from earlier blocks.

The first code block uses AFNI's 3dDetrend to normalize and detrend each vertex's timecourse. At the time of writing, 3dDetrend does not accept gifti image inputs, so the script uses readGIfTI to read the files and write 3dDetrend-friendly "1D" text files. Aside: it's possible to avoid AFNI, implementing the normalize and detrend steps entirely in R. I prefer the AFNI function, however, to avoid introducing errors, and for clarity; using established functions and programs whenever possible is generally advisable.
The second code block reads the _events.tsv files, and finds matching sets of frames corresponding to button press and "not" (no button-press) events. Balance is very important to avoid biasing the classification: each "event" should be the same duration, and each run should have equal numbers of events of each type. Windows for each event (i.e., when event-related brain activity should be present in the BOLD) were set to start 3 seconds after the event, and end 8 seconds after. This 3-8 second window is roughly the peak of the canonical HRF given the very short button press events; longer windows would likely be better for longer duration events.
The third code block performs the temporal compression, writing one file for each person, run, class, and hemisphere. The files have one row for every vertex, and one column for each example (average of the 3:8 second windows found in the second code block), plus a column with the mean across examples.
The final two code blocks run the classification, with leave-one-subject-out (fourth block) and leave-five-subjects-out (11-fold, fifth block) cross-validation. Each version writes a results file with one row per parcel, and a column for the accuracy of each cross-validation fold, plus the mean accuracy over folds. These are the files read by controlAnalysis_buttons.rnw and shown on brains for Figure 9.

Implementation notes:

This code does MVPA with surface data, using text (vertices in rows) for the intermediate files. It is straightforward to identify which rows/vertices correspond to which parcels since the parcellation is of the same type (fsaverage5, in this case) as the functional data (see code in the fourth block, plus the preceding comments). Surface searchlight analyses are far less straightforward than ROI/parcel-based ones, and I don't think generally advisable, due to the varying distances between vertices.

Relatively few changes are needed to perform this same analysis with volumes. The purpose and steps within each code block are the same, though the functions to retrieve the timeseries vary, and there is only one volumetric image per run instead of the surface pair (one per hemisphere). The volumetric version of the buttons classification is divided between controlAnalysis_prep_volume.R and controlAnalysis_buttons_volume.rnw.

As we would hope (since the same original data are used for both), largely the same parcels have the highest accuracy in the surface and volume versions of the MVPA, and the parcel-wise accuracies are positively correlated. The parcel-wise accuracies from the surface version are often, though not uniformly a bit higher.

Friday, June 18, 2021

DMCC55B supplemental as tutorial: basic fMRI QC: temporal mean, SD, and tSNR

This is the fourth post in a series describing the DMCC55B supplemental files. The first introduces the dataset, the second questionnaire data, and the third motion. Here I'll highlight another basic fMRI QC practice: creating and checking temporal mean, standard deviation, and tSNR images. For background on these types of QC images I suggest starting with this post, especially the multiple links in the first sentence.

For DMCC55B, we preprocessed the images as both volume and surface, so the QC was also both volume and surface. I generally find QC easier to judge with unmasked volumes, but if surface analysis is planned, the surface representations must be checked as well.

During DMCC data collection we create a QC summary file for each individual participant (example here), with volumes, surfaces, and motion (realignment parameters) for every run. A first QC check for (unmasked) volume images is straightforward: do they look like brains? This ("got brains?") check doesn't apply to surfaces, since any valid gifti image will be brain-shaped. Instead, our first QC pass for the surface images is to check the temporal mean for "tiger stripes" around the central sulcus. Below is the vertex-wise means for Bas Stern AP from the example, with the blue lines pointing to the stripes. If these stripes are not visible, something likely went wrong with the preprocessing.

For ease of group comparison, the DMCC55B QC supplemental files are split by type rather than participant: QC_SD_surface.pdf for vertex-wise temporal standard deviation, QC_tSNR_surface.pdf for vertex-wise tSNR; QC_SD_volume.pdf and QC_tSNR_volume.pdf for the corresponding voxel-wise images. With eight runs for each of 55 participants these QC summary files are still rather long, but the consistent scaling (same intensity used for all participants when plotting) makes it possible to scroll through and spot trends and oddities. Such group review can be very beneficial, especially before embarking on statistical analyses.

For example, the SD images for some participants/runs are much brighter and fuzzier than others (compare f2157me and f2499cq on page 8). Broadly, the more the temporal SD images resemble maps of brain vessel density, the better the quality, though of course the resolution will vary with fMRI acquisition parameters (e.g., in the 2.4 mm isotropic DMCC images the Circle of Willis should be clearly visible). Artifacts can also be spotted, such as the ghostly "crescents" in some participant's PA images (e.g., f5001ob).

Implementation notes:

The .rnw file with each (knitr) pdf has the R code to create the temporal statistic images, as well as produce the summary .pdfs. The startup code chunk of each document has the code to make the images (from the 4d niftis and giftis produced by fmriprep), while the later chunk plots the images.

The volumetric code uses the AFNI 3dTstat and 3dcalc functions (called from R using system2()) to make the temporal mean, standard deviation, and tSNR images, which are written out as NIfTI images. Saving the statistical images is useful for compilation speed, but more importantly, so that they can be examined in 3d if closer checking is needed. I strongly suggest using existing functions from a well-established program (such as AFNI) whenever possible, for clarity and to avoid introducing bugs.

The volumetric images are plotted with base R image() rather than with my volumetric plotting function, because I wanted to plot one slice through each plane. Thus, the _volume 55B supplementals can also serve as an example of simple NIfTI image plotting. (This is my general volume-plotting knitr tutorial.)

Finally, there is a comment in the volume .rnw files that the input images are already in LPI orientation. I strongly suggest transforming NIfTI images to LPI immediately after preprocessing (e.g., with 3dresample), if they are not already, as this often avoids left/right flipping.

Unlike volumes, the surface QC code calculates the vertex-wise mean and standard deviation in R, and my gifti plotting functions. Many AFNI functions can work with giftis, but these calculations (mean, standard deviation) are so simple, and plotting surfaces from a vector so straightforward, that I decided to save the images as text rather than gifti.

Friday, June 4, 2021

DMCC55B supplemental as tutorial: basic fMRI QC: motion

This is the third post in a series describing the DMCC55B supplemental files. The first introduces the dataset, and the second the questionnaire data. Here I'll highlight the first in the set of basic fMRI QC files: looking at the realignment parameters (head motion).

While we use an FD threshold for censoring during the GLMs, I think plots with the six separate parameters are more useful for QC. These plots make up the bulk of QC_motion.pdf; one for each functional run included in DMCC55B. Chunk code5 of QC_motion.rnw creates these plots directly from the _desc-confounds_regressors.tsv produced during fmriprep preprocessing (note: newer versions name this file _desc-confounds_timeseries.tsv, but the columns used here are unchanged), and is relatively straightforward to adapt to other datasets; the code chunks between startup and code5 produce group summary statistics and can be omitted.

Above is the top of page 5 of the compiled file, including the motion plot for the first run. This shows very little overt or apparent motion, just a slight wiggle around frame 400 (8 minutes; grey lines are 1 minute intervals), which suggests they shifted during the break between the second and third task blocks (green horizontal lines). This run would be given an "A" (top) grade in our qualitative assessment.

f1659oa has extremely low overt head motion across this run, but clear apparent motion (breathing), primarily in the dark blue trans_y line.

R note: These motion (and the other) plots are made with base R graphics in a knitr document. The "stacked" 6-parameter and FD plots are made by using three par() commands within each iteration of the loop (chunk code5): first the size and margins for the upper plot; second the size and margins of the lower plot, with new=TRUE; finally, setting new=FALSE to finish and reset for the next run.

Wednesday, June 2, 2021

DMCC55B supplemental as tutorial: questionnaire data

The previous post introduces the DMCC55B dataset, with pointers to the dataset, documentation, and why you might want to work with it. As mentioned at the end of that post, the description manuscript is accompanied by files intended to be both an introduction to working with DMCC55B data specifically, and more general tutorials for some common analyses. This series of posts will describe the DMCC55B supplemental files in their tutorial aspect and methodological details, starting with the questionnaire data.

Separately from the fMRI sessions, DMCC participants complete up to 28 individual difference questionnaires (described in the "Behavioral Session" section of the manuscript and the DMCC NDA site). Sharing questionnaire data is difficult, since different groups often use different versions of the same questionnaires, and critical details like response ranges and scoring algorithm may not be stated. Several projects are working on standard formats for questionnaire data; as an NIH-funded project the DMCC is using the format developed by the NIMH Data Archive (NDA). For DMCC55B, the questionnaire data is released under derivatives/questionnaires/, with one (NDA-format) csv file per questionnaire.

The behavioralSession_individualDifferences.rnw file contains (knitr) code to read these NDA-format csv files, then calculate and display both group and single-subject summary statistics for each. This code should be useful for anyone reading data from NDA-format questionnaire data files, though with the warning that frequent consultation with the NDA Data Dictionary is required. There is unfortunately quite a bit of variability between the structure definitions, even in fundamentals such as the code used for missings. behavioralSession_individualDifferences.rnw does not calculate all possible summary statistics, only those currently used in the DMCC.

Here is an example of the summaries in the compiled behavioralSession_individualDifferences.pdf, for the DOSPERT (Domain-Specific Risk-Taking) questionnaire. The NDA names this questionnaire dospert01, so that name is also used for the DMCC55B derivative files. In the case of dospert01, we were able to match our questions with those in the Dictionary, so both individual item responses and summary scores are included.

Above left is a small bit of the released dospert01_DMCC55B.csv file. The contents are in accord with the NDA Data Dictionary (above right), except for the first few columns: GUIDs, HCP IDs, sex, and other restricted/sensitive information is omitted. We can use the Dictionary definitions to interpret the other columns, however; for example, that the first participant's answer of 7 to rt_1 corresponds to an answer of "Extremely Likely" to the question of whether they would "Admitting that your tastes are different from those of a friend.".

Scoring the DOSPERT produces five measures, which are reported as group means (above left) and for individual participants (above right); behavioralSession_individualDifferences.pdf presents similar summaries for each scored questionnaire.

Caution: behavioralSession_individualDifferences.rnw has code to parse NDA-format csv files and produce group score summaries. However, its scoring code and the Dictionary definitions and should be carefully reviewed before using with non-DMCC datasets, to make sure the desired calculations are being performed.

an introduction to DMCC55B

It's hard to believe this is my first post of 2021! I've done a few updates to existing posts during the last few months, but not the detailed new ones I've been planning; hopefully this will be the first of a small "flurry" of new posts.

Here, I'm happy to announce the first big public release of Dual Mechanisms of Cognitive Control (DMCC) project data, "DMCC55B", together with a detailed description and example analyses. We've previously released smaller parts of the DMCC (DMCC13benchmark), which has been very useful for methods exploration, but isn't really enough data for detailed analysis of the task-related activity or individual differences. A wide range of analyses are possible with DMCC55B, and we hope the documentation both clear and detailed enough to make its use practical.

A few highlights: DMCC55B is data from 55 unrelated young adults. Each participant performed four cognitive control tasks (Stroop, AX-CPT, Cued Task-Switching, and Sternberg Working Memory) while undergoing moderately high-resolution fMRI scanning (MB4, 2.4 mm isotropic voxels, 1.2 s TR). Two runs of each task (one with AP encoding, the other PA), of approximately 12 minutes each (about 1.5 hours of task fMRI per person). There are also scores for the participants on 28 state and trait self-report questionnaires, as well as finger photoplethysmograph and respiration belt recordings collected during scanning.

This manuscript is intended to be the practical introduction to DMCC55B, providing details such as e.g., the file format of questionnaire data, the order in which task stimuli were presented, fMRI preprocessing (fmriprep output is included), and how Stroop responses were collected and scored. The manuscript also contains links and references to materials used for the main DMCC project (e.g., eprime task presentation scripts, code to extract reaction times from the Stroop audio recordings), which may be of interest or use to others in some cases, but likely only in a few specialized instances. A separate manuscript (preprint; accepted version) describes the wider DMCC project, including the theoretical basis for the task design. If some information is confusing or missing, please let me know!

Last, but most definitely not least, I want to highlight the "supplemental" accompanying DMCC55B. This designed to perform and summarize some standard behavioral and quality control analyses, with the intention of the files serving both as an introduction to working with DMCC data (e.g., how do I obtain the onset time of all AX trials?) and analysis tutorials (e.g., of a parcel-wise classification MVPA with surface data). Currently, the best introduction to this material is in the DMCC55B manuscript and the files themselves. The supplemental files are primarily knitr (R and LaTeX); they call afni functions (e.g., 3dTstat) directly when needed, and are entirely "untidy" base R, including base R graphics. (The last bit refers to the different "flavors" of R programming; I only rarely have need to visit the Tidyverse.)

UPDATE 13 September 2021: Added a link to the published version of the DMCC overview manuscript.