Thursday, June 24, 2021

DMCC55B supplemental as tutorial: positive control "buttons" classification analysis

This is the fifth post in a series describing the DMCC55B supplemental files. The first introduces the dataset, the second questionnaire data, the third motion, and the fourth creating and checking temporal mean, standard deviation, and tSNR images.

I very strongly recommend that "positive control" analyses be performed as part of every fMRI (really, most any) analysis. The idea is that these analyses check for the existence of effects that, if the dataset is valid, must be present (and so, if they are not detected, something is most likely wrong in the dataset or analysis, and analysis should not proceed until the issues are resolved). (Julia Strand's Error Tight site collects additional suggestions for minimizing errors in research and analysis.)

The DMCC55B supplemental includes two of my favorite positive control analyses, "buttons" and "ONs", which can be adapted a wide variety of task paradigms. The "buttons" name is shorthand for analyzing the activity associated with the task responses (which are often moving fingers to press buttons). Task responses like button presses are excellent targets for positive controls because the occurrence of movement can be objectively verified (unlike e.g., psychological states, whose existence is necessarily inferred), and hand motor activity is generally strong, focal, and located in a low g-factor area (i.e., with better fMRI SNR). Further, it is nearly often possible to design an analysis around the responses that is not tied to the experimental hypotheses. (To avoid circularity, control analyses must be independent of experimental hypotheses and have high face validity.)

The DMCC55B "buttons" example uses the Sternberg task. In the DMCC Sternberg task (Figure 3) responses are made with the right hand, pressing the button under either the first or second finger to indicate whether the Probe word was a member of the current list or not. The target hypotheses for Sternberg involve aspects such as response time, whether the response was correct, and brain activity changes during List memorization; while the hypothesis for the buttons control analysis is simply that brain activity in somatomotor areas should change due to the finger motion necessary to press the response button. Rephrased, the contrast of button presses against baseline should show somatomotor activation.

The example DMCC55B Sternberg buttons positive control analysis was implemented as ROI-based classification MVPA (linear svm, c=1, e1071 R libsvm interface), with averaging (not GLMs) for the temporal compression. I ran this on the surface (fsaverage5 giftis produced by fmriprep preprocessing), within each ROI defined by the 1000-parcel, 17-network Schaefer parcellation, with leave-one-subject-out cross-validation (55-fold). I do not generally advise leave-one-subject-out CV, especially with this many people, but used it here for simplicity; a more reasonable 11-fold CV version is also in the code.

The analysis code is split between two files. The first file, controlAnalysis_prep.R, is made up of consecutive code blocks to perform the analysis, while the second file, controlAnalysis_buttons.rnw, displays the results in tables and on brains (compiled controlAnalysis_buttons.pdf). (Aside: I will sometimes include analysis code in knitr documents (e.g., QC_SD_surface.rnw) if I judge the code is short and straightforward enough that the benefit of having everything in one file is greater than the drawback of increased code length and mixing of analysis in with results.) My intention is that the two controlAnalysis_buttons files together will serve as a "starter kit" for classification analysis; be adaptable to many other datasets and applications. 

The analysis starts at the top of controlAnalysis_prep.R; the files produced by this script are used to make the results shown in controlAnalysis_buttons.rnw. The code blocks should be run in sequence, as later blocks depend on output from earlier blocks.

  • The first code block uses AFNI's 3dDetrend to normalize and detrend each vertex's timecourse. At the time of writing, 3dDetrend does not accept gifti image inputs, so the script uses readGIfTI to read the files and write 3dDetrend-friendly "1D" text files. Aside: it's possible to avoid AFNI, implementing the normalize and detrend steps entirely in R. I prefer the AFNI function, however, to avoid introducing errors, and for clarity; using established functions and programs whenever possible is generally advisable.
  • The second code block reads the _events.tsv files, and finds matching sets of frames corresponding to button press and "not" (no button-press) events. Balance is very important to avoid biasing the classification: each "event" should be the same duration, and each run should have equal numbers of events of each type. Windows for each event (i.e., when event-related brain activity should be present in the BOLD) were set to start 3 seconds after the event, and end 8 seconds after. This 3-8 second window is roughly the peak of the canonical HRF given the very short button press events; longer windows would likely be better for longer duration events.
  • The third code block performs the temporal compression, writing one file for each person, run, class, and hemisphere. The files have one row for every vertex, and one column for each example (average of the 3:8 second windows found in the second code block), plus a column with the mean across examples. 
  • The final two code blocks run the classification, with leave-one-subject-out (fourth block) and leave-five-subjects-out (11-fold, fifth block) cross-validation. Each version writes a results file with one row per parcel, and a column for the accuracy of each cross-validation fold, plus the mean accuracy over folds. These are the files read by controlAnalysis_buttons.rnw and shown on brains for Figure 9.

Implementation notes:

This code does MVPA with surface data, using text (vertices in rows) for the intermediate files. It is straightforward to identify which rows/vertices correspond to which parcels since the parcellation is of the same type (fsaverage5, in this case) as the functional data (see code in the fourth block, plus the preceding comments). Surface searchlight analyses are far less straightforward than ROI/parcel-based ones, and I don't think generally advisable, due to the varying distances between vertices.

Relatively few changes are needed to perform this same analysis with volumes. The purpose and steps within each code block are the same, though the functions to retrieve the timeseries vary, and there is only one volumetric image per run instead of the surface pair (one per hemisphere). The volumetric version of the buttons classification is divided between controlAnalysis_prep_volume.R and controlAnalysis_buttons_volume.rnw

As we would hope (since the same original data are used for both), largely the same parcels have the highest accuracy in the surface and volume versions of the MVPA, and the parcel-wise accuracies are positively correlated. The parcel-wise accuracies from the surface version are often, though not uniformly a bit higher.



No comments:

Post a Comment