Tuesday, December 15, 2020

DMCC13benchmark: public again

Apologies: I've had to temporarily delete DMCC13benchmark from OpenNeuro. We plan to upload a new version ASAP. The images and associated data in the new version will be the same, and I will keep the same dataset name.

The problem was with the subject ID codes: some of the DMCC participants were members of the Young Adult HCP, and we released DMCC13benchmark with those HCP ID codes. In consultation with members of the HCP we decided that it would be most appropriate to release data with unique ID codes instead. The subject key (linking the subject IDs released on openneuro to the actual HCP ID codes) is available via the HCP ConnectomeDatabase for people who have accepted the HCP data use terms (ConnectomeDB00026, titled "DMCC (Dual Mechanisms of Cognitive Control) subject key").

Again, I apologize for any inconvenience. We are working to upload a corrected version of DMCC13benchmark soon. Please contact me with questions or if you need something sooner; I will add links to this post when DMCC13benchmark is again public.

UPDATE 4 January 2021: The corrected version of DMCC13benchmark is now available as openneuro dataset ds003452, doi: 10.18112/openneuro.ds003452.v1.0.1. We ask anyone using the dataset to use this version (all subject IDs starting with "f") rather than the previous. Every file should be identical between the versions, excepting the subject IDs.

I again apologize for any inconvenience this change may have caused you, and urge you to contact me with any questions or concerns.


Wednesday, November 11, 2020

DMCC GLMs: afni TENTzero, knots, and HDR curves, #2

This post is the second in a series, which begins with this post. In that first post I included examples of the estimated HDR curves for a positive control analysis, one of which is copied here for reference. 

What are the axis labels for the estimated HDR curves? How did we generate them? What made us think about changing them? 

At the first level the (curve) axis label question has a short answer: y-axes are average (across subjects) beta coefficient estimates, x-axes are knots. But this answer leads to another question: how do knots translate into time? Answering this question is more involved, because we can't translate knots to time without knowing both the TR and the afni 3dDeconvolve command. The TR is easy: 1.2 seconds. Listing the relevant afni commands is also easy:

3dDeconvolve \  
 -local_times \  
 -x1D_stop \  
 -GOFORIT 5 \  
 -input 'lpi_scale_blur4_tfMRI_AxcptBas1_AP.nii.gz lpi_scale_blur4_tfMRI_AxcptBas2_PA.nii.gz' \  
 -polort A \  
 -float \  
 -censor movregs_FD_mask.txt \  
 -num_stimts 3 \  
 -stim_times_AM1 1 f1031ax_Axcpt_baseline_block.txt 'dmBLOCK(1)' -stim_label 1 ON_BLOCKS \  
 -stim_times 2 f1031ax_Axcpt_baseline_blockONandOFF.txt 'TENTzero(0,16.8,8)' -stim_label 2 ON_blockONandOFF \  
 -stim_times 3 f1031ax_Axcpt_baseline_allTrials.txt 'TENTzero(0,21.6,10)' -stim_label 3 ON_TRIALS \  
 -ortvec motion_demean_baseline.1D movregs \  
 -x1D X.xmat.1D \  
 -xjpeg X.jpg \  
3dREMLfit \
 -matrix X.xmat.1D \
 -input 'lpi_scale_blur4_tfMRI_AxcptBas1_AP.nii.gz lpi_scale_blur4_tfMRI_AxcptBas2_PA.nii.gz' \
 -Rvar stats_var_f1031ax_REML.nii.gz \
 -Rbuck STATS_f1031ax_REML.nii.gz \
 -fout \
 -tout \
 -nobout \

Unpacking these commands is not so easy, but I will try to do so here for the most relevant parts; please comment if you spot anything not quite correct in my explanation.

First, the above commands are for the Axcpt task, baseline session, subject f1031ax (aside: their data, including the event timing text files named above, is in DMCC13benchmark). The 3dDeconvolve command has the part describing the knots and HDR estimation; I included the 3dREMLfit call for completeness, because the plotted estimates are from the Coef sub-bricks of the STATS image (produced by -Rbuck).

Since this is a control GLM in which all trials are given the same label there are only three stimulus time series: BLOCKS, blockONandOFF, and TRIALS (this is a mixed design; see introduction). The TRIALS part is what generates the estimated HDR curves plotted above, as defined by TENTzero(0,21.6,10).

We chose TENTzero instead of TENT for the HDR estimation because we do not expect anticipatory activity (the trial responses should start at zero) in this task. (see afni message board, e.g., here and here) To include the full response we decided to model the trial duration plus at least 14 seconds (not "exactly" 14 seconds because we want the durations to be a multiple of the TENT duration). My understanding is that it's not a problem if more time than needed is included in the duration; if too long the last few knots should just approximate zero. I doubt you'd often want to use a duration much shorter than that of the canonical HRF (there's probably some case when that would be useful, but for the DMCC we want to model the entire response).

From the AFNI 3dDeconvolve help

'TENT(b,c,n)': n parameter tent function expansion from times b..c after stimulus time [piecewise linear] [n must be at least 2; time step is (c-b)/(n-1)]


You can also use 'TENTzero' and 'CSPLINzero',which means to eliminate the first and last basis functions from each set. The effect of these omissions is to force the deconvolved HRF to be zero at t=b and t=c (to start and and end at zero response). With these 'zero' response models, there are n-2 parameters (thus for 'TENTzero', n must be at least 3).

The first TENTzero parameter is straightforward: b=0 to start at the event onset.

For the trial duration (c), we need to do some calculations. Here, the example is the DMCC Axcpt task, which has a trial duration of 5.9 seconds, so the modeled duration should be at least 14 + 5.9 = 19.9 seconds. That's not the middle value in the TENTzero command, though.

For these first analyses we decided to have the TENTs span two TRs (1.2 * 2 = 2.4 seconds), in the (mostly) mistaken hope it would improve our signal/noise, and also to make the file sizes and GLM estimation speeds more manageable (which it does). Thus, c=21.6, the shortest multiple of our desired TENT duration greater than the modeled duration (19.9/2.4 = 8.29, rounded up to 9; 9*2.4 = 21.6). 

Figuring out n requires a bit of mind bending but less calculation; I think the slide below (#5 in afni06_decon.pdf) helps: in tent function deconvolution, the n basis functions are divided into n-1 intervals. In the above calculations I rounded 8.29 up to 9; we want to model 9 intervals (of 2.4 s each). Thus,  n = 10 basis functions gives us the needed n-1 = 9 intervals.

Recall that since we're using TENTzero, the first and last tent functions are eliminated. Thus, having n=10 means that we will get 10-2=8 beta coefficients ("knots") out of the GLM. These, finally, are the x-axes in the estimated GLM curves above: the knots at which the HDR estimates are made, of which there are 8 for Axcpt. 

(Aside: 0-based afni can make some confusion for those of us more accustomed to 1-based systems, especially with TENTzero. The top plots put the value labeled as #0_Coef at 1; a point is added at 0=0 since TENTzero sets the curves to start at zero. I could also have added a zero for the last knot (9, for this Axcpt example), but did not.)

These GLMs seem to be working fine, producing sensible HDR estimates. We've been using these results in many analyses (multiple lab members with papers in preparation!), including the fMRIprep and HCP pipeline comparisons described in previous posts. We became curious, though, if we could set the knot duration to the TR, rather than twice the TR (as done here): would the estimated HDR then follow the trial timing even more precisely? In the next posts(s) I'll describe how we changed the GLMs to have the knot duration match the TR, and a (hopefully interesting) hiccup we had along the way.

UPDATE 4 January 2021: Corrected DMCC13benchmark openneuro links.

Tuesday, November 10, 2020

DMCC GLMs: afni TENTzero, knots, and HDR curves, #1

One of the things we've been busy with during the pandemic-induced data collection pause is reviewing  the DMCC (Dual Mechanisms of Cognitive Control; more descriptions here and here) project GLMs: did we set them up and are we interpreting them optimally? 

We decided that the models were not quite right, and are rerunning all of them. This post (and its successors) will describe how the original GLMs were specified, why we thought a change was needed, what the change is, and how the new results look. The posts are intended to summarize and clarify the logic and process for myself, but also for others, as I believe many people find modeling BOLD responses (here, with afni 3dDeconvolve TENTs) confusing.


I won't just start explaining the DMCC GLMs however, because they're complex: the DMCC has a mixed design (task trials embedded in long blocks), and we're estimating both event-related ("transient") and block-related ("sustained") responses. If you're not familiar with them, Petersen and Dubis (2012) review mixed block/event fMRI designs, and Figure 1 from Visscher, et. al (2003) (left) illustrates the idea. In addition to the event and block regressors we have "blockONandOFF" regressors in the DMCC, which are intended to capture the transition between the control and task periods (see Dosenbach et. al (2006)).

We fit many separate GLMs for the DMCC, some as controls and others to capture effects of interest. For these posts I'll describe one of the control analyses (the "ONs") to make it a bit more tractable. The ON GLMs include all three effect types (event (transient), block (sustained), blockONandOFF (block start and stop)), but do not distinguish between the trial types; all trials (events) are given the same label ("on"). The ON GLMs are a positive control; they should show areas with changes in activation between task and rest (30 seconds of fixation cross before and after task blocks). For example, there should be positive, HRF-type responses in visual areas, because all of our task trials include visual stimuli.

Below is an example of the estimated responses from our original ONs GLMs (generated by R knitr code similar to this). The GLMs were fit for every voxel individually, then averaged within the Schaefer, et al. (2018) parcels (400x7 version); these are results for four visual parcels and volumetric preprocessing. These are (robust) mean and SEM over 13 participants (DMCC13benchmark). The results have multiple columns both because there are four DMCC tasks (Axcpt (AX-CPT), Cuedts (Cued task-switching), Stern (Sternberg working memory), and Stroop (color-word variant)), and because of the mixed design, so we generate estimates for both event/transient (lines) and sustained (blue bars) effects (blockONandOFF are not included here).

The line graphs show the modeled response at each time point ("knot"). Note that the duration (and so number of estimated knots) for each task varies; e.g., Stroop has the shortest tasks and Stern the longest. (We set the modeled response to trial length + 14 seconds.)

The double-peak response shape for Axcpt, Cuedts, and Stern is expected, as the trials have two stimuli slides separated by a 4 second ITI; the Stroop response should resemble a canonical HRF since each trial has a single stimulus and is short. The task timing and some sample trials are shown below (this is  Figure 1 in a preprint which has much more task detail). 

So far, so good: the estimated responses for each task have a sensible shape, reflecting both the HRF and task timing. But what exactly is along the x-axis for the curves? And how did we fit these GLMs, and then decide to change them a bit? ... stay tuned.

later posts in this series: #2

UPDATE 4 January 2021: Corrected DMCC13benchmark openneuro links.

Monday, July 27, 2020

overlaying in Connectome Workbench 1.4.2

This post answers a question posted on my Workbench introductory tutorial: Rui asked how to visualize the common regions for two overlays. I've found two ways of viewing overlap.

The first is to reduce the Opacity (circled in red) of the top-most Layer. In this screen capture I have a blueish ROI on top of a yellow one. (I scribbled the colors over their corresponding rows in the Layers toolbox; blue is listed above - over - the yellow. Click the Layers On and off if you're not sure which is which.) Compare the appearance of the ROIs when the Opacity is set to 1.0 for both (left) versus 0.7 for the top (blue, right): the borders of both ROIs are visible on the right side, and the blue is less opaque all over (not just in the area that overlaps).

The second method is to set one (or both) ROIs to Outline (border) mode. Here, I set the top (blue) ROI to "Outline Only" in the Overlay and Map Settings dialog box (click the little wrench button to bring up the dialog).

Tuesday, June 16, 2020

tip: running FSL commands from within R

I very much believe that analyses should use existing/established programs/packages/functions as much as possible, preferably, the original version of the program in a transparent way. But I also want to preserve a record of the analysis, and be able to rerun it if needed, while minimizing the chance of difficult-to-identify errors like of clicking the wrong setting in a GUI.

I'm increasingly convinced that a good strategy is to do file management-type things in the script,  build the needed (FSL, afni, etc.) command, then send the command directly to the target program (FLS, afni, etc.), rather than use a specialized interface package. There are at least two big advantages to building the actual commands: first, few (if any) dependencies or new syntax are introduced, second, the executed commands are immediately and transparently recoverable. Note that this advice is for analysis scripts, not full-scale pipelines or software.

For example, I needed to run FSL flirt to register around twenty images. This is the only step requiring FSL, and my script is in R. I have a few options, including the FLIRT GUI, generating a text file of the flirt commands and running it at the command line, typing the commands directly at the command line, or running fsl from within R using wrapper functions (e.g., fslr).

This code shows the solution I'm advocating: using R functions for the file-wrangling and command-building parts, and then using system2 to "send" the command to fsl for running. This strategy can be used not only with fsl, but also afni, wb_command, and other command-line programs.

 fsl.path <- "/usr/local/pkg/fsl6.0/bin/";   # path to fsl flirt
 in.path <- "/data/";  # path to the input images 
 ref.path <- "/data/";  # path to the reference images  
 out.path <- "/scratch3/registrations/";   # where to write files  
 sub.ids <- c(102008, 107321, 814649, 849971);  # subject IDs
 for (sid in 1:length(sub.ids)) {   # sid <- 1;   
  # make the file names
  fname.in <- paste0(in.path, sub.ids[sid], "/T1w/T1w_acpc_dc_restore.nii.gz");   
  fname.ref <- paste0(ref.path, sub.ids[sid], "/T1w/T1w_acpc_dc_restore.nii.gz");    
  fname.out <- paste0(out.path, sub.ids[sid], "_T1w.nii.gz");  
  fname.mat <- paste0(out.path, sub.ids[sid], "_T1w.txt");  
  # check if the input files exist (and output files do not), then run fsl if so
  if (file.exists(fname.in) & file.exists(fname.ref) & !file.exists(fname.out) & !file.exists(fname.mat)) {  
   system2(paste0(fsl.path, "flirt"), args=paste("-in", fname.in, "-ref", fname.ref, "-out", fname.out, "-omat", fname.mat, "-dof 6"),  
       stdout=TRUE, env="FSLOUTPUTTYPE=NIFTI_GZ");  

The system2 function "sends" the command to fsl and can be used to check what was done:
  • paste0(fsl.path, "flirt") shows which program was run and its location on disk.
  • args= has the options sent to the flirt program, as a string. Copying out just the paste(...) part and running it in R will print it for inspection (example below).
  • stdout=TRUE tells R to print fsl's messages as it runs (useful to spot any errors)
  • env="FSLOUTPUTTYPE=NIFTI_GZ" sets the fsl environment variable FSLOUTPUTTYPE to NIFTI_GZ. Depending on your system and fsl installation you may need to set more (or no) environment variables.
Specifically, for the first participant (sid 1), the code generates this, which could be run without R:
/usr/local/pkg/fsl6.0/bin/flirt -in /data/102008/T1w/T1w_acpc_dc_restore.nii.gz -ref /data/102008/T1w/T1w_acpc_dc_restore.nii.gz -out /scratch3/registrations/102008_T1w.nii.gz -omat /scratch3/registrations/102008_T1w.txt -dof 6

Wednesday, May 13, 2020

"Pattern similarity analyses of frontoparietal task coding" update

A recently-published paper of ours (preprint; "Pattern Similarity Analyses of FrontoParietal Task Coding: Individual Variation and Genetic Influences") uses correlational MVPA/RSA-type methods to look at heritability effects in HCP task fMRI data. A primary motivation was methods development for the DMCC, which we're analyzing for patterns in individuals and twin pairs across time.

Since publishing, the HCP has updated their list of subjects with major issues, including some in our analyses. I wondered if these problematic subjects could have affected the results, so reran the analyses without them (advertisement: this sort of update is drastically easier if you use knitr or another report-generating system for your results). I'm pleased to say that the omitting these flagged subjects had a negligible impact on the results - it turns out that I'd already excluded most of them from the key analyses because of missing behavioral data.

Both the published and updated versions of the supplemental are on the osf site for the paper. Versions of my favorite figure before and after omitting the flagged participants are after the jump.

Wednesday, April 22, 2020

observation: variability from running fmriprep multiple times

UPDATE 13 May 2020: I'm now quite sure that the high variability wasn't due to the multiple fmriprep runs, but rather the way I picked the training/testing examples. I'm leaving this post here, but put most of it "below the fold" since my initial interpretation was incorrect. I will eventually release code and more details.

UPDATE 24 April 2020: I realized that there was at least one other possible big contributing factor to the difference between the runs: randomness in how I chose the not-button examples. I'll fix this and see what happens to the SVM results. I previously found extremely similar GLM results between fmriprep versions 1.1.2 and 1.3.2, so the variability here is very possibly only from my randomness in setting up the classification.