It's time to adjust my mental model of the fMRI signal: there's a lot more correlation with movement in the timecourses after preprocessing than I'd expected. That movement
really affects fMRI is not at all new, of course, and is why including the motion
regressors as covariates in GLMs is standard. But I'd pictured that after preprocessing (assuming it went well and included realignment and spatial normalization) the correlation with movement left in the voxel timecourses should be pretty low (something like normally distributed, centered on 0, ranging between -0.2 to 0.2), without much spatial structure (i.e., maybe some ringing, but fairly uniformly present over the brain). Asking around, I think this is a fairly common mental model, but it looks to be quite wrong.
For exploring, I simply used
afni's 3dTcorr1D program to correlate the timecourse of every voxel in several preprocessed task fMRI datasets with each of the six motion regressors (generated during preprocessing). 3dTcorr1D makes an image with 6 entries in the 4th dimension (sub-brick, in afni-speak), one for each of the 6 motion columns; the value in each voxel the Pearson correlation between that voxel's timecourse and the movement column. I plotted these correlations on brains, and made histograms to summarize the distribution.
The correlations are much higher than I expected, even in people with
very little movement. Here's an example; more follow. Below are the motion regressors from single task run (about 12.5 minutes long; HCP-style preprocessing;
MB8 protocol), the correlation with each motion regressor, and
a (density-style) histogram of the voxel-wise correlations. Color scaling for this and all brain images is from 1 (hottest) to -1 (coolest), not showing correlations between -0.25 and 0.25.
If my expectation (correlations normally distributed, centered on 0, ranging between -0.2 to 0.2) was right, there shouldn't be any color on these images at all, but there's clearly quite a bit: many voxels correlate around 0.5 with roll and pitch (4th and 5th brain rows are mostly "hot" colors), and around -0.5 with x, y, and z (first three rows mostly "cool" colors). There's some structure to the peak correlations (e.g., a hotter strip along the left side in slice 46), which may correspond with sulci or large vessels, but it's rather speckly. Note that this is a pretty low motion subject overall: less than 2 mm drift over the 12 minute run, and only 4 volumes marked for censoring (I didn't censor before the correlation).
Looking at other people and datasets, including from non-SMS acquisitions with larger voxels and longer TRs, it appears like correlations of 0.5 are pretty common: this isn't just some sort of weird effect that only shows up with high-resolution acquisitions. For another example, these are the histograms and motion regressors for four runs from one person included
in this study (acquired with 4 mm isotropic voxels; run duration 7.6 min, TR 2.5 sec, SPM preprocessing). The corresponding brain images are below the jump.
So, really visible motion (which at least sometimes is
linked to respiration) in the voxel activity timecourses (
such as here) is to be expected. Unfortunately, the correlation is not always (or even usually) uniform across the brain or grey matter, such as below (just the correlation with x and y translation). It also looks like very little (under a mm) motion is needed to induce large correlations.
What to do? Well, adjust our mental models of how much correlation with movement is left in the activation timeseries after preprocessing: there's quite a bit. I'll be exploring further, particularly isolating the task windows (since I work with task, not resting state, datsets): how are the correlations during tasks? I'm not at all sure that applying a global signal regression-type step would be beneficial, given the lack of homogeneity across the brain (though I know there are at least a few reports on using it with task data). Censoring high-movement trials (i.e., not including them) is likely sensible. Interestingly,
I've found similar MVPA performance in multiple cases with temporal compression by averaging and fitting a model (PEIs), which would not have been my guess looking at these correlation levels. Perhaps averaging across enough timepoints and trials balances out some of the residual motion effects? I am concerned, however, about respiration (and motion) effects remaining in the timecourses: it's clear that some people adjust their breathing to task timing, and we don't want to be interpreting a breath-holding effect as due to motivation.
Any other thoughts or experiences? Are you surprised by these correlation levels, or is it what you've already known?