Wednesday, April 22, 2020

observation: variability from running fmriprep multiple times

UPDATE 13 May 2020: I'm now quite sure that the high variability wasn't due to the multiple fmriprep runs, but rather the way I picked the training/testing examples. I'm leaving this post here, but put most of it "below the fold" since my initial interpretation was incorrect. I will eventually release code and more details.

UPDATE 24 April 2020: I realized that there was at least one other possible big contributing factor to the difference between the runs: randomness in how I chose the not-button examples. I'll fix this and see what happens to the SVM results. I previously found extremely similar GLM results between fmriprep versions 1.1.2 and 1.3.2, so the variability here is very possibly only from my randomness in setting up the classification.
 



Here's another post in the category of "things that surprised me"; not really a problem, nor anything new, but more than I expected. Background: for various file management reasons we had to run  DMCC13benchmark through fmriprep (1.3.2) twice for two people. Same BIDS input, etc. (even hardware, I think), but run twice. Some of the fmriprep components are not deterministic (see e.g. here and here), so a bit of variability between runs is expected.

Knowing that the preprocessing is not identical, I reran the QC and a few simple analyses on the second set of images to make sure that everything still seems ok. I expected a bit of change in shape along the edge of the brain (from the two runs of skull stripping, etc.), but not enough to change the parcel-level analysis results - hence my surprise when they did occur.

Here are standard deviation images for the same person, four runs (first row the Cued task-switching task, second row the Sternberg task; first three columns the first (AP encoding), last three columns the second (PA encoding) run), from the two times fmriprep was run (red numbers).
Flipping between the two images makes it easy to spot a number of small changes, such as the area circled in blue, but the images are quite similar overall.

One of my test analyses is classification-style within individual people: responding to a task (right hand button press) or not? I expect motor/somatomotor parcels to be able to classify above chance, but also some visual and frontoparietal (since the attention demands and visual input also vary between response-period and not-response-period). Not really relevant, but these are linear SVM, c=1 with default scaling, chance=.5, Schaefer 400 parcels x 7 networks parcellation, simple averaging of each event (separately) for temporal compression, cross-validating on the tasks (Cuedts and Schaefer).
Above are the parcel-wise accuracies for the same person and two preprocessing runs. Only parcels with accuracy of .6 or better are colored, with hotter colors higher accuracies (up to a maximum of .76 for the first preprocessing run, .71 for the second). Many parcels in the same areas have similar accuracies both times, but the two maps are not identical; even some of the highest-classifying parcels shift a bit between the two processing runs - below are the same two maps, thresholded at .65:

Again, this is not necessarily a problem, and there's no reason to think that one set of results is closer to the truth than the other. But the amount of variability between presumed-equivalent preprocessing runs was greater than I expected, and may surprise you as well. I haven't yet run this analysis on the surface version, but plan to and will try to update this afterwards. The other person whose data was run twice shows a similar amount of variation.

2 comments:

  1. Have you run your SVM classification on the same fMRIPrep output twice? Just wondering if there's any randomness involved in that processing step that could help explain it.

    ReplyDelete
    Replies
    1. Yes, that's possible as well; I suspect my error of not ensuring the same events were used in each version will prove the biggest factor.

      Delete