accuracy over timeThe main bit I want to highlight is Figure 4 ("Comparison between the model accuracy function and the canonical HRF in the range of 0–20 s after stimulus onset."):
Figure 4 shows the single-volume classification accuracy for each of the five participants (P1, P2, etc) and two stimulus modalities ("audio" and "ortho"), classifying whether the stimulus (in both modalities) was of a tool or an animal. They used a 1-second TR, and (I think) leave-one-run-out cross-validation, with six runs. They label the dashed horizontal line as "Chance Level" from a binomial test, which I think is rather lenient, but the accuracies are high enough that I won't quibble about significance (I'd just label 50% as chance, since it's two-class classification).
As the authors point out, the peak classification accuracy is closer to 7 seconds after onset in four of the participants, instead of 5, as in the default SPM HRF (thick red line). The lines are hard to see, but P5 (darkish purple) is the person with the most similarity between the canonical HRF and classification accuracy curves, for both modalities. The accuracy curves tend to be similar across modalities within participants, or, as they put it: "The main regularities were specific to participants, with correlations between the profiles of both conditions of 0.88 for P1, 0.72 for P2, 0.81 for P3, 0.91 for P4, and 0.96 for P5."
This is the nicest figure I've seen showing how accuracy increases over time and varies over people; the scanning parameters and experimental design let the authors calculate this at a higher resolution than usual. This figure will be going into my introduction-to-MVPA slides.
cross-modal commentsThe experiment had two sessions (I think on separate days, though this isn't clear): one in which images of hand tools and animals (same each modality) were accompanied by auditory labels ("audio") and the other, with written labels ("ortho"). Classification was of image category (tool or animal), and was done either with a single session's data (all audio or all ortho), or cross-session (cross-modal), training with audio and testing with ortho, or training with ortho and testing with audio. Classification accuracies were pretty high both times (figures 2 and 3), but higher within- than cross-modality, and the authors spend some time exploring and discussing possible reasons why. Here are a few more ideas.
A new experiment could be run, in which both audio and ortho stimuli are presented in each session (e.g. three runs of each on each day). Is cross-modal classification less accurate than within-modal, or is cross-session accuracy less than within-session accuracy? In other words, I expect analyses involving images collected on separate days to be less accurate than analyses with images collected within a single session, simply because time has passed (the scanner properties could change a bit, the person could have different hydration or movement, etc). It's not possible to sort out the session and modality effects in Akama 2012, since they're confounded.
Also, I wonder if the feature selection method (picking top 500 voxels by ANOVA ranking in training set) hurt the likelihood of finding strong cross-modal classification. A ROI-based analysis might be an appropriate alternative, particularly since Tool vs. Animal classification has been studied before.
Hiroyuki Akama, Brian Murphy, Li Na, Yumiko Shimizu, & Massimo Poesio (2012). Decoding semantics across fMRI sessions with different stimulus modalities: a practical MVPA study Frontiers in NeuroInformatics DOI: 10.3389/fninf.2012.00024