Changing how I generated the simulated data causes the null distributions to change a lot as the difference between the classes increases. These simulations are the same as the previous set (same seeds, number of examples, cross-validation schemes, relabeling schemes, linear svms, etc), except for how the simulated data was generated.
In these examples, higher "snr" corresponds to easier classification (so the dark lines have more signal). When permuting the training set labels only the null distribution changes dramatically (and very strangely), becoming wider as the snr increases.
What is the difference?In my original simulations the variance increased minimally with signal. I made the values for one voxel with the R code
tbl[,i] <- c(rnorm(40, mean=-0.15, sd=1), rnorm(40, mean=0.15, sd=1));
which generates 40 entries at random from a standard normal distribution with standard deviation 1 and mean -0.15 to be the 40 values for that voxel in class A, and 40 entries from a normal distribution with standard deviation 1 and mean 0.15 to be the values for that voxel in class B. (This is for 4 runs with 10 examples in each run, and a "bias" of 0.15.)
I set up the simulation for this post to mimic the description of the pyMVPA normal_feature_dataset function, which was used in the simulations on the pyMVPA message list. I kept the parameters similar to my previous simulation (all voxels have signal), but used this R code to generate data for one voxel:
temp.vox <- rnorm(40, mean=0, sd=1);
temp.vox2 <- temp.vox / pymvpa.snr;
temp.vox <- c(temp.vox, temp.vox2);
tbl[,i] <- temp.vox / max(abs(temp.vox));
Where pymvpa.snr is 0.3, 3, 10, or 30. In words, I choose 40 entries from a standard normal distribution with standard deviation 1 and mean 0 to be the 40 values for that voxel in class A. I then divided each of the 40 values by the signal-to-noise ratio (e.g. 0.3), and set those values for class B. Finally, I divided each value by the largest absolute value of any of the 80 values.
Dividing by a constant (the signal-to-noise) makes fairly large differences in the variance of the A and B samples as the constant changes, even after normalization, and I think that might be driving the "widening" of the null distribution with increased signal, and certainly makes large differences to the null distributions.
So ...If such effects of how datasets are simulated are widespread, we'll have to be much more careful in explaining how simulations were conducted, and perhaps try to decide on a standard. Which methods are actually closest to fMRI data? Or can we even determine a general case, given how much fMRI preprocessing can vary?
UPDATE: this is not an accurate description of the normal_feature_dataset function. See this post for details.