These examples are similar to the ones I posted last month: one person, two experiment conditions (classes), four runs, four examples of each class in each run. Classifying with linear svm, c=1, partitioning on the runs, 100 voxels.Images are of the weights from the fitted svm; averaged over the four cross-validation folds.
In each case I generated random numbers for one class for each voxel. If the voxel is "uninformative" I copied the set of random numbers for the other class, if the voxel is "informative" I added a small number (the "bias") to the random numbers to form the other class. In other words, a non-informative voxel's value on the first class A example in run 1 is the same as the first class B example in run 1. If the voxel is informative, the first class B example in run 1 will be equal to the value of the first class A example plus the bias.
I ran these three ways: with all the informative voxels being identical (i.e. I generated one "informative" voxel than copied it the necessary number of times), with all informative voxels equally informative (equal bias) but not identical, and with varying bias in the informative voxels (so they were not identical or equally informative).
Running the code will let you generate graphs for each cross-validation fold and however many informative voxels you wish; I'll show just a few here.
In the graph for 5 identical informative voxels the informative voxels have by far the strongest weights, when there are 50 identical informative voxels they 'fade': their weights are less than the uninformative voxels.
Linear svms produce a weighted sum of the voxel values; a small weight on each is needed when there are so many identically informative voxels.
The accuracy is higher than with the 50 identical informative voxels, though the bias is the same in both cases.
The most striking thing I noticed in these images is the way the weights of the informative voxels get closer to zero as the number of informative voxels increases. This could cause problems when voxels have highly similar timecourses - they won't be weighted in terms of the information in each, but rather as a function of the information in each and the number of voxels with a similar amount of information.