Wednesday, June 20, 2012

linear SVM weight behavior

Lately I've been thinking about how linear svm classification performance and weight maps look when the informative voxels are highly similar, since voxels can of course be highly correlated in fMRI. I made a few little examples that I want to share; R code for these examples is here.

These examples are similar to the ones I posted last month: one person, two experiment conditions (classes), four runs, four examples of each class in each run. Classifying with linear svm, c=1, partitioning on the runs, 100 voxels.Images are of the weights from the fitted svm; averaged over the four cross-validation folds.

In each case I generated random numbers for one class for each voxel. If the voxel is "uninformative" I copied the set of random numbers for the other class, if the voxel is "informative" I added a small number (the "bias") to the random numbers to form the other class. In other words, a non-informative voxel's value on the first class A example in run 1 is the same as the first class B example in run 1. If the voxel is informative, the first class B example in run 1 will be equal to the value of the first class A example plus the bias.

I ran these three ways: with all the informative voxels being identical (i.e. I generated one "informative" voxel than copied it the necessary number of times), with all informative voxels equally informative (equal bias) but not identical, and with varying bias in the informative voxels (so they were not identical or equally informative).

Running the code will let you generate graphs for each cross-validation fold and however many informative voxels you wish; I'll show just a few here.

In the graph for 5 identical informative voxels the informative voxels have by far the strongest weights, when there are 50 identical informative voxels they 'fade': their weights are less than the uninformative voxels.

Linear svms produce a weighted sum of the voxel values; a small weight on each is needed when there are so many identically informative voxels.
This does not happen when the voxels are equally informative, but not identical: the weights are largest (most negative, in this case) for the informative voxels (left side of the image).

The accuracy is higher than with the 50 identical informative voxels, though the bias is the same in both cases.

When the informative voxels are more variable the weight map is also more variable, with voxels with more bias having higher weights.


The most striking thing I noticed in these images is the way the weights of the informative voxels get closer to zero as the number of informative voxels increases. This could cause problems when voxels have highly similar timecourses - they won't be weighted in terms of the information in each, but rather as a function of the information in each and the number of voxels with a similar amount of information.

1 comment:

  1. I believe the best way to interpret weights to the problem domain is via the so called "Kohonen networks". They were originally proposed for unsupervised learning, but supervised versions exist now.