Tuesday, September 18, 2012

permutation testing: example dataset

I currently have a dataset that while mostly straightforward to analyze, has some complications that arise frequently. In this post I'll sketch some critical aspects to hopefully clarify the permutation testing issues.
  • The analysis is within-subjects (each person classified alone) with anatomical ROIs. The ROIs are reasonably small and independently defined; we will not do additional feature selection.
  • The subjects completed 12 scanning runs, with the stimuli reasonably-separated (time-wise), and randomly ordered.
  • There are various types of stimuli, but we are only interested in certain two-way classifications. 
  • We only want to include examples where the subject gave the correct response.
  • We will do linear support vector machines, c=1.
As frequently happens, there are not as many examples as I might prefer. The stimuli types were randomized and balanced across the experiment, but not such that each run was guaranteed to have the same number of examples of the types we're classifying. After removing examples the subjects answered incorrectly, the imbalance is even worse: not all stimulus types are present in every run for every person.

There are various ways to set up the cross-validation. Partitioning on the runs is not an option, since some runs don't have all the stimulus types. My goal was to find a partitioning scheme that is as simple as possible, but separates the training and testing sets in time while getting the number of examples of each class in the training and testing sets as close as possible.

I calculated the number of training and testing examples for several schemes, such as leave-four-runs-out and leave-three-runs-out, settling on leave-six-runs-out. Specifically, I use the first six runs as the training set and the last six runs (in presentation order) as the testing set, then the reverse. Using this scheme, the smallest number of examples in a partition for any person is 6; not great, but I guessed workable. The largest minimum for any person is 25; most people have in the teens.

Even with leave-six-runs-out partitioning the number of examples of the two classes in the training and testing set is usually unequal, sometimes strikingly so. My usual work-around is to subset the larger class to equalize the number of examples in each case (e.g. if there are 12 of class a and 14 of class b in the training set, take out two of the class b). There are of course many, many ways to do the subsetting. My usual practice is to randomly calculate 10 of these, then average the resulting accuracies. (e.g. calculate the accuracy when taking out the 1st and 3rd class b examples in the training set, then after taking out the 4th and 13th examples, etc.).

No comments:

Post a Comment