Friday, May 24, 2013

Schapiro 2013: response from Anna Schapiro

Anna Schapiro, first author of "Neural representations of events arise from temporal community structure", kindly provided additional details (even a diagram!), addressing the questions I asked about her research in my previous post. She gave me permission to share extracts from her reply here, which we hope will be useful to others as well.

She said that my descriptions and guesses were basically accurate, including that they used the searchmight toolbox to perform the searchlight analysis, and so used cubical searchlights by default.

My first question on the previous post was, "Were the item pairs matched for time as well as number of steps?" Anna replied that:
"We did at one point try to balance time as well as the number of steps between items. So we were averaging correlations between items that were both the same number of steps and the same amount of time apart. But I found that some of the time/step bins had very few items and that that made the estimates significantly more noisy. So I opted for the simpler step balance approach. Although it doesn't perfectly address the time issue, it also doesn't introduce any bias, so we thought that was a reasonable way to go."

about the node pairs

My second question was about how many correlation differences went into each average, or, more generally, which within-cluster and between-cluster pairs were correlated.

"Regarding choices of pairs, I think the confusion is that we chose one Hamiltonian path for each subject and used the forwards and backwards versions of that path throughout the scan (see the beginning of Exp 2 for an explanation of why we did this). Let's assume that that path is the outermost path through the graph, as you labeled in your post [top figure]. Then the attached figure [second figure] shows the within and between cluster comparisons that we used, with nodes indexed by rows and columns, and color representing the distance on the graph between the two nodes being compared. We performed the correlations for all node pairs that have the same color (i.e., were the same distance away on the graph) and averaged those values before averaging all the within or between cluster correlations."

Here's the node-numbered version of Figure 1 I posted previously, followed by the figure Anna created to show which node pairs would be included for this graph (the images should enlarge if you click on them).


Concretely, then, if the Hamiltonian path for a person was that shown in Figure 1 (around the outside of the graph, as the nodes are numbered), correlations would be calculated for the node pairs marked in the second figure on each of the approximately 20 path traversals. This is about 60 correlations for each path traversal, sorted into 30 within-cluster comparisons and 30 between-cluster comparisons ("about" 60 since some pairs on each traversal might be omitted depending on where the path started/ended).

The 30 pairs (correlations) for the within-cluster comparisons would be collapsed into a single number by averaging in two steps. First, the correlations of the same length would be averaged (e.g. three for length-4-within-cluster: 11-15, 6-10, 1-5), giving four averages (one for length-1 pairs, one for length-2 pairs, one for length-3 pairs, and one for length-4 pairs). Second, these four averages would be averaged, giving a single average for within- and between-cluster comparisons. This two-step averaging somewhat reduces the influence of path length imbalance (averaging all pairs together would include 12 length-1 pairs in the within-cluster comparison but only 3 length-1 pairs in the between-cluster comparison), though may not eliminate it completely. I wonder if picking three pairs of each length to average (i.e. all length-4 within-cluster but only a third of the length-1 within-cluster) would change the outcome?

group-level analysis

My third question was about how exactly the group analysis was performed. Anna replied that

"yes, we used a one-sample t-test on our within-between statistic values. In this scenario, randomise permutes the sign of the volumes for each subject. On each permutation, the entire volume for a particular subject is left alone or multiplied by -1. Then it looks for clusters of a certain size in this nonsense dataset. In the end it reports clusters in the true dataset are significantly more likely to be found than in these shuffled versions. I like using randomise because it preserves the spatial smoothness in every region of the brain in every subject. Searchlights may create smoothness that have a different character than other analyses, but we don't have to worry about it, since that smoothness is preserved in the null distribution in this permutation test."

I then asked, "Do you mean that it [randomise] permutes the sign of the differences for each person? So it is changing signs on the difference maps, not changing labels on the (processed) BOLD data then recalculating the correlations?", to which she replied that, "I feed randomise within-between maps, so yes, it's permuting the sign of the differences."


Thank you again, Anna Schapiro, for the helpful and detailed replies, and for allowing me to share our correspondence here!

No comments:

Post a Comment