Understanding how goals control behavior is a question ripe for interrogation by new methods from machine learning. These methods require large and labeled datasets to train models. To annotate a large-scale image dataset with observed search fixations, we collected 16,184 fixations from people searching for either microwaves or clocks in a dataset of 4,366 images (MS-COCO). We then used this behaviorally-annotated dataset and the machine learning method of inverse-reinforcement learning (IRL) to learn target-specific reward functions and policies for these two target goals. Finally, we used these learned policies to predict the fixations of 60 new behavioral searchers (clock = 30, microwave = 30) in a disjoint test dataset of kitchen scenes depicting both a microwave and a clock (thus controlling for differences in low-level image contrast). We found that the IRL model predicted behavioral search efficiency and fixation-density maps using multiple metrics. Moreover, reward maps from the IRL model revealed target-specific patterns that suggest, not just attention guidance by target features, but also guidance by scene context (e.g., fixations along walls in the search of clocks). Using machine learning and the psychologically meaningful principle of reward, it is possible to learn the visual features used in goal-directed attention control.
A common feature in many neuroscience datasets is the presence of hierarchical data structures, most commonly recording the activity of multiple neurons in multiple animals across multiple trials. Accordingly, the measurements constituting the dataset are not independent, even though the traditional statistical analyses often applied in such cases (e.g., Student's t-test) treat them as such. The hierarchical bootstrap has been shown to be an effective tool to accurately analyze such data and while it has been used extensively in the statistical literature, its use is not widespread in neuroscience - despite the ubiquity of hierarchical datasets. In this paper, we illustrate the intuitiveness and utility of this approach to analyze hierarchically nested datasets. We use simulated neural data to show that traditional statistical tests can result in a false positive rate of over 45%, even if the Type-I error rate is set at 5%. While summarizing data across non-independent points (or lower levels) can potentially fix this problem, this approach greatly reduces the statistical power of the analysis. The hierarchical bootstrap, when applied sequentially over the levels of the hierarchical structure, keeps the Type-I error rate within the intended bound and retains more statistical power than summarizing methods. We conclude by demonstrating the effectiveness of the method in two real-world examples, first analyzing singing data in male Bengalese finches (Lonchura striata var. domestica) and second quantifying changes in behavior under optogenetic control in flies (Drosophila melanogaster).
We recently reported the existence of fluctuations in neural signals that may permit neurons to code multiple simultaneous stimuli sequentially across time [1]. This required deploying a novel statistical approach to permit investigation of neural activity at the scale of individual trials. Here we present tests using synthetic data to assess the sensitivity and specificity of this analysis. We fabricated datasets to match each of several potential response patterns derived from single-stimulus response distributions. In particular, we simulated dual stimulus trial spike counts that reflected fluctuating mixtures of the single stimulus spike counts, stable intermediate averages, single stimulus winner-take-all, or response distributions that were outside the range defined by the single stimulus responses (such as summation or suppression). We then assessed how well the analysis recovered the correct response pattern as a function of the number of simulated trials and the difference between the simulated responses to each "stimulus" alone. We found excellent recovery of the mixture, intermediate, and outside categories (>97% correct), and good recovery of the single/winner-take-all category (>90% correct) when the number of trials was >20 and the single-stimulus response rates were 50Hz and 20Hz respectively. Both larger numbers of trials and greater separation between the single stimulus firing rates improved categorization accuracy. These results provide a benchmark, and guidelines for data collection, for use of this method to investigate coding of multiple items at the individual-trial time scale.