{"title":"Application of the hierarchical bootstrap to multi-level data in neuroscience.","authors":"Varun Saravanan, Gordon J Berman, Samuel J Sober","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>A common feature in many neuroscience datasets is the presence of hierarchical data structures, most commonly recording the activity of multiple neurons in multiple animals across multiple trials. Accordingly, the measurements constituting the dataset are not independent, even though the traditional statistical analyses often applied in such cases (e.g., Student's t-test) treat them as such. The hierarchical bootstrap has been shown to be an effective tool to accurately analyze such data and while it has been used extensively in the statistical literature, its use is not widespread in neuroscience - despite the ubiquity of hierarchical datasets. In this paper, we illustrate the intuitiveness and utility of this approach to analyze hierarchically nested datasets. We use simulated neural data to show that traditional statistical tests can result in a false positive rate of over 45%, even if the Type-I error rate is set at 5%. While summarizing data across non-independent points (or lower levels) can potentially fix this problem, this approach greatly reduces the statistical power of the analysis. The hierarchical bootstrap, when applied sequentially over the levels of the hierarchical structure, keeps the Type-I error rate within the intended bound and retains more statistical power than summarizing methods. We conclude by demonstrating the effectiveness of the method in two real-world examples, first analyzing singing data in male Bengalese finches (<i>Lonchura striata</i> var. <i>domestica</i>) and second quantifying changes in behavior under optogenetic control in flies (<i>Drosophila melanogaster</i>).</p>","PeriodicalId":74289,"journal":{"name":"Neurons, behavior, data analysis and theory","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7906290/pdf/nihms-1630846.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurons, behavior, data analysis and theory","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/7/21 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A common feature in many neuroscience datasets is the presence of hierarchical data structures, most commonly recording the activity of multiple neurons in multiple animals across multiple trials. Accordingly, the measurements constituting the dataset are not independent, even though the traditional statistical analyses often applied in such cases (e.g., Student's t-test) treat them as such. The hierarchical bootstrap has been shown to be an effective tool to accurately analyze such data and while it has been used extensively in the statistical literature, its use is not widespread in neuroscience - despite the ubiquity of hierarchical datasets. In this paper, we illustrate the intuitiveness and utility of this approach to analyze hierarchically nested datasets. We use simulated neural data to show that traditional statistical tests can result in a false positive rate of over 45%, even if the Type-I error rate is set at 5%. While summarizing data across non-independent points (or lower levels) can potentially fix this problem, this approach greatly reduces the statistical power of the analysis. The hierarchical bootstrap, when applied sequentially over the levels of the hierarchical structure, keeps the Type-I error rate within the intended bound and retains more statistical power than summarizing methods. We conclude by demonstrating the effectiveness of the method in two real-world examples, first analyzing singing data in male Bengalese finches (Lonchura striata var. domestica) and second quantifying changes in behavior under optogenetic control in flies (Drosophila melanogaster).
许多神经科学数据集的一个共同特征是分层数据结构的存在,最常见的是记录多个动物在多个试验中的多个神经元的活动。因此,构成数据集的测量并不是独立的,即使传统的统计分析经常应用于这种情况下(例如,学生t检验)将它们视为独立的。分层自举已被证明是准确分析此类数据的有效工具,虽然它已在统计文献中广泛使用,但它在神经科学中的使用并不广泛-尽管分层数据集无处不在。在本文中,我们说明了这种方法在分析分层嵌套数据集时的直观性和实用性。我们使用模拟神经数据表明,即使将i型错误率设置为5%,传统的统计测试也可能导致超过45%的假阳性率。虽然跨非独立点(或较低级别)汇总数据可能会解决这个问题,但这种方法大大降低了分析的统计能力。当分层引导在分层结构的各个层次上依次应用时,可以将Type-I错误率保持在预期的范围内,并且比汇总方法保留更多的统计能力。最后,我们通过两个现实世界的例子证明了该方法的有效性,首先分析了雄性孟加拉雀(Lonchura striata var. domestica)的鸣叫数据,其次量化了光遗传控制下果蝇(Drosophila melanogaster)的行为变化。