Statistical Analysis of Data Repeatability Measures

IF 1.7 3区 数学 Q1 STATISTICS & PROBABILITY International Statistical Review Pub Date : 2024-08-09 DOI:10.1111/insr.12591
Zeyi Wang, Eric Bridgeford, Shangsi Wang, Joshua T. Vogelstein, Brian Caffo
{"title":"Statistical Analysis of Data Repeatability Measures","authors":"Zeyi Wang, Eric Bridgeford, Shangsi Wang, Joshua T. Vogelstein, Brian Caffo","doi":"10.1111/insr.12591","DOIUrl":null,"url":null,"abstract":"SummaryThe advent of modern data collection and processing techniques has seen the size, scale and complexity of data grow exponentially. A seminal step in leveraging these rich datasets for downstream inference is understanding the characteristics of the data which are repeatable—the aspects of the data that are able to be identified under duplicated analyses. Conflictingly, the utility of traditional repeatability measures, such as the intra‐class correlation coefficient, under these settings is limited. In recent work, novel data repeatability measures have been introduced in the context where a set of subjects are measured twice or more, including: fingerprinting, rank sums and generalisations of the intra‐class correlation coefficient. However, the relationships between, and the best practices among, these measures remains largely unknown. In this manuscript, we formalise a novel repeatability measure, discriminability. We show that it is deterministically linked with the intra‐class correlation coefficients under univariate random effect models and has the desired property of optimal accuracy for inferential tasks using multivariate measurements. Additionally, we overview and systematically compare existing repeatability statistics with discriminability, using both theoretical results and simulations. We show that the rank sum statistic is deterministically linked to a consistent estimator of discriminability. The statistical power of permutation tests derived from these measures are compared numerically under Gaussian and non‐Gaussian settings, with and without simulated batch effects. Motivated by both theoretical and empirical results, we provide methodological recommendations for each benchmark setting to serve as a resource for future analyses. We believe these recommendations will play an important role towards improving repeatability in fields such as functional magnetic resonance imaging, genomics, pharmacology and more.","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":1.7000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Statistical Review","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1111/insr.12591","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

SummaryThe advent of modern data collection and processing techniques has seen the size, scale and complexity of data grow exponentially. A seminal step in leveraging these rich datasets for downstream inference is understanding the characteristics of the data which are repeatable—the aspects of the data that are able to be identified under duplicated analyses. Conflictingly, the utility of traditional repeatability measures, such as the intra‐class correlation coefficient, under these settings is limited. In recent work, novel data repeatability measures have been introduced in the context where a set of subjects are measured twice or more, including: fingerprinting, rank sums and generalisations of the intra‐class correlation coefficient. However, the relationships between, and the best practices among, these measures remains largely unknown. In this manuscript, we formalise a novel repeatability measure, discriminability. We show that it is deterministically linked with the intra‐class correlation coefficients under univariate random effect models and has the desired property of optimal accuracy for inferential tasks using multivariate measurements. Additionally, we overview and systematically compare existing repeatability statistics with discriminability, using both theoretical results and simulations. We show that the rank sum statistic is deterministically linked to a consistent estimator of discriminability. The statistical power of permutation tests derived from these measures are compared numerically under Gaussian and non‐Gaussian settings, with and without simulated batch effects. Motivated by both theoretical and empirical results, we provide methodological recommendations for each benchmark setting to serve as a resource for future analyses. We believe these recommendations will play an important role towards improving repeatability in fields such as functional magnetic resonance imaging, genomics, pharmacology and more.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
数据重复性测量的统计分析
摘要现代数据收集和处理技术的出现,使数据的大小、规模和复杂性呈指数级增长。利用这些丰富的数据集进行下游推断的一个重要步骤是了解数据的可重复性特征--即在重复分析中能够识别的数据方面。矛盾的是,传统的可重复性测量方法(如类内相关系数)在这些环境下的效用有限。在最近的工作中,人们在一组受试者被测量两次或两次以上的情况下引入了新的数据可重复性测量方法,包括:指纹识别、等级总和和类内相关系数的概括。然而,这些测量方法之间的关系和最佳实践在很大程度上仍不为人所知。在本手稿中,我们正式提出了一种新的可重复性测量方法--可辨别性。我们证明,在单变量随机效应模型下,它与类内相关系数之间存在确定性联系,并且在使用多变量测量的推断任务中具有最佳准确性这一理想特性。此外,我们还利用理论结果和模拟,概述并系统地比较了现有的可重复性统计量与可判别性。我们表明,秩和统计量与可判别性的一致估计值具有确定性联系。在高斯和非高斯环境下,我们对这些统计量得出的置换检验的统计能力进行了数值比较,并模拟和不模拟了批次效应。在理论和实证结果的推动下,我们为每种基准设置提供了方法建议,作为未来分析的资源。我们相信,这些建议将在提高功能磁共振成像、基因组学、药理学等领域的可重复性方面发挥重要作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
International Statistical Review
International Statistical Review 数学-统计学与概率论
CiteScore
4.30
自引率
5.00%
发文量
52
审稿时长
>12 weeks
期刊介绍: International Statistical Review is the flagship journal of the International Statistical Institute (ISI) and of its family of Associations. It publishes papers of broad and general interest in statistics and probability. The term Review is to be interpreted broadly. The types of papers that are suitable for publication include (but are not limited to) the following: reviews/surveys of significant developments in theory, methodology, statistical computing and graphics, statistical education, and application areas; tutorials on important topics; expository papers on emerging areas of research or application; papers describing new developments and/or challenges in relevant areas; papers addressing foundational issues; papers on the history of statistics and probability; white papers on topics of importance to the profession or society; and historical assessment of seminal papers in the field and their impact.
期刊最新文献
Handling Out‐of‐Sample Areas to Estimate the Unemployment Rate at Local Labour Market Areas in Italy On Frequency and Probability Weights: An In‐Depth Look at Duelling Weights Alternative Approaches for Estimating Highest‐Density Regions Clustering Longitudinal Data: A Review of Methods and Software Packages Flexible Multivariate Mixture Models: A Comprehensive Approach for Modeling Mixtures of Non‐Identical Distributions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1