Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics.

IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS JMIR Medical Informatics Pub Date : 2024-11-14 DOI:10.2196/53622
Félix Camirand Lemyre, Simon Lévesque, Marie-Pier Domingue, Klaus Herrmann, Jean-François Ethier
{"title":"Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics.","authors":"Félix Camirand Lemyre, Simon Lévesque, Marie-Pier Domingue, Klaus Herrmann, Jean-François Ethier","doi":"10.2196/53622","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Data from multiple organizations are crucial for advancing learning health systems. However, ethical, legal, and social concerns may restrict the use of standard statistical methods that rely on pooling data. Although distributed algorithms offer alternatives, they may not always be suitable for health frameworks.</p><p><strong>Objective: </strong>This study aims to support researchers and data custodians in three ways: (1) providing a concise overview of the literature on statistical inference methods for horizontally partitioned data, (2) describing the methods applicable to generalized linear models (GLMs) and assessing their underlying distributional assumptions, and (3) adapting existing methods to make them fully usable in health settings.</p><p><strong>Methods: </strong>A scoping review methodology was used for the literature mapping, from which methods presenting a methodological framework for GLM analyses with horizontally partitioned data were identified and assessed from the perspective of applicability in health settings. Statistical theory was used to adapt methods and derive the properties of the resulting estimators.</p><p><strong>Results: </strong>From the review, 41 articles were selected and 6 approaches were extracted to conduct standard GLM-based statistical analysis. However, these approaches assumed evenly and identically distributed data across nodes. Consequently, statistical procedures were derived to accommodate uneven node sample sizes and heterogeneous data distributions across nodes. Workflows and detailed algorithms were developed to highlight information sharing requirements and operational complexity.</p><p><strong>Conclusions: </strong>This study contributes to the field of health analytics by providing an overview of the methods that can be used with horizontally partitioned data by adapting these methods to the context of heterogeneous health data and clarifying the workflows and quantities exchanged by the methods discussed. Further analysis of the confidentiality preserved by these methods is needed to fully understand the risk associated with the sharing of summary statistics.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e53622"},"PeriodicalIF":3.1000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/53622","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Data from multiple organizations are crucial for advancing learning health systems. However, ethical, legal, and social concerns may restrict the use of standard statistical methods that rely on pooling data. Although distributed algorithms offer alternatives, they may not always be suitable for health frameworks.

Objective: This study aims to support researchers and data custodians in three ways: (1) providing a concise overview of the literature on statistical inference methods for horizontally partitioned data, (2) describing the methods applicable to generalized linear models (GLMs) and assessing their underlying distributional assumptions, and (3) adapting existing methods to make them fully usable in health settings.

Methods: A scoping review methodology was used for the literature mapping, from which methods presenting a methodological framework for GLM analyses with horizontally partitioned data were identified and assessed from the perspective of applicability in health settings. Statistical theory was used to adapt methods and derive the properties of the resulting estimators.

Results: From the review, 41 articles were selected and 6 approaches were extracted to conduct standard GLM-based statistical analysis. However, these approaches assumed evenly and identically distributed data across nodes. Consequently, statistical procedures were derived to accommodate uneven node sample sizes and heterogeneous data distributions across nodes. Workflows and detailed algorithms were developed to highlight information sharing requirements and operational complexity.

Conclusions: This study contributes to the field of health analytics by providing an overview of the methods that can be used with horizontally partitioned data by adapting these methods to the context of heterogeneous health data and clarifying the workflows and quantities exchanged by the methods discussed. Further analysis of the confidentiality preserved by these methods is needed to fully understand the risk associated with the sharing of summary statistics.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
分布式统计分析:范围审查和适用于健康分析的操作框架示例。
背景:来自多个组织的数据对于推进学习型卫生系统至关重要。然而,伦理、法律和社会问题可能会限制使用依赖于汇集数据的标准统计方法。尽管分布式算法提供了替代方案,但它们可能并不总是适合于健康框架。目的:本研究旨在从三个方面为研究人员和数据管理员提供支持:(1)简要概述水平分区数据统计推断方法的文献;(2)描述适用于广义线性模型(glm)的方法并评估其潜在的分布假设;(3)调整现有方法,使其在卫生环境中完全可用。方法:采用范围审查方法进行文献制图,从中确定了具有水平分割数据的GLM分析方法框架的方法,并从卫生环境适用性的角度对其进行了评估。利用统计理论对方法进行调整,并推导所得估计量的性质。结果:从综述中选择41篇文章,提取6种方法进行标准的glm统计分析。然而,这些方法假设数据在节点间均匀且相同地分布。因此,导出了统计程序以适应不均匀的节点样本量和跨节点的异构数据分布。制定了工作流程和详细的算法,以突出信息共享要求和操作复杂性。结论:本研究对健康分析领域做出了贡献,概述了可用于水平分割数据的方法,使这些方法适应于异构健康数据的背景,并阐明了所讨论的方法的工作流程和交换的数量。需要进一步分析这些方法所保持的保密性,以充分了解与共享汇总统计数据有关的风险。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
JMIR Medical Informatics
JMIR Medical Informatics Medicine-Health Informatics
CiteScore
7.90
自引率
3.10%
发文量
173
审稿时长
12 weeks
期刊介绍: JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.
期刊最新文献
Machine Learning-Based Risk Factor Analysis and Prediction Model Construction for the Occurrence of Chronic Heart Failure: Health Ecologic Study. Smart Contracts and Shared Platforms in Sustainable Health Care: Systematic Review. Diagnostic Decision-Making Variability Between Novice and Expert Optometrists for Glaucoma: Comparative Analysis to Inform AI System Design. The Social Construction of Categorical Data: Mixed Methods Approach to Assessing Data Features in Publicly Available Datasets. Digital Representation of Patients as Medical Digital Twins: Data-Centric Viewpoint.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1