[全基因组关联研究中极度不平衡数据的统计方法[2]]。

N Xie, W J Bi, Z W Zhang, F Shao, Y Y Wei, Y Zhao, R Y Zhang, F Chen
{"title":"[全基因组关联研究中极度不平衡数据的统计方法[2]]。","authors":"N Xie, W J Bi, Z W Zhang, F Shao, Y Y Wei, Y Zhao, R Y Zhang, F Chen","doi":"10.3760/cma.j.cn112338-20240712-00422","DOIUrl":null,"url":null,"abstract":"<p><p>Extremely unbalanced data refers to datasets with independent or dependent variables showing severe imbalances in proportions, which might lead to deviation of classical test statistics from theoretical distribution and difficulties in controlling type Ⅰ error. The increased availability of genome-wide resources from large population cohorts has highlighted the growing demand for efficient and accurate statistical methods for the process of extremely unbalanced data to improve the development of genetic statistical methods. This paper introduces two widely used correction methods in current genome-wide association study for extremely unbalanced data, i.e. Firth correction and saddle point approximation, describes their effectiveness in controlling type Ⅰ errors confirmed by simulation experiments, finally, and summarizes the commonly used software for extremely unbalanced genomic data to provide theoretical reference and suggestion for its application for the statistical analysis on extremely unbalanced data in future.</p>","PeriodicalId":23968,"journal":{"name":"中华流行病学杂志","volume":"46 1","pages":"147-153"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"[Statistical methods for extremely unbalanced data in genome-wide association study (2)].\",\"authors\":\"N Xie, W J Bi, Z W Zhang, F Shao, Y Y Wei, Y Zhao, R Y Zhang, F Chen\",\"doi\":\"10.3760/cma.j.cn112338-20240712-00422\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Extremely unbalanced data refers to datasets with independent or dependent variables showing severe imbalances in proportions, which might lead to deviation of classical test statistics from theoretical distribution and difficulties in controlling type Ⅰ error. The increased availability of genome-wide resources from large population cohorts has highlighted the growing demand for efficient and accurate statistical methods for the process of extremely unbalanced data to improve the development of genetic statistical methods. This paper introduces two widely used correction methods in current genome-wide association study for extremely unbalanced data, i.e. Firth correction and saddle point approximation, describes their effectiveness in controlling type Ⅰ errors confirmed by simulation experiments, finally, and summarizes the commonly used software for extremely unbalanced genomic data to provide theoretical reference and suggestion for its application for the statistical analysis on extremely unbalanced data in future.</p>\",\"PeriodicalId\":23968,\"journal\":{\"name\":\"中华流行病学杂志\",\"volume\":\"46 1\",\"pages\":\"147-153\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"中华流行病学杂志\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3760/cma.j.cn112338-20240712-00422\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"中华流行病学杂志","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3760/cma.j.cn112338-20240712-00422","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

摘要

极度不平衡数据是指自变量或因变量比例严重不平衡的数据集,这可能导致经典检验统计量偏离理论分布,难以控制Ⅰ型误差。来自大群体群体的全基因组资源的可得性增加,突出了对有效和准确的统计方法的日益增长的需求,用于处理极不平衡的数据,以改善遗传统计方法的发展。本文介绍了目前全基因组关联研究中对极度不平衡数据广泛使用的两种校正方法,即Firth校正和鞍点近似,并描述了它们在控制模拟实验证实的Ⅰ型误差方面的有效性。并对极端不平衡基因组数据的常用软件进行了总结,为其今后在极端不平衡数据统计分析中的应用提供理论参考和建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
[Statistical methods for extremely unbalanced data in genome-wide association study (2)].

Extremely unbalanced data refers to datasets with independent or dependent variables showing severe imbalances in proportions, which might lead to deviation of classical test statistics from theoretical distribution and difficulties in controlling type Ⅰ error. The increased availability of genome-wide resources from large population cohorts has highlighted the growing demand for efficient and accurate statistical methods for the process of extremely unbalanced data to improve the development of genetic statistical methods. This paper introduces two widely used correction methods in current genome-wide association study for extremely unbalanced data, i.e. Firth correction and saddle point approximation, describes their effectiveness in controlling type Ⅰ errors confirmed by simulation experiments, finally, and summarizes the commonly used software for extremely unbalanced genomic data to provide theoretical reference and suggestion for its application for the statistical analysis on extremely unbalanced data in future.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
中华流行病学杂志
中华流行病学杂志 Medicine-Medicine (all)
CiteScore
5.60
自引率
0.00%
发文量
8981
期刊介绍: Chinese Journal of Epidemiology, established in 1981, is an advanced academic periodical in epidemiology and related disciplines in China, which, according to the principle of integrating theory with practice, mainly reports the major progress in epidemiological research. The columns of the journal include commentary, expert forum, original article, field investigation, disease surveillance, laboratory research, clinical epidemiology, basic theory or method and review, etc.  The journal is included by more than ten major biomedical databases and index systems worldwide, such as been indexed in Scopus, PubMed/MEDLINE, PubMed Central (PMC), Europe PubMed Central, Embase, Chemical Abstract, Chinese Science and Technology Paper and Citation Database (CSTPCD), Chinese core journal essentials overview, Chinese Science Citation Database (CSCD) core database, Chinese Biological Medical Disc (CBMdisc), and Chinese Medical Citation Index (CMCI), etc. It is one of the core academic journals and carefully selected core journals in preventive and basic medicine in China.
期刊最新文献
[A simulation study for handling two-way treatment switching in rare event scenarios]. [A survey on the cognition of mpox expertise among relevant clinicians in China]. [Analysis of characteristics of anonymous online dating and related factors of not being tested for HIV among men who have sex with men in Shandong Province]. [Analysis on adverse treatment outcome of rifampicin-resistant tuberculosis patients and influencing factors in 9 provinces in China, 2017-2021]. [Association between dietary choline intake trajectories and cognitive function in middle-aged and older population].
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1