[全基因组关联研究中极度不平衡数据的统计方法[1]]。

N Xie, W J Bi, Z W Zhang, F Shao, Y Y Wei, Y Zhao, R Y Zhang, F Chen
{"title":"[全基因组关联研究中极度不平衡数据的统计方法[1]]。","authors":"N Xie, W J Bi, Z W Zhang, F Shao, Y Y Wei, Y Zhao, R Y Zhang, F Chen","doi":"10.3760/cma.j.cn112338-20240506-00235","DOIUrl":null,"url":null,"abstract":"<p><p>Extremely unbalanced data here refers to datasets where the values of independent or dependent variables exhibit severe unbalance in proportions, such as extremely unbalanced case-control ratio, very low incidence rate of disease, heavily censored time-to-event data, and low-frequency or rare variants. In such scenarios, the statistic derived from hypothesis test using the classical statistical method, e.g., logistic regression model and Cox proportional hazard regression model, might deviate from theoretical asymptotic distribution, resulting in inflation or deflation of type I error. With the increased availability and exploration of resources from large-scale population cohorts in genome-wide association study (GWAS), there is a growing demand for effective and accurate statistical approaches to handle extremely unbalanced data in independent and non-independent samples. Our study introduces classical statistical methods in genetic statistics firstly, then, summarizes the failure of classical statistical methods in dealing with extremely unbalanced data through simulation experiments to draw researchers' attention to the extremely unbalanced data in GWAS.</p>","PeriodicalId":23968,"journal":{"name":"中华流行病学杂志","volume":"45 11","pages":"1582-1589"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"[Statistical methods for extremely unbalanced data in genome-wide association study (1)].\",\"authors\":\"N Xie, W J Bi, Z W Zhang, F Shao, Y Y Wei, Y Zhao, R Y Zhang, F Chen\",\"doi\":\"10.3760/cma.j.cn112338-20240506-00235\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Extremely unbalanced data here refers to datasets where the values of independent or dependent variables exhibit severe unbalance in proportions, such as extremely unbalanced case-control ratio, very low incidence rate of disease, heavily censored time-to-event data, and low-frequency or rare variants. In such scenarios, the statistic derived from hypothesis test using the classical statistical method, e.g., logistic regression model and Cox proportional hazard regression model, might deviate from theoretical asymptotic distribution, resulting in inflation or deflation of type I error. With the increased availability and exploration of resources from large-scale population cohorts in genome-wide association study (GWAS), there is a growing demand for effective and accurate statistical approaches to handle extremely unbalanced data in independent and non-independent samples. Our study introduces classical statistical methods in genetic statistics firstly, then, summarizes the failure of classical statistical methods in dealing with extremely unbalanced data through simulation experiments to draw researchers' attention to the extremely unbalanced data in GWAS.</p>\",\"PeriodicalId\":23968,\"journal\":{\"name\":\"中华流行病学杂志\",\"volume\":\"45 11\",\"pages\":\"1582-1589\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"中华流行病学杂志\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3760/cma.j.cn112338-20240506-00235\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"中华流行病学杂志","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3760/cma.j.cn112338-20240506-00235","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

摘要

这里的极度不平衡数据是指自变量或因变量的值在比例上表现出严重不平衡的数据集,例如极度不平衡的病例-对照比、非常低的发病率、严重审查的时间-事件数据以及低频或罕见的变异。在这种情况下,使用经典统计方法,如logistic回归模型和Cox比例风险回归模型进行假设检验得到的统计量可能偏离理论渐近分布,导致I型误差的通货膨胀或通货紧缩。随着全基因组关联研究(GWAS)中大规模人群队列资源的可获得性和探索程度的提高,对有效和准确的统计方法的需求日益增长,以处理独立和非独立样本中极度不平衡的数据。本研究首先介绍了遗传统计中的经典统计方法,然后通过模拟实验总结了经典统计方法在处理极不平衡数据方面的失败,以引起研究者对GWAS中极不平衡数据的关注。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
[Statistical methods for extremely unbalanced data in genome-wide association study (1)].

Extremely unbalanced data here refers to datasets where the values of independent or dependent variables exhibit severe unbalance in proportions, such as extremely unbalanced case-control ratio, very low incidence rate of disease, heavily censored time-to-event data, and low-frequency or rare variants. In such scenarios, the statistic derived from hypothesis test using the classical statistical method, e.g., logistic regression model and Cox proportional hazard regression model, might deviate from theoretical asymptotic distribution, resulting in inflation or deflation of type I error. With the increased availability and exploration of resources from large-scale population cohorts in genome-wide association study (GWAS), there is a growing demand for effective and accurate statistical approaches to handle extremely unbalanced data in independent and non-independent samples. Our study introduces classical statistical methods in genetic statistics firstly, then, summarizes the failure of classical statistical methods in dealing with extremely unbalanced data through simulation experiments to draw researchers' attention to the extremely unbalanced data in GWAS.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
中华流行病学杂志
中华流行病学杂志 Medicine-Medicine (all)
CiteScore
5.60
自引率
0.00%
发文量
8981
期刊介绍: Chinese Journal of Epidemiology, established in 1981, is an advanced academic periodical in epidemiology and related disciplines in China, which, according to the principle of integrating theory with practice, mainly reports the major progress in epidemiological research. The columns of the journal include commentary, expert forum, original article, field investigation, disease surveillance, laboratory research, clinical epidemiology, basic theory or method and review, etc.  The journal is included by more than ten major biomedical databases and index systems worldwide, such as been indexed in Scopus, PubMed/MEDLINE, PubMed Central (PMC), Europe PubMed Central, Embase, Chemical Abstract, Chinese Science and Technology Paper and Citation Database (CSTPCD), Chinese core journal essentials overview, Chinese Science Citation Database (CSCD) core database, Chinese Biological Medical Disc (CBMdisc), and Chinese Medical Citation Index (CMCI), etc. It is one of the core academic journals and carefully selected core journals in preventive and basic medicine in China.
期刊最新文献
[A simulation study for handling two-way treatment switching in rare event scenarios]. [A survey on the cognition of mpox expertise among relevant clinicians in China]. [Analysis of characteristics of anonymous online dating and related factors of not being tested for HIV among men who have sex with men in Shandong Province]. [Analysis on adverse treatment outcome of rifampicin-resistant tuberculosis patients and influencing factors in 9 provinces in China, 2017-2021]. [Association between dietary choline intake trajectories and cognitive function in middle-aged and older population].
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1