Small-group originating model: Optimized individual-level GWAS simulation featured by SLiM and using open-access data

IF 2.6 4区 生物学 Q2 BIOLOGY Computational Biology and Chemistry Pub Date : 2024-07-15 DOI:10.1016/j.compbiolchem.2024.108147
{"title":"Small-group originating model: Optimized individual-level GWAS simulation featured by SLiM and using open-access data","authors":"","doi":"10.1016/j.compbiolchem.2024.108147","DOIUrl":null,"url":null,"abstract":"<div><p>The development of analytical methods for Genome-wide Association Studies (GWAS) has outpaced the evolution of simulation techniques and pipelines. This disparity underscores the importance of innovative simulation methods that can keep pace with the rapidly increasing scale of GWAS. The median sample size of GWAS over the past ten years has exceeded 50,000 individuals, a trend that emphasizes the need for simulation tools capable of generating data on a similar or larger scale. This paper introduces a novel method, the small-group originating (SGO) model, utilizing the SLiM software for simulating individual-level GWAS data. Our standardized protocol facilitates the generation of tens of thousands of pseudo-individuals with millions of variants from small (30−90) open-access datasets.</p><p>SGO stands out, especially when compared to the widely-used resampling method in HapGen, showcasing superior simulation efficiency for large sample sizes (&gt; 13,000) of unrelated individuals. This capability is particularly relevant given the current trajectory towards larger GWAS, necessitating tools that can simulate datasets reflective of this growth. Additionally, SGO provides customization options and can model dynamic life cycles and mating across generations, positioning it as a highly promising alternative for GWAS simulations.</p><p>In a case study, sensitivity analyses of chromosome-level principal component analysis and kinship coefficient estimation were conducted. The results highlighted the poor robustness of chromosome-level quality control (QC) indexes and the uneven distribution of population structure across chromosomes and ancestries, advocating for the caution against relying solely on chromosome-level QC statistics.</p><p>With its flexible and efficient approach to generating pseudo GWAS data, our standardized SGO protocol emerges as a crucial asset for method development, power analysis, and benchmarking in GWAS research. It is especially vital in the context of accommodating the demands for large-scale simulations, aligning with the current and future scale of GWAS.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6000,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Biology and Chemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S147692712400135X","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The development of analytical methods for Genome-wide Association Studies (GWAS) has outpaced the evolution of simulation techniques and pipelines. This disparity underscores the importance of innovative simulation methods that can keep pace with the rapidly increasing scale of GWAS. The median sample size of GWAS over the past ten years has exceeded 50,000 individuals, a trend that emphasizes the need for simulation tools capable of generating data on a similar or larger scale. This paper introduces a novel method, the small-group originating (SGO) model, utilizing the SLiM software for simulating individual-level GWAS data. Our standardized protocol facilitates the generation of tens of thousands of pseudo-individuals with millions of variants from small (30−90) open-access datasets.

SGO stands out, especially when compared to the widely-used resampling method in HapGen, showcasing superior simulation efficiency for large sample sizes (> 13,000) of unrelated individuals. This capability is particularly relevant given the current trajectory towards larger GWAS, necessitating tools that can simulate datasets reflective of this growth. Additionally, SGO provides customization options and can model dynamic life cycles and mating across generations, positioning it as a highly promising alternative for GWAS simulations.

In a case study, sensitivity analyses of chromosome-level principal component analysis and kinship coefficient estimation were conducted. The results highlighted the poor robustness of chromosome-level quality control (QC) indexes and the uneven distribution of population structure across chromosomes and ancestries, advocating for the caution against relying solely on chromosome-level QC statistics.

With its flexible and efficient approach to generating pseudo GWAS data, our standardized SGO protocol emerges as a crucial asset for method development, power analysis, and benchmarking in GWAS research. It is especially vital in the context of accommodating the demands for large-scale simulations, aligning with the current and future scale of GWAS.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
小群体起源模型:以 SLiM 为特色并使用开放获取数据的优化个体水平 GWAS 模拟
全基因组关联研究(GWAS)分析方法的发展速度超过了模拟技术和管道的发展速度。这种差距凸显了创新模拟方法的重要性,它能跟上全基因组关联研究规模迅速扩大的步伐。在过去十年中,GWAS 的样本量中位数已超过 50,000 个个体,这一趋势强调了对能够生成类似或更大规模数据的模拟工具的需求。本文介绍了一种新方法--小群体起源(SGO)模型,利用 SLiM 软件模拟个体水平的 GWAS 数据。与 HapGen 中广泛使用的重采样方法相比,SGO 尤为突出,它对大样本量(13,000 个)非相关个体的模拟效率更高。鉴于目前全球基因组研究正朝着大型化的方向发展,因此需要能模拟反映这一发展的数据集的工具,而这一功能就显得尤为重要。此外,SGO 还提供定制选项,并能模拟动态生命周期和跨代交配,使其成为极有前途的 GWAS 模拟替代工具。在一项案例研究中,对染色体级主成分分析和亲缘关系系数估计进行了敏感性分析。结果表明,染色体水平的质量控制(QC)指标稳健性差,而且种群结构在染色体和祖先间分布不均,因此主张谨慎对待单纯依赖染色体水平的 QC 统计。它在满足大规模模拟需求方面尤为重要,符合 GWAS 当前和未来的规模。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computational Biology and Chemistry
Computational Biology and Chemistry 生物-计算机:跨学科应用
CiteScore
6.10
自引率
3.20%
发文量
142
审稿时长
24 days
期刊介绍: Computational Biology and Chemistry publishes original research papers and review articles in all areas of computational life sciences. High quality research contributions with a major computational component in the areas of nucleic acid and protein sequence research, molecular evolution, molecular genetics (functional genomics and proteomics), theory and practice of either biology-specific or chemical-biology-specific modeling, and structural biology of nucleic acids and proteins are particularly welcome. Exceptionally high quality research work in bioinformatics, systems biology, ecology, computational pharmacology, metabolism, biomedical engineering, epidemiology, and statistical genetics will also be considered. Given their inherent uncertainty, protein modeling and molecular docking studies should be thoroughly validated. In the absence of experimental results for validation, the use of molecular dynamics simulations along with detailed free energy calculations, for example, should be used as complementary techniques to support the major conclusions. Submissions of premature modeling exercises without additional biological insights will not be considered. Review articles will generally be commissioned by the editors and should not be submitted to the journal without explicit invitation. However prospective authors are welcome to send a brief (one to three pages) synopsis, which will be evaluated by the editors.
期刊最新文献
Genome-wide identification of alternative splicing related with transcription factors and splicing regulators in breast cancer stem cells responding to fasting-mimicking diet Design and characterization of defined alpha-helix mini-proteins with intrinsic cell permeability Identify the key genes and pathways of melatonin in age-dependent mice hippocampus regulation by transcriptome analysis Integrating (deep) machine learning and cheminformatics for predicting human intestinal absorption of small molecules Investigating pH-induced conformational switch in PIM-1: An integrated multi spectroscopic and MD simulation study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1