不重复使用遗传物质的混合个体的模拟。

IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Molecular Ecology Resources Pub Date : 2025-01-12 DOI:10.1111/1755-0998.14069
Eric C Anderson, Rachael M Giglio, Matthew G DeSaix, Timothy J Smyser
{"title":"不重复使用遗传物质的混合个体的模拟。","authors":"Eric C Anderson, Rachael M Giglio, Matthew G DeSaix, Timothy J Smyser","doi":"10.1111/1755-0998.14069","DOIUrl":null,"url":null,"abstract":"<p><p>While a best practice for evaluating the behaviour of genetic clustering algorithms on empirical data is to conduct parallel analyses on simulated data, these types of simulation techniques often involve sampling genetic data with replacement. In this paper we demonstrate that sampling with replacement, especially with large marker sets, inflates the perceived statistical power to correctly assign individuals (or the alleles that they carry) back to source populations-a phenomenon we refer to as resampling-induced, spurious power inflation (RISPI). To address this issue, we present gscramble, a simulation approach in R for creating biologically informed individual genotypes from empirical data that: (1) samples alleles from populations without replacement and (2) segregates alleles based on species-specific recombination rates. This framework makes it possible to simulate admixed individuals in a way that respects the physical linkage between markers on the same chromosome and which does not suffer from RISPI. This is achieved in gscramble by allowing users to specify pedigrees of varying complexity in order to simulate admixed genotypes, segregating and tracking haplotype blocks from different source populations through those pedigrees, and then sampling-using a variety of permutation schemes-alleles from empirical data into those haplotype blocks. We demonstrate the functionality of gscramble with both simulated and empirical data sets and highlight additional uses of the package that users may find valuable.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":" ","pages":"e14069"},"PeriodicalIF":5.5000,"publicationDate":"2025-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"gscramble: Simulation of Admixed Individuals Without Reuse of Genetic Material.\",\"authors\":\"Eric C Anderson, Rachael M Giglio, Matthew G DeSaix, Timothy J Smyser\",\"doi\":\"10.1111/1755-0998.14069\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>While a best practice for evaluating the behaviour of genetic clustering algorithms on empirical data is to conduct parallel analyses on simulated data, these types of simulation techniques often involve sampling genetic data with replacement. In this paper we demonstrate that sampling with replacement, especially with large marker sets, inflates the perceived statistical power to correctly assign individuals (or the alleles that they carry) back to source populations-a phenomenon we refer to as resampling-induced, spurious power inflation (RISPI). To address this issue, we present gscramble, a simulation approach in R for creating biologically informed individual genotypes from empirical data that: (1) samples alleles from populations without replacement and (2) segregates alleles based on species-specific recombination rates. This framework makes it possible to simulate admixed individuals in a way that respects the physical linkage between markers on the same chromosome and which does not suffer from RISPI. This is achieved in gscramble by allowing users to specify pedigrees of varying complexity in order to simulate admixed genotypes, segregating and tracking haplotype blocks from different source populations through those pedigrees, and then sampling-using a variety of permutation schemes-alleles from empirical data into those haplotype blocks. We demonstrate the functionality of gscramble with both simulated and empirical data sets and highlight additional uses of the package that users may find valuable.</p>\",\"PeriodicalId\":211,\"journal\":{\"name\":\"Molecular Ecology Resources\",\"volume\":\" \",\"pages\":\"e14069\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-01-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Ecology Resources\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1111/1755-0998.14069\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Ecology Resources","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1111/1755-0998.14069","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

虽然评估遗传聚类算法在经验数据上的行为的最佳实践是对模拟数据进行并行分析,但这些类型的模拟技术通常涉及对遗传数据进行替换采样。在本文中,我们证明了用替换抽样,特别是用大标记集抽样,会使正确分配个体(或他们携带的等位基因)回到源种群的感知统计能力膨胀——我们将这种现象称为重采样诱导的虚假权力膨胀(RISPI)。为了解决这个问题,我们在R中提出了一种模拟方法gscramble,用于从经验数据中创建具有生物学信息的个体基因型,该方法:(1)从没有替代的种群中取样等位基因;(2)根据物种特异性重组率分离等位基因。这个框架使得以一种尊重同一染色体上标记之间的物理联系的方式模拟混合个体成为可能,并且不会受到RISPI的影响。这是通过允许用户指定不同复杂性的谱系来实现的,以便模拟混合基因型,通过这些谱系分离和跟踪来自不同来源人群的单倍型块,然后使用各种排列方案将经验数据中的等位基因取样到这些单倍型块中。我们用模拟和经验数据集演示了gscramble的功能,并强调了用户可能觉得有价值的软件包的其他用途。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
gscramble: Simulation of Admixed Individuals Without Reuse of Genetic Material.

While a best practice for evaluating the behaviour of genetic clustering algorithms on empirical data is to conduct parallel analyses on simulated data, these types of simulation techniques often involve sampling genetic data with replacement. In this paper we demonstrate that sampling with replacement, especially with large marker sets, inflates the perceived statistical power to correctly assign individuals (or the alleles that they carry) back to source populations-a phenomenon we refer to as resampling-induced, spurious power inflation (RISPI). To address this issue, we present gscramble, a simulation approach in R for creating biologically informed individual genotypes from empirical data that: (1) samples alleles from populations without replacement and (2) segregates alleles based on species-specific recombination rates. This framework makes it possible to simulate admixed individuals in a way that respects the physical linkage between markers on the same chromosome and which does not suffer from RISPI. This is achieved in gscramble by allowing users to specify pedigrees of varying complexity in order to simulate admixed genotypes, segregating and tracking haplotype blocks from different source populations through those pedigrees, and then sampling-using a variety of permutation schemes-alleles from empirical data into those haplotype blocks. We demonstrate the functionality of gscramble with both simulated and empirical data sets and highlight additional uses of the package that users may find valuable.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Molecular Ecology Resources
Molecular Ecology Resources 生物-进化生物学
CiteScore
15.60
自引率
5.20%
发文量
170
审稿时长
3 months
期刊介绍: Molecular Ecology Resources promotes the creation of comprehensive resources for the scientific community, encompassing computer programs, statistical and molecular advancements, and a diverse array of molecular tools. Serving as a conduit for disseminating these resources, the journal targets a broad audience of researchers in the fields of evolution, ecology, and conservation. Articles in Molecular Ecology Resources are crafted to support investigations tackling significant questions within these disciplines. In addition to original resource articles, Molecular Ecology Resources features Reviews, Opinions, and Comments relevant to the field. The journal also periodically releases Special Issues focusing on resource development within specific areas.
期刊最新文献
Next-Generation Snow Leopard Population Assessment Tool: Multiplex-PCR SNP Panel for Individual Identification From Faeces. A Long-Term Ecological Research Data Set From the Marine Genetic Monitoring Program ARMS-MBON 2018-2020. Comparative Genomics Points to Ecological Drivers of Genomic Divergence Among Intertidal Limpets. Quantifying Bone Collagen Fingerprint Variation Between Species. Pangenomics Links Boll Weevil Divergence With Ancient Mesoamerican Cotton Cultivation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1