疟疾寄生虫亲缘性估计的系统偏差。

IF 2.2 3区 生物学 Q3 GENETICS & HEREDITY G3: Genes|Genomes|Genetics Pub Date : 2025-05-08 DOI:10.1093/g3journal/jkaf018
Somya Mehra, Daniel E Neafsey, Michael White, Aimee R Taylor
{"title":"疟疾寄生虫亲缘性估计的系统偏差。","authors":"Somya Mehra, Daniel E Neafsey, Michael White, Aimee R Taylor","doi":"10.1093/g3journal/jkaf018","DOIUrl":null,"url":null,"abstract":"<p><p>Genetic studies of Plasmodium parasites increasingly feature relatedness estimates. However, various aspects of malaria parasite relatedness estimation are not fully understood. For example, relatedness estimates based on whole-genome-sequence (WGS) data often exceed those based on sparser data types. Systematic bias in relatedness estimation is well documented in the literature geared towards diploid organisms, but largely unknown within the malaria community. We characterize systematic bias in malaria parasite relatedness estimation using three complementary approaches: theoretically, under a non-ancestral statistical model of pairwise relatedness; numerically, under a simulation model of ancestry; and empirically, using data on parasites sampled from Guyana and Colombia. We show that allele frequency estimates encode, locus-by-locus, relatedness averaged over the set of sampled parasites used to compute them. Plugging sample allele frequencies into models of pairwise relatedness can lead to systematic underestimation. However, systematic underestimation can be viewed as population-relatedness calibration, i.e., a way of generating measures of relative relatedness. Systematic underestimation is unavoidable when relatedness is estimated assuming independence between genetic markers. It is mitigated when relatedness is estimated using WGS data under a hidden Markov model (HMM) that exploits linkage between proximal markers. The extent of mitigation is unknowable when a HMM is fit to sparser data, but downstream analyses that use high relatedness thresholds are relatively robust regardless. In summary, practitioners can either resolve to use relative relatedness estimated under independence, or try to estimate absolute relatedness under a HMM. We propose various tools to help practitioners evaluate their situation on a case-by-case basis.</p>","PeriodicalId":12468,"journal":{"name":"G3: Genes|Genomes|Genetics","volume":" ","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12060250/pdf/","citationCount":"0","resultStr":"{\"title\":\"Systematic bias in malaria parasite relatedness estimation.\",\"authors\":\"Somya Mehra, Daniel E Neafsey, Michael White, Aimee R Taylor\",\"doi\":\"10.1093/g3journal/jkaf018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Genetic studies of Plasmodium parasites increasingly feature relatedness estimates. However, various aspects of malaria parasite relatedness estimation are not fully understood. For example, relatedness estimates based on whole-genome-sequence (WGS) data often exceed those based on sparser data types. Systematic bias in relatedness estimation is well documented in the literature geared towards diploid organisms, but largely unknown within the malaria community. We characterize systematic bias in malaria parasite relatedness estimation using three complementary approaches: theoretically, under a non-ancestral statistical model of pairwise relatedness; numerically, under a simulation model of ancestry; and empirically, using data on parasites sampled from Guyana and Colombia. We show that allele frequency estimates encode, locus-by-locus, relatedness averaged over the set of sampled parasites used to compute them. Plugging sample allele frequencies into models of pairwise relatedness can lead to systematic underestimation. However, systematic underestimation can be viewed as population-relatedness calibration, i.e., a way of generating measures of relative relatedness. Systematic underestimation is unavoidable when relatedness is estimated assuming independence between genetic markers. It is mitigated when relatedness is estimated using WGS data under a hidden Markov model (HMM) that exploits linkage between proximal markers. The extent of mitigation is unknowable when a HMM is fit to sparser data, but downstream analyses that use high relatedness thresholds are relatively robust regardless. In summary, practitioners can either resolve to use relative relatedness estimated under independence, or try to estimate absolute relatedness under a HMM. We propose various tools to help practitioners evaluate their situation on a case-by-case basis.</p>\",\"PeriodicalId\":12468,\"journal\":{\"name\":\"G3: Genes|Genomes|Genetics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2025-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12060250/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"G3: Genes|Genomes|Genetics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/g3journal/jkaf018\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"G3: Genes|Genomes|Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/g3journal/jkaf018","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

疟原虫的遗传研究越来越多地以亲缘关系估计为特征。然而,疟疾寄生虫亲缘关系估计的各个方面尚未完全了解。例如,基于全基因组序列(WGS)数据的相关性估计往往超过基于稀疏数据类型的相关性估计。相关估计的系统性偏差在二倍体生物的文献中有很好的记录,但在疟疾社区中很大程度上是未知的。我们使用三种互补的方法来描述疟疾寄生虫亲缘关系估计中的系统偏差:理论上,在一对亲缘关系的非祖先统计模型下;数值上,在祖先的模拟模型下;根据经验,使用了圭亚那和哥伦比亚的寄生虫样本数据。我们表明,等位基因频率估计编码,基因座对基因座的亲缘关系,平均在一组用于计算它们的寄生虫样本。将样本等位基因频率插入到配对相关性模型中可能导致系统性的低估。然而,系统的低估可以被看作是种群相关性校准,即一种产生相对相关性度量的方法。当假设遗传标记之间的独立性来估计亲缘关系时,系统的低估是不可避免的。当在利用近端标记之间的联系的隐马尔可夫模型(HMM)下使用WGS数据估计相关性时,这种情况得到缓解。当HMM适合于更稀疏的数据时,缓解的程度是不可知的,但使用高相关性阈值的下游分析无论如何都相对稳健。总之,从业者可以决定使用独立性下估计的相对相关性,或者尝试在HMM下估计绝对相关性。我们提出了各种工具来帮助实践者在个案基础上评估他们的情况。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Systematic bias in malaria parasite relatedness estimation.

Genetic studies of Plasmodium parasites increasingly feature relatedness estimates. However, various aspects of malaria parasite relatedness estimation are not fully understood. For example, relatedness estimates based on whole-genome-sequence (WGS) data often exceed those based on sparser data types. Systematic bias in relatedness estimation is well documented in the literature geared towards diploid organisms, but largely unknown within the malaria community. We characterize systematic bias in malaria parasite relatedness estimation using three complementary approaches: theoretically, under a non-ancestral statistical model of pairwise relatedness; numerically, under a simulation model of ancestry; and empirically, using data on parasites sampled from Guyana and Colombia. We show that allele frequency estimates encode, locus-by-locus, relatedness averaged over the set of sampled parasites used to compute them. Plugging sample allele frequencies into models of pairwise relatedness can lead to systematic underestimation. However, systematic underestimation can be viewed as population-relatedness calibration, i.e., a way of generating measures of relative relatedness. Systematic underestimation is unavoidable when relatedness is estimated assuming independence between genetic markers. It is mitigated when relatedness is estimated using WGS data under a hidden Markov model (HMM) that exploits linkage between proximal markers. The extent of mitigation is unknowable when a HMM is fit to sparser data, but downstream analyses that use high relatedness thresholds are relatively robust regardless. In summary, practitioners can either resolve to use relative relatedness estimated under independence, or try to estimate absolute relatedness under a HMM. We propose various tools to help practitioners evaluate their situation on a case-by-case basis.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
G3: Genes|Genomes|Genetics
G3: Genes|Genomes|Genetics GENETICS & HEREDITY-
CiteScore
5.10
自引率
3.80%
发文量
305
审稿时长
3-8 weeks
期刊介绍: G3: Genes, Genomes, Genetics provides a forum for the publication of high‐quality foundational research, particularly research that generates useful genetic and genomic information such as genome maps, single gene studies, genome‐wide association and QTL studies, as well as genome reports, mutant screens, and advances in methods and technology. The Editorial Board of G3 believes that rapid dissemination of these data is the necessary foundation for analysis that leads to mechanistic insights. G3, published by the Genetics Society of America, meets the critical and growing need of the genetics community for rapid review and publication of important results in all areas of genetics. G3 offers the opportunity to publish the puzzling finding or to present unpublished results that may not have been submitted for review and publication due to a perceived lack of a potential high-impact finding. G3 has earned the DOAJ Seal, which is a mark of certification for open access journals, awarded by DOAJ to journals that achieve a high level of openness, adhere to Best Practice and high publishing standards.
期刊最新文献
Genome Dynamics and Chromosome Structural Variations in Histoplasma ohiense, a fungal pathogen of humans. Bioinformatic, cell localization and phylogenetic analyses reveal a novel family of putative lipases in Tetrahymena thermophila. Association mapping integrated with image-based phenomics identifies SNPs and candidate genes underpinning fruit morphology and development in chile pepper (Capsicum annum L.). Haplotype graph analysis of PdR1 uncovers resistance diversity to Pierce's Disease in Vitis arizonica and its hybrids. Identification of a novel mutant allele of the LIKE EARLY STARVATION 1 gene in rice (Oryza sativa L.) using QTL-seq.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1