对检测近期强阳性选择的方法和工具的调查。

IF 1.9 3区 生物学 Q2 BIOLOGY Journal of Biological Research-Thessaloniki Pub Date : 2017-04-08 eCollection Date: 2017-12-01 DOI:10.1186/s40709-017-0064-0
Pavlos Pavlidis, Nikolaos Alachiotis
{"title":"对检测近期强阳性选择的方法和工具的调查。","authors":"Pavlos Pavlidis,&nbsp;Nikolaos Alachiotis","doi":"10.1186/s40709-017-0064-0","DOIUrl":null,"url":null,"abstract":"<p><p>Positive selection occurs when an allele is favored by natural selection. The frequency of the favored allele increases in the population and due to genetic hitchhiking the neighboring linked variation diminishes, creating so-called selective sweeps. Detecting traces of positive selection in genomes is achieved by searching for signatures introduced by selective sweeps, such as regions of reduced variation, a specific shift of the site frequency spectrum, and particular LD patterns in the region. A variety of methods and tools can be used for detecting sweeps, ranging from simple implementations that compute summary statistics such as Tajima's D, to more advanced statistical approaches that use combinations of statistics, maximum likelihood, machine learning etc. In this survey, we present and discuss summary statistics and software tools, and classify them based on the selective sweep signature they detect, i.e., SFS-based vs. LD-based, as well as their capacity to analyze whole genomes or just subgenomic regions. Additionally, we summarize the results of comparisons among four open-source software releases (SweeD, SweepFinder, SweepFinder2, and OmegaPlus) regarding sensitivity, specificity, and execution times. In equilibrium neutral models or mild bottlenecks, both SFS- and LD-based methods are able to detect selective sweeps accurately. Methods and tools that rely on LD exhibit higher true positive rates than SFS-based ones under the model of a single sweep or recurrent hitchhiking. However, their false positive rate is elevated when a misspecified demographic model is used to represent the null hypothesis. When the correct (or similar to the correct) demographic model is used instead, the false positive rates are considerably reduced. The accuracy of detecting the true target of selection is decreased in bottleneck scenarios. In terms of execution time, LD-based methods are typically faster than SFS-based methods, due to the nature of required arithmetic.</p>","PeriodicalId":50251,"journal":{"name":"Journal of Biological Research-Thessaloniki","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2017-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s40709-017-0064-0","citationCount":"87","resultStr":"{\"title\":\"A survey of methods and tools to detect recent and strong positive selection.\",\"authors\":\"Pavlos Pavlidis,&nbsp;Nikolaos Alachiotis\",\"doi\":\"10.1186/s40709-017-0064-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Positive selection occurs when an allele is favored by natural selection. The frequency of the favored allele increases in the population and due to genetic hitchhiking the neighboring linked variation diminishes, creating so-called selective sweeps. Detecting traces of positive selection in genomes is achieved by searching for signatures introduced by selective sweeps, such as regions of reduced variation, a specific shift of the site frequency spectrum, and particular LD patterns in the region. A variety of methods and tools can be used for detecting sweeps, ranging from simple implementations that compute summary statistics such as Tajima's D, to more advanced statistical approaches that use combinations of statistics, maximum likelihood, machine learning etc. In this survey, we present and discuss summary statistics and software tools, and classify them based on the selective sweep signature they detect, i.e., SFS-based vs. LD-based, as well as their capacity to analyze whole genomes or just subgenomic regions. Additionally, we summarize the results of comparisons among four open-source software releases (SweeD, SweepFinder, SweepFinder2, and OmegaPlus) regarding sensitivity, specificity, and execution times. In equilibrium neutral models or mild bottlenecks, both SFS- and LD-based methods are able to detect selective sweeps accurately. Methods and tools that rely on LD exhibit higher true positive rates than SFS-based ones under the model of a single sweep or recurrent hitchhiking. However, their false positive rate is elevated when a misspecified demographic model is used to represent the null hypothesis. When the correct (or similar to the correct) demographic model is used instead, the false positive rates are considerably reduced. The accuracy of detecting the true target of selection is decreased in bottleneck scenarios. In terms of execution time, LD-based methods are typically faster than SFS-based methods, due to the nature of required arithmetic.</p>\",\"PeriodicalId\":50251,\"journal\":{\"name\":\"Journal of Biological Research-Thessaloniki\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2017-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1186/s40709-017-0064-0\",\"citationCount\":\"87\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Biological Research-Thessaloniki\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s40709-017-0064-0\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2017/12/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biological Research-Thessaloniki","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s40709-017-0064-0","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2017/12/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 87

摘要

当一个等位基因受到自然选择的青睐时,就会出现正选择。受青睐的等位基因在人群中出现的频率增加,由于基因搭便车,邻近的相关变异减少,产生了所谓的选择性扫描。检测基因组中正选择的痕迹是通过搜索选择性扫描引入的特征来实现的,例如变异减少的区域,位点频谱的特定移位,以及该区域的特定LD模式。各种方法和工具可用于检测扫描,从计算汇总统计的简单实现,如田岛D,到使用统计、最大似然、机器学习等组合的更高级的统计方法。在这项调查中,我们提出并讨论了汇总统计和软件工具,并根据它们检测到的选择性扫描特征对它们进行分类,即基于sfs的与基于ld的,以及它们分析全基因组或亚基因组区域的能力。此外,我们总结了四个开源软件版本(SweeD、SweepFinder、SweepFinder2和OmegaPlus)在灵敏度、特异性和执行时间方面的比较结果。在平衡中性模型或轻度瓶颈中,基于SFS和基于ld的方法都能够准确地检测选择性扫描。在单次扫描或反复搭便车模式下,依赖于LD的方法和工具比基于sfs的方法和工具显示出更高的真阳性率。然而,当使用错误指定的人口统计学模型来表示原假设时,他们的假阳性率会升高。当使用正确的(或类似于正确的)人口统计模型时,假阳性率大大降低。在瓶颈情况下,检测选择真实目标的准确性降低。在执行时间方面,基于ld的方法通常比基于sfs的方法快,这是由于所需算法的性质。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A survey of methods and tools to detect recent and strong positive selection.

Positive selection occurs when an allele is favored by natural selection. The frequency of the favored allele increases in the population and due to genetic hitchhiking the neighboring linked variation diminishes, creating so-called selective sweeps. Detecting traces of positive selection in genomes is achieved by searching for signatures introduced by selective sweeps, such as regions of reduced variation, a specific shift of the site frequency spectrum, and particular LD patterns in the region. A variety of methods and tools can be used for detecting sweeps, ranging from simple implementations that compute summary statistics such as Tajima's D, to more advanced statistical approaches that use combinations of statistics, maximum likelihood, machine learning etc. In this survey, we present and discuss summary statistics and software tools, and classify them based on the selective sweep signature they detect, i.e., SFS-based vs. LD-based, as well as their capacity to analyze whole genomes or just subgenomic regions. Additionally, we summarize the results of comparisons among four open-source software releases (SweeD, SweepFinder, SweepFinder2, and OmegaPlus) regarding sensitivity, specificity, and execution times. In equilibrium neutral models or mild bottlenecks, both SFS- and LD-based methods are able to detect selective sweeps accurately. Methods and tools that rely on LD exhibit higher true positive rates than SFS-based ones under the model of a single sweep or recurrent hitchhiking. However, their false positive rate is elevated when a misspecified demographic model is used to represent the null hypothesis. When the correct (or similar to the correct) demographic model is used instead, the false positive rates are considerably reduced. The accuracy of detecting the true target of selection is decreased in bottleneck scenarios. In terms of execution time, LD-based methods are typically faster than SFS-based methods, due to the nature of required arithmetic.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.20
自引率
0.00%
发文量
0
审稿时长
>12 weeks
期刊介绍: Journal of Biological Research-Thessaloniki is a peer-reviewed, open access, international journal that publishes articles providing novel insights into the major fields of biology. Topics covered in Journal of Biological Research-Thessaloniki include, but are not limited to: molecular biology, cytology, genetics, evolutionary biology, morphology, development and differentiation, taxonomy, bioinformatics, physiology, marine biology, behaviour, ecology and conservation.
期刊最新文献
Circ_0000620 acts as an oncogenic factor in gastric cancer through regulating MMP2 expression via sponging miR-671-5p. Peroxiredoxin-6 regulates p38-mediated epithelial-mesenchymal transition in HCT116 colon cancer cells. Nesfatin-1 protects H9c2 cardiomyocytes against cobalt chloride-induced hypoxic injury by modulating the MAPK and Notch1 signaling pathways. LncRNA FBXL19-AS1 promotes proliferation and metastasis of cervical cancer through upregulating COL1A1 as a sponge of miR-193a-5p. CircCNIH4 inhibits gastric cancer progression via regulating DKK2 and FRZB expression and Wnt/β-catenin pathway.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1