FSTest: an efficient tool for cross-population fixation index estimation on variant call format files

IF 2.9 4区 生物学 Q1 EDUCATION & EDUCATIONAL RESEARCH Journal of Genetics Pub Date : 2024-01-05 DOI:10.1007/s12041-023-01459-1
{"title":"FSTest: an efficient tool for cross-population fixation index estimation on variant call format files","authors":"","doi":"10.1007/s12041-023-01459-1","DOIUrl":null,"url":null,"abstract":"<h3>Abstract</h3> <p>Fixation index (<em>F</em><sub>st</sub>) statistics provide critical insights into evolutionary processes affecting the structure of genetic variation within and among populations. <em>F</em><sub>st</sub> statistics have been widely applied in population and evolutionary genetics to identify genomic regions targeted by selection pressures. The FSTest 1.3 software was developed to estimate four <em>F</em><sub>st</sub> statistics of Hudson, Weir and Cockerham, Nei, and Wright using high-throughput genotyping or sequencing data. Here, we introduced FSTest 1.3 and compared its performance with two widely used software VCFtools 0.1.16 and PLINK 2.0. Chromosome 1 of 1000 Genomes Phase III variant data belonging to South Asian (<em>n</em> = 211) and African (<em>n</em> = 274) populations were included as an example case in this study. Different <em>F</em><sub>st</sub> estimates were calculated for each single-nucleotide polymorphism (SNP) in a pairwise comparison of South Asian against African populations, and the results of FSTest 1.3 were confirmed by VCFtools 0.1.16 and PLINK 2.0. Two different sliding window approaches, one based on a fixed number of SNPs and another based on a fixed number of base pair (bp) were conducted using FSTest 1.3 and VCFtools 0.1.16. Our results showed that regions with low coverage genotypic data could lead to an overestimation of <em>F</em><sub>st</sub> in sliding window analysis using a fixed number of bp. FSTest 1.3 could mitigate this challenge by estimating the average of consecutive SNPs along the chromosome. FSTest 1.3 allows direct analysis of VCF files with a small amount of code and can calculate <em>F</em><sub>st</sub> estimates on a desktop computer for more than a million SNPs in a few minutes. FSTest 1.3 is freely available at https://github.com/similab/FSTest.</p>","PeriodicalId":15907,"journal":{"name":"Journal of Genetics","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s12041-023-01459-1","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0

Abstract

Fixation index (Fst) statistics provide critical insights into evolutionary processes affecting the structure of genetic variation within and among populations. Fst statistics have been widely applied in population and evolutionary genetics to identify genomic regions targeted by selection pressures. The FSTest 1.3 software was developed to estimate four Fst statistics of Hudson, Weir and Cockerham, Nei, and Wright using high-throughput genotyping or sequencing data. Here, we introduced FSTest 1.3 and compared its performance with two widely used software VCFtools 0.1.16 and PLINK 2.0. Chromosome 1 of 1000 Genomes Phase III variant data belonging to South Asian (n = 211) and African (n = 274) populations were included as an example case in this study. Different Fst estimates were calculated for each single-nucleotide polymorphism (SNP) in a pairwise comparison of South Asian against African populations, and the results of FSTest 1.3 were confirmed by VCFtools 0.1.16 and PLINK 2.0. Two different sliding window approaches, one based on a fixed number of SNPs and another based on a fixed number of base pair (bp) were conducted using FSTest 1.3 and VCFtools 0.1.16. Our results showed that regions with low coverage genotypic data could lead to an overestimation of Fst in sliding window analysis using a fixed number of bp. FSTest 1.3 could mitigate this challenge by estimating the average of consecutive SNPs along the chromosome. FSTest 1.3 allows direct analysis of VCF files with a small amount of code and can calculate Fst estimates on a desktop computer for more than a million SNPs in a few minutes. FSTest 1.3 is freely available at https://github.com/similab/FSTest.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
FSTest:对变异调用格式文件进行跨种群固定指数估算的高效工具
摘要 固定指数(Fst)统计为了解影响种群内和种群间遗传变异结构的进化过程提供了重要依据。Fst 统计已广泛应用于群体遗传学和进化遗传学,以确定选择压力所针对的基因组区域。FSTest 1.3 软件是利用高通量基因分型或测序数据估算 Hudson、Weir 和 Cockerham、Nei 和 Wright 的四种 Fst 统计量而开发的。在此,我们介绍了 FSTest 1.3,并将其性能与两款广泛使用的软件 VCFtools 0.1.16 和 PLINK 2.0 进行了比较。本研究以属于南亚(n = 211)和非洲(n = 274)人群的 1000 基因组第三阶段变异数据 1 号染色体为例。在南亚人与非洲人的配对比较中,对每个单核苷酸多态性(SNP)计算了不同的 Fst 估计值,并通过 VCFtools 0.1.16 和 PLINK 2.0 确认了 FSTest 1.3 的结果。使用 FSTest 1.3 和 VCFtools 0.1.16 进行了两种不同的滑动窗口方法,一种基于固定数量的 SNPs,另一种基于固定数量的碱基对 (bp)。结果表明,在使用固定碱基对数的滑动窗口分析中,基因型数据覆盖率低的区域可能会导致 Fst 被高估。FSTest 1.3 可以通过估算染色体上连续 SNP 的平均值来缓解这一难题。FSTest 1.3 只需少量代码就能直接分析 VCF 文件,并能在几分钟内在台式电脑上计算出超过一百万 SNPs 的 Fst 估计值。FSTest 1.3 可在 https://github.com/similab/FSTest 免费获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Genetics
Journal of Genetics 生物-遗传学
CiteScore
3.10
自引率
0.00%
发文量
72
审稿时长
1 months
期刊介绍: The journal retains its traditional interest in evolutionary research that is of relevance to geneticists, even if this is not explicitly genetical in nature. The journal covers all areas of genetics and evolution,including molecular genetics and molecular evolution.It publishes papers and review articles on current topics, commentaries and essayson ideas and trends in genetics and evolutionary biology, historical developments, debates and book reviews. From 2010 onwards, the journal has published a special category of papers termed ‘Online Resources’. These are brief reports on the development and the routine use of molecular markers for assessing genetic variability within and among species. Also published are reports outlining pedagogical approaches in genetics teaching.
期刊最新文献
A novel intron variant in the prolactin gene associated with eggshell weight and thickness with putative alternative splicing patterns in chickens Assessment of the contribution of VDR and VDBP/GC genes in the pathogenesis of celiac disease A novel missense variant in PNLDC1 associated with nonobstructive azoospermia miR-7160 inhibits gastric cancer cell proliferation and metastasis by silencing SIX1 COQ7 splice site variant causing a spastic paraparesis phenotype in siblings
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1