A hidden Markov-model for gene mapping based on whole-genome next generation sequencing data.

IF 0.8 4区 数学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Statistical Applications in Genetics and Molecular Biology Pub Date : 2015-02-01 DOI:10.1515/sagmb-2014-0007
Jürgen Claesen, Tomasz Burzykowski
{"title":"A hidden Markov-model for gene mapping based on whole-genome next generation sequencing data.","authors":"Jürgen Claesen,&nbsp;Tomasz Burzykowski","doi":"10.1515/sagmb-2014-0007","DOIUrl":null,"url":null,"abstract":"<p><p>The analysis of polygenic, phenotypic characteristics such as quantitative traits or inheritable diseases requires reliable scoring of many genetic markers covering the entire genome. The advent of high-throughput sequencing technologies provides a new way to evaluate large numbers of single nucleotide polymorphisms as genetic markers. Combining the technologies with pooling of segregants, as performed in bulk segregant analysis, should, in principle, allow the simultaneous mapping of multiple genetic loci present throughout the genome. We propose a hidden Markov-model to analyze the marker data obtained by the bulk segregant next generation sequencing. The model includes several states, each associated with a different probability of observing the same/different nucleotide in an offspring as compared to the parent. The transitions between the molecular markers imply transitions between the states of the model. After estimating the transition probabilities and state-related probabilities of nucleotide (dis)similarity, the most probable state for each SNP is selected. The most probable states can then be used to indicate which genomic regions may be likely to contain trait-related genes. The application of the model is illustrated on the data from a study of ethanol tolerance in yeast. Software is written in R. R-functions, R-scripts and documentation are available on www.ibiostat.be/software/bioinformatics.</p>","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"14 1","pages":"21-34"},"PeriodicalIF":0.8000,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2014-0007","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Applications in Genetics and Molecular Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/sagmb-2014-0007","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 9

Abstract

The analysis of polygenic, phenotypic characteristics such as quantitative traits or inheritable diseases requires reliable scoring of many genetic markers covering the entire genome. The advent of high-throughput sequencing technologies provides a new way to evaluate large numbers of single nucleotide polymorphisms as genetic markers. Combining the technologies with pooling of segregants, as performed in bulk segregant analysis, should, in principle, allow the simultaneous mapping of multiple genetic loci present throughout the genome. We propose a hidden Markov-model to analyze the marker data obtained by the bulk segregant next generation sequencing. The model includes several states, each associated with a different probability of observing the same/different nucleotide in an offspring as compared to the parent. The transitions between the molecular markers imply transitions between the states of the model. After estimating the transition probabilities and state-related probabilities of nucleotide (dis)similarity, the most probable state for each SNP is selected. The most probable states can then be used to indicate which genomic regions may be likely to contain trait-related genes. The application of the model is illustrated on the data from a study of ethanol tolerance in yeast. Software is written in R. R-functions, R-scripts and documentation are available on www.ibiostat.be/software/bioinformatics.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于下一代全基因组测序数据的基因定位隐马尔可夫模型。
多基因、表型特征(如数量性状或遗传性疾病)的分析需要对覆盖整个基因组的许多遗传标记进行可靠的评分。高通量测序技术的出现,为评价大量单核苷酸多态性作为遗传标记提供了新的途径。将这些技术与分离池相结合,就像在批量分离分析中执行的那样,原则上应该允许同时绘制整个基因组中存在的多个遗传位点。我们提出了一个隐马尔可夫模型来分析由批量分离下一代测序获得的标记数据。该模型包括几种状态,每一种状态都与在后代中观察到与父辈相同/不同核苷酸的不同概率有关。分子标记之间的转换意味着模型状态之间的转换。在估计核苷酸(非)相似性的转移概率和状态相关概率后,选择每个SNP的最可能状态。最可能的状态可以用来指示哪些基因组区域可能包含与性状相关的基因。以酵母对乙醇耐受性的研究数据为例说明了该模型的应用。软件是用r编写的,r函数、r脚本和文档可以在www.ibiostat.be/software/bioinformatics上找到。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Statistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology BIOCHEMISTRY & MOLECULAR BIOLOGY-MATHEMATICAL & COMPUTATIONAL BIOLOGY
自引率
11.10%
发文量
8
期刊介绍: Statistical Applications in Genetics and Molecular Biology seeks to publish significant research on the application of statistical ideas to problems arising from computational biology. The focus of the papers should be on the relevant statistical issues but should contain a succinct description of the relevant biological problem being considered. The range of topics is wide and will include topics such as linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarray data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies. Both original research and review articles will be warmly received.
期刊最新文献
When is the allele-sharing dissimilarity between two populations exceeded by the allele-sharing dissimilarity of a population with itself? Sparse latent factor regression models for genome-wide and epigenome-wide association studies Low variability in the underlying cellular landscape adversely affects the performance of interaction-based approaches for conducting cell-specific analyses of DNA methylation in bulk samples. AdaReg: data adaptive robust estimation in linear regression with application in GTEx gene expressions. Collocation based training of neural ordinary differential equations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1