A hidden Markov-model for gene mapping based on whole-genome next generation sequencing data.

IF 0.8 4区数学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Statistical Applications in Genetics and Molecular Biology Pub Date : 2015-02-01 DOI:10.1515/sagmb-2014-0007

Jürgen Claesen, Tomasz Burzykowski

{"title":"A hidden Markov-model for gene mapping based on whole-genome next generation sequencing data.","authors":"Jürgen Claesen, Tomasz Burzykowski","doi":"10.1515/sagmb-2014-0007","DOIUrl":null,"url":null,"abstract":"<p><p>The analysis of polygenic, phenotypic characteristics such as quantitative traits or inheritable diseases requires reliable scoring of many genetic markers covering the entire genome. The advent of high-throughput sequencing technologies provides a new way to evaluate large numbers of single nucleotide polymorphisms as genetic markers. Combining the technologies with pooling of segregants, as performed in bulk segregant analysis, should, in principle, allow the simultaneous mapping of multiple genetic loci present throughout the genome. We propose a hidden Markov-model to analyze the marker data obtained by the bulk segregant next generation sequencing. The model includes several states, each associated with a different probability of observing the same/different nucleotide in an offspring as compared to the parent. The transitions between the molecular markers imply transitions between the states of the model. After estimating the transition probabilities and state-related probabilities of nucleotide (dis)similarity, the most probable state for each SNP is selected. The most probable states can then be used to indicate which genomic regions may be likely to contain trait-related genes. The application of the model is illustrated on the data from a study of ethanol tolerance in yeast. Software is written in R. R-functions, R-scripts and documentation are available on www.ibiostat.be/software/bioinformatics.</p>","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"14 1","pages":"21-34"},"PeriodicalIF":0.8000,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2014-0007","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Applications in Genetics and Molecular Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/sagmb-2014-0007","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 9

Abstract

The analysis of polygenic, phenotypic characteristics such as quantitative traits or inheritable diseases requires reliable scoring of many genetic markers covering the entire genome. The advent of high-throughput sequencing technologies provides a new way to evaluate large numbers of single nucleotide polymorphisms as genetic markers. Combining the technologies with pooling of segregants, as performed in bulk segregant analysis, should, in principle, allow the simultaneous mapping of multiple genetic loci present throughout the genome. We propose a hidden Markov-model to analyze the marker data obtained by the bulk segregant next generation sequencing. The model includes several states, each associated with a different probability of observing the same/different nucleotide in an offspring as compared to the parent. The transitions between the molecular markers imply transitions between the states of the model. After estimating the transition probabilities and state-related probabilities of nucleotide (dis)similarity, the most probable state for each SNP is selected. The most probable states can then be used to indicate which genomic regions may be likely to contain trait-related genes. The application of the model is illustrated on the data from a study of ethanol tolerance in yeast. Software is written in R. R-functions, R-scripts and documentation are available on www.ibiostat.be/software/bioinformatics.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于下一代全基因组测序数据的基因定位隐马尔可夫模型。

多基因、表型特征(如数量性状或遗传性疾病)的分析需要对覆盖整个基因组的许多遗传标记进行可靠的评分。高通量测序技术的出现，为评价大量单核苷酸多态性作为遗传标记提供了新的途径。将这些技术与分离池相结合，就像在批量分离分析中执行的那样，原则上应该允许同时绘制整个基因组中存在的多个遗传位点。我们提出了一个隐马尔可夫模型来分析由批量分离下一代测序获得的标记数据。该模型包括几种状态，每一种状态都与在后代中观察到与父辈相同/不同核苷酸的不同概率有关。分子标记之间的转换意味着模型状态之间的转换。在估计核苷酸(非)相似性的转移概率和状态相关概率后，选择每个SNP的最可能状态。最可能的状态可以用来指示哪些基因组区域可能包含与性状相关的基因。以酵母对乙醇耐受性的研究数据为例说明了该模型的应用。软件是用r编写的，r函数、r脚本和文档可以在www.ibiostat.be/software/bioinformatics上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Statistical Applications in Genetics and Molecular Biology BIOCHEMISTRY & MOLECULAR BIOLOGY-MATHEMATICAL & COMPUTATIONAL BIOLOGY

自引率

11.10%

发文量

期刊介绍： Statistical Applications in Genetics and Molecular Biology seeks to publish significant research on the application of statistical ideas to problems arising from computational biology. The focus of the papers should be on the relevant statistical issues but should contain a succinct description of the relevant biological problem being considered. The range of topics is wide and will include topics such as linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarray data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies. Both original research and review articles will be warmly received.