首页 > 最新文献

Statistical Applications in Genetics and Molecular Biology最新文献

英文 中文
Optimizing weighted gene co-expression network analysis with a multi-threaded calculation of the topological overlap matrix. 基于拓扑重叠矩阵多线程计算的优化加权基因共表达网络分析。
IF 0.9 4区 数学 Q3 Mathematics Pub Date : 2021-11-09 DOI: 10.1515/sagmb-2021-0025
Min Shuai, Dongmei He, Xin Chen

Biomolecular networks are often assumed to be scale-free hierarchical networks. The weighted gene co-expression network analysis (WGCNA) treats gene co-expression networks as undirected scale-free hierarchical weighted networks. The WGCNA R software package uses an Adjacency Matrix to store a network, next calculates the topological overlap matrix (TOM), and then identifies the modules (sub-networks), where each module is assumed to be associated with a certain biological function. The most time-consuming step of WGCNA is to calculate TOM from the Adjacency Matrix in a single thread. In this paper, the single-threaded algorithm of the TOM has been changed into a multi-threaded algorithm (the parameters are the default values of WGCNA). In the multi-threaded algorithm, Rcpp was used to make R call a C++ function, and then C++ used OpenMP to start multiple threads to calculate TOM from the Adjacency Matrix. On shared-memory MultiProcessor systems, the calculation time decreases as the number of CPU cores increases. The algorithm of this paper can promote the application of WGCNA on large data sets, and help other research fields to identify sub-networks in undirected scale-free hierarchical weighted networks. The source codes and usage are available at https://github.com/do-somethings-haha/multi-threaded_calculate_unsigned_TOM_from_unsigned_or_signed_Adjacency_Matrix_of_WGCNA.

生物分子网络通常被认为是无标度的分层网络。加权基因共表达网络分析(WGCNA)将基因共表达网络视为无向无标度分层加权网络。WGCNA R软件包使用邻接矩阵来存储网络,然后计算拓扑重叠矩阵(TOM),然后识别模块(子网络),其中每个模块被认为与特定的生物功能相关联。WGCNA中最耗时的一步是从单线程的邻接矩阵中计算TOM。本文将TOM的单线程算法改为多线程算法(参数为WGCNA的默认值)。在多线程算法中,使用Rcpp让R调用一个c++函数,然后c++使用OpenMP启动多个线程从邻接矩阵中计算TOM。在共享内存多处理器系统上,计算时间随着CPU核数的增加而减少。本文的算法可以促进WGCNA在大数据集上的应用,并有助于其他研究领域对无向无标度分层加权网络中的子网络进行识别。源代码和用法可从https://github.com/do-somethings-haha/multi-threaded_calculate_unsigned_TOM_from_unsigned_or_signed_Adjacency_Matrix_of_WGCNA获得。
{"title":"Optimizing weighted gene co-expression network analysis with a multi-threaded calculation of the topological overlap matrix.","authors":"Min Shuai,&nbsp;Dongmei He,&nbsp;Xin Chen","doi":"10.1515/sagmb-2021-0025","DOIUrl":"https://doi.org/10.1515/sagmb-2021-0025","url":null,"abstract":"<p><p>Biomolecular networks are often assumed to be scale-free hierarchical networks. The weighted gene co-expression network analysis (WGCNA) treats gene co-expression networks as undirected scale-free hierarchical weighted networks. The WGCNA R software package uses an Adjacency Matrix to store a network, next calculates the topological overlap matrix (TOM), and then identifies the modules (sub-networks), where each module is assumed to be associated with a certain biological function. The most time-consuming step of WGCNA is to calculate TOM from the Adjacency Matrix in a single thread. In this paper, the single-threaded algorithm of the TOM has been changed into a multi-threaded algorithm (the parameters are the default values of WGCNA). In the multi-threaded algorithm, Rcpp was used to make R call a C++ function, and then C++ used OpenMP to start multiple threads to calculate TOM from the Adjacency Matrix. On shared-memory MultiProcessor systems, the calculation time decreases as the number of CPU cores increases. The algorithm of this paper can promote the application of WGCNA on large data sets, and help other research fields to identify sub-networks in undirected scale-free hierarchical weighted networks. The source codes and usage are available at https://github.com/do-somethings-haha/multi-threaded_calculate_unsigned_TOM_from_unsigned_or_signed_Adjacency_Matrix_of_WGCNA.</p>","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2021-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39696432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A hierarchical Bayesian approach for detecting global microbiome associations. 检测全球微生物组关联的分层贝叶斯方法。
IF 0.9 4区 数学 Q3 Mathematics Pub Date : 2021-11-01 DOI: 10.1515/sagmb-2021-0047
Farhad Hatami, Emma Beamish, Albert Davies, Rachael Rigby, Frank Dondelinger

The human gut microbiome has been shown to be associated with a variety of human diseases, including cancer, metabolic conditions and inflammatory bowel disease. Current approaches for detecting microbiome associations are limited by relying on specific measures of ecological distance, or only allowing for the detection of associations with individual bacterial species, rather than the whole microbiome. In this work, we develop a novel hierarchical Bayesian model for detecting global microbiome associations. Our method is not dependent on a choice of distance measure, and is able to incorporate phylogenetic information about microbial species. We perform extensive simulation studies and show that our method allows for consistent estimation of global microbiome effects. Additionally, we investigate the performance of the model on two real-world microbiome studies: a study of microbiome-metabolome associations in inflammatory bowel disease, and a study of associations between diet and the gut microbiome in mice. We show that we can use the method to reliably detect associations in real-world datasets with varying numbers of samples and covariates.

人类肠道微生物组已被证明与多种人类疾病有关,包括癌症、代谢疾病和炎症性肠病。目前检测微生物组关联的方法受到限制,因为它们依赖于特定的生态距离测量,或者只能检测与单个细菌物种而非整个微生物组的关联。在这项工作中,我们开发了一种新型分层贝叶斯模型,用于检测全球微生物组关联。我们的方法不依赖于距离度量的选择,并且能够纳入微生物物种的系统发育信息。我们进行了大量的模拟研究,结果表明我们的方法可以对全球微生物组效应进行一致的估计。此外,我们还调查了该模型在两项实际微生物组研究中的表现:一项是炎症性肠病中微生物组-代谢组关联研究,另一项是小鼠饮食与肠道微生物组关联研究。我们的研究表明,我们可以用这种方法在样本数量和协变量各不相同的真实世界数据集中可靠地检测出相关性。
{"title":"A hierarchical Bayesian approach for detecting global microbiome associations.","authors":"Farhad Hatami, Emma Beamish, Albert Davies, Rachael Rigby, Frank Dondelinger","doi":"10.1515/sagmb-2021-0047","DOIUrl":"10.1515/sagmb-2021-0047","url":null,"abstract":"<p><p>The human gut microbiome has been shown to be associated with a variety of human diseases, including cancer, metabolic conditions and inflammatory bowel disease. Current approaches for detecting microbiome associations are limited by relying on specific measures of ecological distance, or only allowing for the detection of associations with individual bacterial species, rather than the whole microbiome. In this work, we develop a novel hierarchical Bayesian model for detecting global microbiome associations. Our method is not dependent on a choice of distance measure, and is able to incorporate phylogenetic information about microbial species. We perform extensive simulation studies and show that our method allows for consistent estimation of global microbiome effects. Additionally, we investigate the performance of the model on two real-world microbiome studies: a study of microbiome-metabolome associations in inflammatory bowel disease, and a study of associations between diet and the gut microbiome in mice. We show that we can use the method to reliably detect associations in real-world datasets with varying numbers of samples and covariates.</p>","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9125803/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39574241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frontmatter
IF 0.9 4区 数学 Q3 Mathematics Pub Date : 2021-10-01 DOI: 10.1515/sagmb-2021-frontmatter3
{"title":"Frontmatter","authors":"","doi":"10.1515/sagmb-2021-frontmatter3","DOIUrl":"https://doi.org/10.1515/sagmb-2021-frontmatter3","url":null,"abstract":"","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43757619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frontmatter Frontmatter
IF 0.9 4区 数学 Q3 Mathematics Pub Date : 2021-02-01 DOI: 10.1515/sagmb-2021-frontmatter1
{"title":"Frontmatter","authors":"","doi":"10.1515/sagmb-2021-frontmatter1","DOIUrl":"https://doi.org/10.1515/sagmb-2021-frontmatter1","url":null,"abstract":"","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2021-frontmatter1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42140367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measuring evolutionary cancer dynamics from genome sequencing, one patient at a time 通过基因组测序测量癌症的进化动态,每次一名患者
IF 0.9 4区 数学 Q3 Mathematics Pub Date : 2020-12-01 DOI: 10.1515/sagmb-2020-0075
G. Caravagna
Abstract Cancers progress through the accumulation of somatic mutations which accrue during tumour evolution, allowing some cells to proliferate in an uncontrolled fashion. This growth process is intimately related to latent evolutionary forces moulding the genetic and epigenetic composition of tumour subpopulations. Understanding cancer requires therefore the understanding of these selective pressures. The adoption of widespread next-generation sequencing technologies opens up for the possibility of measuring molecular profiles of cancers at multiple resolutions, across one or multiple patients. In this review we discuss how cancer genome sequencing data from a single tumour can be used to understand these evolutionary forces, overviewing mathematical models and inferential methods adopted in field of Cancer Evolution.
摘要癌症通过在肿瘤进化过程中积累的体细胞突变来发展,使一些细胞以不受控制的方式增殖。这种生长过程与塑造肿瘤亚群遗传和表观遗传组成的潜在进化力密切相关。因此,理解癌症需要理解这些选择性压力。广泛采用的下一代测序技术为以多种分辨率测量一名或多名患者的癌症分子谱开辟了可能性。在这篇综述中,我们讨论了如何使用单个肿瘤的癌症基因组测序数据来理解这些进化力,概述了癌症进化领域采用的数学模型和推理方法。
{"title":"Measuring evolutionary cancer dynamics from genome sequencing, one patient at a time","authors":"G. Caravagna","doi":"10.1515/sagmb-2020-0075","DOIUrl":"https://doi.org/10.1515/sagmb-2020-0075","url":null,"abstract":"Abstract Cancers progress through the accumulation of somatic mutations which accrue during tumour evolution, allowing some cells to proliferate in an uncontrolled fashion. This growth process is intimately related to latent evolutionary forces moulding the genetic and epigenetic composition of tumour subpopulations. Understanding cancer requires therefore the understanding of these selective pressures. The adoption of widespread next-generation sequencing technologies opens up for the possibility of measuring molecular profiles of cancers at multiple resolutions, across one or multiple patients. In this review we discuss how cancer genome sequencing data from a single tumour can be used to understand these evolutionary forces, overviewing mathematical models and inferential methods adopted in field of Cancer Evolution.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2020-0075","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45496562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Inferring dynamic gene regulatory networks with low-order conditional independencies – an evaluation of the method 推断具有低阶条件独立性的动态基因调控网络-对该方法的评价
IF 0.9 4区 数学 Q3 Mathematics Pub Date : 2020-12-01 DOI: 10.1515/sagmb-2020-0051
Hamda Ajmal, M. G. Madden
Abstract Over a decade ago, Lèbre (2009) proposed an inference method, G1DBN, to learn the structure of gene regulatory networks (GRNs) from high dimensional, sparse time-series gene expression data. Their approach is based on concept of low-order conditional independence graphs that they extend to dynamic Bayesian networks (DBNs). They present results to demonstrate that their method yields better structural accuracy compared to the related Lasso and Shrinkage methods, particularly where the data is sparse, that is, the number of time measurements n is much smaller than the number of genes p. This paper challenges these claims using a careful experimental analysis, to show that the GRNs reverse engineered from time-series data using the G1DBN approach are less accurate than claimed by Lèbre (2009). We also show that the Lasso method yields higher structural accuracy for graphs learned from the simulated data, compared to the G1DBN method, particularly when the data is sparse ( n < < p $n{< }{< }p$ ). The Lasso method is also better than G1DBN at identifying the transcription factors (TFs) involved in the cell cycle of Saccharomyces cerevisiae.
摘要十多年前,Lèbre(2009)提出了一种推理方法G1DBN,用于从高维、稀疏的时间序列基因表达数据中学习基因调控网络(GRN)的结构。他们的方法基于低阶条件独立图的概念,并将其扩展到动态贝叶斯网络(DBN)。他们提出的结果表明,与相关的拉索和收缩方法相比,他们的方法产生了更好的结构精度,特别是在数据稀疏的情况下,即时间测量的数量n远小于基因的数量p。本文通过仔细的实验分析对这些说法提出了质疑,以表明使用G1DBN方法从时间序列数据中反向工程的GRN不如Lèbre(2009)所声称的准确。我们还表明,与G1DBN方法相比,Lasso方法对从模拟数据中学习的图产生了更高的结构精度,特别是当数据稀疏时(n<
{"title":"Inferring dynamic gene regulatory networks with low-order conditional independencies – an evaluation of the method","authors":"Hamda Ajmal, M. G. Madden","doi":"10.1515/sagmb-2020-0051","DOIUrl":"https://doi.org/10.1515/sagmb-2020-0051","url":null,"abstract":"Abstract Over a decade ago, Lèbre (2009) proposed an inference method, G1DBN, to learn the structure of gene regulatory networks (GRNs) from high dimensional, sparse time-series gene expression data. Their approach is based on concept of low-order conditional independence graphs that they extend to dynamic Bayesian networks (DBNs). They present results to demonstrate that their method yields better structural accuracy compared to the related Lasso and Shrinkage methods, particularly where the data is sparse, that is, the number of time measurements n is much smaller than the number of genes p. This paper challenges these claims using a careful experimental analysis, to show that the GRNs reverse engineered from time-series data using the G1DBN approach are less accurate than claimed by Lèbre (2009). We also show that the Lasso method yields higher structural accuracy for graphs learned from the simulated data, compared to the G1DBN method, particularly when the data is sparse ( n < < p $n{< }{< }p$ ). The Lasso method is also better than G1DBN at identifying the transcription factors (TFs) involved in the cell cycle of Saccharomyces cerevisiae.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2020-0051","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46568594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Distinct characteristics of correlation analysis at the single-cell and the population level 单细胞水平和群体水平相关分析的显著特征
IF 0.9 4区 数学 Q3 Mathematics Pub Date : 2020-08-19 DOI: 10.21203/rs.3.rs-42825/v1
Guoyu Wu, Yuchao Li
Abstract Correlation analysis is widely used in biological studies to infer molecular relationships within biological networks. Recently, single-cell analysis has drawn tremendous interests, for its ability to obtain high-resolution molecular phenotypes. It turns out that there is little overlap of co-expressed genes identified in single-cell level investigations with that of population level investigations. However, the nature of the relationship of correlations between single-cell and population levels remains unclear. In this manuscript, we aimed to unveil the origin of the differences between the correlation coefficients at the single-cell level and that at the population level, and bridge the gap between them. Through developing formulations to link correlations at the single-cell and the population level, we illustrated that aggregated correlations could be stronger, weaker or equal to the corresponding individual correlations, depending on the variations and the correlations within the population. When the correlation within the population is weaker than the individual correlation, the aggregated correlation is stronger than the corresponding individual correlation. Besides, our data indicated that aggregated correlation is more likely to be stronger than the corresponding individual correlation, and it was rare to find gene-pairs exclusively strongly correlated at the single-cell level. Through a bottom-up approach to model interactions between molecules in a signaling cascade or a multi-regulator-controlled gene expression, we surprisingly found that the existence of interaction between two components could not be excluded simply based on their low correlation coefficients, suggesting a reconsideration of connectivity within biological networks which was derived solely from correlation analysis. We also investigated the impact of technical random measurement errors on the correlation coefficients for the single-cell level and the population level. The results indicate that the aggregated correlation is relatively robust and less affected. Because of the heterogeneity among single cells, correlation coefficients calculated based on data of the single-cell level might be different from that of the population level. Depending on the specific question we are asking, proper sampling and normalization procedure should be done before we draw any conclusions.
相关分析在生物学研究中被广泛用于推断生物网络中的分子关系。最近,单细胞分析已经引起了极大的兴趣,因为它能够获得高分辨率的分子表型。结果表明,在单细胞水平调查中发现的共表达基因与群体水平调查中发现的共表达基因几乎没有重叠。然而,单细胞水平和群体水平之间的相关性关系的本质仍不清楚。在这篇文章中,我们旨在揭示单细胞水平上的相关系数与群体水平上的相关系数差异的来源,并弥合它们之间的差距。通过开发将单细胞和种群水平的相关性联系起来的公式,我们说明了,根据种群内的变异和相关性,聚合相关性可能比相应的个体相关性更强、更弱或等于个体相关性。当群体内相关性弱于个体相关性时,总体相关性强于相应的个体相关性。此外,我们的数据表明,总体相关性可能比相应的个体相关性更强,并且很少发现基因对在单细胞水平上完全强相关。通过自下而上的方法来模拟信号级联分子之间的相互作用或多调节因子控制的基因表达,我们惊讶地发现,不能简单地基于它们的低相关系数来排除两个组分之间相互作用的存在,这表明重新考虑生物网络中仅由相关分析得出的连性。我们还研究了技术随机测量误差对单细胞水平和种群水平相关系数的影响。结果表明,综合相关性具有较强的鲁棒性,受影响较小。由于单细胞间的异质性,根据单细胞水平计算的相关系数可能与群体水平计算的相关系数不同。根据我们所问的具体问题,在我们得出任何结论之前,应该进行适当的抽样和归一化程序。
{"title":"Distinct characteristics of correlation analysis at the single-cell and the population level","authors":"Guoyu Wu, Yuchao Li","doi":"10.21203/rs.3.rs-42825/v1","DOIUrl":"https://doi.org/10.21203/rs.3.rs-42825/v1","url":null,"abstract":"Abstract Correlation analysis is widely used in biological studies to infer molecular relationships within biological networks. Recently, single-cell analysis has drawn tremendous interests, for its ability to obtain high-resolution molecular phenotypes. It turns out that there is little overlap of co-expressed genes identified in single-cell level investigations with that of population level investigations. However, the nature of the relationship of correlations between single-cell and population levels remains unclear. In this manuscript, we aimed to unveil the origin of the differences between the correlation coefficients at the single-cell level and that at the population level, and bridge the gap between them. Through developing formulations to link correlations at the single-cell and the population level, we illustrated that aggregated correlations could be stronger, weaker or equal to the corresponding individual correlations, depending on the variations and the correlations within the population. When the correlation within the population is weaker than the individual correlation, the aggregated correlation is stronger than the corresponding individual correlation. Besides, our data indicated that aggregated correlation is more likely to be stronger than the corresponding individual correlation, and it was rare to find gene-pairs exclusively strongly correlated at the single-cell level. Through a bottom-up approach to model interactions between molecules in a signaling cascade or a multi-regulator-controlled gene expression, we surprisingly found that the existence of interaction between two components could not be excluded simply based on their low correlation coefficients, suggesting a reconsideration of connectivity within biological networks which was derived solely from correlation analysis. We also investigated the impact of technical random measurement errors on the correlation coefficients for the single-cell level and the population level. The results indicate that the aggregated correlation is relatively robust and less affected. Because of the heterogeneity among single cells, correlation coefficients calculated based on data of the single-cell level might be different from that of the population level. Depending on the specific question we are asking, proper sampling and normalization procedure should be done before we draw any conclusions.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2020-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41900756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accuracy and sensitivity of different Bayesian methods for genomic prediction using simulation and real data. 不同贝叶斯方法在基因组预测中的准确性和敏感性。
IF 0.9 4区 数学 Q3 Mathematics Pub Date : 2020-08-10 DOI: 10.1515/sagmb-2019-0007
Saheb Foroutaifar

The main objectives of this study were to compare the prediction accuracy of different Bayesian methods for traits with a wide range of genetic architecture using simulation and real data and to assess the sensitivity of these methods to the violation of their assumptions. For the simulation study, different scenarios were implemented based on two traits with low or high heritability and different numbers of QTL and the distribution of their effects. For real data analysis, a German Holstein dataset for milk fat percentage, milk yield, and somatic cell score was used. The simulation results showed that, with the exception of the Bayes R, the other methods were sensitive to changes in the number of QTLs and distribution of QTL effects. Having a distribution of QTL effects, similar to what different Bayesian methods assume for estimating marker effects, did not improve their prediction accuracy. The Bayes B method gave higher or equal accuracy rather than the rest. The real data analysis showed that similar to scenarios with a large number of QTLs in the simulation, there was no difference between the accuracies of the different methods for any of the traits.

本研究的主要目的是利用模拟和真实数据比较不同贝叶斯方法对具有广泛遗传结构的性状的预测精度,并评估这些方法对违反其假设的敏感性。在模拟研究中,根据遗传力低或高的两个性状、不同的QTL数量及其效应分布,实施不同的情景。对于实际数据分析,使用了德国荷尔斯坦的乳脂率、产奶量和体细胞评分数据集。模拟结果表明,除Bayes R外,其他方法对QTL数量和QTL效应分布的变化较为敏感。有一个QTL效应的分布,类似于不同的贝叶斯方法估计标记效应的假设,并没有提高他们的预测精度。与其他方法相比,贝叶斯B方法给出了更高或相同的精度。实际数据分析表明,与模拟中qtl数量较多的情况类似,不同方法对任意性状的准确率均无差异。
{"title":"Accuracy and sensitivity of different Bayesian methods for genomic prediction using simulation and real data.","authors":"Saheb Foroutaifar","doi":"10.1515/sagmb-2019-0007","DOIUrl":"https://doi.org/10.1515/sagmb-2019-0007","url":null,"abstract":"<p><p>The main objectives of this study were to compare the prediction accuracy of different Bayesian methods for traits with a wide range of genetic architecture using simulation and real data and to assess the sensitivity of these methods to the violation of their assumptions. For the simulation study, different scenarios were implemented based on two traits with low or high heritability and different numbers of QTL and the distribution of their effects. For real data analysis, a German Holstein dataset for milk fat percentage, milk yield, and somatic cell score was used. The simulation results showed that, with the exception of the Bayes R, the other methods were sensitive to changes in the number of QTLs and distribution of QTL effects. Having a distribution of QTL effects, similar to what different Bayesian methods assume for estimating marker effects, did not improve their prediction accuracy. The Bayes B method gave higher or equal accuracy rather than the rest. The real data analysis showed that similar to scenarios with a large number of QTLs in the simulation, there was no difference between the accuracies of the different methods for any of the traits.</p>","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2019-0007","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38247369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Understanding hormonal crosstalk in Arabidopsis root development via emulation and history matching. 通过模拟和历史匹配了解拟南芥根系发育中的激素串扰。
IF 0.9 4区 数学 Q3 Mathematics Pub Date : 2020-07-13 DOI: 10.1515/sagmb-2018-0053
Samuel E Jackson, Ian Vernon, Junli Liu, Keith Lindsey

A major challenge in plant developmental biology is to understand how plant growth is coordinated by interacting hormones and genes. To meet this challenge, it is important to not only use experimental data, but also formulate a mathematical model. For the mathematical model to best describe the true biological system, it is necessary to understand the parameter space of the model, along with the links between the model, the parameter space and experimental observations. We develop sequential history matching methodology, using Bayesian emulation, to gain substantial insight into biological model parameter spaces. This is achieved by finding sets of acceptable parameters in accordance with successive sets of physical observations. These methods are then applied to a complex hormonal crosstalk model for Arabidopsis root growth. In this application, we demonstrate how an initial set of 22 observed trends reduce the volume of the set of acceptable inputs to a proportion of 6.1 × 10-7 of the original space. Additional sets of biologically relevant experimental data, each of size 5, reduce the size of this space by a further three and two orders of magnitude respectively. Hence, we provide insight into the constraints placed upon the model structure by, and the biological consequences of, measuring subsets of observations.

植物发育生物学的一个主要挑战是了解植物生长是如何通过激素和基因的相互作用来协调的。为了应对这一挑战,不仅要使用实验数据,还要建立数学模型。为了使数学模型最好地描述真实的生物系统,有必要了解模型的参数空间,以及模型、参数空间和实验观测之间的联系。我们开发时序历史匹配方法,使用贝叶斯仿真,以获得对生物模型参数空间的实质性见解。这是通过根据连续的物理观测找到一组可接受的参数来实现的。然后将这些方法应用于拟南芥根系生长的复杂激素串扰模型。在这个应用程序中,我们演示了22个观察到的趋势的初始集如何将可接受输入集的体积减少到原始空间的6.1 × 10-7的比例。额外的生物学相关实验数据集,每个大小为5,分别将该空间的大小进一步减少了3个数量级和2个数量级。因此,我们提供了对模型结构的约束的见解,以及测量观测子集的生物学后果。
{"title":"Understanding hormonal crosstalk in Arabidopsis root development via emulation and history matching.","authors":"Samuel E Jackson,&nbsp;Ian Vernon,&nbsp;Junli Liu,&nbsp;Keith Lindsey","doi":"10.1515/sagmb-2018-0053","DOIUrl":"https://doi.org/10.1515/sagmb-2018-0053","url":null,"abstract":"<p><p>A major challenge in plant developmental biology is to understand how plant growth is coordinated by interacting hormones and genes. To meet this challenge, it is important to not only use experimental data, but also formulate a mathematical model. For the mathematical model to best describe the true biological system, it is necessary to understand the parameter space of the model, along with the links between the model, the parameter space and experimental observations. We develop sequential history matching methodology, using Bayesian emulation, to gain substantial insight into biological model parameter spaces. This is achieved by finding sets of acceptable parameters in accordance with successive sets of physical observations. These methods are then applied to a complex hormonal crosstalk model for Arabidopsis root growth. In this application, we demonstrate how an initial set of 22 observed trends reduce the volume of the set of acceptable inputs to a proportion of 6.1 × 10-7 of the original space. Additional sets of biologically relevant experimental data, each of size 5, reduce the size of this space by a further three and two orders of magnitude respectively. Hence, we provide insight into the constraints placed upon the model structure by, and the biological consequences of, measuring subsets of observations.</p>","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2018-0053","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38140980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Bivariate traits association analysis using generalized estimating equations in family data. 基于广义估计方程的家庭数据双变量性状关联分析。
IF 0.9 4区 数学 Q3 Mathematics Pub Date : 2020-05-05 DOI: 10.1515/sagmb-2019-0030
Mariza de Andrade, Mauricio A Mazo Lopera, Nubia E Duarte

Genome wide association study (GWAS) is becoming fundamental in the arduous task of deciphering the etiology of complex diseases. The majority of the statistical models used to address the genes-disease association consider a single response variable. However, it is common for certain diseases to have correlated phenotypes such as in cardiovascular diseases. Usually, GWAS typically sample unrelated individuals from a population and the shared familial risk factors are not investigated. In this paper, we propose to apply a bivariate model using family data that associates two phenotypes with a genetic region. Using generalized estimation equations (GEE), we model two phenotypes, either discrete, continuous or a mixture of them, as a function of genetic variables and other important covariates. We incorporate the kinship relationships into the working matrix extended to a bivariate analysis. The estimation method and the joint gene-set effect in both phenotypes are developed in this work. We also evaluate the proposed methodology with a simulation study and an application to real data.

基因组全关联研究(GWAS)正在成为破译复杂疾病病因的艰巨任务的基础。大多数用于研究基因-疾病关联的统计模型都考虑一个单一的反应变量。然而,某些疾病通常具有相关的表型,例如心血管疾病。通常,GWAS通常从人群中抽样不相关的个体,而不调查共同的家族危险因素。在本文中,我们建议应用一个双变量模型,使用家庭数据,将两种表型与遗传区域联系起来。使用广义估计方程(GEE),我们将两种表型(离散型、连续型或混合型)作为遗传变量和其他重要协变量的函数进行建模。我们将亲属关系纳入工作矩阵扩展到双变量分析。在这项工作中,开发了两种表型的估计方法和联合基因集效应。我们还通过模拟研究和实际数据的应用来评估所提出的方法。
{"title":"Bivariate traits association analysis using generalized estimating equations in family data.","authors":"Mariza de Andrade,&nbsp;Mauricio A Mazo Lopera,&nbsp;Nubia E Duarte","doi":"10.1515/sagmb-2019-0030","DOIUrl":"https://doi.org/10.1515/sagmb-2019-0030","url":null,"abstract":"<p><p>Genome wide association study (GWAS) is becoming fundamental in the arduous task of deciphering the etiology of complex diseases. The majority of the statistical models used to address the genes-disease association consider a single response variable. However, it is common for certain diseases to have correlated phenotypes such as in cardiovascular diseases. Usually, GWAS typically sample unrelated individuals from a population and the shared familial risk factors are not investigated. In this paper, we propose to apply a bivariate model using family data that associates two phenotypes with a genetic region. Using generalized estimation equations (GEE), we model two phenotypes, either discrete, continuous or a mixture of them, as a function of genetic variables and other important covariates. We incorporate the kinship relationships into the working matrix extended to a bivariate analysis. The estimation method and the joint gene-set effect in both phenotypes are developed in this work. We also evaluate the proposed methodology with a simulation study and an application to real data.</p>","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2020-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2019-0030","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37905663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistical Applications in Genetics and Molecular Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1