首页 > 最新文献

Statistical Applications in Genetics and Molecular Biology最新文献

英文 中文
Batch effect reduction of microarray data with dependent samples using an empirical Bayes approach (BRIDGE). 使用经验贝叶斯方法(BRIDGE)减少具有依赖性样本的微阵列数据的批次效应。
IF 0.9 4区 数学 Q3 Mathematics Pub Date : 2021-12-14 DOI: 10.1515/sagmb-2021-0020
Qing Xia, Jeffrey A Thompson, Devin C Koestler

Batch-effects present challenges in the analysis of high-throughput molecular data and are particularly problematic in longitudinal studies when interest lies in identifying genes/features whose expression changes over time, but time is confounded with batch. While many methods to correct for batch-effects exist, most assume independence across samples; an assumption that is unlikely to hold in longitudinal microarray studies. We propose Batch effect Reduction of mIcroarray data with Dependent samples usinGEmpirical Bayes (BRIDGE), a three-step parametric empirical Bayes approach that leverages technical replicate samples profiled at multiple timepoints/batches, so-called "bridge samples", to inform batch-effect reduction/attenuation in longitudinal microarray studies. Extensive simulation studies and an analysis of a real biological data set were conducted to benchmark the performance of BRIDGE against both ComBat and longitudinalComBat. Our results demonstrate that while all methods perform well in facilitating accurate estimates of time effects, BRIDGE outperforms both ComBat and longitudinal ComBat in the removal of batch-effects in data sets with bridging samples, and perhaps as a result, was observed to have improved statistical power for detecting genes with a time effect. BRIDGE demonstrated competitive performance in batch effect reduction of confounded longitudinal microarray studies, both in simulated and a real data sets, and may serve as a useful preprocessing method for researchers conducting longitudinal microarray studies that include bridging samples.

批次效应给高通量分子数据分析带来了挑战,尤其是在纵向研究中,当研究兴趣在于识别表达随时间变化的基因/特征,但时间与批次混淆时,批次效应更是问题重重。虽然有很多方法可以校正批次效应,但大多数方法都假设不同样本之间是独立的,而这一假设在纵向微阵列研究中不太可能成立。我们提出了使用经验贝叶斯降低依赖样本的微阵列数据批次效应(BRIDGE),这是一种三步参数经验贝叶斯方法,它利用在多个时间点/批次剖析的技术复制样本(即所谓的 "桥样本"),为纵向微阵列研究中批次效应的降低/减弱提供信息。我们进行了广泛的模拟研究和对真实生物数据集的分析,以对照 ComBat 和 longitudinalComBat 对 BRIDGE 的性能进行基准测试。我们的结果表明,虽然所有方法都能很好地促进时间效应的准确估计,但 BRIDGE 在消除具有桥接样本的数据集中的批次效应方面优于 ComBat 和纵向 ComBat,因此,在检测具有时间效应的基因方面,BRIDGE 的统计能力也得到了提高。无论是在模拟数据集还是真实数据集中,BRIDGE 在减少纵向微阵列研究中的批次效应方面都表现出了很强的竞争力,可以作为研究人员进行包含桥接样本的纵向微阵列研究的一种有用的预处理方法。
{"title":"Batch effect reduction of microarray data with dependent samples using an empirical Bayes approach (BRIDGE).","authors":"Qing Xia, Jeffrey A Thompson, Devin C Koestler","doi":"10.1515/sagmb-2021-0020","DOIUrl":"10.1515/sagmb-2021-0020","url":null,"abstract":"<p><p>Batch-effects present challenges in the analysis of high-throughput molecular data and are particularly problematic in longitudinal studies when interest lies in identifying genes/features whose expression changes over time, but time is confounded with batch. While many methods to correct for batch-effects exist, most assume independence across samples; an assumption that is unlikely to hold in longitudinal microarray studies. We propose <u>B</u>atch effect <u>R</u>eduction of m<u>I</u>croarray data with <u>D</u>ependent samples usin<u>G</u><u>E</u>mpirical Bayes (<i>BRIDGE</i>), a three-step parametric empirical Bayes approach that leverages technical replicate samples profiled at multiple timepoints/batches, so-called \"bridge samples\", to inform batch-effect reduction/attenuation in longitudinal microarray studies. Extensive simulation studies and an analysis of a real biological data set were conducted to benchmark the performance of <i>BRIDGE</i> against both <i>ComBat</i> and <i>longitudinal</i><i>ComBat</i>. Our results demonstrate that while all methods perform well in facilitating accurate estimates of time effects, <i>BRIDGE</i> outperforms both <i>ComBat</i> and <i>longitudinal ComBat</i> in the removal of batch-effects in data sets with bridging samples, and perhaps as a result, was observed to have improved statistical power for detecting genes with a time effect. <i>BRIDGE</i> demonstrated competitive performance in batch effect reduction of confounded longitudinal microarray studies, both in simulated and a real data sets, and may serve as a useful preprocessing method for researchers conducting longitudinal microarray studies that include bridging samples.</p>","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"20 4-6","pages":"101-119"},"PeriodicalIF":0.9,"publicationDate":"2021-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9617207/pdf/nihms-1843789.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39586240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frontmatter
IF 0.9 4区 数学 Q3 Mathematics Pub Date : 2021-12-01 DOI: 10.1515/sagmb-2021-frontmatter4-6
{"title":"Frontmatter","authors":"","doi":"10.1515/sagmb-2021-frontmatter4-6","DOIUrl":"https://doi.org/10.1515/sagmb-2021-frontmatter4-6","url":null,"abstract":"","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":" ","pages":""},"PeriodicalIF":0.9,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43944170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing weighted gene co-expression network analysis with a multi-threaded calculation of the topological overlap matrix. 基于拓扑重叠矩阵多线程计算的优化加权基因共表达网络分析。
IF 0.9 4区 数学 Q3 Mathematics Pub Date : 2021-11-09 DOI: 10.1515/sagmb-2021-0025
Min Shuai, Dongmei He, Xin Chen

Biomolecular networks are often assumed to be scale-free hierarchical networks. The weighted gene co-expression network analysis (WGCNA) treats gene co-expression networks as undirected scale-free hierarchical weighted networks. The WGCNA R software package uses an Adjacency Matrix to store a network, next calculates the topological overlap matrix (TOM), and then identifies the modules (sub-networks), where each module is assumed to be associated with a certain biological function. The most time-consuming step of WGCNA is to calculate TOM from the Adjacency Matrix in a single thread. In this paper, the single-threaded algorithm of the TOM has been changed into a multi-threaded algorithm (the parameters are the default values of WGCNA). In the multi-threaded algorithm, Rcpp was used to make R call a C++ function, and then C++ used OpenMP to start multiple threads to calculate TOM from the Adjacency Matrix. On shared-memory MultiProcessor systems, the calculation time decreases as the number of CPU cores increases. The algorithm of this paper can promote the application of WGCNA on large data sets, and help other research fields to identify sub-networks in undirected scale-free hierarchical weighted networks. The source codes and usage are available at https://github.com/do-somethings-haha/multi-threaded_calculate_unsigned_TOM_from_unsigned_or_signed_Adjacency_Matrix_of_WGCNA.

生物分子网络通常被认为是无标度的分层网络。加权基因共表达网络分析(WGCNA)将基因共表达网络视为无向无标度分层加权网络。WGCNA R软件包使用邻接矩阵来存储网络,然后计算拓扑重叠矩阵(TOM),然后识别模块(子网络),其中每个模块被认为与特定的生物功能相关联。WGCNA中最耗时的一步是从单线程的邻接矩阵中计算TOM。本文将TOM的单线程算法改为多线程算法(参数为WGCNA的默认值)。在多线程算法中,使用Rcpp让R调用一个c++函数,然后c++使用OpenMP启动多个线程从邻接矩阵中计算TOM。在共享内存多处理器系统上,计算时间随着CPU核数的增加而减少。本文的算法可以促进WGCNA在大数据集上的应用,并有助于其他研究领域对无向无标度分层加权网络中的子网络进行识别。源代码和用法可从https://github.com/do-somethings-haha/multi-threaded_calculate_unsigned_TOM_from_unsigned_or_signed_Adjacency_Matrix_of_WGCNA获得。
{"title":"Optimizing weighted gene co-expression network analysis with a multi-threaded calculation of the topological overlap matrix.","authors":"Min Shuai,&nbsp;Dongmei He,&nbsp;Xin Chen","doi":"10.1515/sagmb-2021-0025","DOIUrl":"https://doi.org/10.1515/sagmb-2021-0025","url":null,"abstract":"<p><p>Biomolecular networks are often assumed to be scale-free hierarchical networks. The weighted gene co-expression network analysis (WGCNA) treats gene co-expression networks as undirected scale-free hierarchical weighted networks. The WGCNA R software package uses an Adjacency Matrix to store a network, next calculates the topological overlap matrix (TOM), and then identifies the modules (sub-networks), where each module is assumed to be associated with a certain biological function. The most time-consuming step of WGCNA is to calculate TOM from the Adjacency Matrix in a single thread. In this paper, the single-threaded algorithm of the TOM has been changed into a multi-threaded algorithm (the parameters are the default values of WGCNA). In the multi-threaded algorithm, Rcpp was used to make R call a C++ function, and then C++ used OpenMP to start multiple threads to calculate TOM from the Adjacency Matrix. On shared-memory MultiProcessor systems, the calculation time decreases as the number of CPU cores increases. The algorithm of this paper can promote the application of WGCNA on large data sets, and help other research fields to identify sub-networks in undirected scale-free hierarchical weighted networks. The source codes and usage are available at https://github.com/do-somethings-haha/multi-threaded_calculate_unsigned_TOM_from_unsigned_or_signed_Adjacency_Matrix_of_WGCNA.</p>","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"20 4-6","pages":"145-153"},"PeriodicalIF":0.9,"publicationDate":"2021-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39696432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A hierarchical Bayesian approach for detecting global microbiome associations. 检测全球微生物组关联的分层贝叶斯方法。
IF 0.9 4区 数学 Q3 Mathematics Pub Date : 2021-11-01 DOI: 10.1515/sagmb-2021-0047
Farhad Hatami, Emma Beamish, Albert Davies, Rachael Rigby, Frank Dondelinger

The human gut microbiome has been shown to be associated with a variety of human diseases, including cancer, metabolic conditions and inflammatory bowel disease. Current approaches for detecting microbiome associations are limited by relying on specific measures of ecological distance, or only allowing for the detection of associations with individual bacterial species, rather than the whole microbiome. In this work, we develop a novel hierarchical Bayesian model for detecting global microbiome associations. Our method is not dependent on a choice of distance measure, and is able to incorporate phylogenetic information about microbial species. We perform extensive simulation studies and show that our method allows for consistent estimation of global microbiome effects. Additionally, we investigate the performance of the model on two real-world microbiome studies: a study of microbiome-metabolome associations in inflammatory bowel disease, and a study of associations between diet and the gut microbiome in mice. We show that we can use the method to reliably detect associations in real-world datasets with varying numbers of samples and covariates.

人类肠道微生物组已被证明与多种人类疾病有关,包括癌症、代谢疾病和炎症性肠病。目前检测微生物组关联的方法受到限制,因为它们依赖于特定的生态距离测量,或者只能检测与单个细菌物种而非整个微生物组的关联。在这项工作中,我们开发了一种新型分层贝叶斯模型,用于检测全球微生物组关联。我们的方法不依赖于距离度量的选择,并且能够纳入微生物物种的系统发育信息。我们进行了大量的模拟研究,结果表明我们的方法可以对全球微生物组效应进行一致的估计。此外,我们还调查了该模型在两项实际微生物组研究中的表现:一项是炎症性肠病中微生物组-代谢组关联研究,另一项是小鼠饮食与肠道微生物组关联研究。我们的研究表明,我们可以用这种方法在样本数量和协变量各不相同的真实世界数据集中可靠地检测出相关性。
{"title":"A hierarchical Bayesian approach for detecting global microbiome associations.","authors":"Farhad Hatami, Emma Beamish, Albert Davies, Rachael Rigby, Frank Dondelinger","doi":"10.1515/sagmb-2021-0047","DOIUrl":"10.1515/sagmb-2021-0047","url":null,"abstract":"<p><p>The human gut microbiome has been shown to be associated with a variety of human diseases, including cancer, metabolic conditions and inflammatory bowel disease. Current approaches for detecting microbiome associations are limited by relying on specific measures of ecological distance, or only allowing for the detection of associations with individual bacterial species, rather than the whole microbiome. In this work, we develop a novel hierarchical Bayesian model for detecting global microbiome associations. Our method is not dependent on a choice of distance measure, and is able to incorporate phylogenetic information about microbial species. We perform extensive simulation studies and show that our method allows for consistent estimation of global microbiome effects. Additionally, we investigate the performance of the model on two real-world microbiome studies: a study of microbiome-metabolome associations in inflammatory bowel disease, and a study of associations between diet and the gut microbiome in mice. We show that we can use the method to reliably detect associations in real-world datasets with varying numbers of samples and covariates.</p>","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"20 3","pages":"85-100"},"PeriodicalIF":0.9,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9125803/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39574241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frontmatter
IF 0.9 4区 数学 Q3 Mathematics Pub Date : 2021-10-01 DOI: 10.1515/sagmb-2021-frontmatter3
{"title":"Frontmatter","authors":"","doi":"10.1515/sagmb-2021-frontmatter3","DOIUrl":"https://doi.org/10.1515/sagmb-2021-frontmatter3","url":null,"abstract":"","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":" ","pages":""},"PeriodicalIF":0.9,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43757619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frontmatter Frontmatter
IF 0.9 4区 数学 Q3 Mathematics Pub Date : 2021-02-01 DOI: 10.1515/sagmb-2021-frontmatter1
{"title":"Frontmatter","authors":"","doi":"10.1515/sagmb-2021-frontmatter1","DOIUrl":"https://doi.org/10.1515/sagmb-2021-frontmatter1","url":null,"abstract":"","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"1 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2021-frontmatter1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42140367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measuring evolutionary cancer dynamics from genome sequencing, one patient at a time 通过基因组测序测量癌症的进化动态,每次一名患者
IF 0.9 4区 数学 Q3 Mathematics Pub Date : 2020-12-01 DOI: 10.1515/sagmb-2020-0075
G. Caravagna
Abstract Cancers progress through the accumulation of somatic mutations which accrue during tumour evolution, allowing some cells to proliferate in an uncontrolled fashion. This growth process is intimately related to latent evolutionary forces moulding the genetic and epigenetic composition of tumour subpopulations. Understanding cancer requires therefore the understanding of these selective pressures. The adoption of widespread next-generation sequencing technologies opens up for the possibility of measuring molecular profiles of cancers at multiple resolutions, across one or multiple patients. In this review we discuss how cancer genome sequencing data from a single tumour can be used to understand these evolutionary forces, overviewing mathematical models and inferential methods adopted in field of Cancer Evolution.
摘要癌症通过在肿瘤进化过程中积累的体细胞突变来发展,使一些细胞以不受控制的方式增殖。这种生长过程与塑造肿瘤亚群遗传和表观遗传组成的潜在进化力密切相关。因此,理解癌症需要理解这些选择性压力。广泛采用的下一代测序技术为以多种分辨率测量一名或多名患者的癌症分子谱开辟了可能性。在这篇综述中,我们讨论了如何使用单个肿瘤的癌症基因组测序数据来理解这些进化力,概述了癌症进化领域采用的数学模型和推理方法。
{"title":"Measuring evolutionary cancer dynamics from genome sequencing, one patient at a time","authors":"G. Caravagna","doi":"10.1515/sagmb-2020-0075","DOIUrl":"https://doi.org/10.1515/sagmb-2020-0075","url":null,"abstract":"Abstract Cancers progress through the accumulation of somatic mutations which accrue during tumour evolution, allowing some cells to proliferate in an uncontrolled fashion. This growth process is intimately related to latent evolutionary forces moulding the genetic and epigenetic composition of tumour subpopulations. Understanding cancer requires therefore the understanding of these selective pressures. The adoption of widespread next-generation sequencing technologies opens up for the possibility of measuring molecular profiles of cancers at multiple resolutions, across one or multiple patients. In this review we discuss how cancer genome sequencing data from a single tumour can be used to understand these evolutionary forces, overviewing mathematical models and inferential methods adopted in field of Cancer Evolution.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":" ","pages":""},"PeriodicalIF":0.9,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2020-0075","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45496562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Inferring dynamic gene regulatory networks with low-order conditional independencies – an evaluation of the method 推断具有低阶条件独立性的动态基因调控网络-对该方法的评价
IF 0.9 4区 数学 Q3 Mathematics Pub Date : 2020-12-01 DOI: 10.1515/sagmb-2020-0051
Hamda Ajmal, M. G. Madden
Abstract Over a decade ago, Lèbre (2009) proposed an inference method, G1DBN, to learn the structure of gene regulatory networks (GRNs) from high dimensional, sparse time-series gene expression data. Their approach is based on concept of low-order conditional independence graphs that they extend to dynamic Bayesian networks (DBNs). They present results to demonstrate that their method yields better structural accuracy compared to the related Lasso and Shrinkage methods, particularly where the data is sparse, that is, the number of time measurements n is much smaller than the number of genes p. This paper challenges these claims using a careful experimental analysis, to show that the GRNs reverse engineered from time-series data using the G1DBN approach are less accurate than claimed by Lèbre (2009). We also show that the Lasso method yields higher structural accuracy for graphs learned from the simulated data, compared to the G1DBN method, particularly when the data is sparse ( n < < p $n{< }{< }p$ ). The Lasso method is also better than G1DBN at identifying the transcription factors (TFs) involved in the cell cycle of Saccharomyces cerevisiae.
摘要十多年前,Lèbre(2009)提出了一种推理方法G1DBN,用于从高维、稀疏的时间序列基因表达数据中学习基因调控网络(GRN)的结构。他们的方法基于低阶条件独立图的概念,并将其扩展到动态贝叶斯网络(DBN)。他们提出的结果表明,与相关的拉索和收缩方法相比,他们的方法产生了更好的结构精度,特别是在数据稀疏的情况下,即时间测量的数量n远小于基因的数量p。本文通过仔细的实验分析对这些说法提出了质疑,以表明使用G1DBN方法从时间序列数据中反向工程的GRN不如Lèbre(2009)所声称的准确。我们还表明,与G1DBN方法相比,Lasso方法对从模拟数据中学习的图产生了更高的结构精度,特别是当数据稀疏时(n<
{"title":"Inferring dynamic gene regulatory networks with low-order conditional independencies – an evaluation of the method","authors":"Hamda Ajmal, M. G. Madden","doi":"10.1515/sagmb-2020-0051","DOIUrl":"https://doi.org/10.1515/sagmb-2020-0051","url":null,"abstract":"Abstract Over a decade ago, Lèbre (2009) proposed an inference method, G1DBN, to learn the structure of gene regulatory networks (GRNs) from high dimensional, sparse time-series gene expression data. Their approach is based on concept of low-order conditional independence graphs that they extend to dynamic Bayesian networks (DBNs). They present results to demonstrate that their method yields better structural accuracy compared to the related Lasso and Shrinkage methods, particularly where the data is sparse, that is, the number of time measurements n is much smaller than the number of genes p. This paper challenges these claims using a careful experimental analysis, to show that the GRNs reverse engineered from time-series data using the G1DBN approach are less accurate than claimed by Lèbre (2009). We also show that the Lasso method yields higher structural accuracy for graphs learned from the simulated data, compared to the G1DBN method, particularly when the data is sparse ( n < < p $n{< }{< }p$ ). The Lasso method is also better than G1DBN at identifying the transcription factors (TFs) involved in the cell cycle of Saccharomyces cerevisiae.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":" ","pages":""},"PeriodicalIF":0.9,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2020-0051","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46568594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Distinct characteristics of correlation analysis at the single-cell and the population level 单细胞水平和群体水平相关分析的显著特征
IF 0.9 4区 数学 Q3 Mathematics Pub Date : 2020-08-19 DOI: 10.21203/rs.3.rs-42825/v1
Guoyu Wu, Yuchao Li
Abstract Correlation analysis is widely used in biological studies to infer molecular relationships within biological networks. Recently, single-cell analysis has drawn tremendous interests, for its ability to obtain high-resolution molecular phenotypes. It turns out that there is little overlap of co-expressed genes identified in single-cell level investigations with that of population level investigations. However, the nature of the relationship of correlations between single-cell and population levels remains unclear. In this manuscript, we aimed to unveil the origin of the differences between the correlation coefficients at the single-cell level and that at the population level, and bridge the gap between them. Through developing formulations to link correlations at the single-cell and the population level, we illustrated that aggregated correlations could be stronger, weaker or equal to the corresponding individual correlations, depending on the variations and the correlations within the population. When the correlation within the population is weaker than the individual correlation, the aggregated correlation is stronger than the corresponding individual correlation. Besides, our data indicated that aggregated correlation is more likely to be stronger than the corresponding individual correlation, and it was rare to find gene-pairs exclusively strongly correlated at the single-cell level. Through a bottom-up approach to model interactions between molecules in a signaling cascade or a multi-regulator-controlled gene expression, we surprisingly found that the existence of interaction between two components could not be excluded simply based on their low correlation coefficients, suggesting a reconsideration of connectivity within biological networks which was derived solely from correlation analysis. We also investigated the impact of technical random measurement errors on the correlation coefficients for the single-cell level and the population level. The results indicate that the aggregated correlation is relatively robust and less affected. Because of the heterogeneity among single cells, correlation coefficients calculated based on data of the single-cell level might be different from that of the population level. Depending on the specific question we are asking, proper sampling and normalization procedure should be done before we draw any conclusions.
相关分析在生物学研究中被广泛用于推断生物网络中的分子关系。最近,单细胞分析已经引起了极大的兴趣,因为它能够获得高分辨率的分子表型。结果表明,在单细胞水平调查中发现的共表达基因与群体水平调查中发现的共表达基因几乎没有重叠。然而,单细胞水平和群体水平之间的相关性关系的本质仍不清楚。在这篇文章中,我们旨在揭示单细胞水平上的相关系数与群体水平上的相关系数差异的来源,并弥合它们之间的差距。通过开发将单细胞和种群水平的相关性联系起来的公式,我们说明了,根据种群内的变异和相关性,聚合相关性可能比相应的个体相关性更强、更弱或等于个体相关性。当群体内相关性弱于个体相关性时,总体相关性强于相应的个体相关性。此外,我们的数据表明,总体相关性可能比相应的个体相关性更强,并且很少发现基因对在单细胞水平上完全强相关。通过自下而上的方法来模拟信号级联分子之间的相互作用或多调节因子控制的基因表达,我们惊讶地发现,不能简单地基于它们的低相关系数来排除两个组分之间相互作用的存在,这表明重新考虑生物网络中仅由相关分析得出的连性。我们还研究了技术随机测量误差对单细胞水平和种群水平相关系数的影响。结果表明,综合相关性具有较强的鲁棒性,受影响较小。由于单细胞间的异质性,根据单细胞水平计算的相关系数可能与群体水平计算的相关系数不同。根据我们所问的具体问题,在我们得出任何结论之前,应该进行适当的抽样和归一化程序。
{"title":"Distinct characteristics of correlation analysis at the single-cell and the population level","authors":"Guoyu Wu, Yuchao Li","doi":"10.21203/rs.3.rs-42825/v1","DOIUrl":"https://doi.org/10.21203/rs.3.rs-42825/v1","url":null,"abstract":"Abstract Correlation analysis is widely used in biological studies to infer molecular relationships within biological networks. Recently, single-cell analysis has drawn tremendous interests, for its ability to obtain high-resolution molecular phenotypes. It turns out that there is little overlap of co-expressed genes identified in single-cell level investigations with that of population level investigations. However, the nature of the relationship of correlations between single-cell and population levels remains unclear. In this manuscript, we aimed to unveil the origin of the differences between the correlation coefficients at the single-cell level and that at the population level, and bridge the gap between them. Through developing formulations to link correlations at the single-cell and the population level, we illustrated that aggregated correlations could be stronger, weaker or equal to the corresponding individual correlations, depending on the variations and the correlations within the population. When the correlation within the population is weaker than the individual correlation, the aggregated correlation is stronger than the corresponding individual correlation. Besides, our data indicated that aggregated correlation is more likely to be stronger than the corresponding individual correlation, and it was rare to find gene-pairs exclusively strongly correlated at the single-cell level. Through a bottom-up approach to model interactions between molecules in a signaling cascade or a multi-regulator-controlled gene expression, we surprisingly found that the existence of interaction between two components could not be excluded simply based on their low correlation coefficients, suggesting a reconsideration of connectivity within biological networks which was derived solely from correlation analysis. We also investigated the impact of technical random measurement errors on the correlation coefficients for the single-cell level and the population level. The results indicate that the aggregated correlation is relatively robust and less affected. Because of the heterogeneity among single cells, correlation coefficients calculated based on data of the single-cell level might be different from that of the population level. Depending on the specific question we are asking, proper sampling and normalization procedure should be done before we draw any conclusions.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"0 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2020-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41900756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accuracy and sensitivity of different Bayesian methods for genomic prediction using simulation and real data. 不同贝叶斯方法在基因组预测中的准确性和敏感性。
IF 0.9 4区 数学 Q3 Mathematics Pub Date : 2020-08-10 DOI: 10.1515/sagmb-2019-0007
Saheb Foroutaifar

The main objectives of this study were to compare the prediction accuracy of different Bayesian methods for traits with a wide range of genetic architecture using simulation and real data and to assess the sensitivity of these methods to the violation of their assumptions. For the simulation study, different scenarios were implemented based on two traits with low or high heritability and different numbers of QTL and the distribution of their effects. For real data analysis, a German Holstein dataset for milk fat percentage, milk yield, and somatic cell score was used. The simulation results showed that, with the exception of the Bayes R, the other methods were sensitive to changes in the number of QTLs and distribution of QTL effects. Having a distribution of QTL effects, similar to what different Bayesian methods assume for estimating marker effects, did not improve their prediction accuracy. The Bayes B method gave higher or equal accuracy rather than the rest. The real data analysis showed that similar to scenarios with a large number of QTLs in the simulation, there was no difference between the accuracies of the different methods for any of the traits.

本研究的主要目的是利用模拟和真实数据比较不同贝叶斯方法对具有广泛遗传结构的性状的预测精度,并评估这些方法对违反其假设的敏感性。在模拟研究中,根据遗传力低或高的两个性状、不同的QTL数量及其效应分布,实施不同的情景。对于实际数据分析,使用了德国荷尔斯坦的乳脂率、产奶量和体细胞评分数据集。模拟结果表明,除Bayes R外,其他方法对QTL数量和QTL效应分布的变化较为敏感。有一个QTL效应的分布,类似于不同的贝叶斯方法估计标记效应的假设,并没有提高他们的预测精度。与其他方法相比,贝叶斯B方法给出了更高或相同的精度。实际数据分析表明,与模拟中qtl数量较多的情况类似,不同方法对任意性状的准确率均无差异。
{"title":"Accuracy and sensitivity of different Bayesian methods for genomic prediction using simulation and real data.","authors":"Saheb Foroutaifar","doi":"10.1515/sagmb-2019-0007","DOIUrl":"https://doi.org/10.1515/sagmb-2019-0007","url":null,"abstract":"<p><p>The main objectives of this study were to compare the prediction accuracy of different Bayesian methods for traits with a wide range of genetic architecture using simulation and real data and to assess the sensitivity of these methods to the violation of their assumptions. For the simulation study, different scenarios were implemented based on two traits with low or high heritability and different numbers of QTL and the distribution of their effects. For real data analysis, a German Holstein dataset for milk fat percentage, milk yield, and somatic cell score was used. The simulation results showed that, with the exception of the Bayes R, the other methods were sensitive to changes in the number of QTLs and distribution of QTL effects. Having a distribution of QTL effects, similar to what different Bayesian methods assume for estimating marker effects, did not improve their prediction accuracy. The Bayes B method gave higher or equal accuracy rather than the rest. The real data analysis showed that similar to scenarios with a large number of QTLs in the simulation, there was no difference between the accuracies of the different methods for any of the traits.</p>","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"19 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2019-0007","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38247369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Statistical Applications in Genetics and Molecular Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1