组学数据分析的广义依赖度量

Journal of Data Mining in Genomics & Proteomics Pub Date : 2016-01-01 DOI:10.4172/2153-0602.1000183

Qihua Tan, Martin Tepel, Hans Christian Beck, Lars Melholt Rasmussen, Jacob v. B. Hjelmborg

{"title":"组学数据分析的广义依赖度量","authors":"Qihua Tan, Martin Tepel, Hans Christian Beck, Lars Melholt Rasmussen, Jacob v. B. Hjelmborg","doi":"10.4172/2153-0602.1000183","DOIUrl":null,"url":null,"abstract":"J Data Mining Genomics Proteomics ISSN: 2153-0602 JDMGP, an open access journal Volume 7 • Issue 1 • 1000183 As a popular measure of association, the Pearson’s correlation coefficient has been frequently used in omics data analysis e.g. in feature selection process during prediction model building using high dimensional gene expression data [1] and proteomics data [2]. However, Pearson’s correlation coefficient captures only linear relationships which greatly limit its use in situations of nonlinear association. Statistical modeling for dealing with nonlinear patterns can be complicated [3] and requires intensive computation in case of high dimensional data such as microarray data or genome sequence data. In the analysis of omics data, high dimension means that there can be diverse patterns of dependence not limited to linearity. In this situation, the generalized measures of association more adequate than the Pearson’s correlation and capable of capturing both linear and nonlinear correlations are needed. Recently, generalized correlation coefficients have been frequently discussed [4] and their application to large scale genomic data illustrated through microarray gene expression time-course analysis [5].","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"11 1","pages":"1000183"},"PeriodicalIF":0.0000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Generalized Measure of Dependency for Analysis of Omics Data\",\"authors\":\"Qihua Tan, Martin Tepel, Hans Christian Beck, Lars Melholt Rasmussen, Jacob v. B. Hjelmborg\",\"doi\":\"10.4172/2153-0602.1000183\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"J Data Mining Genomics Proteomics ISSN: 2153-0602 JDMGP, an open access journal Volume 7 • Issue 1 • 1000183 As a popular measure of association, the Pearson’s correlation coefficient has been frequently used in omics data analysis e.g. in feature selection process during prediction model building using high dimensional gene expression data [1] and proteomics data [2]. However, Pearson’s correlation coefficient captures only linear relationships which greatly limit its use in situations of nonlinear association. Statistical modeling for dealing with nonlinear patterns can be complicated [3] and requires intensive computation in case of high dimensional data such as microarray data or genome sequence data. In the analysis of omics data, high dimension means that there can be diverse patterns of dependence not limited to linearity. In this situation, the generalized measures of association more adequate than the Pearson’s correlation and capable of capturing both linear and nonlinear correlations are needed. Recently, generalized correlation coefficients have been frequently discussed [4] and their application to large scale genomic data illustrated through microarray gene expression time-course analysis [5].\",\"PeriodicalId\":15630,\"journal\":{\"name\":\"Journal of Data Mining in Genomics & Proteomics\",\"volume\":\"11 1\",\"pages\":\"1000183\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Data Mining in Genomics & Proteomics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4172/2153-0602.1000183\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data Mining in Genomics & Proteomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4172/2153-0602.1000183","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

作为一种流行的关联度量，Pearson相关系数已被频繁地用于组学数据分析，例如在使用高维基因表达数据[1]和蛋白质组学数据[2]构建预测模型的特征选择过程中。然而，皮尔逊相关系数只捕获线性关系，这极大地限制了它在非线性关联情况下的应用。用于处理非线性模式的统计建模可能会很复杂[3]，并且在微阵列数据或基因组序列数据等高维数据的情况下需要大量计算。在组学数据的分析中，高维意味着可以存在不限于线性的多种依赖模式。在这种情况下，需要比皮尔逊相关性更充分的关联的广义度量，并且能够捕获线性和非线性相关性。近年来，人们经常讨论广义相关系数[4]，并通过微阵列基因表达时程分析将其应用于大规模基因组数据[5]。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Generalized Measure of Dependency for Analysis of Omics Data

J Data Mining Genomics Proteomics ISSN: 2153-0602 JDMGP, an open access journal Volume 7 • Issue 1 • 1000183 As a popular measure of association, the Pearson’s correlation coefficient has been frequently used in omics data analysis e.g. in feature selection process during prediction model building using high dimensional gene expression data [1] and proteomics data [2]. However, Pearson’s correlation coefficient captures only linear relationships which greatly limit its use in situations of nonlinear association. Statistical modeling for dealing with nonlinear patterns can be complicated [3] and requires intensive computation in case of high dimensional data such as microarray data or genome sequence data. In the analysis of omics data, high dimension means that there can be diverse patterns of dependence not limited to linearity. In this situation, the generalized measures of association more adequate than the Pearson’s correlation and capable of capturing both linear and nonlinear correlations are needed. Recently, generalized correlation coefficients have been frequently discussed [4] and their application to large scale genomic data illustrated through microarray gene expression time-course analysis [5].

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Data Mining in Genomics & Proteomics

自引率

0.00%

发文量