组学数据分析的广义依赖度量

Qihua Tan, Martin Tepel, Hans Christian Beck, Lars Melholt Rasmussen, Jacob v. B. Hjelmborg
{"title":"组学数据分析的广义依赖度量","authors":"Qihua Tan, Martin Tepel, Hans Christian Beck, Lars Melholt Rasmussen, Jacob v. B. Hjelmborg","doi":"10.4172/2153-0602.1000183","DOIUrl":null,"url":null,"abstract":"J Data Mining Genomics Proteomics ISSN: 2153-0602 JDMGP, an open access journal Volume 7 • Issue 1 • 1000183 As a popular measure of association, the Pearson’s correlation coefficient has been frequently used in omics data analysis e.g. in feature selection process during prediction model building using high dimensional gene expression data [1] and proteomics data [2]. However, Pearson’s correlation coefficient captures only linear relationships which greatly limit its use in situations of nonlinear association. Statistical modeling for dealing with nonlinear patterns can be complicated [3] and requires intensive computation in case of high dimensional data such as microarray data or genome sequence data. In the analysis of omics data, high dimension means that there can be diverse patterns of dependence not limited to linearity. In this situation, the generalized measures of association more adequate than the Pearson’s correlation and capable of capturing both linear and nonlinear correlations are needed. Recently, generalized correlation coefficients have been frequently discussed [4] and their application to large scale genomic data illustrated through microarray gene expression time-course analysis [5].","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"11 1","pages":"1000183"},"PeriodicalIF":0.0000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Generalized Measure of Dependency for Analysis of Omics Data\",\"authors\":\"Qihua Tan, Martin Tepel, Hans Christian Beck, Lars Melholt Rasmussen, Jacob v. B. Hjelmborg\",\"doi\":\"10.4172/2153-0602.1000183\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"J Data Mining Genomics Proteomics ISSN: 2153-0602 JDMGP, an open access journal Volume 7 • Issue 1 • 1000183 As a popular measure of association, the Pearson’s correlation coefficient has been frequently used in omics data analysis e.g. in feature selection process during prediction model building using high dimensional gene expression data [1] and proteomics data [2]. However, Pearson’s correlation coefficient captures only linear relationships which greatly limit its use in situations of nonlinear association. Statistical modeling for dealing with nonlinear patterns can be complicated [3] and requires intensive computation in case of high dimensional data such as microarray data or genome sequence data. In the analysis of omics data, high dimension means that there can be diverse patterns of dependence not limited to linearity. In this situation, the generalized measures of association more adequate than the Pearson’s correlation and capable of capturing both linear and nonlinear correlations are needed. Recently, generalized correlation coefficients have been frequently discussed [4] and their application to large scale genomic data illustrated through microarray gene expression time-course analysis [5].\",\"PeriodicalId\":15630,\"journal\":{\"name\":\"Journal of Data Mining in Genomics & Proteomics\",\"volume\":\"11 1\",\"pages\":\"1000183\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Data Mining in Genomics & Proteomics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4172/2153-0602.1000183\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data Mining in Genomics & Proteomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4172/2153-0602.1000183","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

作为一种流行的关联度量,Pearson相关系数已被频繁地用于组学数据分析,例如在使用高维基因表达数据[1]和蛋白质组学数据[2]构建预测模型的特征选择过程中。然而,皮尔逊相关系数只捕获线性关系,这极大地限制了它在非线性关联情况下的应用。用于处理非线性模式的统计建模可能会很复杂[3],并且在微阵列数据或基因组序列数据等高维数据的情况下需要大量计算。在组学数据的分析中,高维意味着可以存在不限于线性的多种依赖模式。在这种情况下,需要比皮尔逊相关性更充分的关联的广义度量,并且能够捕获线性和非线性相关性。近年来,人们经常讨论广义相关系数[4],并通过微阵列基因表达时程分析将其应用于大规模基因组数据[5]。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Generalized Measure of Dependency for Analysis of Omics Data
J Data Mining Genomics Proteomics ISSN: 2153-0602 JDMGP, an open access journal Volume 7 • Issue 1 • 1000183 As a popular measure of association, the Pearson’s correlation coefficient has been frequently used in omics data analysis e.g. in feature selection process during prediction model building using high dimensional gene expression data [1] and proteomics data [2]. However, Pearson’s correlation coefficient captures only linear relationships which greatly limit its use in situations of nonlinear association. Statistical modeling for dealing with nonlinear patterns can be complicated [3] and requires intensive computation in case of high dimensional data such as microarray data or genome sequence data. In the analysis of omics data, high dimension means that there can be diverse patterns of dependence not limited to linearity. In this situation, the generalized measures of association more adequate than the Pearson’s correlation and capable of capturing both linear and nonlinear correlations are needed. Recently, generalized correlation coefficients have been frequently discussed [4] and their application to large scale genomic data illustrated through microarray gene expression time-course analysis [5].
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Proteomics Study of the Effect Left Atrial Appendage Resection on theEnergy Metabolism of Atrial Muscle in Beagle Dogs with Rapid Atrial Pacing Expression of NUP62 in the Development of Ovarian Cancer Translocation (2; 5) (q37.3, q14q35.3) in a Case of Male Infertility in Cotonou Editorial on Bioinformatics Tools and Techniques for Data Mining Ribosomes: Atomic Machines Association between Nucleic acids and Proteins
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1