Exploring homogeneity of correlation structures of gene expression datasets within and between etiological disease categories.

IF 0.8 4区 数学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Statistical Applications in Genetics and Molecular Biology Pub Date : 2014-12-01 DOI:10.1515/sagmb-2014-0003
Victor L Jong, Putri W Novianti, Kit C B Roes, Marinus J C Eijkemans
{"title":"Exploring homogeneity of correlation structures of gene expression datasets within and between etiological disease categories.","authors":"Victor L Jong,&nbsp;Putri W Novianti,&nbsp;Kit C B Roes,&nbsp;Marinus J C Eijkemans","doi":"10.1515/sagmb-2014-0003","DOIUrl":null,"url":null,"abstract":"<p><p>The literature shows that classifiers perform differently across datasets and that correlations within datasets affect the performance of classifiers. The question that arises is whether the correlation structure within datasets differ significantly across diseases. In this study, we evaluated the homogeneity of correlation structures within and between datasets of six etiological disease categories; inflammatory, immune, infectious, degenerative, hereditary and acute myeloid leukemia (AML). We also assessed the effect of filtering; detection call and variance filtering on correlation structures. We downloaded microarray datasets from ArrayExpress for experiments meeting predefined criteria and ended up with 12 datasets for non-cancerous diseases and six for AML. The datasets were preprocessed by a common procedure incorporating platform-specific recommendations and the two filtering methods mentioned above. Homogeneity of correlation matrices between and within datasets of etiological diseases was assessed using the Box's M statistic on permuted samples. We found that correlation structures significantly differ between datasets of the same and/or different etiological disease categories and that variance filtering eliminates more uncorrelated probesets than detection call filtering and thus renders the data highly correlated.</p>","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"13 6","pages":"717-32"},"PeriodicalIF":0.8000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2014-0003","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Applications in Genetics and Molecular Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/sagmb-2014-0003","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 6

Abstract

The literature shows that classifiers perform differently across datasets and that correlations within datasets affect the performance of classifiers. The question that arises is whether the correlation structure within datasets differ significantly across diseases. In this study, we evaluated the homogeneity of correlation structures within and between datasets of six etiological disease categories; inflammatory, immune, infectious, degenerative, hereditary and acute myeloid leukemia (AML). We also assessed the effect of filtering; detection call and variance filtering on correlation structures. We downloaded microarray datasets from ArrayExpress for experiments meeting predefined criteria and ended up with 12 datasets for non-cancerous diseases and six for AML. The datasets were preprocessed by a common procedure incorporating platform-specific recommendations and the two filtering methods mentioned above. Homogeneity of correlation matrices between and within datasets of etiological diseases was assessed using the Box's M statistic on permuted samples. We found that correlation structures significantly differ between datasets of the same and/or different etiological disease categories and that variance filtering eliminates more uncorrelated probesets than detection call filtering and thus renders the data highly correlated.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
探索病原学疾病类别内部和之间基因表达数据集相关结构的同质性。
文献表明,分类器在不同的数据集上表现不同,数据集内的相关性影响分类器的性能。由此产生的问题是,数据集内的相关结构在不同疾病之间是否存在显著差异。在这项研究中,我们评估了六种病原学疾病类别数据集内部和之间相关结构的同质性;炎性、免疫性、感染性、退行性、遗传性和急性髓性白血病(AML)。我们还评估了过滤的效果;相关结构的检测调用和方差滤波。我们从ArrayExpress下载了符合预定义标准的微阵列数据集,最终获得了12个非癌性疾病数据集和6个AML数据集。数据集通过结合平台特定建议和上述两种过滤方法的通用程序进行预处理。使用排列样本的Box's M统计量评估病原学疾病数据集之间和内部相关矩阵的同质性。我们发现相同和/或不同病因疾病类别的数据集之间的相关性结构显着不同,方差过滤比检测调用过滤消除了更多不相关的问题集,从而使数据高度相关。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Statistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology BIOCHEMISTRY & MOLECULAR BIOLOGY-MATHEMATICAL & COMPUTATIONAL BIOLOGY
自引率
11.10%
发文量
8
期刊介绍: Statistical Applications in Genetics and Molecular Biology seeks to publish significant research on the application of statistical ideas to problems arising from computational biology. The focus of the papers should be on the relevant statistical issues but should contain a succinct description of the relevant biological problem being considered. The range of topics is wide and will include topics such as linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarray data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies. Both original research and review articles will be warmly received.
期刊最新文献
When is the allele-sharing dissimilarity between two populations exceeded by the allele-sharing dissimilarity of a population with itself? Sparse latent factor regression models for genome-wide and epigenome-wide association studies Low variability in the underlying cellular landscape adversely affects the performance of interaction-based approaches for conducting cell-specific analyses of DNA methylation in bulk samples. AdaReg: data adaptive robust estimation in linear regression with application in GTEx gene expressions. Collocation based training of neural ordinary differential equations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1