Semi-automatic mining of correlated data from a complex database: Correlation network visualization

M. Lexa, Radovan Lapar
{"title":"Semi-automatic mining of correlated data from a complex database: Correlation network visualization","authors":"M. Lexa, Radovan Lapar","doi":"10.1109/ICCABS.2016.7802783","DOIUrl":null,"url":null,"abstract":"In previous work we have addressed the issue of frequent ad-hoc queries in deeply-structured databases. We wrote a library of functions AutodenormLib.py for issuing proper JOIN commands to denormalize an arbitrary subset of stored data for downstream processing. This may include statistical analysis, visualization or machine learning. Here, we visualize the content of the Thalamoss biomedical database as a correlation network. The network is created by calculating pairwise correlations through all pairs of variables, whether they be numerical, ordinal or nominal. We subsequently construct the network over the entire set of variables, clustering variables with similar effects to discover group relationships between the various biomedical characteristics. We use a semi-automatic procedure that makes the selection of all pairs possible and discuss issues of dealing with different types of variables. This is done either by limiting the analysis to numerical and ordinal ones, or by binning their values into intervals of values. Knowledge extracted from the data in this mode can be used to select variables for statistical models, or as markers of medically interesting conditions.","PeriodicalId":89933,"journal":{"name":"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences","volume":"8 1","pages":"1-2"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCABS.2016.7802783","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In previous work we have addressed the issue of frequent ad-hoc queries in deeply-structured databases. We wrote a library of functions AutodenormLib.py for issuing proper JOIN commands to denormalize an arbitrary subset of stored data for downstream processing. This may include statistical analysis, visualization or machine learning. Here, we visualize the content of the Thalamoss biomedical database as a correlation network. The network is created by calculating pairwise correlations through all pairs of variables, whether they be numerical, ordinal or nominal. We subsequently construct the network over the entire set of variables, clustering variables with similar effects to discover group relationships between the various biomedical characteristics. We use a semi-automatic procedure that makes the selection of all pairs possible and discuss issues of dealing with different types of variables. This is done either by limiting the analysis to numerical and ordinal ones, or by binning their values into intervals of values. Knowledge extracted from the data in this mode can be used to select variables for statistical models, or as markers of medically interesting conditions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
复杂数据库中关联数据的半自动挖掘:关联网络可视化
在之前的工作中,我们已经解决了深度结构化数据库中频繁的临时查询问题。我们编写了一个函数库AutodenormLib.py,用于发出适当的JOIN命令,对存储数据的任意子集进行反规范化,以便进行下游处理。这可能包括统计分析、可视化或机器学习。在这里,我们将Thalamoss生物医学数据库的内容可视化为一个相关网络。该网络是通过计算所有变量对的两两相关性而创建的,无论它们是数值的、顺序的还是名义的。随后,我们在整个变量集上构建网络,聚类具有相似效果的变量,以发现各种生物医学特征之间的群体关系。我们使用半自动程序,使所有对的选择成为可能,并讨论处理不同类型变量的问题。这可以通过将分析限制为数值和序数来实现,或者通过将它们的值合并到值的区间中来实现。在这种模式下,从数据中提取的知识可用于为统计模型选择变量,或作为医学上有趣的条件的标记。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Computational Advances in Bio and Medical Sciences: 11th International Conference, ICCABS 2021, Virtual Event, December 16–18, 2021, Revised Selected Papers Computational Advances in Bio and Medical Sciences: 10th International Conference, ICCABS 2020, Virtual Event, December 10-12, 2020, Revised Selected Papers Single-Cell Gene Regulatory Network Analysis Reveals Potential Mechanisms of Action of Antimalarials Against SARS-CoV-2 Computational Study of Action Potential Generation in Urethral Smooth Muscle Cell DNA Read Feature Importance Using Machine Learning for Read Alignment Categories
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1