A visual analysis approach for data imputation via multi-party tabular data correlation strategies

IF 2.7 3区 工程技术 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Frontiers of Information Technology & Electronic Engineering Pub Date : 2023-12-29 DOI:10.1631/fitee.2300480
Haiyang Zhu, Dongmin Han, Jiacheng Pan, Yating Wei, Yingchaojie Feng, Luoxuan Weng, Ketian Mao, Yuankai Xing, Jianshu Lv, Qiucheng Wan, Wei Chen
{"title":"A visual analysis approach for data imputation via multi-party tabular data correlation strategies","authors":"Haiyang Zhu, Dongmin Han, Jiacheng Pan, Yating Wei, Yingchaojie Feng, Luoxuan Weng, Ketian Mao, Yuankai Xing, Jianshu Lv, Qiucheng Wan, Wei Chen","doi":"10.1631/fitee.2300480","DOIUrl":null,"url":null,"abstract":"<p>Data imputation is an essential pre-processing task for data governance, aimed at filling in incomplete data. However, conventional data imputation methods can only partly alleviate data incompleteness using isolated tabular data, and they fail to achieve the best balance between accuracy and efficiency. In this paper, we present a novel visual analysis approach for data imputation. We develop a multi-party tabular data association strategy that uses intelligent algorithms to identify similar columns and establish column correlations across multiple tables. Then, we perform the initial imputation of incomplete data using correlated data entries from other tables. Additionally, we develop a visual analysis system to refine data imputation candidates. Our interactive system combines the multi-party data imputation approach with expert knowledge, allowing for a better understanding of the relational structure of the data. This significantly enhances the accuracy and efficiency of data imputation, thereby enhancing the quality of data governance and the intrinsic value of data assets. Experimental validation and user surveys demonstrate that this method supports users in verifying and judging the associated columns and similar rows using their domain knowledge.</p>","PeriodicalId":12608,"journal":{"name":"Frontiers of Information Technology & Electronic Engineering","volume":"14 1","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2023-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers of Information Technology & Electronic Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1631/fitee.2300480","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Data imputation is an essential pre-processing task for data governance, aimed at filling in incomplete data. However, conventional data imputation methods can only partly alleviate data incompleteness using isolated tabular data, and they fail to achieve the best balance between accuracy and efficiency. In this paper, we present a novel visual analysis approach for data imputation. We develop a multi-party tabular data association strategy that uses intelligent algorithms to identify similar columns and establish column correlations across multiple tables. Then, we perform the initial imputation of incomplete data using correlated data entries from other tables. Additionally, we develop a visual analysis system to refine data imputation candidates. Our interactive system combines the multi-party data imputation approach with expert knowledge, allowing for a better understanding of the relational structure of the data. This significantly enhances the accuracy and efficiency of data imputation, thereby enhancing the quality of data governance and the intrinsic value of data assets. Experimental validation and user surveys demonstrate that this method supports users in verifying and judging the associated columns and similar rows using their domain knowledge.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过多方表格数据关联策略进行数据估算的可视化分析方法
数据估算是数据管理的一项重要预处理任务,旨在填补不完整数据。然而,传统的数据估算方法只能利用孤立的表格数据部分缓解数据不完整的问题,无法在准确性和效率之间取得最佳平衡。在本文中,我们提出了一种新颖的数据估算可视化分析方法。我们开发了一种多方表格数据关联策略,该策略使用智能算法识别相似列,并在多个表格中建立列关联。然后,我们使用其他表中的相关数据条目对不完整数据进行初始估算。此外,我们还开发了一个可视化分析系统来完善候选数据估算。我们的交互式系统将多方数据估算方法与专家知识相结合,从而能更好地理解数据的关系结构。这大大提高了数据估算的准确性和效率,从而提高了数据管理的质量和数据资产的内在价值。实验验证和用户调查表明,该方法支持用户利用其领域知识验证和判断相关列和类似行。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Frontiers of Information Technology & Electronic Engineering
Frontiers of Information Technology & Electronic Engineering COMPUTER SCIENCE, INFORMATION SYSTEMSCOMPU-COMPUTER SCIENCE, SOFTWARE ENGINEERING
CiteScore
6.00
自引率
10.00%
发文量
1372
期刊介绍: Frontiers of Information Technology & Electronic Engineering (ISSN 2095-9184, monthly), formerly known as Journal of Zhejiang University SCIENCE C (Computers & Electronics) (2010-2014), is an international peer-reviewed journal launched by Chinese Academy of Engineering (CAE) and Zhejiang University, co-published by Springer & Zhejiang University Press. FITEE is aimed to publish the latest implementation of applications, principles, and algorithms in the broad area of Electrical and Electronic Engineering, including but not limited to Computer Science, Information Sciences, Control, Automation, Telecommunications. There are different types of articles for your choice, including research articles, review articles, science letters, perspective, new technical notes and methods, etc.
期刊最新文献
A novel overlapping minimization SMOTE algorithm for imbalanced classification A review on the developments and space applications of mid- and long-wavelength infrared detection technologies Detecting compromised accounts caused by phone number recycling on e-commerce platforms: taking Meituan as an example Flocking fragmentation formulation for a multi-robot system under multi-hop and lossy ad hoc networks Event-triggered distributed cross-dimensional formation control for heterogeneous multi-agent systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1