使用预测分析的自动化大数据质量异常校正框架

IF 2.2 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Data Pub Date : 2023-12-01 DOI:10.3390/data8120182

Widad Elouataoui, Saida El Mendili, Youssef Gahi

{"title":"使用预测分析的自动化大数据质量异常校正框架","authors":"Widad Elouataoui, Saida El Mendili, Youssef Gahi","doi":"10.3390/data8120182","DOIUrl":null,"url":null,"abstract":"Big data has emerged as a fundamental component in various domains, enabling organizations to extract valuable insights and make informed decisions. However, ensuring data quality is crucial for effectively using big data. Thus, big data quality has been gaining more attention in recent years by researchers and practitioners due to its significant impact on decision-making processes. However, existing studies addressing data quality anomalies often have a limited scope, concentrating on specific aspects such as outliers or inconsistencies. Moreover, many approaches are context-specific, lacking a generic solution applicable across different domains. To the best of our knowledge, no existing framework currently automatically addresses quality anomalies comprehensively and generically, considering all aspects of data quality. To fill the gaps in the field, we propose a sophisticated framework that automatically corrects big data quality anomalies using an intelligent predictive model. The proposed framework comprehensively addresses the main aspects of data quality by considering six key quality dimensions: Accuracy, Completeness, Conformity, Uniqueness, Consistency, and Readability. Moreover, the framework is not correlated to a specific field and is designed to be applicable across various areas, offering a generic approach to address data quality anomalies. The proposed framework was implemented on two datasets and has achieved an accuracy of 98.22%. Moreover, the results have shown that the framework has allowed the data quality to be boosted to a great score, reaching 99%, with an improvement rate of up to 14.76% of the quality score.","PeriodicalId":36824,"journal":{"name":"Data","volume":"317 4","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Automated Big Data Quality Anomaly Correction Framework Using Predictive Analysis\",\"authors\":\"Widad Elouataoui, Saida El Mendili, Youssef Gahi\",\"doi\":\"10.3390/data8120182\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big data has emerged as a fundamental component in various domains, enabling organizations to extract valuable insights and make informed decisions. However, ensuring data quality is crucial for effectively using big data. Thus, big data quality has been gaining more attention in recent years by researchers and practitioners due to its significant impact on decision-making processes. However, existing studies addressing data quality anomalies often have a limited scope, concentrating on specific aspects such as outliers or inconsistencies. Moreover, many approaches are context-specific, lacking a generic solution applicable across different domains. To the best of our knowledge, no existing framework currently automatically addresses quality anomalies comprehensively and generically, considering all aspects of data quality. To fill the gaps in the field, we propose a sophisticated framework that automatically corrects big data quality anomalies using an intelligent predictive model. The proposed framework comprehensively addresses the main aspects of data quality by considering six key quality dimensions: Accuracy, Completeness, Conformity, Uniqueness, Consistency, and Readability. Moreover, the framework is not correlated to a specific field and is designed to be applicable across various areas, offering a generic approach to address data quality anomalies. The proposed framework was implemented on two datasets and has achieved an accuracy of 98.22%. Moreover, the results have shown that the framework has allowed the data quality to be boosted to a great score, reaching 99%, with an improvement rate of up to 14.76% of the quality score.\",\"PeriodicalId\":36824,\"journal\":{\"name\":\"Data\",\"volume\":\"317 4\",\"pages\":\"\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2023-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data\",\"FirstCategoryId\":\"90\",\"ListUrlMain\":\"https://doi.org/10.3390/data8120182\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.3390/data8120182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

大数据已经成为各个领域的基本组成部分，使组织能够提取有价值的见解并做出明智的决策。然而，确保数据质量对于有效利用大数据至关重要。因此，大数据质量由于其对决策过程的重大影响，近年来越来越受到研究人员和实践者的关注。然而，解决数据质量异常的现有研究通常范围有限，集中在特定方面，如异常值或不一致性。此外，许多方法是特定于上下文的，缺乏适用于不同领域的通用解决方案。据我们所知，目前还没有一个现有的框架能够全面、通用地自动处理质量异常，并考虑到数据质量的所有方面。为了填补该领域的空白，我们提出了一个复杂的框架，该框架使用智能预测模型自动纠正大数据质量异常。该框架通过考虑六个关键质量维度:准确性、完整性、一致性、唯一性、一致性和可读性，全面解决了数据质量的主要方面。此外，该框架不与特定领域相关，并且被设计为适用于各个领域，提供了解决数据质量异常的通用方法。该框架在两个数据集上实现，准确率达到98.22%。此外，结果表明，该框架可以将数据质量提升到一个很大的分数，达到99%，质量分数的改进率高达14.76%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An Automated Big Data Quality Anomaly Correction Framework Using Predictive Analysis

Big data has emerged as a fundamental component in various domains, enabling organizations to extract valuable insights and make informed decisions. However, ensuring data quality is crucial for effectively using big data. Thus, big data quality has been gaining more attention in recent years by researchers and practitioners due to its significant impact on decision-making processes. However, existing studies addressing data quality anomalies often have a limited scope, concentrating on specific aspects such as outliers or inconsistencies. Moreover, many approaches are context-specific, lacking a generic solution applicable across different domains. To the best of our knowledge, no existing framework currently automatically addresses quality anomalies comprehensively and generically, considering all aspects of data quality. To fill the gaps in the field, we propose a sophisticated framework that automatically corrects big data quality anomalies using an intelligent predictive model. The proposed framework comprehensively addresses the main aspects of data quality by considering six key quality dimensions: Accuracy, Completeness, Conformity, Uniqueness, Consistency, and Readability. Moreover, the framework is not correlated to a specific field and is designed to be applicable across various areas, offering a generic approach to address data quality anomalies. The proposed framework was implemented on two datasets and has achieved an accuracy of 98.22%. Moreover, the results have shown that the framework has allowed the data quality to be boosted to a great score, reaching 99%, with an improvement rate of up to 14.76% of the quality score.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊