{"title":"Fixing rules for data cleaning based on conditional functional dependency","authors":"Rashed Salem, Asmaa Abdo","doi":"10.1016/j.fcij.2017.03.002","DOIUrl":null,"url":null,"abstract":"<div><p>Most existing databases suffer from data inconsistencies. Enhancing data quality efforts are necessary to resolve this issue. In this paper, two techniques are proposed for mining accurate conditional functional dependencies rules from such databases to be employed for data cleaning. The idea of the proposed techniques is to mine firstly maximal closed frequent patterns, then mine the dependable conditional functional dependencies rules with the help of lift measure. Moreover, data repairing algorithm is proposed for fixing inconsistent tuples found in the database exploiting the generated rules. An extensive experimental is conducted study to confirm the effectiveness of the proposed techniques compared with existing technique on both real-life and synthetic medical data sets.</p></div>","PeriodicalId":100561,"journal":{"name":"Future Computing and Informatics Journal","volume":"1 1","pages":"Pages 10-26"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.fcij.2017.03.002","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Computing and Informatics Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2314728817300041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21
Abstract
Most existing databases suffer from data inconsistencies. Enhancing data quality efforts are necessary to resolve this issue. In this paper, two techniques are proposed for mining accurate conditional functional dependencies rules from such databases to be employed for data cleaning. The idea of the proposed techniques is to mine firstly maximal closed frequent patterns, then mine the dependable conditional functional dependencies rules with the help of lift measure. Moreover, data repairing algorithm is proposed for fixing inconsistent tuples found in the database exploiting the generated rules. An extensive experimental is conducted study to confirm the effectiveness of the proposed techniques compared with existing technique on both real-life and synthetic medical data sets.