R. Krishnamoorthy, S. S. Kumar, Basavaraj Neelagund
{"title":"A new approach for data cleaning process","authors":"R. Krishnamoorthy, S. S. Kumar, Basavaraj Neelagund","doi":"10.1109/ICRAIE.2014.6909249","DOIUrl":null,"url":null,"abstract":"In this paper, we introduced a new approach called Effective Data Cleaning (EDC) is presented. The proposed EDC technique is aimed to identify the relevant and irrelevant instance from the large data set through the degree of the missing value, and it reconstructs the missed value in relevant instance through its closest instance within the instance set. The EDC technique is consist of two methods Identify Relevant Instance (IRI) and Reconstruct Missing Value (RMV). The IRI method is identifying the relevant and irrelevant instance belongs to the large instance set through the degree of the missing value of each instance in the instance set, and the RMV method can reconstruct the missing value in the relevant instance through its closest instance based on the distance metric. Experiment result shows, that the proposed EDC technique is simple and effective for identifying the relevant and irrelevant instance, and reconstruct the missing values in the relevant instance through the closest instance with higher similarity.","PeriodicalId":355706,"journal":{"name":"International Conference on Recent Advances and Innovations in Engineering (ICRAIE-2014)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Recent Advances and Innovations in Engineering (ICRAIE-2014)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRAIE.2014.6909249","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In this paper, we introduced a new approach called Effective Data Cleaning (EDC) is presented. The proposed EDC technique is aimed to identify the relevant and irrelevant instance from the large data set through the degree of the missing value, and it reconstructs the missed value in relevant instance through its closest instance within the instance set. The EDC technique is consist of two methods Identify Relevant Instance (IRI) and Reconstruct Missing Value (RMV). The IRI method is identifying the relevant and irrelevant instance belongs to the large instance set through the degree of the missing value of each instance in the instance set, and the RMV method can reconstruct the missing value in the relevant instance through its closest instance based on the distance metric. Experiment result shows, that the proposed EDC technique is simple and effective for identifying the relevant and irrelevant instance, and reconstruct the missing values in the relevant instance through the closest instance with higher similarity.