{"title":"A machine learning adaptive approach to remove impurities over Bigdata","authors":"Akash Devgun","doi":"10.1109/ICECCE.2014.7086616","DOIUrl":null,"url":null,"abstract":"A Bigdata is the vast information storage collected from various locations and sources. Bigdata is defined as centralized repository with a standard structural specification. But the information driven from various sources are not always appropriate for this structure. This kind of information suffers from number of associated impurities. These impurities include incompleteness, duplicate information, lack of association between dataset attributes etc. To represent this information in organized and structured form, there is the requirement of some algorithmic approach that can identify these impurities and accept the validated data. In this present work, a two stage mode is defined under machine learning approach to transformed unstructured data to structured form. In first stage of this model, a fuzzy based model is defined to analyze this user data. The analysis is performed here under the impurity type analysis and the association analysis. The fuzzy rule is implied here to identify the degree of impurity and the associativity. Once the analysis is performed, the final stage of work is the transformation approach. During this stage, the transformation of this unstructured data to structured data is performed. An ontology driven work is defined to define such mapping. The mapping is here performed under the domain constructs and the data constructs. The work is implemented in java environment. The obtained results from system shows the reliable and robust information mapping so that the effective information tracking over the dataset is obtained.","PeriodicalId":223751,"journal":{"name":"2014 International Conference on Electronics, Communication and Computational Engineering (ICECCE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Electronics, Communication and Computational Engineering (ICECCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECCE.2014.7086616","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A Bigdata is the vast information storage collected from various locations and sources. Bigdata is defined as centralized repository with a standard structural specification. But the information driven from various sources are not always appropriate for this structure. This kind of information suffers from number of associated impurities. These impurities include incompleteness, duplicate information, lack of association between dataset attributes etc. To represent this information in organized and structured form, there is the requirement of some algorithmic approach that can identify these impurities and accept the validated data. In this present work, a two stage mode is defined under machine learning approach to transformed unstructured data to structured form. In first stage of this model, a fuzzy based model is defined to analyze this user data. The analysis is performed here under the impurity type analysis and the association analysis. The fuzzy rule is implied here to identify the degree of impurity and the associativity. Once the analysis is performed, the final stage of work is the transformation approach. During this stage, the transformation of this unstructured data to structured data is performed. An ontology driven work is defined to define such mapping. The mapping is here performed under the domain constructs and the data constructs. The work is implemented in java environment. The obtained results from system shows the reliable and robust information mapping so that the effective information tracking over the dataset is obtained.