{"title":"Large dataset summarization with automatic parameter optimization and parallel processing for outlier detection","authors":"Zhaoyu Shou, Simin Li","doi":"10.1109/FSKD.2017.8393136","DOIUrl":null,"url":null,"abstract":"As one of the most important research problems of data analytics and data mining, outlier detection from large datasets has drawn many research attentions in recent years. In this paper, we investigate the interesting research problem of summarizing large datasets for supporting efficient local outlier detection. To summarize large datasets, efficient summarization algorithms are proposed which produce a highly compact summary of the original dataset which can be applied to detect local outliers from future similar datasets. A novel automatic parameter optimization method is proposed to produce the optimal setup of the key parameters used in the summarization algorithm. Parallel processing methods are also proposed to accelerate significantly the summarization process. The experimental evaluation results demonstrate that our proposed algorithms are highly scalable for large datasets and effective in producing dataset summary for local outlier detection.","PeriodicalId":236093,"journal":{"name":"2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FSKD.2017.8393136","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
As one of the most important research problems of data analytics and data mining, outlier detection from large datasets has drawn many research attentions in recent years. In this paper, we investigate the interesting research problem of summarizing large datasets for supporting efficient local outlier detection. To summarize large datasets, efficient summarization algorithms are proposed which produce a highly compact summary of the original dataset which can be applied to detect local outliers from future similar datasets. A novel automatic parameter optimization method is proposed to produce the optimal setup of the key parameters used in the summarization algorithm. Parallel processing methods are also proposed to accelerate significantly the summarization process. The experimental evaluation results demonstrate that our proposed algorithms are highly scalable for large datasets and effective in producing dataset summary for local outlier detection.