{"title":"在Azure ML上使用支持向量机和逻辑回归挖掘进化社区的离群值分类","authors":"Shahi Dost, Sajid Anwer, Faryal Saud, Maham Shabbir","doi":"10.1109/C-CODE.2017.7918931","DOIUrl":null,"url":null,"abstract":"Evolutionary Community Outliers (ECO) is a type of outlier group in which the average evolutionary behavior of certain objects is different from the same community by some measures. The detection, classification and removal of ECO are a very important and challenging task of data cleaning and preprocessing. ECO is different from the old outlier detection technique, which detect objects that have fairly diverse nature as compared with the community objects. Different outlier detection and removal studies have been presented in the last few years, but there are some limitations due to the individual process at a time. The proposed research approach uses Support Vector Machine (SVM) and Logistic Regression (LR) at the same time to accurately classify, detect and remove the outliers form the Evolutionary community dataset, which is a new technique for outlier detection and removal. We have used Azure Machine Learning (ML) which is the state of the art data processing tool to test our proposed technique on the Forest Fire data of Southern Algarve Portugal to show the result, which gives us very good results on the given data. The accuracy of these outliers' detection and removal is 73–93 percent varying due to the data acquisition process. Proposed technique shows remarkable results in detecting diverse ECO on both real time and temporal datasets.","PeriodicalId":344222,"journal":{"name":"2017 International Conference on Communication, Computing and Digital Systems (C-CODE)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Outliers classification for mining evolutionary community using Support Vector Machine and Logistic Regression on Azure ML\",\"authors\":\"Shahi Dost, Sajid Anwer, Faryal Saud, Maham Shabbir\",\"doi\":\"10.1109/C-CODE.2017.7918931\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Evolutionary Community Outliers (ECO) is a type of outlier group in which the average evolutionary behavior of certain objects is different from the same community by some measures. The detection, classification and removal of ECO are a very important and challenging task of data cleaning and preprocessing. ECO is different from the old outlier detection technique, which detect objects that have fairly diverse nature as compared with the community objects. Different outlier detection and removal studies have been presented in the last few years, but there are some limitations due to the individual process at a time. The proposed research approach uses Support Vector Machine (SVM) and Logistic Regression (LR) at the same time to accurately classify, detect and remove the outliers form the Evolutionary community dataset, which is a new technique for outlier detection and removal. We have used Azure Machine Learning (ML) which is the state of the art data processing tool to test our proposed technique on the Forest Fire data of Southern Algarve Portugal to show the result, which gives us very good results on the given data. The accuracy of these outliers' detection and removal is 73–93 percent varying due to the data acquisition process. Proposed technique shows remarkable results in detecting diverse ECO on both real time and temporal datasets.\",\"PeriodicalId\":344222,\"journal\":{\"name\":\"2017 International Conference on Communication, Computing and Digital Systems (C-CODE)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on Communication, Computing and Digital Systems (C-CODE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/C-CODE.2017.7918931\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Communication, Computing and Digital Systems (C-CODE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/C-CODE.2017.7918931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
进化群落异类(Evolutionary Community Outliers, ECO)是指某些对象的平均进化行为在一定程度上不同于同一群落的一种异常群。ECO的检测、分类和去除是数据清洗和预处理中非常重要和具有挑战性的任务。ECO不同于旧的离群点检测技术,旧的离群点检测技术检测的对象与社区对象相比具有相当多样化的性质。在过去的几年中,已经提出了不同的异常值检测和去除研究,但由于单个过程的限制,存在一些局限性。该方法采用支持向量机(SVM)和Logistic回归(LR)相结合的方法对进化群落数据集中的异常点进行准确分类、检测和去除,是一种新的异常点检测和去除技术。我们使用Azure机器学习(ML),这是最先进的数据处理工具,在葡萄牙阿尔加维南部的森林火灾数据上测试我们提出的技术,以显示结果,这给了我们非常好的结果。由于数据采集过程的不同,这些异常值的检测和去除的准确性为73 - 93%。该技术在实时和时间数据集上检测不同的ECO都取得了显著的效果。
Outliers classification for mining evolutionary community using Support Vector Machine and Logistic Regression on Azure ML
Evolutionary Community Outliers (ECO) is a type of outlier group in which the average evolutionary behavior of certain objects is different from the same community by some measures. The detection, classification and removal of ECO are a very important and challenging task of data cleaning and preprocessing. ECO is different from the old outlier detection technique, which detect objects that have fairly diverse nature as compared with the community objects. Different outlier detection and removal studies have been presented in the last few years, but there are some limitations due to the individual process at a time. The proposed research approach uses Support Vector Machine (SVM) and Logistic Regression (LR) at the same time to accurately classify, detect and remove the outliers form the Evolutionary community dataset, which is a new technique for outlier detection and removal. We have used Azure Machine Learning (ML) which is the state of the art data processing tool to test our proposed technique on the Forest Fire data of Southern Algarve Portugal to show the result, which gives us very good results on the given data. The accuracy of these outliers' detection and removal is 73–93 percent varying due to the data acquisition process. Proposed technique shows remarkable results in detecting diverse ECO on both real time and temporal datasets.