Pub Date : 2019-02-01DOI: 10.1109/ICCIDS.2019.8862136
Anuja Prasad, L. Mary
This paper presents a comparative study of different features for vehicle classification. Real-time vehicle classification system using computer vision is relatively cheaper and easy to install. As traffic is heterogeneous in India, road planning and traffic management is challenging. So an automated vehicle detection and classification system is useful for traffic survey, planning, signal time optimization and surveillance. In this work, traffic video data is collected using a camera placed on the top of a vehicle parking on the side of a road at an angle of approximately 45°. Both audio and video are used for vehicle detection. The presence of a vehicle is detected from frames corresponding to the peaks in the short time energy of audio. The process of adaptive background subtraction is performed on the selected frames to separate the vehicle from the background. After background subtraction, morphological processes such as erosion, dilation and closing are applied to get the region of interest. There may be mulitiple frames with the same vehicle are detected at this stage. To reduce the multiple occurrences of the same vehicle in selected frames, Speeded-Up Robust Feature (SURF) matching algorithm is used. Different features like Histogram Oriented Gradient (HOG), Local Binary Pattern (LBP), KAZE, Binary Robust Invariant Scale Keypoint (BRISK) features of selected frames are extracted and Support Vector Machine (SVM) models are developed. Vehicle classification accuracy of various features are compared using a 20 minutes traffic video. It is observed that HOG gives the best result compared to KAZE, LBP and BRISK, with an accuracy of 85.50%.
本文对车辆分类的不同特征进行了比较研究。使用计算机视觉的实时车辆分类系统相对便宜且易于安装。由于印度的交通是异构的,道路规划和交通管理是具有挑战性的。因此,车辆自动检测与分类系统对交通调查、规划、信号时间优化和监控具有重要意义。在这项工作中,交通视频数据是通过一个摄像头收集的,摄像头以大约45°的角度放置在路边停车的车辆顶部。音频和视频都用于车辆检测。从音频短时间能量峰值对应的帧中检测车辆的存在。对选取的帧进行自适应背景减法处理,将车辆从背景中分离出来。背景减除后,利用侵蚀、扩张和闭合等形态学过程得到感兴趣的区域。在此阶段可能会检测到同一车辆的多个帧。为了减少同一车辆在选定帧中多次出现,采用了SURF (accelerated - up Robust Feature)匹配算法。提取所选帧的直方图导向梯度(HOG)、局部二值模式(LBP)、KAZE、二值鲁棒不变尺度关键点(BRISK)等特征,并建立支持向量机(SVM)模型。利用一段20分钟的交通视频,比较了各种特征的车辆分类准确率。结果表明,与KAZE、LBP和BRISK相比,HOG的准确率为85.50%。
{"title":"A Comparative Study of Different Features for Vehicle Classification","authors":"Anuja Prasad, L. Mary","doi":"10.1109/ICCIDS.2019.8862136","DOIUrl":"https://doi.org/10.1109/ICCIDS.2019.8862136","url":null,"abstract":"This paper presents a comparative study of different features for vehicle classification. Real-time vehicle classification system using computer vision is relatively cheaper and easy to install. As traffic is heterogeneous in India, road planning and traffic management is challenging. So an automated vehicle detection and classification system is useful for traffic survey, planning, signal time optimization and surveillance. In this work, traffic video data is collected using a camera placed on the top of a vehicle parking on the side of a road at an angle of approximately 45°. Both audio and video are used for vehicle detection. The presence of a vehicle is detected from frames corresponding to the peaks in the short time energy of audio. The process of adaptive background subtraction is performed on the selected frames to separate the vehicle from the background. After background subtraction, morphological processes such as erosion, dilation and closing are applied to get the region of interest. There may be mulitiple frames with the same vehicle are detected at this stage. To reduce the multiple occurrences of the same vehicle in selected frames, Speeded-Up Robust Feature (SURF) matching algorithm is used. Different features like Histogram Oriented Gradient (HOG), Local Binary Pattern (LBP), KAZE, Binary Robust Invariant Scale Keypoint (BRISK) features of selected frames are extracted and Support Vector Machine (SVM) models are developed. Vehicle classification accuracy of various features are compared using a 20 minutes traffic video. It is observed that HOG gives the best result compared to KAZE, LBP and BRISK, with an accuracy of 85.50%.","PeriodicalId":196915,"journal":{"name":"2019 International Conference on Computational Intelligence in Data Science (ICCIDS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131211009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-01DOI: 10.1109/ICCIDS.2019.8862119
A. M. V. Bharathy, N. Umapathi, S. Prabaharan
Intrusion detection system is described as a data monitoring, network activity study and data on possible vulnerabilities and attacks in advance. One of the main limitations of the present intrusion detection technology is the need to take out fake alarms so that the user can confound with the data. This paper deals with the different types of IDS their behaviour, response time and other important factors. This paper also demonstrates and brings out the advantages and disadvantages of six latest intrusion detection techniques and gives a clear picture of the recent advancements available in the field of IDS based on the factors detection rate, accuracy, average running time and false alarm rate.
{"title":"An Elaborate Comprehensive Survey on Recent Developments in Behaviour Based Intrusion Detection Systems","authors":"A. M. V. Bharathy, N. Umapathi, S. Prabaharan","doi":"10.1109/ICCIDS.2019.8862119","DOIUrl":"https://doi.org/10.1109/ICCIDS.2019.8862119","url":null,"abstract":"Intrusion detection system is described as a data monitoring, network activity study and data on possible vulnerabilities and attacks in advance. One of the main limitations of the present intrusion detection technology is the need to take out fake alarms so that the user can confound with the data. This paper deals with the different types of IDS their behaviour, response time and other important factors. This paper also demonstrates and brings out the advantages and disadvantages of six latest intrusion detection techniques and gives a clear picture of the recent advancements available in the field of IDS based on the factors detection rate, accuracy, average running time and false alarm rate.","PeriodicalId":196915,"journal":{"name":"2019 International Conference on Computational Intelligence in Data Science (ICCIDS)","volume":"279 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122938405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-01DOI: 10.1109/iccids.2019.8862106
{"title":"ICCIDS 2019 Author Index","authors":"","doi":"10.1109/iccids.2019.8862106","DOIUrl":"https://doi.org/10.1109/iccids.2019.8862106","url":null,"abstract":"","PeriodicalId":196915,"journal":{"name":"2019 International Conference on Computational Intelligence in Data Science (ICCIDS)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124647580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-01DOI: 10.1109/ICCIDS.2019.8862080
M. Uma, V. Sneha, G. Sneha, J. Bhuvana, B. Bharathi
Today, everyone has their own personal devices that connects to the internet. Every user tries to get the information that they require through internet. Most of the information is in the form of a database. A user who wants to access a database but having limited or no knowledge of database languages faces a challenging and difficult situation. Hence, there is a need for a system that enables the users to access the information in the database. This paper aims to develop such a system using NLP by giving structured natural language question as input and receiving SQL query as the output, to access the related information from the railways reservation database with ease. The steps involved in this process are tokenization, lemmatization, parts of speech tagging, parsing and mapping. The dataset used for the proposed system has a set of 2880 structured natural language queries on train fare and seats available. We have achieved 98.89 per cent accuracy. The paper would give an overall view of the usage of Natural Language Processing (NLP) and use of regular expressions to map the query in English language to SQL.
{"title":"Formation of SQL from Natural Language Query using NLP","authors":"M. Uma, V. Sneha, G. Sneha, J. Bhuvana, B. Bharathi","doi":"10.1109/ICCIDS.2019.8862080","DOIUrl":"https://doi.org/10.1109/ICCIDS.2019.8862080","url":null,"abstract":"Today, everyone has their own personal devices that connects to the internet. Every user tries to get the information that they require through internet. Most of the information is in the form of a database. A user who wants to access a database but having limited or no knowledge of database languages faces a challenging and difficult situation. Hence, there is a need for a system that enables the users to access the information in the database. This paper aims to develop such a system using NLP by giving structured natural language question as input and receiving SQL query as the output, to access the related information from the railways reservation database with ease. The steps involved in this process are tokenization, lemmatization, parts of speech tagging, parsing and mapping. The dataset used for the proposed system has a set of 2880 structured natural language queries on train fare and seats available. We have achieved 98.89 per cent accuracy. The paper would give an overall view of the usage of Natural Language Processing (NLP) and use of regular expressions to map the query in English language to SQL.","PeriodicalId":196915,"journal":{"name":"2019 International Conference on Computational Intelligence in Data Science (ICCIDS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130158743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-01DOI: 10.1109/ICCIDS.2019.8862117
Simran Barnwal, Wei-Jan Peng
Most RSS based indoor localization algorithms require the a priori knowledge of location of Access Points, timewise variation of location of user, and use of multiple sensor data. The paper proposes an innovative approach combining the Crowdsensing based wireless indoor localization technology with Artificial Neural Networks, to automatically predict new users location and analyze the effect of device heterogeneity on the RSS localization accuracy, by using cell phone user data. The performance evaluation demonstrates that the trained MLP Regression model can obtain the highest localization accuracy than the probabilistic localization algorithms, without individual model for each device in the fingerprinting database. In contrast with existing systems proposed in the literature, the result shows that our proposed approach efficiently handles very large number of Access Points in 10 times larger indoor spaces.
{"title":"Crowdsensing-based WiFi Indoor Localization using Feed-forward Multilayer Perceptron Regressor","authors":"Simran Barnwal, Wei-Jan Peng","doi":"10.1109/ICCIDS.2019.8862117","DOIUrl":"https://doi.org/10.1109/ICCIDS.2019.8862117","url":null,"abstract":"Most RSS based indoor localization algorithms require the a priori knowledge of location of Access Points, timewise variation of location of user, and use of multiple sensor data. The paper proposes an innovative approach combining the Crowdsensing based wireless indoor localization technology with Artificial Neural Networks, to automatically predict new users location and analyze the effect of device heterogeneity on the RSS localization accuracy, by using cell phone user data. The performance evaluation demonstrates that the trained MLP Regression model can obtain the highest localization accuracy than the probabilistic localization algorithms, without individual model for each device in the fingerprinting database. In contrast with existing systems proposed in the literature, the result shows that our proposed approach efficiently handles very large number of Access Points in 10 times larger indoor spaces.","PeriodicalId":196915,"journal":{"name":"2019 International Conference on Computational Intelligence in Data Science (ICCIDS)","volume":"127 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130229211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-01DOI: 10.1109/ICCIDS.2019.8862084
S. M. Jaisakthi, P. Mirunalini, D. Thenmozhi, Vatsala
Having diseases is quite natural in crops due to changing climatic and environmental conditions. Diseases affect the growth and produce of the crops and often difficult to control. To ensure good quality and high production, it is necessary to have accurate disease diagnosis and control actions to prevent them in time. Grape which is widely grown crop in India and it may be affected by different types of diseases on leaf, stem and fruit. Leaf diseases which are the early symptoms caused due to fungi, bacteria and virus. So, there is a need to have an automatic system that can be used to detect the type of diseases and to take appropriate actions. We have proposed an automatic system for detecting the diseases in the grape vines using image processing and machine learning technique. The system segments the leaf (Region of Interest) from the background image using grab cut segmentation method. From the segmented leaf part the diseased region is fruther segmented based on two different methods such as global thresholding and using semi-supervised technique. The features are extracted from the segmented diseased part and it has been classified as healthy, rot, esca, and leaf blight using different machine learning techniques such as Support Vector Machine (SVM), adaboost and Random Forest tree. Using SVM we have obtained a better testing accuracy of 93%.
{"title":"Grape Leaf Disease Identification using Machine Learning Techniques","authors":"S. M. Jaisakthi, P. Mirunalini, D. Thenmozhi, Vatsala","doi":"10.1109/ICCIDS.2019.8862084","DOIUrl":"https://doi.org/10.1109/ICCIDS.2019.8862084","url":null,"abstract":"Having diseases is quite natural in crops due to changing climatic and environmental conditions. Diseases affect the growth and produce of the crops and often difficult to control. To ensure good quality and high production, it is necessary to have accurate disease diagnosis and control actions to prevent them in time. Grape which is widely grown crop in India and it may be affected by different types of diseases on leaf, stem and fruit. Leaf diseases which are the early symptoms caused due to fungi, bacteria and virus. So, there is a need to have an automatic system that can be used to detect the type of diseases and to take appropriate actions. We have proposed an automatic system for detecting the diseases in the grape vines using image processing and machine learning technique. The system segments the leaf (Region of Interest) from the background image using grab cut segmentation method. From the segmented leaf part the diseased region is fruther segmented based on two different methods such as global thresholding and using semi-supervised technique. The features are extracted from the segmented diseased part and it has been classified as healthy, rot, esca, and leaf blight using different machine learning techniques such as Support Vector Machine (SVM), adaboost and Random Forest tree. Using SVM we have obtained a better testing accuracy of 93%.","PeriodicalId":196915,"journal":{"name":"2019 International Conference on Computational Intelligence in Data Science (ICCIDS)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121965465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-01DOI: 10.1109/ICCIDS.2019.8862102
G. Abinaya, Gyan Ranjan, P. Aswin Karthik
In this paper, we propose a novel framework that enables a machine learning model to constantly learn over a period of time and hence improve the performance with time and more data. We have compared the performance of different models which were trained only on the actual data against models trained with the data aided by the feedback collected by the automated framework.
{"title":"Continuous learning mechanism of NLU-ML models boosted by human feedback","authors":"G. Abinaya, Gyan Ranjan, P. Aswin Karthik","doi":"10.1109/ICCIDS.2019.8862102","DOIUrl":"https://doi.org/10.1109/ICCIDS.2019.8862102","url":null,"abstract":"In this paper, we propose a novel framework that enables a machine learning model to constantly learn over a period of time and hence improve the performance with time and more data. We have compared the performance of different models which were trained only on the actual data against models trained with the data aided by the feedback collected by the automated framework.","PeriodicalId":196915,"journal":{"name":"2019 International Conference on Computational Intelligence in Data Science (ICCIDS)","volume":"344 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134200779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-01DOI: 10.1109/ICCIDS.2019.8862033
N. Suchetha, Anupama Nikhil, P. Hrudya
The application of machine learning algorithms on text data is challenging in several ways, the greatest being the presence of sparse, high dimensional feature set. Feature selection methods are effective in reducing the dimensionality of the data and helps in improving the computational efficiency and the performance of the learned model. Recently, evolutionary computation (EC) methods have shown success in solving the feature selection problem. However, due to the requirement of a large number of evaluations, EC based feature selection methods on text data are computationally expensive. This paper examines the different evaluation classifiers used for EC based wrapper feature selection methods. A two-stage feature selection method is applied to twitter data for sentiment classification. In the first stage, a filter feature selection method based on Information Gain (IG) is applied. During the second stage, a comparison is made between 4 different EC feature selection methods, Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), Cuckoo Search (CS) and Firefly Search, with different classifiers as subset evaluators. LibLinear, K Nearest neighbours (KNN) and Naive Bayes (NB) are the classifiers used for wrapper feature subset evaluation. Also, the time required for evaluating the feature subset for the chosen classifiers is computed. Finally, the effect of the application of this combined feature selection approach is evaluated using six different learners. Results demonstrate that LibLinear is computationally efficient and achieves the best performance.
机器学习算法在文本数据上的应用在几个方面都具有挑战性,最大的挑战是存在稀疏的高维特征集。特征选择方法可以有效地降低数据的维数,有助于提高计算效率和学习模型的性能。近年来,进化计算(EC)方法在解决特征选择问题上取得了成功。然而,由于需要进行大量的评估,基于EC的文本数据特征选择方法的计算成本很高。本文研究了用于基于EC的包装器特征选择方法的不同评估分类器。将两阶段特征选择方法应用于twitter数据的情感分类。第一阶段采用基于信息增益(Information Gain, IG)的滤波器特征选择方法。在第二阶段,比较了粒子群优化(PSO)、蚁群优化(ACO)、布谷鸟搜索(CS)和萤火虫搜索(Firefly Search) 4种不同的EC特征选择方法,采用不同的分类器作为子集评估器。LibLinear, K近邻(KNN)和朴素贝叶斯(NB)是用于包装器特征子集评估的分类器。此外,还计算了评估所选分类器的特征子集所需的时间。最后,使用六种不同的学习器来评估这种组合特征选择方法的应用效果。结果表明,LibLinear算法计算效率高,达到了最佳性能。
{"title":"Comparing the Wrapper Feature Selection Evaluators on Twitter Sentiment Classification","authors":"N. Suchetha, Anupama Nikhil, P. Hrudya","doi":"10.1109/ICCIDS.2019.8862033","DOIUrl":"https://doi.org/10.1109/ICCIDS.2019.8862033","url":null,"abstract":"The application of machine learning algorithms on text data is challenging in several ways, the greatest being the presence of sparse, high dimensional feature set. Feature selection methods are effective in reducing the dimensionality of the data and helps in improving the computational efficiency and the performance of the learned model. Recently, evolutionary computation (EC) methods have shown success in solving the feature selection problem. However, due to the requirement of a large number of evaluations, EC based feature selection methods on text data are computationally expensive. This paper examines the different evaluation classifiers used for EC based wrapper feature selection methods. A two-stage feature selection method is applied to twitter data for sentiment classification. In the first stage, a filter feature selection method based on Information Gain (IG) is applied. During the second stage, a comparison is made between 4 different EC feature selection methods, Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), Cuckoo Search (CS) and Firefly Search, with different classifiers as subset evaluators. LibLinear, K Nearest neighbours (KNN) and Naive Bayes (NB) are the classifiers used for wrapper feature subset evaluation. Also, the time required for evaluating the feature subset for the chosen classifiers is computed. Finally, the effect of the application of this combined feature selection approach is evaluated using six different learners. Results demonstrate that LibLinear is computationally efficient and achieves the best performance.","PeriodicalId":196915,"journal":{"name":"2019 International Conference on Computational Intelligence in Data Science (ICCIDS)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124423635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-01DOI: 10.1109/iccids.2019.8862086
{"title":"ICCIDS 2019 Photos","authors":"","doi":"10.1109/iccids.2019.8862086","DOIUrl":"https://doi.org/10.1109/iccids.2019.8862086","url":null,"abstract":"","PeriodicalId":196915,"journal":{"name":"2019 International Conference on Computational Intelligence in Data Science (ICCIDS)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115482069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-01DOI: 10.1109/ICCIDS.2019.8862112
Rishabh Rustogi, Ayush Prasad
Continuous expansion in the fields of science and technology has led to the immense availability and attainability of data in every field. Fundamentally understanding and analyzing this data is a critical job in the decision-making process. Although, great success has been achieved by the prevailing data engineering and mining techniques, the problem of swift classification of the imbalanced data still exists in academia and industry. A potential solution to the problem of skewness in data can be resolved by data upsampling or downsampling. There exists a few techniques that firstly remove skewness and then perform classification, however, these methods suffer from hurdles like abortive precision or slower learning rate. In this paper, a hybrid method to classify binary imbalanced data using Synthetic Minority Over-sampling Technique followed by Extreme Learning Machine is proposed. Our method along with swift learning rate is efficacious to predict the desired class. We verified our model using five standard imbalance dataset and obtained higher F-measure, G-mean and ROC score for all the dataset.
{"title":"Swift Imbalance Data Classification using SMOTE and Extreme Learning Machine","authors":"Rishabh Rustogi, Ayush Prasad","doi":"10.1109/ICCIDS.2019.8862112","DOIUrl":"https://doi.org/10.1109/ICCIDS.2019.8862112","url":null,"abstract":"Continuous expansion in the fields of science and technology has led to the immense availability and attainability of data in every field. Fundamentally understanding and analyzing this data is a critical job in the decision-making process. Although, great success has been achieved by the prevailing data engineering and mining techniques, the problem of swift classification of the imbalanced data still exists in academia and industry. A potential solution to the problem of skewness in data can be resolved by data upsampling or downsampling. There exists a few techniques that firstly remove skewness and then perform classification, however, these methods suffer from hurdles like abortive precision or slower learning rate. In this paper, a hybrid method to classify binary imbalanced data using Synthetic Minority Over-sampling Technique followed by Extreme Learning Machine is proposed. Our method along with swift learning rate is efficacious to predict the desired class. We verified our model using five standard imbalance dataset and obtained higher F-measure, G-mean and ROC score for all the dataset.","PeriodicalId":196915,"journal":{"name":"2019 International Conference on Computational Intelligence in Data Science (ICCIDS)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123712949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}