Pub Date : 2022-07-01DOI: 10.1177/15501329221105159
Majdi Maabreh, A. Maabreh, Basheer Qolomany, A. Al-Fuqaha
Despite the encouraging outcomes of machine learning and artificial intelligence applications, the safety of artificial intelligence–based systems is one of the most severe challenges that need further exploration. Data set poisoning is a severe problem that may lead to the corruption of machine learning models. The attacker injects data into the data set that are faulty or mislabeled by flipping the actual labels into the incorrect ones. The word “robustness” refers to a machine learning algorithm’s ability to cope with hostile situations. Here, instead of flipping the labels randomly, we use the clustering approach to choose the training samples for label changes to influence the classifiers’ performance and the distance-based anomaly detection capacity in quarantining the poisoned samples. According to our experiments on a benchmark data set, random label flipping may have a short-term negative impact on the classifier’s accuracy. Yet, an anomaly filter would discover on average 63% of them. On the contrary, the proposed clustering-based flipping might inject dormant poisoned samples until the number of poisoned samples is enough to influence the classifiers’ performance severely; on average, the same anomaly filter would discover 25% of them. We also highlight important lessons and observations during this experiment about the performance and robustness of popular multiclass learners against training data set–poisoning attacks that include: trade-offs, complexity, categories, poisoning resistance, and hyperparameter optimization.
{"title":"The robustness of popular multiclass machine learning models against poisoning attacks: Lessons and insights","authors":"Majdi Maabreh, A. Maabreh, Basheer Qolomany, A. Al-Fuqaha","doi":"10.1177/15501329221105159","DOIUrl":"https://doi.org/10.1177/15501329221105159","url":null,"abstract":"Despite the encouraging outcomes of machine learning and artificial intelligence applications, the safety of artificial intelligence–based systems is one of the most severe challenges that need further exploration. Data set poisoning is a severe problem that may lead to the corruption of machine learning models. The attacker injects data into the data set that are faulty or mislabeled by flipping the actual labels into the incorrect ones. The word “robustness” refers to a machine learning algorithm’s ability to cope with hostile situations. Here, instead of flipping the labels randomly, we use the clustering approach to choose the training samples for label changes to influence the classifiers’ performance and the distance-based anomaly detection capacity in quarantining the poisoned samples. According to our experiments on a benchmark data set, random label flipping may have a short-term negative impact on the classifier’s accuracy. Yet, an anomaly filter would discover on average 63% of them. On the contrary, the proposed clustering-based flipping might inject dormant poisoned samples until the number of poisoned samples is enough to influence the classifiers’ performance severely; on average, the same anomaly filter would discover 25% of them. We also highlight important lessons and observations during this experiment about the performance and robustness of popular multiclass learners against training data set–poisoning attacks that include: trade-offs, complexity, categories, poisoning resistance, and hyperparameter optimization.","PeriodicalId":50327,"journal":{"name":"International Journal of Distributed Sensor Networks","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48685301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1177/15501329221107822
Tao Jing, Hongyan Huang, Yue Wu, Qinghe Gao, Yan Huo, Jiayu Sun
With the number of Internet of Things devices continually increasing, the endogenous security of Internet of Things communication systems is growingly critical. Physical layer authentication is a powerful means of resisting active attacks by exploiting the unique characteristics inherent in wireless signals and physical devices. Many existing physical layer authentication schemes usually assume physical layer attributes obey certain statistical distributions that are unknown to receivers. To overcome the uncertainty, machine learning–based authentication approaches have been employed to implement threshold-free authentication. In this article, we utilize an expectation–conditional maximization algorithm to provide the physical layer attribute estimates required for the authentication phase and a logistic regression model to achieve threshold-free physical layer authentication. Moreover, a Frank–Wolfe algorithm is considered to achieve fast convergence of the logistic regression parameters and multi-attributes are adopted to increase the differentiation of transmitters. Simulation results demonstrate that the obtained attribute estimates are sufficient to provide a reliable source of data for authentication and the proposed threshold-free multi-attributes physical layer authentication scheme can effectively improve authentication accuracy, with the false alarm rate P f reduced to 0.0263% and the miss detection rate P m reduced to 0.3466%.
{"title":"Threshold-free multi-attributes physical layer authentication based on expectation–conditional maximization channel estimation in Internet of Things","authors":"Tao Jing, Hongyan Huang, Yue Wu, Qinghe Gao, Yan Huo, Jiayu Sun","doi":"10.1177/15501329221107822","DOIUrl":"https://doi.org/10.1177/15501329221107822","url":null,"abstract":"With the number of Internet of Things devices continually increasing, the endogenous security of Internet of Things communication systems is growingly critical. Physical layer authentication is a powerful means of resisting active attacks by exploiting the unique characteristics inherent in wireless signals and physical devices. Many existing physical layer authentication schemes usually assume physical layer attributes obey certain statistical distributions that are unknown to receivers. To overcome the uncertainty, machine learning–based authentication approaches have been employed to implement threshold-free authentication. In this article, we utilize an expectation–conditional maximization algorithm to provide the physical layer attribute estimates required for the authentication phase and a logistic regression model to achieve threshold-free physical layer authentication. Moreover, a Frank–Wolfe algorithm is considered to achieve fast convergence of the logistic regression parameters and multi-attributes are adopted to increase the differentiation of transmitters. Simulation results demonstrate that the obtained attribute estimates are sufficient to provide a reliable source of data for authentication and the proposed threshold-free multi-attributes physical layer authentication scheme can effectively improve authentication accuracy, with the false alarm rate P f reduced to 0.0263% and the miss detection rate P m reduced to 0.3466%.","PeriodicalId":50327,"journal":{"name":"International Journal of Distributed Sensor Networks","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44601545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1177/15501329221110810
Teng Shao
One of the fundamental problems in sensor networks is to estimate and track the target states of interest that evolve in the sensing field. Distributed filtering is an effective tool to deal with state estimation in which each sensor only communicates information with its neighbors in sensor networks without the requirement of a fusion center. However, in the majority of the existing distributed filters, it is assumed that typically all sensors possess unlimited field of view to observe the target states. This is quite restrictive since practical sensors have limited sensing range. In this article, we consider distributed filtering based on linear minimum mean square error criterion in sensor networks with limited sensing range. To achieve the optimal filter and consensus, two types of strategies based on linear minimum mean square error criterion are proposed, that is, linear minimum mean square error filter based on measurement and linear minimum mean square error filter based on estimate, according to the difference of the neighbor sensor information received by the sensor. In linear minimum mean square error filter based on measurement, the sensor node collects measurement from its neighbors, whereas in linear minimum mean square error filter based on estimate, the sensor node collects estimate from its neighbors. The stability and computational complexity of linear minimum mean square error filter are analyzed. Numerical experimental results further verify the effectiveness of the proposed methods.
{"title":"Distributed filtering in sensor networks based on linear minimum mean square error criterion with limited sensing range","authors":"Teng Shao","doi":"10.1177/15501329221110810","DOIUrl":"https://doi.org/10.1177/15501329221110810","url":null,"abstract":"One of the fundamental problems in sensor networks is to estimate and track the target states of interest that evolve in the sensing field. Distributed filtering is an effective tool to deal with state estimation in which each sensor only communicates information with its neighbors in sensor networks without the requirement of a fusion center. However, in the majority of the existing distributed filters, it is assumed that typically all sensors possess unlimited field of view to observe the target states. This is quite restrictive since practical sensors have limited sensing range. In this article, we consider distributed filtering based on linear minimum mean square error criterion in sensor networks with limited sensing range. To achieve the optimal filter and consensus, two types of strategies based on linear minimum mean square error criterion are proposed, that is, linear minimum mean square error filter based on measurement and linear minimum mean square error filter based on estimate, according to the difference of the neighbor sensor information received by the sensor. In linear minimum mean square error filter based on measurement, the sensor node collects measurement from its neighbors, whereas in linear minimum mean square error filter based on estimate, the sensor node collects estimate from its neighbors. The stability and computational complexity of linear minimum mean square error filter are analyzed. Numerical experimental results further verify the effectiveness of the proposed methods.","PeriodicalId":50327,"journal":{"name":"International Journal of Distributed Sensor Networks","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49044571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wireless sensor network has been widely used in different fields, such as structural health monitoring and artificial intelligence technology. The routing planning, an important part of wireless sensor network, can be formalized as an optimization problem needing to be solved. In this article, a reinforcement learning algorithm is proposed to solve the problem of optimal routing in wireless sensor networks, namely, adaptive TD( λ ) learning algorithm referred to as ADTD( λ ) under Markovian noise, which is more practical than i.i.d. (identically and independently distributed) noise in reinforcement learning. Moreover, we also present non-asymptotic analysis of ADTD( λ ) with both constant and diminishing step-sizes. Specifically, when the step-size is constant, the convergence rate of O ( 1 / T ) is achieved, where T is the number of iterations; when the step-size is diminishing, the convergence rate of O ~ ( 1 / T ) is also obtained. In addition, the performance of the algorithm is verified by simulation.
{"title":"A non-asymptotic analysis of adaptive TD(λ) learning in wireless sensor networks","authors":"Bing Li, Tao Li, Muhua Liu, Junlong Zhu, Mingchuan Zhang, Qingtao Wu","doi":"10.1177/15501329221114546","DOIUrl":"https://doi.org/10.1177/15501329221114546","url":null,"abstract":"Wireless sensor network has been widely used in different fields, such as structural health monitoring and artificial intelligence technology. The routing planning, an important part of wireless sensor network, can be formalized as an optimization problem needing to be solved. In this article, a reinforcement learning algorithm is proposed to solve the problem of optimal routing in wireless sensor networks, namely, adaptive TD( λ ) learning algorithm referred to as ADTD( λ ) under Markovian noise, which is more practical than i.i.d. (identically and independently distributed) noise in reinforcement learning. Moreover, we also present non-asymptotic analysis of ADTD( λ ) with both constant and diminishing step-sizes. Specifically, when the step-size is constant, the convergence rate of O ( 1 / T ) is achieved, where T is the number of iterations; when the step-size is diminishing, the convergence rate of O ~ ( 1 / T ) is also obtained. In addition, the performance of the algorithm is verified by simulation.","PeriodicalId":50327,"journal":{"name":"International Journal of Distributed Sensor Networks","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41554588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1177/15501329221114060
Juan Chen, V. Sugumaran, P. Qu
In order to reduce the number of vehicle collisions and average travel time when vehicles pass through an unsignalized intersection with connected and automated vehicle, an improved Double Dueling Deep Q Network method with Convolutional Neutral Network and Long Short-Term Memory is presented in this article. This method designs a multi-step reward and penalty method to alleviate the sparse reward problem using positive and negative reward experience replay buffer. The proposed method is validated in a simulation environment with different traffic flow and market penetration under the mixed traffic conditions of automated vehicles and human-driving vehicles. The results show that compared with traditional signal control methods, the proposed method can effectively improve the convergence and stability of the algorithm, reduce the number of collisions, and reduce the average travel time under different traffic conditions.
{"title":"Connected and automated vehicle control at unsignalized intersection based on deep reinforcement learning in vehicle-to-infrastructure environment","authors":"Juan Chen, V. Sugumaran, P. Qu","doi":"10.1177/15501329221114060","DOIUrl":"https://doi.org/10.1177/15501329221114060","url":null,"abstract":"In order to reduce the number of vehicle collisions and average travel time when vehicles pass through an unsignalized intersection with connected and automated vehicle, an improved Double Dueling Deep Q Network method with Convolutional Neutral Network and Long Short-Term Memory is presented in this article. This method designs a multi-step reward and penalty method to alleviate the sparse reward problem using positive and negative reward experience replay buffer. The proposed method is validated in a simulation environment with different traffic flow and market penetration under the mixed traffic conditions of automated vehicles and human-driving vehicles. The results show that compared with traditional signal control methods, the proposed method can effectively improve the convergence and stability of the algorithm, reduce the number of collisions, and reduce the average travel time under different traffic conditions.","PeriodicalId":50327,"journal":{"name":"International Journal of Distributed Sensor Networks","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41630025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Node localization is one of the key technologies in the wireless sensor network research field, which is crucial to the high-accuracy localization of mobile nodes, but the positioning error of traditional algorithms such as received signal strength indicator and angle of arrival is more than 4 m, which has almost no practical value. For example, the localization accuracy of the localization algorithm based on received signal strength indicator will be reduced sharply when affected by signal reflection, multipath propagation, and other interference factors. To solve the problem, a three-dimensional localization algorithm of mobile nodes was proposed in this article based on received signal strength indicator–angle of arrival and least-squares support-vector regression, which fused the ranging information of received signal strength indicator algorithm and the angle of arrival algorithm and optimized the estimated distance of unknown nodes. Next, the mobile node model and least-squares support-vector regression modeling mechanism were built according to the hop count of the shortest distance between nodes. Finally, the unknown mobile nodes were localized based on least-squares support-vector regression modeling. The experimental results showed that compared with the localization algorithms without optimized ranging information or least-squares support-vector regression modeling, the algorithm proposed in this study exhibited significantly improved stability, a reduced mean localization error by more than 50%, and increased localization accuracy.
{"title":"Three-dimensional localization algorithm of mobile nodes based on received signal strength indicator-angle of arrival and least-squares support-vector regression","authors":"Lieping Zhang, Huihao Peng, Jiajie He, Shenglan Zhang, Zuqiong Zhang","doi":"10.1177/15501329221111961","DOIUrl":"https://doi.org/10.1177/15501329221111961","url":null,"abstract":"Node localization is one of the key technologies in the wireless sensor network research field, which is crucial to the high-accuracy localization of mobile nodes, but the positioning error of traditional algorithms such as received signal strength indicator and angle of arrival is more than 4 m, which has almost no practical value. For example, the localization accuracy of the localization algorithm based on received signal strength indicator will be reduced sharply when affected by signal reflection, multipath propagation, and other interference factors. To solve the problem, a three-dimensional localization algorithm of mobile nodes was proposed in this article based on received signal strength indicator–angle of arrival and least-squares support-vector regression, which fused the ranging information of received signal strength indicator algorithm and the angle of arrival algorithm and optimized the estimated distance of unknown nodes. Next, the mobile node model and least-squares support-vector regression modeling mechanism were built according to the hop count of the shortest distance between nodes. Finally, the unknown mobile nodes were localized based on least-squares support-vector regression modeling. The experimental results showed that compared with the localization algorithms without optimized ranging information or least-squares support-vector regression modeling, the algorithm proposed in this study exhibited significantly improved stability, a reduced mean localization error by more than 50%, and increased localization accuracy.","PeriodicalId":50327,"journal":{"name":"International Journal of Distributed Sensor Networks","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46096139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-01DOI: 10.1177/15501329221102730
Yunfei Zha, Xinye Liu, Fangwu Ma, CC Liu
To improve the reliability of vehicle state parameter estimation, a vehicle state fusion estimation method based on dichotomy is proposed. An extended Kalman filter algorithm is designed based on the vehicle 3 degrees of freedom dynamic model. Meanwhile, considering the influence of dynamic model and sensor noise and its coefficient selection on the estimation results, a radial basis function neural network estimation algorithm is designed. To further improve the reliability of the estimation algorithm, a method of estimation algorithm fusion is proposed based on the idea of mutual compensation between model- and data-driven estimation algorithms. The weights of the estimation results of different algorithms are assigned through the dichotomy. The redundancy and fusion of estimation algorithms can improve estimation performance. The effectiveness of the fusion method is verified by the co-simulation of MATLAB/Simulink and CarSim, and the real vehicle test. The results show that the change trend of the estimation result is consistent with the actual state parameters change trend, and the estimation accuracy after algorithm fusion is significantly improved compared to a single extended Kalman filter or radial basis function.
{"title":"Vehicle state estimation based on extended Kalman filter and radial basis function neural networks","authors":"Yunfei Zha, Xinye Liu, Fangwu Ma, CC Liu","doi":"10.1177/15501329221102730","DOIUrl":"https://doi.org/10.1177/15501329221102730","url":null,"abstract":"To improve the reliability of vehicle state parameter estimation, a vehicle state fusion estimation method based on dichotomy is proposed. An extended Kalman filter algorithm is designed based on the vehicle 3 degrees of freedom dynamic model. Meanwhile, considering the influence of dynamic model and sensor noise and its coefficient selection on the estimation results, a radial basis function neural network estimation algorithm is designed. To further improve the reliability of the estimation algorithm, a method of estimation algorithm fusion is proposed based on the idea of mutual compensation between model- and data-driven estimation algorithms. The weights of the estimation results of different algorithms are assigned through the dichotomy. The redundancy and fusion of estimation algorithms can improve estimation performance. The effectiveness of the fusion method is verified by the co-simulation of MATLAB/Simulink and CarSim, and the real vehicle test. The results show that the change trend of the estimation result is consistent with the actual state parameters change trend, and the estimation accuracy after algorithm fusion is significantly improved compared to a single extended Kalman filter or radial basis function.","PeriodicalId":50327,"journal":{"name":"International Journal of Distributed Sensor Networks","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42219702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-01DOI: 10.1177/15501329221102371
L. Ang, K. Seng, M. Wachowicz
The advances and convergence in sensor technology, information and communication technology, and intelligent analytics have given rise to the Internet of Things or also known as the Internet of Everything or the Industrial Internet. The research and development works for the Internet of Things can be seen to have progressed in two main phases: (1) In the first phase, the earlier works for the Internet of Things focused on developing the building blocks and enabling technologies such as the sensors and RFID technologies, communications and wireless protocols, machine-to-machine interfaces, energy efficiency of nodes, and energy harvesting technologies, and (2) in the second phase, the latter and recent works focused on the addition of, and embedding value to application-specific Internet of Things using technologies for smart environments and applications such as intelligent analytics and machine learning, embedded vision and image processing, augmented reality, and autonomous systems. We associate the term of embedded intelligence and analytics with the data-driven future for application-specific Internet of Things. In this article, we give an introduction and review recent developments of embedded intelligence for the Internet of Things; the various embedded intelligence computational frameworks such as edge, fog, and cloud for the application-specific Internet of Things; and highlight the techniques, challenges, and opportunities for effective deployment of application-specific Internet of Things technology to address complex problems for various smart environments and applications.
{"title":"Embedded intelligence and the data-driven future of application-specific Internet of Things for smart environments","authors":"L. Ang, K. Seng, M. Wachowicz","doi":"10.1177/15501329221102371","DOIUrl":"https://doi.org/10.1177/15501329221102371","url":null,"abstract":"The advances and convergence in sensor technology, information and communication technology, and intelligent analytics have given rise to the Internet of Things or also known as the Internet of Everything or the Industrial Internet. The research and development works for the Internet of Things can be seen to have progressed in two main phases: (1) In the first phase, the earlier works for the Internet of Things focused on developing the building blocks and enabling technologies such as the sensors and RFID technologies, communications and wireless protocols, machine-to-machine interfaces, energy efficiency of nodes, and energy harvesting technologies, and (2) in the second phase, the latter and recent works focused on the addition of, and embedding value to application-specific Internet of Things using technologies for smart environments and applications such as intelligent analytics and machine learning, embedded vision and image processing, augmented reality, and autonomous systems. We associate the term of embedded intelligence and analytics with the data-driven future for application-specific Internet of Things. In this article, we give an introduction and review recent developments of embedded intelligence for the Internet of Things; the various embedded intelligence computational frameworks such as edge, fog, and cloud for the application-specific Internet of Things; and highlight the techniques, challenges, and opportunities for effective deployment of application-specific Internet of Things technology to address complex problems for various smart environments and applications.","PeriodicalId":50327,"journal":{"name":"International Journal of Distributed Sensor Networks","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46482150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-01DOI: 10.1177/15501329221106935
Ping Zhang, Yiqiao Jia, Youlin Shang
As a new and efficient ensemble learning algorithm, XGBoost has been widely applied for its multitudinous advantages, but its classification effect in the case of data imbalance is often not ideal. Aiming at this problem, an attempt was made to optimize the regularization term of XGBoost, and a classification algorithm based on mixed sampling and ensemble learning is proposed. The main idea is to combine SVM-SMOTE over-sampling and EasyEnsemble under-sampling technologies for data processing, and then obtain the final model based on XGBoost by training and ensemble. At the same time, the optimal parameters are automatically searched and adjusted through the Bayesian optimization algorithm to realize classification prediction. In the experimental stage, the G-mean and area under the curve (AUC) values are used as evaluation indicators to compare and analyze the classification performance of different sampling methods and algorithm models. The experimental results on the public data set also verify the feasibility and effectiveness of the proposed algorithm.
作为一种新型高效的集成学习算法,XGBoost以其众多的优点得到了广泛的应用,但其在数据不平衡情况下的分类效果往往并不理想。针对这一问题,尝试对XGBoost的正则化项进行优化,提出了一种基于混合采样和集成学习的分类算法。主要思想是将SVM-SMOTE过采样和EasyEnsemble欠采样技术结合起来进行数据处理,然后通过训练和集成得到基于XGBoost的最终模型。同时,通过贝叶斯优化算法自动搜索和调整最优参数,实现分类预测。在实验阶段,以g均值和曲线下面积(area under the curve, AUC)值作为评价指标,对比分析不同采样方法和算法模型的分类性能。在公共数据集上的实验结果也验证了该算法的可行性和有效性。
{"title":"Research and application of XGBoost in imbalanced data","authors":"Ping Zhang, Yiqiao Jia, Youlin Shang","doi":"10.1177/15501329221106935","DOIUrl":"https://doi.org/10.1177/15501329221106935","url":null,"abstract":"As a new and efficient ensemble learning algorithm, XGBoost has been widely applied for its multitudinous advantages, but its classification effect in the case of data imbalance is often not ideal. Aiming at this problem, an attempt was made to optimize the regularization term of XGBoost, and a classification algorithm based on mixed sampling and ensemble learning is proposed. The main idea is to combine SVM-SMOTE over-sampling and EasyEnsemble under-sampling technologies for data processing, and then obtain the final model based on XGBoost by training and ensemble. At the same time, the optimal parameters are automatically searched and adjusted through the Bayesian optimization algorithm to realize classification prediction. In the experimental stage, the G-mean and area under the curve (AUC) values are used as evaluation indicators to compare and analyze the classification performance of different sampling methods and algorithm models. The experimental results on the public data set also verify the feasibility and effectiveness of the proposed algorithm.","PeriodicalId":50327,"journal":{"name":"International Journal of Distributed Sensor Networks","volume":"18 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65535171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-01DOI: 10.1177/15501329221105202
Tao Jing, Hengyu Yu, Xiaoxuan Wang, Qinghe Gao
The Internet of Things has emerged as a wonder-solution to numerous problems in our everyday lives, such as smart homes and intelligent transportation. As an extension of the IoTs, the Internet of Vehicles (IoVs) also requires increasingly high security and timeliness. This article proposes a vehicle-assisted batch verification (VABV) system for IoV, in which some vehicles called auxiliary authentication terminal (AAT) are selected to assist the roadside unit for Basic Safety Message (BSM) verification. As a measure to enhance the timeliness performance for system dependability, comprehensive AAT selection strategies are designed. To overcome the security weaknesses of VABV system, a Sybil detection scheme based on Extreme Learning Machine is developed. For the evaluation of VABV system, the quantified Age of Information (AoI) is used as an integrated timeliness and security indicator. The proposed AoI indicator synthesizes the effects of BSM verification, re-verification for failure of some AATs, Sybil attack, and Sybil detection scheme. As illustrated by the simulation results, by employing AoI as a performance evaluation indicator, we can better and more intuitively design an AAT optimal selection strategy based on changes in AoI. Simultaneously, the performance of the proposed Sybil detection scheme can be evaluated more intuitively and effectively under different IoV scenarios based on AoI.
{"title":"Joint timeliness and security provisioning for enhancement of dependability in Internet of Vehicle system","authors":"Tao Jing, Hengyu Yu, Xiaoxuan Wang, Qinghe Gao","doi":"10.1177/15501329221105202","DOIUrl":"https://doi.org/10.1177/15501329221105202","url":null,"abstract":"The Internet of Things has emerged as a wonder-solution to numerous problems in our everyday lives, such as smart homes and intelligent transportation. As an extension of the IoTs, the Internet of Vehicles (IoVs) also requires increasingly high security and timeliness. This article proposes a vehicle-assisted batch verification (VABV) system for IoV, in which some vehicles called auxiliary authentication terminal (AAT) are selected to assist the roadside unit for Basic Safety Message (BSM) verification. As a measure to enhance the timeliness performance for system dependability, comprehensive AAT selection strategies are designed. To overcome the security weaknesses of VABV system, a Sybil detection scheme based on Extreme Learning Machine is developed. For the evaluation of VABV system, the quantified Age of Information (AoI) is used as an integrated timeliness and security indicator. The proposed AoI indicator synthesizes the effects of BSM verification, re-verification for failure of some AATs, Sybil attack, and Sybil detection scheme. As illustrated by the simulation results, by employing AoI as a performance evaluation indicator, we can better and more intuitively design an AAT optimal selection strategy based on changes in AoI. Simultaneously, the performance of the proposed Sybil detection scheme can be evaluated more intuitively and effectively under different IoV scenarios based on AoI.","PeriodicalId":50327,"journal":{"name":"International Journal of Distributed Sensor Networks","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41652376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}