{"title":"SCAN-GAN: Generative Adversarial Network Based Synthetic Data Generation Technique for Controller Area Network","authors":"Amit Chougule, Kartik Agrawal, Vinay Chamola","doi":"10.1109/iotm.001.2300013","DOIUrl":null,"url":null,"abstract":"In recent years, significant research has occurred on developing various protocols for communication within an autonomous vehicle. Due to the simplicity and trustworthiness of a Controller Area Network (CAN) bus, it has become trendy and widely employed for in-vehicle communication. However, research indicates numerous network-level threats are possible owing to the CAN bus's lack of defense mechanisms. Messages are prone to attacks from third-party sources threatening the correctness of the CAN bus messages. In the last few years, machine learning and deep learning algorithms have effectively improved CAN security and developed various misbehavior, intrusion prevention, and detection systems. However, a large amount of data is required to train these algorithms. There are currently very few CAN datasets available, which has become a major barrier for researchers when developing new CAN security algorithms. Also, the nature of the data in question is tedious to accumulate, especially if there is a need for specific features. In this work, we proposed SCAN-GAN (Synthetic CAN), a generative adversarial Network (GAN) based technique to generate data using existing collected data and presented a synthetic CAN dataset. We also compared the original and generated dataset based on various parameters as well as on well-known classification algorithms, showing that various previous models deliver improved results on the generated dataset over the original dataset. The results exhibit the efficiency of using GANs for data production, which is on par with real data. The results of this work also suggest the adaptability of the GAN to work with varied datasets.","PeriodicalId":235472,"journal":{"name":"IEEE Internet of Things Magazine","volume":"198 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Internet of Things Magazine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iotm.001.2300013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, significant research has occurred on developing various protocols for communication within an autonomous vehicle. Due to the simplicity and trustworthiness of a Controller Area Network (CAN) bus, it has become trendy and widely employed for in-vehicle communication. However, research indicates numerous network-level threats are possible owing to the CAN bus's lack of defense mechanisms. Messages are prone to attacks from third-party sources threatening the correctness of the CAN bus messages. In the last few years, machine learning and deep learning algorithms have effectively improved CAN security and developed various misbehavior, intrusion prevention, and detection systems. However, a large amount of data is required to train these algorithms. There are currently very few CAN datasets available, which has become a major barrier for researchers when developing new CAN security algorithms. Also, the nature of the data in question is tedious to accumulate, especially if there is a need for specific features. In this work, we proposed SCAN-GAN (Synthetic CAN), a generative adversarial Network (GAN) based technique to generate data using existing collected data and presented a synthetic CAN dataset. We also compared the original and generated dataset based on various parameters as well as on well-known classification algorithms, showing that various previous models deliver improved results on the generated dataset over the original dataset. The results exhibit the efficiency of using GANs for data production, which is on par with real data. The results of this work also suggest the adaptability of the GAN to work with varied datasets.