Ziran Min, S. Gokhale, Shashank Shekhar, C. Mahmoudi, Zhuangwei Kang, Yogesh D. Barve, A. Gokhale
{"title":"A Classification Framework for IoT Network Traffic Data for Provisioning 5G Network Slices in Smart Computing Applications","authors":"Ziran Min, S. Gokhale, Shashank Shekhar, C. Mahmoudi, Zhuangwei Kang, Yogesh D. Barve, A. Gokhale","doi":"10.1109/SMARTCOMP58114.2023.00034","DOIUrl":null,"url":null,"abstract":"Existing massive deployments of IoT devices in support of smart computing applications across a range of domains must leverage critical features of 5G, such as network slicing, to receive differentiated and reliable services. However, the voluminous, dynamic, and heterogeneous nature of IoT traffic imposes complexities on the problems of network flow classification, network traffic analysis, and accurate quantification of the network requirements, thereby making the provisioning of 5G network slices across the application mix a challenging problem. To address these needs, we propose a novel network traffic classification approach that consists of a pipeline that combines Principal Component Analysis (PCA), with KMeans clustering and Hellinger distance. PCA is applied as the first step to efficiently reduce the dimensionality of features while preserving as much of the original information as possible. This significantly reduces the runtime of KMeans, which is applied as the second step. KMeans, being an unsupervised approach, eliminates the need to label data which can be cumbersome, error-prone, and time-consuming. In the third step, a Hellinger distance-based recursive KMeans algorithm is applied to merge similar clusters toward identifying the optimal number of clusters. This makes the final clustering results compact and intuitively interpretable within the context of the problem, while addressing the limitations of traditional KMeans algorithm, such as sensitivity to initialization and the requirement of manual specification of the number of clusters. Evaluation of our approach on a real-world IoT dataset demonstrates that the pipeline can compactly represent the dataset as three clusters. The service properties of these clusters can be easily inferred and directly mapped to different types of slices in the 5G network.","PeriodicalId":163556,"journal":{"name":"2023 IEEE International Conference on Smart Computing (SMARTCOMP)","volume":"25 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Smart Computing (SMARTCOMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMARTCOMP58114.2023.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Existing massive deployments of IoT devices in support of smart computing applications across a range of domains must leverage critical features of 5G, such as network slicing, to receive differentiated and reliable services. However, the voluminous, dynamic, and heterogeneous nature of IoT traffic imposes complexities on the problems of network flow classification, network traffic analysis, and accurate quantification of the network requirements, thereby making the provisioning of 5G network slices across the application mix a challenging problem. To address these needs, we propose a novel network traffic classification approach that consists of a pipeline that combines Principal Component Analysis (PCA), with KMeans clustering and Hellinger distance. PCA is applied as the first step to efficiently reduce the dimensionality of features while preserving as much of the original information as possible. This significantly reduces the runtime of KMeans, which is applied as the second step. KMeans, being an unsupervised approach, eliminates the need to label data which can be cumbersome, error-prone, and time-consuming. In the third step, a Hellinger distance-based recursive KMeans algorithm is applied to merge similar clusters toward identifying the optimal number of clusters. This makes the final clustering results compact and intuitively interpretable within the context of the problem, while addressing the limitations of traditional KMeans algorithm, such as sensitivity to initialization and the requirement of manual specification of the number of clusters. Evaluation of our approach on a real-world IoT dataset demonstrates that the pipeline can compactly represent the dataset as three clusters. The service properties of these clusters can be easily inferred and directly mapped to different types of slices in the 5G network.