{"title":"GML learning, a generic machine learning model for network measurements analysis","authors":"P. Casas, J. Vanerio, K. Fukuda","doi":"10.23919/CNSM.2017.8255998","DOIUrl":null,"url":null,"abstract":"The application of machine learning models to the analysis of network measurement problems has largely increased in the last decade; however, there is still no clear best-practice or silver bullet approach to address these problems in a general context, and only adhoc and tailored approaches have been evaluated so far. While deep-learning models have provided a major breakthrough in highly-dimensional problems such as image processing, it is difficult to say today which is the best model to address the analysis of large volumes of highly-dimensional data collected in operational networks. In this paper we present a potential solution to fill this gap, exploring the application of ensemble learning models to multiple network measurement problems. We introduce GML Learning, a generic Machine Learning model for the analysis of network measurements. The GML model is a generalization of the well-known stacking approach to ensemble learning, and follows the concepts of the Super Learner model. The Super Learner performs asymptotically as well as the best input base or weak learners, providing a very powerful approach to tackle multiple problems with the same technique. In addition, it defines an approach to minimize over-fitting likelihood during training, using a variant of cross-validation. We deploy the GML model on top of Big-DAMA, a big data analytics framework for network measurement applications. We test the proposed solution in five different and assorted network measurement problems, including detection of network attacks and anomalies, QoE modeling and prediction, and Internet-paths dynamics tracking. Results confirm that the GML model provides better results than any of the single baseline models of the stack, and outperforms traditional bagging and boosting ensemble learning approaches. The GML Learning model opens the door for a generalization of a best-practice technique for the analysis of network measurements.","PeriodicalId":211611,"journal":{"name":"2017 13th International Conference on Network and Service Management (CNSM)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 13th International Conference on Network and Service Management (CNSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/CNSM.2017.8255998","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19
Abstract
The application of machine learning models to the analysis of network measurement problems has largely increased in the last decade; however, there is still no clear best-practice or silver bullet approach to address these problems in a general context, and only adhoc and tailored approaches have been evaluated so far. While deep-learning models have provided a major breakthrough in highly-dimensional problems such as image processing, it is difficult to say today which is the best model to address the analysis of large volumes of highly-dimensional data collected in operational networks. In this paper we present a potential solution to fill this gap, exploring the application of ensemble learning models to multiple network measurement problems. We introduce GML Learning, a generic Machine Learning model for the analysis of network measurements. The GML model is a generalization of the well-known stacking approach to ensemble learning, and follows the concepts of the Super Learner model. The Super Learner performs asymptotically as well as the best input base or weak learners, providing a very powerful approach to tackle multiple problems with the same technique. In addition, it defines an approach to minimize over-fitting likelihood during training, using a variant of cross-validation. We deploy the GML model on top of Big-DAMA, a big data analytics framework for network measurement applications. We test the proposed solution in five different and assorted network measurement problems, including detection of network attacks and anomalies, QoE modeling and prediction, and Internet-paths dynamics tracking. Results confirm that the GML model provides better results than any of the single baseline models of the stack, and outperforms traditional bagging and boosting ensemble learning approaches. The GML Learning model opens the door for a generalization of a best-practice technique for the analysis of network measurements.