None Junior Momo Ziazet, None Charles Boudreau, None Oscar Delgado, None Brigitte Jaumard
{"title":"Designing graph neural networks training data with limited samples and small network sizes","authors":"None Junior Momo Ziazet, None Charles Boudreau, None Oscar Delgado, None Brigitte Jaumard","doi":"10.52953/afyw5455","DOIUrl":null,"url":null,"abstract":"Machine learning is a data-driven domain, which means a learning model's performance depends on the availability of large volumes of data to train it. However, by improving data quality, we can train effective machine learning models with little data. This paper demonstrates this possibility by proposing a methodology to generate high-quality data in the networking domain. We designed a dataset to train a given Graph Neural Network (GNN) that not only contains a small number of samples, but whose samples also feature network graphs of a reduced size (10-node networks). Our evaluations indicate that the dataset generated by the proposed pipeline can train a GNN model that scales well to larger networks of 50 to 300 nodes. The trained model compares favorably to the baseline, achieving a mean absolute percentage error of 5-6%, while being significantly smaller at 90 samples total (vs. thousands of samples for the baseline).","PeriodicalId":93013,"journal":{"name":"ITU journal : ICT discoveries","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ITU journal : ICT discoveries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52953/afyw5455","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning is a data-driven domain, which means a learning model's performance depends on the availability of large volumes of data to train it. However, by improving data quality, we can train effective machine learning models with little data. This paper demonstrates this possibility by proposing a methodology to generate high-quality data in the networking domain. We designed a dataset to train a given Graph Neural Network (GNN) that not only contains a small number of samples, but whose samples also feature network graphs of a reduced size (10-node networks). Our evaluations indicate that the dataset generated by the proposed pipeline can train a GNN model that scales well to larger networks of 50 to 300 nodes. The trained model compares favorably to the baseline, achieving a mean absolute percentage error of 5-6%, while being significantly smaller at 90 samples total (vs. thousands of samples for the baseline).