Juthaporn Vipatpakpaiboon, V. C. Barroso, K. Akkarajitsakul
{"title":"Computing Resource Estimation by using Machine Learning Techniques for ALICE O2 Logging System","authors":"Juthaporn Vipatpakpaiboon, V. C. Barroso, K. Akkarajitsakul","doi":"10.1145/3468784.3468786","DOIUrl":null,"url":null,"abstract":"Resource estimation is a technique used to estimate computing resources of a system based on historical data and make the system more efficient. There are many researchers who apply machine learning to estimate the computing resources and fulfill their problems. The European Organization for Nuclear Research (CERN) is currently developing a new logging system for A Large Ion Collider Experiment detector (ALICE) based on the Elastic Logstash Kibana (ELK) software stack. Beat which is a data shipper installed on the First Level Processor (FLP) nodes will receive the log data and transfer these to Logstash, a data preprocessing pipeline. It ingests the data and sends the ingested data to Elasticsearch which is a search and analytics engine. The difficulty of this work is about how to handle the large cluster which in future, the number of nodes may increase or decrease, and the number of services in the machine likewise. To make the system more reliable and adaptable to change, a regression model can be used to estimate and plan the number of resources for Logstash. In this paper, we use Metricbeat to get the historical computing metrics of machines from Logstash. In order to find an appropriate regression model, we applied different machine learning algorithms including random forest regression, multiple linear regression, and multi-layer perceptron. The efficiency of these models is measured and compared using coefficient of determination, mean absolute error (MAE)and mean squared error (MSE). The experimental results show that our random forest regression model can outperform the others in both the tuned and not tuned models for estimating CPU, memory and disk space. However, in terms of the training time, the multiple linear regression model spends less time due to the lower number of parameters and lower complexity of the model.","PeriodicalId":341589,"journal":{"name":"The 12th International Conference on Advances in Information Technology","volume":"443 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 12th International Conference on Advances in Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3468784.3468786","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Resource estimation is a technique used to estimate computing resources of a system based on historical data and make the system more efficient. There are many researchers who apply machine learning to estimate the computing resources and fulfill their problems. The European Organization for Nuclear Research (CERN) is currently developing a new logging system for A Large Ion Collider Experiment detector (ALICE) based on the Elastic Logstash Kibana (ELK) software stack. Beat which is a data shipper installed on the First Level Processor (FLP) nodes will receive the log data and transfer these to Logstash, a data preprocessing pipeline. It ingests the data and sends the ingested data to Elasticsearch which is a search and analytics engine. The difficulty of this work is about how to handle the large cluster which in future, the number of nodes may increase or decrease, and the number of services in the machine likewise. To make the system more reliable and adaptable to change, a regression model can be used to estimate and plan the number of resources for Logstash. In this paper, we use Metricbeat to get the historical computing metrics of machines from Logstash. In order to find an appropriate regression model, we applied different machine learning algorithms including random forest regression, multiple linear regression, and multi-layer perceptron. The efficiency of these models is measured and compared using coefficient of determination, mean absolute error (MAE)and mean squared error (MSE). The experimental results show that our random forest regression model can outperform the others in both the tuned and not tuned models for estimating CPU, memory and disk space. However, in terms of the training time, the multiple linear regression model spends less time due to the lower number of parameters and lower complexity of the model.