Jiwon Bang, Siwoon Son, Hajin Kim, Yang-Sae Moon, Mi-Jung Choi
{"title":"Design and implementation of a load shedding engine for solving starvation problems in Apache Kafka","authors":"Jiwon Bang, Siwoon Son, Hajin Kim, Yang-Sae Moon, Mi-Jung Choi","doi":"10.1109/NOMS.2018.8406306","DOIUrl":null,"url":null,"abstract":"Real-time data stream processing technologies such as Apache Storm and Apache Spark are being actively studied to deal with large-capacity data streams that generated rapidly in real time. Because it is difficult to use most real-time processing techniques alone, it is common to use it with a messaging system that supports input and output of data streams. Apache Kafka is a representative distributed messaging system, specialized in delivering large amounts of real-time log data. However, if the production rate of data in Kafka is faster than the consumption rate, data starvation problem may arise. In order to solve the starvation problem, a load shedding technique is needed to limit the incoming data and maintain system performance when the system is under load. Thus, in this paper confirmed the starvation problem that can occur in Kafka, and we designed and implemented a load shedding engine to solve this problem and proposed a solution to the starvation problem in Kafka based on the performance experiment.","PeriodicalId":19331,"journal":{"name":"NOMS 2018 - 2018 IEEE/IFIP Network Operations and Management Symposium","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NOMS 2018 - 2018 IEEE/IFIP Network Operations and Management Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NOMS.2018.8406306","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Real-time data stream processing technologies such as Apache Storm and Apache Spark are being actively studied to deal with large-capacity data streams that generated rapidly in real time. Because it is difficult to use most real-time processing techniques alone, it is common to use it with a messaging system that supports input and output of data streams. Apache Kafka is a representative distributed messaging system, specialized in delivering large amounts of real-time log data. However, if the production rate of data in Kafka is faster than the consumption rate, data starvation problem may arise. In order to solve the starvation problem, a load shedding technique is needed to limit the incoming data and maintain system performance when the system is under load. Thus, in this paper confirmed the starvation problem that can occur in Kafka, and we designed and implemented a load shedding engine to solve this problem and proposed a solution to the starvation problem in Kafka based on the performance experiment.