{"title":"Analysis of Garbage Collection Patterns to Extend Microbenchmarks for Big Data Workloads","authors":"Samyak S. Sarnayak, Aditi Ahuja, Pranav Kesavarapu, Aayush Naik, Santhosh Kumar Vasudevan, Subramaniam Kalambur","doi":"10.1145/3491204.3527473","DOIUrl":null,"url":null,"abstract":"Java uses automatic memory allocation where the user does not have to explicitly free used memory. This is done by the garbage collector. Garbage Collection (GC) can take up a significant amount of time, especially in Big Data applications running large workloads where garbage collection can take up to 50 percent of the application's run time. Although benchmarks have been designed to trace garbage collection events, these are not specifically suited for Big Data workloads, due to their unique memory usage patterns. We have developed a free and open source pipeline to extract and analyze object-level details from any Java program including benchmarks and Big Data applications such as Hadoop. The data contains information such as lifetime, class and allocation site of every object allocated by the program. Through the analysis of this data, we propose a small set of benchmarks designed to emulate some of the patterns observed in Big Data applications. These benchmarks also allow us to experiment and compare some Java programming patterns.","PeriodicalId":129216,"journal":{"name":"Companion of the 2022 ACM/SPEC International Conference on Performance Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion of the 2022 ACM/SPEC International Conference on Performance Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3491204.3527473","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Java uses automatic memory allocation where the user does not have to explicitly free used memory. This is done by the garbage collector. Garbage Collection (GC) can take up a significant amount of time, especially in Big Data applications running large workloads where garbage collection can take up to 50 percent of the application's run time. Although benchmarks have been designed to trace garbage collection events, these are not specifically suited for Big Data workloads, due to their unique memory usage patterns. We have developed a free and open source pipeline to extract and analyze object-level details from any Java program including benchmarks and Big Data applications such as Hadoop. The data contains information such as lifetime, class and allocation site of every object allocated by the program. Through the analysis of this data, we propose a small set of benchmarks designed to emulate some of the patterns observed in Big Data applications. These benchmarks also allow us to experiment and compare some Java programming patterns.