{"title":"Improving Throughput of BigData Applications","authors":"Janardhana Reddy Naredula","doi":"10.1109/HiPCW.2019.00014","DOIUrl":null,"url":null,"abstract":"The paper describes various performance problems and solutions to improve throughput of BigData Application like Redis, Kafka, memcache, Cassandra, ElasticSearch,..etc. Most of the solution to the problems are achieved by some of the techniques like bypassing linux kernel, minimizing system calls, efficiently using the multi core machine using asynchronous programming, one thread per core, DPDK, .. etc. Modern machines are very different from those of just 10 years ago. They have many cores and deep memory hierarchies (from L1 caches to NUMA) which reward certain programming practices and penalizes others, Unscalable programming practices (such as taking locks) can devastate performance on many cores. Shared memory and lock-free synchronization primitives are used in solving some of the problems. The paper was concluded with the test prototype of Redis with efficient network path that resulted 37X perf improvement over the baseline.","PeriodicalId":223719,"journal":{"name":"2019 26th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 26th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPCW.2019.00014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The paper describes various performance problems and solutions to improve throughput of BigData Application like Redis, Kafka, memcache, Cassandra, ElasticSearch,..etc. Most of the solution to the problems are achieved by some of the techniques like bypassing linux kernel, minimizing system calls, efficiently using the multi core machine using asynchronous programming, one thread per core, DPDK, .. etc. Modern machines are very different from those of just 10 years ago. They have many cores and deep memory hierarchies (from L1 caches to NUMA) which reward certain programming practices and penalizes others, Unscalable programming practices (such as taking locks) can devastate performance on many cores. Shared memory and lock-free synchronization primitives are used in solving some of the problems. The paper was concluded with the test prototype of Redis with efficient network path that resulted 37X perf improvement over the baseline.