Chen Yang, Qi Guo, Xiaofeng Meng, Rihui Xin, Chunkai Wang
{"title":"重新审视大数据系统中的性能:一种资源解耦方法","authors":"Chen Yang, Qi Guo, Xiaofeng Meng, Rihui Xin, Chunkai Wang","doi":"10.1145/3127479.3132685","DOIUrl":null,"url":null,"abstract":"Big data systems for large-scale data processing are now in widespread use. To improve their performance, both academia and industry have expended a great deal of effort in the analysis of performance bottlenecks. Most big data systems, as Hadoop and Spark, allow distributed computing across clusters. As a result, the execution of systems always parallelizes the use of the CPU, memory, disk and network. If a given resource has the greatest limiting impact on performance, systems will be bottlenecked on it. For a system designer, it is effective for the improvement of performance to tune the bottleneck resource. The key point for the aforementioned scenario is how to determine the bottleneck resource. The nature clue is to quantify the impact of the four major components and identify one causing the greatest impact factor as the bottleneck resource.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"2012 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Revisiting performance in big data systems: an resource decoupling approach\",\"authors\":\"Chen Yang, Qi Guo, Xiaofeng Meng, Rihui Xin, Chunkai Wang\",\"doi\":\"10.1145/3127479.3132685\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big data systems for large-scale data processing are now in widespread use. To improve their performance, both academia and industry have expended a great deal of effort in the analysis of performance bottlenecks. Most big data systems, as Hadoop and Spark, allow distributed computing across clusters. As a result, the execution of systems always parallelizes the use of the CPU, memory, disk and network. If a given resource has the greatest limiting impact on performance, systems will be bottlenecked on it. For a system designer, it is effective for the improvement of performance to tune the bottleneck resource. The key point for the aforementioned scenario is how to determine the bottleneck resource. The nature clue is to quantify the impact of the four major components and identify one causing the greatest impact factor as the bottleneck resource.\",\"PeriodicalId\":20679,\"journal\":{\"name\":\"Proceedings of the 2017 Symposium on Cloud Computing\",\"volume\":\"2012 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2017 Symposium on Cloud Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3127479.3132685\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 Symposium on Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3127479.3132685","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Revisiting performance in big data systems: an resource decoupling approach
Big data systems for large-scale data processing are now in widespread use. To improve their performance, both academia and industry have expended a great deal of effort in the analysis of performance bottlenecks. Most big data systems, as Hadoop and Spark, allow distributed computing across clusters. As a result, the execution of systems always parallelizes the use of the CPU, memory, disk and network. If a given resource has the greatest limiting impact on performance, systems will be bottlenecked on it. For a system designer, it is effective for the improvement of performance to tune the bottleneck resource. The key point for the aforementioned scenario is how to determine the bottleneck resource. The nature clue is to quantify the impact of the four major components and identify one causing the greatest impact factor as the bottleneck resource.