{"title":"norm:大规模分布式流处理系统中高效的基于窗口的计算","authors":"Kasper Grud Skat Madsen, Yongluan Zhou, Li Su","doi":"10.1145/2933267.2933315","DOIUrl":null,"url":null,"abstract":"Modern distributed stream processing systems (DSPS), such as Storm, typically provide a flexible programming model, where computation is specified as complicated UDFs and data is opaque to the system. While such a programming framework provides very high flexibility to the developers, it does not provide much semantic information to the system and hence it is hard to perform optimizations that has already been proved very effective in conventional stream systems. Examples include sharing computation among overlapping windows, co-partitioning operators to save communication overhead and efficient state migration during load balancing. In lieu of these challenges, we propose a new framework, which is designed to expose sufficient semantic information of the applications to enable the aforementioned effective optimizations, while on the other hand, maintaining the flexibility of Storm's original programming framework. Furthermore, we present new optimization algorithms to minimize the communication cost and state migration overhead for dynamic load balancing. We implement our framework on top of Storm and run an extensive experimental study to verify its effectiveness.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Enorm: efficient window-based computation in large-scale distributed stream processing systems\",\"authors\":\"Kasper Grud Skat Madsen, Yongluan Zhou, Li Su\",\"doi\":\"10.1145/2933267.2933315\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern distributed stream processing systems (DSPS), such as Storm, typically provide a flexible programming model, where computation is specified as complicated UDFs and data is opaque to the system. While such a programming framework provides very high flexibility to the developers, it does not provide much semantic information to the system and hence it is hard to perform optimizations that has already been proved very effective in conventional stream systems. Examples include sharing computation among overlapping windows, co-partitioning operators to save communication overhead and efficient state migration during load balancing. In lieu of these challenges, we propose a new framework, which is designed to expose sufficient semantic information of the applications to enable the aforementioned effective optimizations, while on the other hand, maintaining the flexibility of Storm's original programming framework. Furthermore, we present new optimization algorithms to minimize the communication cost and state migration overhead for dynamic load balancing. We implement our framework on top of Storm and run an extensive experimental study to verify its effectiveness.\",\"PeriodicalId\":277061,\"journal\":{\"name\":\"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2933267.2933315\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2933267.2933315","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Enorm: efficient window-based computation in large-scale distributed stream processing systems
Modern distributed stream processing systems (DSPS), such as Storm, typically provide a flexible programming model, where computation is specified as complicated UDFs and data is opaque to the system. While such a programming framework provides very high flexibility to the developers, it does not provide much semantic information to the system and hence it is hard to perform optimizations that has already been proved very effective in conventional stream systems. Examples include sharing computation among overlapping windows, co-partitioning operators to save communication overhead and efficient state migration during load balancing. In lieu of these challenges, we propose a new framework, which is designed to expose sufficient semantic information of the applications to enable the aforementioned effective optimizations, while on the other hand, maintaining the flexibility of Storm's original programming framework. Furthermore, we present new optimization algorithms to minimize the communication cost and state migration overhead for dynamic load balancing. We implement our framework on top of Storm and run an extensive experimental study to verify its effectiveness.