{"title":"大负载下分布式流处理系统建模","authors":"Muhammad Mudassar Qureshi, Hanhua Chen, Hai Jin","doi":"10.1109/CW.2019.00024","DOIUrl":null,"url":null,"abstract":"Big data applications play a significant role in diverse fields. Distributed Stream Processing Engines (DSPEs) are widely used to support real time applications efficiently. Partitioning algorithms are used to partition data streams into multiple nodes to process in parallel to gain efficient performance. Aggregation cost is an important factor when process stateful streaming applications using such partitioning algorithms because it plays an important role on performance when final result is being produced in stateful streaming applications. However, impact of aggregation cost in stream processing is not discussed comprehensively in existing literature. We use performance modeling to identify the importance of aggregation cost when workload is high. We implement performance model on a multi-node cluster to predict the same behavior as on single resource performance model. We demonstrate that stateful streaming applications need more resources as compare to stateless applications when workload is high and both stateful and stateless applications are running in the same DSPE. Experiments results show that a stateful streaming application needs more resources compared to a stateless streaming application when both applications are running on the same DSPE when the workload is high. Further experiment results show that the performance modeling may be helpful to predict maximum workload that can be process on a DSPE and increase in parallelism level is not guaranteed to increase the performance of streaming applications.","PeriodicalId":117409,"journal":{"name":"2019 International Conference on Cyberworlds (CW)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Modeling Distributed Stream Processing Systems Under Heavy Workload\",\"authors\":\"Muhammad Mudassar Qureshi, Hanhua Chen, Hai Jin\",\"doi\":\"10.1109/CW.2019.00024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big data applications play a significant role in diverse fields. Distributed Stream Processing Engines (DSPEs) are widely used to support real time applications efficiently. Partitioning algorithms are used to partition data streams into multiple nodes to process in parallel to gain efficient performance. Aggregation cost is an important factor when process stateful streaming applications using such partitioning algorithms because it plays an important role on performance when final result is being produced in stateful streaming applications. However, impact of aggregation cost in stream processing is not discussed comprehensively in existing literature. We use performance modeling to identify the importance of aggregation cost when workload is high. We implement performance model on a multi-node cluster to predict the same behavior as on single resource performance model. We demonstrate that stateful streaming applications need more resources as compare to stateless applications when workload is high and both stateful and stateless applications are running in the same DSPE. Experiments results show that a stateful streaming application needs more resources compared to a stateless streaming application when both applications are running on the same DSPE when the workload is high. Further experiment results show that the performance modeling may be helpful to predict maximum workload that can be process on a DSPE and increase in parallelism level is not guaranteed to increase the performance of streaming applications.\",\"PeriodicalId\":117409,\"journal\":{\"name\":\"2019 International Conference on Cyberworlds (CW)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Cyberworlds (CW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CW.2019.00024\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Cyberworlds (CW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CW.2019.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Modeling Distributed Stream Processing Systems Under Heavy Workload
Big data applications play a significant role in diverse fields. Distributed Stream Processing Engines (DSPEs) are widely used to support real time applications efficiently. Partitioning algorithms are used to partition data streams into multiple nodes to process in parallel to gain efficient performance. Aggregation cost is an important factor when process stateful streaming applications using such partitioning algorithms because it plays an important role on performance when final result is being produced in stateful streaming applications. However, impact of aggregation cost in stream processing is not discussed comprehensively in existing literature. We use performance modeling to identify the importance of aggregation cost when workload is high. We implement performance model on a multi-node cluster to predict the same behavior as on single resource performance model. We demonstrate that stateful streaming applications need more resources as compare to stateless applications when workload is high and both stateful and stateless applications are running in the same DSPE. Experiments results show that a stateful streaming application needs more resources compared to a stateless streaming application when both applications are running on the same DSPE when the workload is high. Further experiment results show that the performance modeling may be helpful to predict maximum workload that can be process on a DSPE and increase in parallelism level is not guaranteed to increase the performance of streaming applications.