{"title":"Automating Incremental and Asynchronous Evaluation for Recursive Aggregate Data Processing","authors":"Qiange Wang, Yanfeng Zhang, Hao Wang, Liang Geng, Rubao Lee, Xiaodong Zhang, Ge Yu","doi":"10.1145/3318464.3389712","DOIUrl":null,"url":null,"abstract":"In database and large-scale data analytics, recursive aggregate processing plays an important role, which is generally implemented under a framework of incremental computing and executed synchronously and/or asynchronously. We identify three barriers in existing recursive aggregate data processing. First, the processing scope is largely limited to monotonic programs. Second, checking on conditions for monotonicity and correctness for async processing is sophisticated and manually done. Third, execution engines may be suboptimal due to separation of sync and async execution. In this paper, we lay an analytical foundation for conditions to check if a recursive aggregate program that is monotonic or even non-monotonic can be executed incrementally and asynchronously with its correct result. We design and implement a condition verification tool that can automatically check if a given program satisfies the conditions. We further propose a unified sync-async engine to execute these programs for high performance. To integrate all these effective methods together, we have developed a distributed Datalog system, called PowerLog. Our evaluation shows that PowerLog can outperform three representative Datalog systems on both monotonic and non-monotonic recursive programs.","PeriodicalId":436122,"journal":{"name":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","volume":"14 41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3318464.3389712","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
In database and large-scale data analytics, recursive aggregate processing plays an important role, which is generally implemented under a framework of incremental computing and executed synchronously and/or asynchronously. We identify three barriers in existing recursive aggregate data processing. First, the processing scope is largely limited to monotonic programs. Second, checking on conditions for monotonicity and correctness for async processing is sophisticated and manually done. Third, execution engines may be suboptimal due to separation of sync and async execution. In this paper, we lay an analytical foundation for conditions to check if a recursive aggregate program that is monotonic or even non-monotonic can be executed incrementally and asynchronously with its correct result. We design and implement a condition verification tool that can automatically check if a given program satisfies the conditions. We further propose a unified sync-async engine to execute these programs for high performance. To integrate all these effective methods together, we have developed a distributed Datalog system, called PowerLog. Our evaluation shows that PowerLog can outperform three representative Datalog systems on both monotonic and non-monotonic recursive programs.