Sarah Chlyah, Nils Gesbert, Pierre Genevès, Nabil Layaïda
{"title":"Efficient iterative programs with distributed data collections","authors":"Sarah Chlyah, Nils Gesbert, Pierre Genevès, Nabil Layaïda","doi":"10.1016/j.jlamp.2025.101047","DOIUrl":null,"url":null,"abstract":"<div><div>Big data programming frameworks have become increasingly important for the development of applications for which performance and scalability are critical. In those complex frameworks, optimizing code by hand is hard and time-consuming, making automated optimization particularly necessary. In order to automate optimization, a prerequisite is to find suitable abstractions to represent programs; for instance, algebras based on monads or monoids to represent distributed data collections. Currently, however, such algebras do not represent recursive programs in a way which allows for analyzing or rewriting them. In this paper, we extend a monoid algebra with a fixpoint operator for representing recursion as a first class citizen and show how it enables new optimizations. Experiments with the Spark platform illustrate performance gains brought by these systematic optimizations.</div></div>","PeriodicalId":48797,"journal":{"name":"Journal of Logical and Algebraic Methods in Programming","volume":"144 ","pages":"Article 101047"},"PeriodicalIF":0.7000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Logical and Algebraic Methods in Programming","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352220825000136","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Big data programming frameworks have become increasingly important for the development of applications for which performance and scalability are critical. In those complex frameworks, optimizing code by hand is hard and time-consuming, making automated optimization particularly necessary. In order to automate optimization, a prerequisite is to find suitable abstractions to represent programs; for instance, algebras based on monads or monoids to represent distributed data collections. Currently, however, such algebras do not represent recursive programs in a way which allows for analyzing or rewriting them. In this paper, we extend a monoid algebra with a fixpoint operator for representing recursion as a first class citizen and show how it enables new optimizations. Experiments with the Spark platform illustrate performance gains brought by these systematic optimizations.
期刊介绍:
The Journal of Logical and Algebraic Methods in Programming is an international journal whose aim is to publish high quality, original research papers, survey and review articles, tutorial expositions, and historical studies in the areas of logical and algebraic methods and techniques for guaranteeing correctness and performability of programs and in general of computing systems. All aspects will be covered, especially theory and foundations, implementation issues, and applications involving novel ideas.