M. Perscheid, H. Plattner, Daniel Ritter, R. Schlosser, Ralf Teusner
The Hasso Plattner Institute (HPI), academically structured as the independent Faculty of Digital Engineering at the University of Potsdam, unites computer science research and teaching with the advantages of a privately financed institute and a tuition-free study program. Founder and namesake of the institute is the SAP co-founder Hasso Plattner, who also heads the Enterprise Platform and Integration Concepts (EPIC) research center which focuses on the technical aspects of business software with a vision to provide the fastest way to get insights out of enterprise data. Founded in 2006, the EPIC combines three research groups comprising autonomous data management, enterprise software engineering, and data-driven decision support.
{"title":"Enterprise Platform and Integration Concepts Research at HPI","authors":"M. Perscheid, H. Plattner, Daniel Ritter, R. Schlosser, Ralf Teusner","doi":"10.1145/3582302.3582322","DOIUrl":"https://doi.org/10.1145/3582302.3582322","url":null,"abstract":"The Hasso Plattner Institute (HPI), academically structured as the independent Faculty of Digital Engineering at the University of Potsdam, unites computer science research and teaching with the advantages of a privately financed institute and a tuition-free study program. Founder and namesake of the institute is the SAP co-founder Hasso Plattner, who also heads the Enterprise Platform and Integration Concepts (EPIC) research center which focuses on the technical aspects of business software with a vision to provide the fastest way to get insights out of enterprise data. Founded in 2006, the EPIC combines three research groups comprising autonomous data management, enterprise software engineering, and data-driven decision support.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129297675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Welcome to this installment of ACM SIGMOD Record's series of interviews with distinguished members of the database community. I'm Marianne Winslett, and today we are on Zoom with Chenggang Wu, co-founder and CTO of Aqueduct. Chenggang received the 2022 ACM SIGMOD Jim Gray Dissertation Award for his thesis entitled The Design of Any-scale Serverless Infrastructure with Rich Consistency Guarantees. His PhD is from UC Berkeley. So, Chenggang, welcome!
欢迎来到ACM SIGMOD Record对数据库社区杰出成员的系列访谈的这一期。我是Marianne Winslett,今天我们和Aqueduct的联合创始人兼首席技术官吴成刚一起上Zoom。程刚博士的论文《具有丰富一致性保证的任意规模无服务器基础设施的设计》获得了2022年ACM SIGMOD Jim Gray博士论文奖。他的博士学位来自加州大学伯克利分校。所以,成钢,欢迎你!
{"title":"Chenggang Wu Speaks Out on his ACM SIGMOD Jim Gray Dissertation Award, Rejection, Believing in Your Work, and More","authors":"Chenggang Wu","doi":"10.1145/3582302.3582318","DOIUrl":"https://doi.org/10.1145/3582302.3582318","url":null,"abstract":"Welcome to this installment of ACM SIGMOD Record's series of interviews with distinguished members of the database community. I'm Marianne Winslett, and today we are on Zoom with Chenggang Wu, co-founder and CTO of Aqueduct. Chenggang received the 2022 ACM SIGMOD Jim Gray Dissertation Award for his thesis entitled The Design of Any-scale Serverless Infrastructure with Rich Consistency Guarantees. His PhD is from UC Berkeley. So, Chenggang, welcome!","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121636289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: Data science is increasingly collaborative. On the one hand, results need to be distributed, e.g., as interactive visualizations. On the other, collaboration in the data development process improves quality and timeliness. This can take many forms: partitioning a problem and working on aspects in parallel, exploring different solutions or reviewing someone else's work.
{"title":"Collaborative Data Science using Scalable Homoiconicity","authors":"H. Pirk","doi":"10.1145/3582302.3582316","DOIUrl":"https://doi.org/10.1145/3582302.3582316","url":null,"abstract":"Motivation: Data science is increasingly collaborative. On the one hand, results need to be distributed, e.g., as interactive visualizations. On the other, collaboration in the data development process improves quality and timeliness. This can take many forms: partitioning a problem and working on aspects in parallel, exploring different solutions or reviewing someone else's work.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130267924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The explorative and iterative nature of developing and operating ML applications leads to a variety of artifacts, such as datasets, features, models, hyperparameters, metrics, software, configurations, and logs. In order to enable comparability, reproducibility, and traceability of these artifacts across the ML lifecycle steps and iterations, systems and tools have been developed to support their collection, storage, and management. It is often not obvious what precise functional scope such systems offer so that the comparison and the estimation of synergy effects between candidates are quite challenging. In this paper, we aim to give an overview of systems and platforms which support the management of ML lifecycle artifacts. Based on a systematic literature review, we derive assessment criteria and apply them to a representative selection of more than 60 systems and platforms.
{"title":"Management of Machine Learning Lifecycle Artifacts","authors":"Marius Schlegel, K. Sattler","doi":"10.1145/3582302.3582306","DOIUrl":"https://doi.org/10.1145/3582302.3582306","url":null,"abstract":"The explorative and iterative nature of developing and operating ML applications leads to a variety of artifacts, such as datasets, features, models, hyperparameters, metrics, software, configurations, and logs. In order to enable comparability, reproducibility, and traceability of these artifacts across the ML lifecycle steps and iterations, systems and tools have been developed to support their collection, storage, and management. It is often not obvious what precise functional scope such systems offer so that the comparison and the estimation of synergy effects between candidates are quite challenging. In this paper, we aim to give an overview of systems and platforms which support the management of ML lifecycle artifacts. Based on a systematic literature review, we derive assessment criteria and apply them to a representative selection of more than 60 systems and platforms.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123753362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reasoning-based query planning has been explored in many contexts, including relational data integration, the SemanticWeb, and query reformulation. But infrastructure to build reasoning-based optimization in the relational context has been slow to develop. We overview PDQ 2.0, a platform supporting a number of reasoningenhanced querying tasks. We focus on a major goal of PDQ 2.0: obtaining a more modular and flexible architecture for reasoning-based query optimization.
{"title":"PDQ 2.0","authors":"M. Benedikt, Fergus Cooper, Stefano Germano, Gabor Gyorkei, Efthymia Tsamoura, Brandon Moore, Camilo Ortiz","doi":"10.1145/3582302.3582308","DOIUrl":"https://doi.org/10.1145/3582302.3582308","url":null,"abstract":"Reasoning-based query planning has been explored in many contexts, including relational data integration, the SemanticWeb, and query reformulation. But infrastructure to build reasoning-based optimization in the relational context has been slow to develop. We overview PDQ 2.0, a platform supporting a number of reasoningenhanced querying tasks. We focus on a major goal of PDQ 2.0: obtaining a more modular and flexible architecture for reasoning-based query optimization.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"323 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122709335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rapidly growing social networks and other graph data have created a high demand for graph technologies in the market. A plethora of graph databases, systems, and solutions have emerged, as a result. On the other hand, graph has long been a well studied area in the database research community. Despite the numerous surveys on various graph research topics, there is a lack of survey on graph technologies from an industry perspective. The purpose of this paper is to provide the research community with an industrial perspective on the graph database landscape, so that graph researcher can better understand the industry trend and the challenges that the industry is facing, and work on solutions to help address these problems.
{"title":"The World of Graph Databases from An Industry Perspective","authors":"Yuanyuan Tian","doi":"10.1145/3582302.3582320","DOIUrl":"https://doi.org/10.1145/3582302.3582320","url":null,"abstract":"Rapidly growing social networks and other graph data have created a high demand for graph technologies in the market. A plethora of graph databases, systems, and solutions have emerged, as a result. On the other hand, graph has long been a well studied area in the database research community. Despite the numerous surveys on various graph research topics, there is a lack of survey on graph technologies from an industry perspective. The purpose of this paper is to provide the research community with an industrial perspective on the graph database landscape, so that graph researcher can better understand the industry trend and the challenges that the industry is facing, and work on solutions to help address these problems.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115885408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
My high school grades were top except for one subject: composition. Free text was (and still is) my absolute nightmare. After high school I only had to do technical writing, which is much easier: it boils down to math. Fact, supporting evidence, implication, which leads to another fact, repeat. So, when Tamer asked me to write a piece about mid-career challenges, I was excited at first, and then I was terrified. I wrote five outlines and veto'ed them all. "I am not good at this," I wanted to say, "ask somebody else!" But, then I remembered - this happens every time I get into unknown territory.
{"title":"The Formidable Mid-Career Crisis","authors":"A. Ailamaki","doi":"10.1145/3572751.3572761","DOIUrl":"https://doi.org/10.1145/3572751.3572761","url":null,"abstract":"My high school grades were top except for one subject: composition. Free text was (and still is) my absolute nightmare. After high school I only had to do technical writing, which is much easier: it boils down to math. Fact, supporting evidence, implication, which leads to another fact, repeat. So, when Tamer asked me to write a piece about mid-career challenges, I was excited at first, and then I was terrified. I wrote five outlines and veto'ed them all. \"I am not good at this,\" I wanted to say, \"ask somebody else!\" But, then I remembered - this happens every time I get into unknown territory.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115044726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As part of the International Conference on Very Large Data Bases (VLDB) 2021 / Proceedings of the VLDB Endowment Volume 14, a new Research Track category named Scalable Data Science (SDS) was launched [2, 6]. The goal of SDS is to attract cutting-edge and impactful real-world work in the scalable data science arena to enhance the impact and visibility of the VLDB community on data science practice, spur new technical connections, and inspire new follow-on research. The inaugural year proved to be successful, with numerous interesting papers from a wide cross section of both industry and academia, spanning several data science topics, and originating from several countries around the world. In this report, we reflect on the inaugural year of SDS with some statistics on both submissions and accepted papers, SDS invited talks, and our observations, lessons, and tips as inaugural Associate Editors for SDS. We hope this article is helpful to future authors, reviewers, and organizers of SDS, as well as other interested members of the wider database / data management community and beyond.
{"title":"VLDB Scalable Data Science Category","authors":"Arun C. S. Kumar","doi":"10.1145/3572751.3572769","DOIUrl":"https://doi.org/10.1145/3572751.3572769","url":null,"abstract":"As part of the International Conference on Very Large Data Bases (VLDB) 2021 / Proceedings of the VLDB Endowment Volume 14, a new Research Track category named Scalable Data Science (SDS) was launched [2, 6]. The goal of SDS is to attract cutting-edge and impactful real-world work in the scalable data science arena to enhance the impact and visibility of the VLDB community on data science practice, spur new technical connections, and inspire new follow-on research. The inaugural year proved to be successful, with numerous interesting papers from a wide cross section of both industry and academia, spanning several data science topics, and originating from several countries around the world. In this report, we reflect on the inaugural year of SDS with some statistics on both submissions and accepted papers, SDS invited talks, and our observations, lessons, and tips as inaugural Associate Editors for SDS. We hope this article is helpful to future authors, reviewers, and organizers of SDS, as well as other interested members of the wider database / data management community and beyond.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"336 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133015941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Query optimization is a critical technology that is common across all modern data processing systems. However, it is traditionally implemented in silos and is deeply embedded in different systems. Furthermore, over the years, query optimizers have become less understood and rarely touched pieces of code that are brittle to changes and very expensive to maintain, thus slowing down the pace of innovation. In this paper, we argue that it is time to think of query optimizer as a service in modern cloud architectures. Such a design can help build a common set of well-maintained optimizations that are externalized from the query engines and that could be learned and improved using the large workloads present in modern clouds. We present, Oasis, a reference architecture for query optimizer as a service and describe our success in deploying the early version of it in Cosmos. Finally, we discuss the risks and responsibilities involved with Oasis to ensure it is a win-win for everyone.
{"title":"Query Optimizer as a Service","authors":"Alekh Jindal, Jyoti Leeka","doi":"10.1145/3572751.3572767","DOIUrl":"https://doi.org/10.1145/3572751.3572767","url":null,"abstract":"Query optimization is a critical technology that is common across all modern data processing systems. However, it is traditionally implemented in silos and is deeply embedded in different systems. Furthermore, over the years, query optimizers have become less understood and rarely touched pieces of code that are brittle to changes and very expensive to maintain, thus slowing down the pace of innovation. In this paper, we argue that it is time to think of query optimizer as a service in modern cloud architectures. Such a design can help build a common set of well-maintained optimizations that are externalized from the query engines and that could be learned and improved using the large workloads present in modern clouds. We present, Oasis, a reference architecture for query optimizer as a service and describe our success in deploying the early version of it in Cosmos. Finally, we discuss the risks and responsibilities involved with Oasis to ensure it is a win-win for everyone.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127690523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Arenas, L. A. Croquevielle, Rajesh Jayaram, Cristian Riveros
Counting the answers to a query is a fundamental problem in databases, with several applications in the evaluation, optimization, and visualization of queries. Unfortunately, counting query answers is a #P-hard problem in most cases, so it is unlikely to be solvable in polynomial time. Recently, new results on approximate counting have been developed, specifically by showing that some problems in automata theory admit fully polynomial-time randomized approximation schemes. These results have several implications for the problem of counting the answers to a query; in particular, for graph and conjunctive queries. In this work, we present the main ideas of these approximation results, by using labeled DAGs instead of automata to simplify the presentation. In addition, we review how to apply these results to count query answers in different areas of databases.
{"title":"Counting the Answers to a Query","authors":"M. Arenas, L. A. Croquevielle, Rajesh Jayaram, Cristian Riveros","doi":"10.1145/3572751.3572753","DOIUrl":"https://doi.org/10.1145/3572751.3572753","url":null,"abstract":"Counting the answers to a query is a fundamental problem in databases, with several applications in the evaluation, optimization, and visualization of queries. Unfortunately, counting query answers is a #P-hard problem in most cases, so it is unlikely to be solvable in polynomial time. Recently, new results on approximate counting have been developed, specifically by showing that some problems in automata theory admit fully polynomial-time randomized approximation schemes. These results have several implications for the problem of counting the answers to a query; in particular, for graph and conjunctive queries. In this work, we present the main ideas of these approximation results, by using labeled DAGs instead of automata to simplify the presentation. In addition, we review how to apply these results to count query answers in different areas of databases.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114589032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}