Optimizing data movement has always been one of the key ways to get a data processing system to perform efficiently. Appearing under different disguises as computers evolved over the years, the issue is today as relevant as ever. With the advent of the cloud, data movement has become the bottleneck to address in any data processing system. In the cloud, compute and storage are typically disaggregated, with a network in between. In addition, cloud systems are scale-out, i.e., performance is obtained by parallelizing across machines, which also involves network communication. And while it is possible to use machines with large amounts of memory, the pricing models and the virtualized nature of the cloud tends to favor clusters of smaller computing nodes. Nowadays, the problem of optimizing data movement has become the problem of using the network as efficiently as possible.
{"title":"Technical perspective: DFI: The Data Flow Interface for High-Speed Networks","authors":"G. Alonso","doi":"10.1145/3542700.3542704","DOIUrl":"https://doi.org/10.1145/3542700.3542704","url":null,"abstract":"Optimizing data movement has always been one of the key ways to get a data processing system to perform efficiently. Appearing under different disguises as computers evolved over the years, the issue is today as relevant as ever. With the advent of the cloud, data movement has become the bottleneck to address in any data processing system. In the cloud, compute and storage are typically disaggregated, with a network in between. In addition, cloud systems are scale-out, i.e., performance is obtained by parallelizing across machines, which also involves network communication. And while it is possible to use machines with large amounts of memory, the pricing models and the virtualized nature of the cloud tends to favor clusters of smaller computing nodes. Nowadays, the problem of optimizing data movement has become the problem of using the network as efficiently as possible.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114243822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the emergence of (geographically) distributed data mangement in cloud infrastructures the key value systems were promoted as so-called NoSQL systems. In order to achieve maximum availability and performance these KV stores sacrificed the "holy grail" of database consistency and relied on relaxed consistency models, such as eventual consistency.
{"title":"Technical Perspective","authors":"A. Kemper","doi":"10.1145/3542700.3542706","DOIUrl":"https://doi.org/10.1145/3542700.3542706","url":null,"abstract":"With the emergence of (geographically) distributed data mangement in cloud infrastructures the key value systems were promoted as so-called NoSQL systems. In order to achieve maximum availability and performance these KV stores sacrificed the \"holy grail\" of database consistency and relied on relaxed consistency models, such as eventual consistency.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116486298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Pavan, N. V. Vinodchandran, Arnab Bhattacharyya, Kuldeep S. Meel
Constraint satisfaction problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently and with little interaction between them. In this work, we seek to investigate whether bridging the seeming communication gap between the two communities may pave the way to richer fundamental insights. To this end, we focus on two foundational problems: model counting for CSPs and computation of zeroth frequency moments (F0) for data streams.
{"title":"Model Counting Meets Distinct Elements in a Data Stream","authors":"A. Pavan, N. V. Vinodchandran, Arnab Bhattacharyya, Kuldeep S. Meel","doi":"10.1145/3542700.3542721","DOIUrl":"https://doi.org/10.1145/3542700.3542721","url":null,"abstract":"Constraint satisfaction problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently and with little interaction between them. In this work, we seek to investigate whether bridging the seeming communication gap between the two communities may pave the way to richer fundamental insights. To this end, we focus on two foundational problems: model counting for CSPs and computation of zeroth frequency moments (F0) for data streams.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124503940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Amer-Yahia, G. Koutrika, Martin Braschler, Diego Calvanese, D. Lanti, Hendrik Lücke-Tieke, A. Mosca, Tarcisio Mendes de Farias, D. Papadopoulos, Yogendra Patil, Guillem Rull, Ellery Smith, Dimitrios Skoutas, S. Subramanian, Kurt Stockinger
A full-fledged data exploration system must combine different access modalities with a powerful concept of guiding the user in the exploration process, by being reactive and anticipative both for data discovery and for data linking. Such systems are a real opportunity for our community to cater to users with different domain and data science expertise. We introduce INODE - an end-to-end data exploration system - that leverages, on the one hand, Machine Learning and, on the other hand, semantics for the purpose of Data Management (DM). Our vision is to develop a classic unified, comprehensive platform that provides extensive access to open datasets, and we demonstrate it in three significant use cases in the fields of Cancer Biomarker Research, Research and Innovation Policy Making, and Astrophysics. INODE offers sustainable services in (a) data modeling and linking, (b) integrated query processing using natural language, (c) guidance, and (d) data exploration through visualization, thus facilitating the user in discovering new insights. We demonstrate that our system is uniquely accessible to a wide range of users from larger scientific communities to the public. Finally, we briefly illustrate how this work paves the way for new research opportunities in DM.
{"title":"INODE","authors":"S. Amer-Yahia, G. Koutrika, Martin Braschler, Diego Calvanese, D. Lanti, Hendrik Lücke-Tieke, A. Mosca, Tarcisio Mendes de Farias, D. Papadopoulos, Yogendra Patil, Guillem Rull, Ellery Smith, Dimitrios Skoutas, S. Subramanian, Kurt Stockinger","doi":"10.1145/3516431.3516436","DOIUrl":"https://doi.org/10.1145/3516431.3516436","url":null,"abstract":"A full-fledged data exploration system must combine different access modalities with a powerful concept of guiding the user in the exploration process, by being reactive and anticipative both for data discovery and for data linking. Such systems are a real opportunity for our community to cater to users with different domain and data science expertise. We introduce INODE - an end-to-end data exploration system - that leverages, on the one hand, Machine Learning and, on the other hand, semantics for the purpose of Data Management (DM). Our vision is to develop a classic unified, comprehensive platform that provides extensive access to open datasets, and we demonstrate it in three significant use cases in the fields of Cancer Biomarker Research, Research and Innovation Policy Making, and Astrophysics. INODE offers sustainable services in (a) data modeling and linking, (b) integrated query processing using natural language, (c) guidance, and (d) data exploration through visualization, thus facilitating the user in discovering new insights. We demonstrate that our system is uniquely accessible to a wide range of users from larger scientific communities to the public. Finally, we briefly illustrate how this work paves the way for new research opportunities in DM.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121383294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MOTIVATION. The advent of inexpensive, high-quality cameras has led to a rapid increase in the volume of generated video data [19, 16]. It is now feasible to automatically analyze these video datasets at scale due to two developments over the last decade. First, researchers have designed complex, computationally-intensive deep learning (DL) models that capture the contents of a given set of video frames (e.g., objects present in a particular frame [11]) [15]. Second, the computational capabilities of hardware accelerators for evaluating these DL models have increased over the last decade (e.g., TPUs) [8]. We anticipate that automated analysis of videos will reduce the labor cost of analyzing video
{"title":"Accelerating Video Analytics","authors":"Joy Arulraj","doi":"10.1145/3516431.3516442","DOIUrl":"https://doi.org/10.1145/3516431.3516442","url":null,"abstract":"MOTIVATION. The advent of inexpensive, high-quality cameras has led to a rapid increase in the volume of generated video data [19, 16]. It is now feasible to automatically analyze these video datasets at scale due to two developments over the last decade. First, researchers have designed complex, computationally-intensive deep learning (DL) models that capture the contents of a given set of video frames (e.g., objects present in a particular frame [11]) [15]. Second, the computational capabilities of hardware accelerators for evaluating these DL models have increased over the last decade (e.g., TPUs) [8]. We anticipate that automated analysis of videos will reduce the labor cost of analyzing video","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123223192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Bonifati, Michael J. Mior, Felix Naumann, Nele Sina Noack
ACM SIGMOD, VLDB and other database organizations have committed to fostering an inclusive and diverse community, as do many other scientific organizations. Recently, different measures have been taken to advance these goals, especially for underrepresented groups. One possible measure is double-blind reviewing, which aims to hide gender, ethnicity, and other properties of the authors. We report the preliminary results of a gender diversity analysis of publications of the database community across several peer-reviewed venues, and also compare women's authorship percentages in both single-blind and double-blind venues along the years. We also obtained a cross comparison of the obtained results in data management with other relevant areas in Computer Science.
{"title":"How Inclusive are We?","authors":"A. Bonifati, Michael J. Mior, Felix Naumann, Nele Sina Noack","doi":"10.1145/3516431.3516438","DOIUrl":"https://doi.org/10.1145/3516431.3516438","url":null,"abstract":"ACM SIGMOD, VLDB and other database organizations have committed to fostering an inclusive and diverse community, as do many other scientific organizations. Recently, different measures have been taken to advance these goals, especially for underrepresented groups. One possible measure is double-blind reviewing, which aims to hide gender, ethnicity, and other properties of the authors. We report the preliminary results of a gender diversity analysis of publications of the database community across several peer-reviewed venues, and also compare women's authorship percentages in both single-blind and double-blind venues along the years. We also obtained a cross comparison of the obtained results in data management with other relevant areas in Computer Science.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127594627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Welcome to ACM SIGMOD Record's series of interviews with distinguished members of the database community. I am Marianne Winslett, and today I have here with me Juliana Freire, who is a professor at New York University. Juliana is an ACM Fellow, and she has a Google Faculty Research Award, an IBM Faculty Award, and an NSF Career Award. She is also the chair of SIGMOD, and her term of office ends in just a few days. Juliana's Ph.D. is from Stony Brook. So, Juliana, welcome!
{"title":"Juliana Freire Speaks Out on Reproducibility and Hard Changes","authors":"M. Winslett, V. Braganholo","doi":"10.1145/3516431.3516444","DOIUrl":"https://doi.org/10.1145/3516431.3516444","url":null,"abstract":"Welcome to ACM SIGMOD Record's series of interviews with distinguished members of the database community. I am Marianne Winslett, and today I have here with me Juliana Freire, who is a professor at New York University. Juliana is an ACM Fellow, and she has a Google Faculty Research Award, an IBM Faculty Award, and an NSF Career Award. She is also the chair of SIGMOD, and her term of office ends in just a few days. Juliana's Ph.D. is from Stony Brook. So, Juliana, welcome!","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132605176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It probably seems like yesterday that you were starting at your first post-PhD position, but with this latest promotion, whether it is tenure or promotion to a senior level at your company, you can no longer call yourself "junior". You are now stepping into the shoes of a senior researcher. Congratulations! This is a tremendous accomplishment, and you should celebrate. The road was long and often uphill. You finally made it.
{"title":"Congratulations! You Have Become a Senior Researcher. Now What?","authors":"M. Balazinska","doi":"10.1145/3516431.3516440","DOIUrl":"https://doi.org/10.1145/3516431.3516440","url":null,"abstract":"It probably seems like yesterday that you were starting at your first post-PhD position, but with this latest promotion, whether it is tenure or promotion to a senior level at your company, you can no longer call yourself \"junior\". You are now stepping into the shoes of a senior researcher. Congratulations! This is a tremendous accomplishment, and you should celebrate. The road was long and often uphill. You finally made it.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132729739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Philippe Bonnet, Xin Dong, Felix Naumann, Pinar Tözün
The 47th International Conference on Very Large Databases (VLDB'21) was held on August 16-20, 2021 as a hybrid conference. It attracted 180 in-person attendees in Copenhagen and 840 remote attendees. In this paper, we describe our key decisions as general chairs and program committee chairs and share the lessons we learned.
{"title":"VLDB 2021","authors":"Philippe Bonnet, Xin Dong, Felix Naumann, Pinar Tözün","doi":"10.1145/3516431.3516447","DOIUrl":"https://doi.org/10.1145/3516431.3516447","url":null,"abstract":"The 47th International Conference on Very Large Databases (VLDB'21) was held on August 16-20, 2021 as a hybrid conference. It attracted 180 in-person attendees in Copenhagen and 840 remote attendees. In this paper, we describe our key decisions as general chairs and program committee chairs and share the lessons we learned.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122133341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The research area of data summarization seeks to find small data structures that can be updated flexibly, and answer certain queries on the input accurately. Summaries are widely used across the area of data management, and are studied from both theoretical and practical perspectives. They are the subject of ongoing research to improve their performance and broaden their applicability. In this column, recent developments in data summarization are surveyed, with the intent of inspiring further advances.
{"title":"Current Trends in Data Summaries","authors":"Graham Cormode, AI Meta","doi":"10.1145/3516431.3516433","DOIUrl":"https://doi.org/10.1145/3516431.3516433","url":null,"abstract":"The research area of data summarization seeks to find small data structures that can be updated flexibly, and answer certain queries on the input accurately. Summaries are widely used across the area of data management, and are studied from both theoretical and practical perspectives. They are the subject of ongoing research to improve their performance and broaden their applicability. In this column, recent developments in data summarization are surveyed, with the intent of inspiring further advances.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133198624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}