Evangelia Tsoukanara, Georgia Koloniari, E. Pitoura
Graphs offer a generic abstraction for modeling entities and the interactions and relationships between them. Since most real-world graphs evolve over time, there is a need for models to explore the evolution of graphs over time. We introduce the GraphTempo model that allows aggregation both at the attribute and at the time dimension. We also propose an exploration strategy for navigating through the evolution of the graph based on identifying time intervals of significant growth, shrinkage or stability. This exploration strategy would be useful for example for identifying time periods of multiple collaborations between specific groups in a cooperation network, or of declining contacts between specific groups in a disease propagation network. We evaluate the performance and effectiveness of our strategy using two real graphs.
{"title":"GraphTempo: An aggregation framework for evolving graphs","authors":"Evangelia Tsoukanara, Georgia Koloniari, E. Pitoura","doi":"10.48786/edbt.2023.18","DOIUrl":"https://doi.org/10.48786/edbt.2023.18","url":null,"abstract":"Graphs offer a generic abstraction for modeling entities and the interactions and relationships between them. Since most real-world graphs evolve over time, there is a need for models to explore the evolution of graphs over time. We introduce the GraphTempo model that allows aggregation both at the attribute and at the time dimension. We also propose an exploration strategy for navigating through the evolution of the graph based on identifying time intervals of significant growth, shrinkage or stability. This exploration strategy would be useful for example for identifying time periods of multiple collaborations between specific groups in a cooperation network, or of declining contacts between specific groups in a disease propagation network. We evaluate the performance and effectiveness of our strategy using two real graphs.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"78 1","pages":"221-233"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83906379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Road networks are widely used as a fundamental structure in urban transportation studies. In recent years, with more research leveraging deep learning to solve conventional transportation problems, how to obtain robust road network representations (i.e., embeddings) applicable for a wide range of applications became a fundamental need. Existing studies mainly adopt graph embedding methods. Such methods, however, foremost learn the topological correlations of road networks but ignore the spatial structure (i.e., spatial correlations) which are also important in applications such as querying similar trajectories. Besides, most studies learn task-specic embeddings in a supervised manner such that the embeddings are sub-optimal when being used for new tasks. It is inecient to store or learn dedicated embeddings for every dierent task in a large transportation system. To tackle these issues, we propose a model named SARN to learn generic and task-agnostic road network embeddings based on self-supervised contrastive learning. We present (i) a spatial similarity matrix to help learn the spatial correlations of the roads, (ii) a sampling strategy based on the spatial structure of a road network to form self-supervised training samples, and (iii) a two-level loss function to guide SARN to learn embeddings based on both local and global contrasts of similar and dissimilar road segments. Experimental results on three downstream tasks over real-world road networks show that SARN outperforms state-of-the-art self-supervised models consistently and achieves comparable (or even better) performance to supervised models.
{"title":"Spatial Structure-Aware Road Network Embedding via Graph Contrastive Learning","authors":"Yanchuan Chang, E. Tanin, Xin Cao, Jianzhong Qi","doi":"10.48786/edbt.2023.12","DOIUrl":"https://doi.org/10.48786/edbt.2023.12","url":null,"abstract":"Road networks are widely used as a fundamental structure in urban transportation studies. In recent years, with more research leveraging deep learning to solve conventional transportation problems, how to obtain robust road network representations (i.e., embeddings) applicable for a wide range of applications became a fundamental need. Existing studies mainly adopt graph embedding methods. Such methods, however, foremost learn the topological correlations of road networks but ignore the spatial structure (i.e., spatial correlations) which are also important in applications such as querying similar trajectories. Besides, most studies learn task-specic embeddings in a supervised manner such that the embeddings are sub-optimal when being used for new tasks. It is inecient to store or learn dedicated embeddings for every dierent task in a large transportation system. To tackle these issues, we propose a model named SARN to learn generic and task-agnostic road network embeddings based on self-supervised contrastive learning. We present (i) a spatial similarity matrix to help learn the spatial correlations of the roads, (ii) a sampling strategy based on the spatial structure of a road network to form self-supervised training samples, and (iii) a two-level loss function to guide SARN to learn embeddings based on both local and global contrasts of similar and dissimilar road segments. Experimental results on three downstream tasks over real-world road networks show that SARN outperforms state-of-the-art self-supervised models consistently and achieves comparable (or even better) performance to supervised models.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"46 1","pages":"144-156"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84432765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The dawn of multi-model data has brought many challenges to most aspects of data management. In addition, no standards exist focusing on how the models should be combined and man-aged. This paper focuses on the problems related to multi-model querying. We introduce MM-quecat , a tool that enables one to query multi-model data regardless of the underlying multi-model database or polystore. Using category theory, we provide a unified abstract representation of multi-model data, which can be viewed as a graph and, thus, queried using a SPARQL-based query language. Moreover, the support for cross-model redundancy enables the choice of the optimal multi-model query strategy.
{"title":"MM-quecat: A Tool for Unified Querying of Multi-Model Data","authors":"P. Koupil, Daniel Crha, I. Holubová","doi":"10.48786/edbt.2023.76","DOIUrl":"https://doi.org/10.48786/edbt.2023.76","url":null,"abstract":"The dawn of multi-model data has brought many challenges to most aspects of data management. In addition, no standards exist focusing on how the models should be combined and man-aged. This paper focuses on the problems related to multi-model querying. We introduce MM-quecat , a tool that enables one to query multi-model data regardless of the underlying multi-model database or polystore. Using category theory, we provide a unified abstract representation of multi-model data, which can be viewed as a graph and, thus, queried using a SPARQL-based query language. Moreover, the support for cross-model redundancy enables the choice of the optimal multi-model query strategy.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"42 1","pages":"831-834"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81527880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Database systems are no longer used only for the storage of plain structured data and basic analyses. An increasing role is also played by the integration of ML models, e.g., neural networks with specialized frameworks, and their use for classification or prediction. However, using such models on data stored in a database system might require downloading the data and performing the computations outside. In this paper, we evaluate approaches for integrating the ML inference step as a special query operator - the ModelJoin. We explore several options for this integration on different abstraction levels: relational representation of the models as well as SQL queries for inference, the use of UDFs, the use of APIs to existing ML runtimes and a native implementation of the ModelJoin as a query operator supporting both CPU and GPU execution. Our evaluation results show that integrating ML runtimes over APIs perform similarly to a native operator while being generic to support arbitrary model types. The solution of relational representation and SQL queries is most portable and works well for smaller inputs without any changes needed in the database engine.
{"title":"Exploration of Approaches for In-Database ML","authors":"Steffen Kläbe, Stefan Hagedorn, K. Sattler","doi":"10.48786/edbt.2023.25","DOIUrl":"https://doi.org/10.48786/edbt.2023.25","url":null,"abstract":"Database systems are no longer used only for the storage of plain structured data and basic analyses. An increasing role is also played by the integration of ML models, e.g., neural networks with specialized frameworks, and their use for classification or prediction. However, using such models on data stored in a database system might require downloading the data and performing the computations outside. In this paper, we evaluate approaches for integrating the ML inference step as a special query operator - the ModelJoin. We explore several options for this integration on different abstraction levels: relational representation of the models as well as SQL queries for inference, the use of UDFs, the use of APIs to existing ML runtimes and a native implementation of the ModelJoin as a query operator supporting both CPU and GPU execution. Our evaluation results show that integrating ML runtimes over APIs perform similarly to a native operator while being generic to support arbitrary model types. The solution of relational representation and SQL queries is most portable and works well for smaller inputs without any changes needed in the database engine.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"17 1","pages":"311-323"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85281129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Y. Lai, Zainab Zolaktaf, Mostafa Milani, Omar AlOmeir, Jianhao Cao, R. Pottinger
Users interact with databases by writing sequences of SQL queries that are are often stored in query workloads. Current SQL query recommendation approaches make little use of query workloads. Our work presents a novel workload-aware approach to query recommendation. We use deep learning prediction models trained on query pairs extracted from large-scale query workloads to build our approach. Our algorithms suggest contextual (query fragments) and structural (query templates) information to aid users in formulating their next query. We evaluate our algorithms on two real-world datasets: the Sloan Digital Sky Survey (SDSS) and SQLShare. We perform a thorough analysis of the workloads and then empirically show that our workload-aware, deep-learning approach vastly outperforms known collaborative filtering approaches.
{"title":"Workload-Aware Query Recommendation Using Deep Learning","authors":"E. Y. Lai, Zainab Zolaktaf, Mostafa Milani, Omar AlOmeir, Jianhao Cao, R. Pottinger","doi":"10.48786/edbt.2023.05","DOIUrl":"https://doi.org/10.48786/edbt.2023.05","url":null,"abstract":"Users interact with databases by writing sequences of SQL queries that are are often stored in query workloads. Current SQL query recommendation approaches make little use of query workloads. Our work presents a novel workload-aware approach to query recommendation. We use deep learning prediction models trained on query pairs extracted from large-scale query workloads to build our approach. Our algorithms suggest contextual (query fragments) and structural (query templates) information to aid users in formulating their next query. We evaluate our algorithms on two real-world datasets: the Sloan Digital Sky Survey (SDSS) and SQLShare. We perform a thorough analysis of the workloads and then empirically show that our workload-aware, deep-learning approach vastly outperforms known collaborative filtering approaches.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"23 1","pages":"53-65"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84189099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anomaly detection is an important problem in data analytics with applications in many domains. In recent years, there has been an increasing interest in anomaly detection tasks applied to time series. In this tutorial, we take a holistic view on anomaly detection in time series, starting from the core definitions and taxonomies related to time series and anomaly types, to an extensive description of the anomaly detection methods proposed by different communities in the literature. Then, we discuss shortcomings in traditional evaluation measures. Finally, we present new solutions to assess the quality of anomaly detection approaches and new benchmarks capturing diverse domains and applications.
{"title":"New Trends in Time Series Anomaly Detection","authors":"Paul Boniol, John Paparizzos, Themis Palpanas","doi":"10.48786/edbt.2023.80","DOIUrl":"https://doi.org/10.48786/edbt.2023.80","url":null,"abstract":"Anomaly detection is an important problem in data analytics with applications in many domains. In recent years, there has been an increasing interest in anomaly detection tasks applied to time series. In this tutorial, we take a holistic view on anomaly detection in time series, starting from the core definitions and taxonomies related to time series and anomaly types, to an extensive description of the anomaly detection methods proposed by different communities in the literature. Then, we discuss shortcomings in traditional evaluation measures. Finally, we present new solutions to assess the quality of anomaly detection approaches and new benchmarks capturing diverse domains and applications.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"13 1","pages":"847-850"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84734981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ehab Abdelhamid, Nikos Tsikoudis, M. Duller, Marc B. Sugiyama, Nicholas E. Marino, F. Waas
Extract, Transform, and Load (ETL) pipelines are widely used to ingest data into Enterprise Data Warehouse (EDW) systems. These pipelines can be very complex and often tightly coupled to a given EDW, making it challenging to upgrade from a legacy EDW to a Cloud Data Warehouse (CDW). This paper presents a novel solution for a transparent and fully-automated porting of legacy ETL pipelines to CDW environments.
{"title":"Adaptive Real-time Virtualization of Legacy ETL Pipelines in Cloud Data Warehouses","authors":"Ehab Abdelhamid, Nikos Tsikoudis, M. Duller, Marc B. Sugiyama, Nicholas E. Marino, F. Waas","doi":"10.48786/edbt.2023.64","DOIUrl":"https://doi.org/10.48786/edbt.2023.64","url":null,"abstract":"Extract, Transform, and Load (ETL) pipelines are widely used to ingest data into Enterprise Data Warehouse (EDW) systems. These pipelines can be very complex and often tightly coupled to a given EDW, making it challenging to upgrade from a legacy EDW to a Cloud Data Warehouse (CDW). This paper presents a novel solution for a transparent and fully-automated porting of legacy ETL pipelines to CDW environments.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"33 1","pages":"765-772"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89588031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bhimesh Kandibedala, A. Pyayt, Nick Piraino, Chris Caballero, M. Gubanov
We describe a Web-scale interactive Knowledge Graph (KG) , populated with trustworthy information from the latest published medical findings on COVID-19. Currently existing, socially maintained KGs, such as YAGO or DBPedia or more specialized medical ontologies, such as NCBI, Virus-, and COVID-19-related are getting stale very quickly, lack any latest COVID-19 medical findings - most importantly lack any scalable mechanism to keep them up to date. Here we describe COVIDKG.ORG - an online, interactive, trust-worthy COVID-19 Web-scale Knowledge Graph and several advanced search-engines. Its content is extracted and updated from the latest medical research. Because of that it does not suffer from any bias or misinformation, often dominating public information sources.
{"title":"COVIDKG.ORG - a Web-scale COVID-19 Interactive, Trustworthy Knowledge Graph, Constructed and Interrogated for Bias using Deep-Learning","authors":"Bhimesh Kandibedala, A. Pyayt, Nick Piraino, Chris Caballero, M. Gubanov","doi":"10.48786/edbt.2023.63","DOIUrl":"https://doi.org/10.48786/edbt.2023.63","url":null,"abstract":"We describe a Web-scale interactive Knowledge Graph (KG) , populated with trustworthy information from the latest published medical findings on COVID-19. Currently existing, socially maintained KGs, such as YAGO or DBPedia or more specialized medical ontologies, such as NCBI, Virus-, and COVID-19-related are getting stale very quickly, lack any latest COVID-19 medical findings - most importantly lack any scalable mechanism to keep them up to date. Here we describe COVIDKG.ORG - an online, interactive, trust-worthy COVID-19 Web-scale Knowledge Graph and several advanced search-engines. Its content is extracted and updated from the latest medical research. Because of that it does not suffer from any bias or misinformation, often dominating public information sources.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"188 1","pages":"757-764"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73944081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siqiang Luo, Zichen Zhu, Xiaokui Xiao, Y. Yang, Chunbo Li, B. Kao
Vertex-centric (VC) graph systems are at the core of large-scale distributed graph processing. For such systems, a common usage pattern is the concurrent processing of multiple tasks ( multi-processing for short), which aims to execute a large number of unit tasks in parallel. In this paper, we point out that multi-processing has not been sufficiently studied or evaluated in previous work; hence, we fill this critical gap with three major contributions. First, we examine the tradeoff between two important measures in VC-systems: the number of communication rounds and message congestion . We show that this tradeoff is crucial to system performance; yet, existing approaches fail to achieve an optimal tradeoff, leading to poor performance. Second, based on exten-sive experimental evaluations on mainstream VC systems (e.g., Giraph, Pregel+, GraphD) and benchmark multi-processing tasks (e.g., Batch Personalized PageRanks, Multiple Source Shortest Paths), we present several important insights on the correlation between system performance and configurations, which is valu-able to practitioners in optimizing system performance. Third, based on the insights drawn from our experimental evaluations, we present a cost-based tuning framework that optimizes the performance of a representative VC-system. This demonstrates the usefulness of the insights.
{"title":"Multi-Task Processing in Vertex-Centric Graph Systems: Evaluations and Insights","authors":"Siqiang Luo, Zichen Zhu, Xiaokui Xiao, Y. Yang, Chunbo Li, B. Kao","doi":"10.48786/edbt.2023.20","DOIUrl":"https://doi.org/10.48786/edbt.2023.20","url":null,"abstract":"Vertex-centric (VC) graph systems are at the core of large-scale distributed graph processing. For such systems, a common usage pattern is the concurrent processing of multiple tasks ( multi-processing for short), which aims to execute a large number of unit tasks in parallel. In this paper, we point out that multi-processing has not been sufficiently studied or evaluated in previous work; hence, we fill this critical gap with three major contributions. First, we examine the tradeoff between two important measures in VC-systems: the number of communication rounds and message congestion . We show that this tradeoff is crucial to system performance; yet, existing approaches fail to achieve an optimal tradeoff, leading to poor performance. Second, based on exten-sive experimental evaluations on mainstream VC systems (e.g., Giraph, Pregel+, GraphD) and benchmark multi-processing tasks (e.g., Batch Personalized PageRanks, Multiple Source Shortest Paths), we present several important insights on the correlation between system performance and configurations, which is valu-able to practitioners in optimizing system performance. Third, based on the insights drawn from our experimental evaluations, we present a cost-based tuning framework that optimizes the performance of a representative VC-system. This demonstrates the usefulness of the insights.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"4 1","pages":"247-259"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79759570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Almost every organization today is promoting data-driven decision making leveraging advances in data science. According to various surveys, data scientists spend up to 80% of their time cleaning and transforming data. Although data management systems have been carefully optimized for such tasks over several decades, they are seldom leveraged by data scientists who prefer to use libraries such as Pandas, sacrificing performance and scalability in favor of familiarity and ease of use. As a result, data scientists are not able to fully leverage the hardware capabilities of commodity workstations and either end up working on a small sample of their data locally or migrate to more heavyweight frameworks in a cluster environment. In this paper, we present PyFroid, a framework that leverages lightweight relational databases to improve the performance and scalability of Pandas, allowing data scientists to operate on much larger datasets on a commodity workstation. PyFroid has zero learning curve as it maintains all the Pandas APIs and is fully compatible with the tools that data scientists use (e.g., Python notebooks). We experimentally demonstrate that, compared to Pandas, PyFroid is able to analyze up to 20X more data on the same machine, provide comparable or better performance for small datasets as well as near-memory data sizes, and consume much less resources.
{"title":"PyFroid: Scaling Data Analysis on a Commodity Workstation","authors":"Venkatesh Emani, A. Floratou, C. Curino","doi":"10.48786/edbt.2024.06","DOIUrl":"https://doi.org/10.48786/edbt.2024.06","url":null,"abstract":"Almost every organization today is promoting data-driven decision making leveraging advances in data science. According to various surveys, data scientists spend up to 80% of their time cleaning and transforming data. Although data management systems have been carefully optimized for such tasks over several decades, they are seldom leveraged by data scientists who prefer to use libraries such as Pandas, sacrificing performance and scalability in favor of familiarity and ease of use. As a result, data scientists are not able to fully leverage the hardware capabilities of commodity workstations and either end up working on a small sample of their data locally or migrate to more heavyweight frameworks in a cluster environment. In this paper, we present PyFroid, a framework that leverages lightweight relational databases to improve the performance and scalability of Pandas, allowing data scientists to operate on much larger datasets on a commodity workstation. PyFroid has zero learning curve as it maintains all the Pandas APIs and is fully compatible with the tools that data scientists use (e.g., Python notebooks). We experimentally demonstrate that, compared to Pandas, PyFroid is able to analyze up to 20X more data on the same machine, provide comparable or better performance for small datasets as well as near-memory data sizes, and consume much less resources.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"1 1","pages":"61-67"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89326433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}