S. Langenecker, Christoph Sturm, Christian Schalles, Carsten Binnig
Table corpora such as VizNet or TURL which contain annotated semantic types per column are important to build machine learning models for the task of automatic semantic type detection. However, there is a huge discrepancy between corpora and real-world data lakes since they contain a huge fraction of numerical data which are not present in existing corpora. Hence, in this paper, we introduce a new corpus that contains a much higher proportion of numerical columns than existing corpora. To reflect the distribution in real-world data lakes, our corpus SportsTables has on average approx. 86% numerical columns, posing new challenges to existing semantic type detection models which have mainly targeted non-numerical columns so far. To demonstrate this effect, we show in this extended version paper of [18] the results of an extensive study using four different state-of-the-art approaches for semantic type detection on our new corpus. Overall, the results demonstrate significant performance differences in predicting semantic types for textual and numerical data.
{"title":"SportsTables: A new Corpus for Semantic Type Detection","authors":"S. Langenecker, Christoph Sturm, Christian Schalles, Carsten Binnig","doi":"10.18420/BTW2023-68","DOIUrl":"https://doi.org/10.18420/BTW2023-68","url":null,"abstract":"Table corpora such as VizNet or TURL which contain annotated semantic types per column are important to build machine learning models for the task of automatic semantic type detection. However, there is a huge discrepancy between corpora and real-world data lakes since they contain a huge fraction of numerical data which are not present in existing corpora. Hence, in this paper, we introduce a new corpus that contains a much higher proportion of numerical columns than existing corpora. To reflect the distribution in real-world data lakes, our corpus SportsTables has on average approx. 86% numerical columns, posing new challenges to existing semantic type detection models which have mainly targeted non-numerical columns so far. To demonstrate this effect, we show in this extended version paper of [18] the results of an extensive study using four different state-of-the-art approaches for semantic type detection on our new corpus. Overall, the results demonstrate significant performance differences in predicting semantic types for textual and numerical data.","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130562111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Today’s systems are capable of storing large amounts of data in main memory. Particularly, in-memory DBMSs benefit from this development. However, the processing of data from the main memory necessarily has to run via the CPU. This creates a bottleneck, which affects the possible performance of the DBMS. Processing-In-Memory (PIM) is a paradigm to overcome this problem, which was not available in commercial systems for a long time. With the availability of UPMEM, a commercial product is finally available that provides PIM technology in hardware. In this work, we focus on the acceleration of the table scan, a fundamental database query operation. We show and investigate an approach that can be used to optimize this operation by using PIM. We evaluate the PIM scan in terms of parallelism and execution time in benchmarks with different table sizes and compare it to a traditional CPU-based table scan. The result is a PIM table scan that outperforms the CPU-based scan significantly.
{"title":"Accelerating Large Table Scan using Processing-In-Memory Technology","authors":"Alexander Baumstark, M. Jibril, K. Sattler","doi":"10.18420/BTW2023-51","DOIUrl":"https://doi.org/10.18420/BTW2023-51","url":null,"abstract":"Today’s systems are capable of storing large amounts of data in main memory. Particularly, in-memory DBMSs benefit from this development. However, the processing of data from the main memory necessarily has to run via the CPU. This creates a bottleneck, which affects the possible performance of the DBMS. Processing-In-Memory (PIM) is a paradigm to overcome this problem, which was not available in commercial systems for a long time. With the availability of UPMEM, a commercial product is finally available that provides PIM technology in hardware. In this work, we focus on the acceleration of the table scan, a fundamental database query operation. We show and investigate an approach that can be used to optimize this operation by using PIM. We evaluate the PIM scan in terms of parallelism and execution time in benchmarks with different table sizes and compare it to a traditional CPU-based table scan. The result is a PIM table scan that outperforms the CPU-based scan significantly.","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"452 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125780903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sabine Bartsch, Changxu Duan, Sherry Tan, Elena Volkanovska, W. Stille
The discourse on climate change has become a centerpiece of public debate, thereby creating a pressing need to analyze the multitude of messages created by the participants in this communication process. In addition to text, information on this topic is conveyed multimodally, through images, videos, tables and other data objects that are embedded within documents and accompany the text. This paper presents the process of building a multimodal pilot corpus to the InsightsNet Climate Change Corpus (ICCC) and using natural language processing (NLP) tools to enrich corpus (meta)data, thus creating a dataset that lends itself to the exploration of the interplay between the various modalities that constitute the discourse on climate change. We demonstrate how the pilot corpus can be queried for relevant information in two types of databases, and how the proposed data model promotes a more comprehensive sentiment analysis approach.
{"title":"The InsightsNet Climate Change Corpus (ICCC)","authors":"Sabine Bartsch, Changxu Duan, Sherry Tan, Elena Volkanovska, W. Stille","doi":"10.18420/BTW2023-59","DOIUrl":"https://doi.org/10.18420/BTW2023-59","url":null,"abstract":"The discourse on climate change has become a centerpiece of public debate, thereby creating a pressing need to analyze the multitude of messages created by the participants in this communication process. In addition to text, information on this topic is conveyed multimodally, through images, videos, tables and other data objects that are embedded within documents and accompany the text. This paper presents the process of building a multimodal pilot corpus to the InsightsNet Climate Change Corpus (ICCC) and using natural language processing (NLP) tools to enrich corpus (meta)data, thus creating a dataset that lends itself to the exploration of the interplay between the various modalities that constitute the discourse on climate change. We demonstrate how the pilot corpus can be queried for relevant information in two types of databases, and how the proposed data model promotes a more comprehensive sentiment analysis approach.","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130358233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-17DOI: 10.48550/arXiv.2301.07095
Dennis Aumiller, Jing Fan, Michael Gertz
With recent advancements in the area of Natural Language Processing, the focus is slowly shifting from a purely English-centric view towards more language-specific solutions, including German. Especially practical for businesses to analyze their growing amount of textual data are text summarization systems, which transform long input documents into compressed and more digestible summary texts. In this work, we assess the particular landscape of German abstractive text summarization and investigate the reasons why practically useful solutions for abstractive text summarization are still absent in industry. Our focus is two-fold, analyzing a) training resources, and b) publicly available summarization systems. We are able to show that popular existing datasets exhibit crucial flaws in their assumptions about the original sources, which frequently leads to detrimental effects on system generalization and evaluation biases. We confirm that for the most popular training dataset, MLSUM, over 50% of the training set is unsuitable for abstractive summarization purposes. Furthermore, available systems frequently fail to compare to simple baselines, and ignore more effective and efficient extractive summarization approaches. We attribute poor evaluation quality to a variety of different factors, which are investigated in more detail in this work: A lack of qualitative (and diverse) gold data considered for training, understudied (and untreated) positional biases in some of the existing datasets, and the lack of easily accessible and streamlined pre-processing strategies or analysis tools. We provide a comprehensive assessment of available models on the cleaned datasets, and find that this can lead to a reduction of more than 20 ROUGE-1 points during evaluation. The code for dataset filtering and reproducing results can be found online at https://github.com/dennlinger/summaries
{"title":"On the State of German (Abstractive) Text Summarization","authors":"Dennis Aumiller, Jing Fan, Michael Gertz","doi":"10.48550/arXiv.2301.07095","DOIUrl":"https://doi.org/10.48550/arXiv.2301.07095","url":null,"abstract":"With recent advancements in the area of Natural Language Processing, the focus is slowly shifting from a purely English-centric view towards more language-specific solutions, including German. Especially practical for businesses to analyze their growing amount of textual data are text summarization systems, which transform long input documents into compressed and more digestible summary texts. In this work, we assess the particular landscape of German abstractive text summarization and investigate the reasons why practically useful solutions for abstractive text summarization are still absent in industry. Our focus is two-fold, analyzing a) training resources, and b) publicly available summarization systems. We are able to show that popular existing datasets exhibit crucial flaws in their assumptions about the original sources, which frequently leads to detrimental effects on system generalization and evaluation biases. We confirm that for the most popular training dataset, MLSUM, over 50% of the training set is unsuitable for abstractive summarization purposes. Furthermore, available systems frequently fail to compare to simple baselines, and ignore more effective and efficient extractive summarization approaches. We attribute poor evaluation quality to a variety of different factors, which are investigated in more detail in this work: A lack of qualitative (and diverse) gold data considered for training, understudied (and untreated) positional biases in some of the existing datasets, and the lack of easily accessible and streamlined pre-processing strategies or analysis tools. We provide a comprehensive assessment of available models on the cleaned datasets, and find that this can lead to a reduction of more than 20 ROUGE-1 points during evaluation. The code for dataset filtering and reproducing results can be found online at https://github.com/dennlinger/summaries","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132983195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-10DOI: 10.48550/arXiv.2210.04484
F. Schuhknecht, Simon Jörz
Blockchain systems essentially consist of two levels: The network level has the responsibility of distributing an ordered stream of transactions to all nodes of the network in exactly the same way, even in the presence of a certain amount of malicious parties (byzantine fault tolerance). On the node level, each node then receives this ordered stream of transactions and executes it within some sort of transaction processing system, typically to alter some kind of state. This clear separation into two levels as well as drastically different application requirements have led to the materialization of the network level in form of so-called blockchain frameworks. While providing all the"blockchain features", these frameworks leave the node level backend flexible or even left to be implemented depending on the specific needs of the application. In the following paper, we present how to integrate a highly versatile transaction processing system, namely a relational DBMS, into such a blockchain framework. As framework, we use the popular Tendermint Core, now part of the Ignite/Cosmos eco-system, which can run both public and permissioned networks and combine it with relational DBMSs as the backend. This results in a"relational blockchain", which is able to run deterministic SQL on a fully replicated relational database. Apart from presenting the integration and its pitfalls, we will carefully evaluate the performance implications of such combinations, in particular, the throughput and latency overhead caused by the blockchain layer on top of the DBMS. As a result, we give recommendations on how to run such a systems combination efficiently in practice.
{"title":"The Easiest Way of Turning your Relational Database into a Blockchain - and the Cost of Doing So","authors":"F. Schuhknecht, Simon Jörz","doi":"10.48550/arXiv.2210.04484","DOIUrl":"https://doi.org/10.48550/arXiv.2210.04484","url":null,"abstract":"Blockchain systems essentially consist of two levels: The network level has the responsibility of distributing an ordered stream of transactions to all nodes of the network in exactly the same way, even in the presence of a certain amount of malicious parties (byzantine fault tolerance). On the node level, each node then receives this ordered stream of transactions and executes it within some sort of transaction processing system, typically to alter some kind of state. This clear separation into two levels as well as drastically different application requirements have led to the materialization of the network level in form of so-called blockchain frameworks. While providing all the\"blockchain features\", these frameworks leave the node level backend flexible or even left to be implemented depending on the specific needs of the application. In the following paper, we present how to integrate a highly versatile transaction processing system, namely a relational DBMS, into such a blockchain framework. As framework, we use the popular Tendermint Core, now part of the Ignite/Cosmos eco-system, which can run both public and permissioned networks and combine it with relational DBMSs as the backend. This results in a\"relational blockchain\", which is able to run deterministic SQL on a fully replicated relational database. Apart from presenting the integration and its pitfalls, we will carefully evaluate the performance implications of such combinations, in particular, the throughput and latency overhead caused by the blockchain layer on top of the DBMS. As a result, we give recommendations on how to run such a systems combination efficiently in practice.","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130412710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Mauerer, Ralf Ramsauer, Edson Ramiro Lucas Filho, Stefanie Scherzinger
When multiple tenants compete for resources, database performance tends to suffer. Yet there are scenarios where guaranteed sub-millisecond latencies are crucial, such as in real-time data processing, IoT devices, or when operating in safety-critical environments. In this paper, we study how to make query latencies deterministic in the face of noise (whether caused by other tenants or unrelated operating system tasks). We perform controlled experiments with an in-memory database engine in a multi-tenant setting, where we successively eradicate noisy interference from within the system software stack, to the point where the engine runs close to bare-metal on the underlying hardware. We show that we can achieve query latencies comparable to the database engine running as the sole tenant, but without noticeably impacting the workload of competing tenants. We discuss these results in the context of ongoing efforts to build custom operating systems for database workloads, and point out that for certain use cases, the margin for improvement is rather narrow. In fact, for scenarios like ours, existing operating systems might just be good enough, provided that they are expertly configured. We then critically discuss these findings in the light of a broader family of database systems (e.g., including disk-based), and how to extend the approach of this paper accordingly.
{"title":"Silentium! Run-Analyse-Eradicate the Noise out of the DB/OS Stack","authors":"W. Mauerer, Ralf Ramsauer, Edson Ramiro Lucas Filho, Stefanie Scherzinger","doi":"10.18420/btw2021-21","DOIUrl":"https://doi.org/10.18420/btw2021-21","url":null,"abstract":"When multiple tenants compete for resources, database performance tends to suffer. Yet there are scenarios where guaranteed sub-millisecond latencies are crucial, such as in real-time data processing, IoT devices, or when operating in safety-critical environments. In this paper, we study how to make query latencies deterministic in the face of noise (whether caused by other tenants or unrelated operating system tasks). We perform controlled experiments with an in-memory database engine in a multi-tenant setting, where we successively eradicate noisy interference from within the system software stack, to the point where the engine runs close to bare-metal on the underlying hardware. We show that we can achieve query latencies comparable to the database engine running as the sole tenant, but without noticeably impacting the workload of competing tenants. We discuss these results in the context of ongoing efforts to build custom operating systems for database workloads, and point out that for certain use cases, the margin for improvement is rather narrow. In fact, for scenarios like ours, existing operating systems might just be good enough, provided that they are expertly configured. We then critically discuss these findings in the light of a broader family of database systems (e.g., including disk-based), and how to extend the approach of this paper accordingly.","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"19 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120837431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent trends in business and technology (e.g., machine learning, social network analysis) benefit from storing and processing growing amounts of graph-structured data in databases and data science platforms. FPGAs as accelerators for graph processing with a customizable memory hierarchy promise solving performance problems caused by inherent irregular memory access patterns on traditional hardware (e.g., CPU). However, developing such hardware accelerators is yet time-consuming and difficult and benchmarking is non-standardized, hindering comprehension of the impact of memory access pattern changes and systematic engineering of graph processing accelerators. In this work, we propose a simulation environment for the analysis of graph processing accelerators based on simulating their memory access patterns. Further, we evaluate our approach on two state-of-the-art FPGA graph processing accelerators and show reproducibility, comparablity, as well as the shortened development process by an example. Not implementing the cycle-accurate internal data flow on accelerator hardware like FPGAs significantly reduces the implementation time, increases the benchmark parameter transparency, and allows comparison of graph processing approaches.
{"title":"Exploring Memory Access Patterns for Graph Processing Accelerators","authors":"Jonas Dann, Daniel Ritter, H. Fröning","doi":"10.18420/btw2021-05","DOIUrl":"https://doi.org/10.18420/btw2021-05","url":null,"abstract":"Recent trends in business and technology (e.g., machine learning, social network analysis) benefit from storing and processing growing amounts of graph-structured data in databases and data science platforms. FPGAs as accelerators for graph processing with a customizable memory hierarchy promise solving performance problems caused by inherent irregular memory access patterns on traditional hardware (e.g., CPU). However, developing such hardware accelerators is yet time-consuming and difficult and benchmarking is non-standardized, hindering comprehension of the impact of memory access pattern changes and systematic engineering of graph processing accelerators. \u0000In this work, we propose a simulation environment for the analysis of graph processing accelerators based on simulating their memory access patterns. Further, we evaluate our approach on two state-of-the-art FPGA graph processing accelerators and show reproducibility, comparablity, as well as the shortened development process by an example. Not implementing the cycle-accurate internal data flow on accelerator hardware like FPGAs significantly reduces the implementation time, increases the benchmark parameter transparency, and allows comparison of graph processing approaches.","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123300978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zum zweiten Mal — nach der BTW 2017 in Stuttgart [Wa17] — findet auf der BTWKonferenzreihe die Data Science Challenge statt. Die Teilnehmer der Challenge hatten die Möglichkeit, ihren eigenen Ansatz zur cloud-basierten Datenanalyse zu entwickeln und damit im direkten Vergleich gegen andere Teilnehmer anzutreten.
{"title":"Die Data Science Challenge auf der BTW 2019 in Rostock","authors":"Hannes Grunert, Holger Meyer","doi":"10.18420/btw2019-ws-30","DOIUrl":"https://doi.org/10.18420/btw2019-ws-30","url":null,"abstract":"Zum zweiten Mal — nach der BTW 2017 in Stuttgart [Wa17] — findet auf der BTWKonferenzreihe die Data Science Challenge statt. Die Teilnehmer der Challenge hatten die Möglichkeit, ihren eigenen Ansatz zur cloud-basierten Datenanalyse zu entwickeln und damit im direkten Vergleich gegen andere Teilnehmer anzutreten.","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"308 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129750194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Möller, Nicolas Berton, Meike Klettke, Stefanie Scherzinger, U. Störl
We present jHound, a tool for profiling large collections of JSON data, and apply it to thousands of data sets holding open government data. jHound reports key characteristics of JSON documents, such as their nesting depth. As we show, jHound can help detect structural outliers, and most importantly, badly encoded documents: jHound can pinpoint certain cases of documents that use string-typed values where other native JSON datatypes would have been a better match. Moreover, we can detect certain cases of maladaptively structured JSON documents, which obviously do not comply with good data modeling practices. By interactively exploring particular example documents, we hope to inspire discussions in the community about what makes a good JSON encoding.
{"title":"jHound: Large-Scale Profiling of Open JSON Data","authors":"M. Möller, Nicolas Berton, Meike Klettke, Stefanie Scherzinger, U. Störl","doi":"10.18420/btw2019-44","DOIUrl":"https://doi.org/10.18420/btw2019-44","url":null,"abstract":"We present jHound, a tool for profiling large collections of JSON data, and apply it to thousands of data sets holding open government data. jHound reports key characteristics of JSON documents, such as their nesting depth. As we show, jHound can help detect structural outliers, and most importantly, badly encoded documents: jHound can pinpoint certain cases of documents that use string-typed values where other native JSON datatypes would have been a better match. Moreover, we can detect certain cases of maladaptively structured JSON documents, which obviously do not comply with good data modeling practices. By interactively exploring particular example documents, we hope to inspire discussions in the community about what makes a good JSON encoding.","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123463824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-01DOI: 10.15496/publikation-25721
D. Butterstein, Torsten Grust
We demonstrate how a surgical change to PostgreSQL’s evaluation strategy for SQL expressions can have a noticeable impact on overall query runtime performance. (This is an abridged version of [BG16], originally demonstrated at VLDB 2016.) The evaluation of scalar or Boolean expressions often takes the backseat in a discussion of query processing although table scans, filters, aggregates, projections
{"title":"Invest Once, Save a Million Times - LLVM-based Expression Compilation in PostgreSQL","authors":"D. Butterstein, Torsten Grust","doi":"10.15496/publikation-25721","DOIUrl":"https://doi.org/10.15496/publikation-25721","url":null,"abstract":"We demonstrate how a surgical change to PostgreSQL’s evaluation strategy for SQL expressions can have a noticeable impact on overall query runtime performance. (This is an abridged version of [BG16], originally demonstrated at VLDB 2016.) The evaluation of scalar or Boolean expressions often takes the backseat in a discussion of query processing although table scans, filters, aggregates, projections","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130533198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}