Pub Date : 1900-01-01DOI: 10.1007/978-3-642-86096-6_6
Stella Gatziu Grivas, Klaus R. Dittrich
{"title":"Eine Ereignissprache für das aktive, objektorientierte Datenbanksystem SAMOS","authors":"Stella Gatziu Grivas, Klaus R. Dittrich","doi":"10.1007/978-3-642-86096-6_6","DOIUrl":"https://doi.org/10.1007/978-3-642-86096-6_6","url":null,"abstract":"","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123269658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The analysis of large graphs has received considerable attention recently but current solutions are typically hard to use. In this demonstration paper, we report on an effort to improve the usability of the open-source system Gradoop for processing and analyzing large graphs. This is achieved by integrating Gradoop into the popular open-source software KNIME to visually create graph analysis workflows, without the need for coding. We outline the integration approach and discuss what will be demonstrated.
{"title":"Big graph analysis by visually created workflows","authors":"M. Rostami, E. Peukert, M. Wilke, E. Rahm","doi":"10.18420/btw2019-45","DOIUrl":"https://doi.org/10.18420/btw2019-45","url":null,"abstract":"The analysis of large graphs has received considerable attention recently but current solutions are typically hard to use. In this demonstration paper, we report on an effort to improve the usability of the open-source system Gradoop for processing and analyzing large graphs. This is achieved by integrating Gradoop into the popular open-source software KNIME to visually create graph analysis workflows, without the need for coding. We outline the integration approach and discuss what will be demonstrated.","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123394014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Suspended particulate matter (SPM) is a significant problem discussed in current environmental research with an impact on the every-day life of many people. Our goal for the BTW 2019 Data Science Challenge (DSC) is to leverage information from available sensor data about SPM and assess the benefits and disadvantages of driving bans. Our application builds upon data of 57 sensors in the city of Dresden and 338 sensors in the city of Stuttgart. Each sensor tracks particle concentration, temperature, and humidity. Stuttgart has a particular interesting situation because of the driving ban for outdated diesel engines on roads in the inner city introduced in January 2019. This gives us the possibility to compare the effectiveness of driving bans not only over time but also between two cities. While we only analyze two cities exemplary in this report, we see high potential of applying our tools to other cities and scenarios. We think, this universality of our approach is an important factor in knowledge transfer. The applications are not limited to SPM analyses but can be extended for example to weather and climate research.
{"title":"Assessing the Impact of Driving Bans with Data Analysis","authors":"Lucas Woltmann, Claudio Hartmann, Wolfgang Lehner","doi":"10.18420/btw2019-ws-31","DOIUrl":"https://doi.org/10.18420/btw2019-ws-31","url":null,"abstract":"Suspended particulate matter (SPM) is a significant problem discussed in current environmental research with an impact on the every-day life of many people. Our goal for the BTW 2019 Data Science Challenge (DSC) is to leverage information from available sensor data about SPM and assess the benefits and disadvantages of driving bans. Our application builds upon data of 57 sensors in the city of Dresden and 338 sensors in the city of Stuttgart. Each sensor tracks particle concentration, temperature, and humidity. Stuttgart has a particular interesting situation because of the driving ban for outdated diesel engines on roads in the inner city introduced in January 2019. This gives us the possibility to compare the effectiveness of driving bans not only over time but also between two cities. While we only analyze two cities exemplary in this report, we see high potential of applying our tools to other cities and scenarios. We think, this universality of our approach is an important factor in knowledge transfer. The applications are not limited to SPM analyses but can be extended for example to weather and climate research.","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125253255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Johannes Fett, Christian Schwarz, Urs Kober, Dirk Habich, Wolfgang Lehner
{"title":"Improving GPU Matrix Multiplication by Leveraging Bit Level Granularity and Compression","authors":"Johannes Fett, Christian Schwarz, Urs Kober, Dirk Habich, Wolfgang Lehner","doi":"10.18420/BTW2023-49","DOIUrl":"https://doi.org/10.18420/BTW2023-49","url":null,"abstract":"","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"188 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125675308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mahdi Esmailoghli, S. Redyuk, R. Martinez, Ziawasch Abedjan, T. Rabl, V. Markl
During the last years, high emission of fine-grained particles into the atmosphere and its negative impact on people’s health and well-being has attracted the attention of researchers and governmental agencies to look for the causes of air pollution in different neighbourhoods [7]. Serious measures have been taken in order to sustain the levels of air pollution, such as the introduction of fine-grained particle concentration thresholds or driving bans for vehicles that use diesel engines in several European cities [8]. When it comes to current approaches on predictive modeling in the area of air pollution, many focus on estimating the concentration of fine particulate matter in the nearest future in a particular area [2]. However, identifying the cause of high emission of fine particulate matter, as well as finding its potential sources can provide decision makers with valuable information for the design of counter measures. Detecting the sources of air pollution and treating them is a big step toward better air quality [3]. The problem we observe is that historical records from air quality sensors that are used to forecast the concentration of fine particulate matter are not sufficient for inference of factors that are likely to cause air pollution. Intuitively, we can assume that traffic, factories and production facilities, agriculture etc. might negatively affect the air quality. To test these assumptions, we need to incorporate external data sources into the main dataset of air quality sensory readings (Section 2). For this project, we aim at designing a proto-
{"title":"Explanation of Air Pollution Using External Data Sources","authors":"Mahdi Esmailoghli, S. Redyuk, R. Martinez, Ziawasch Abedjan, T. Rabl, V. Markl","doi":"10.18420/btw2019-ws-32","DOIUrl":"https://doi.org/10.18420/btw2019-ws-32","url":null,"abstract":"During the last years, high emission of fine-grained particles into the atmosphere and its negative impact on people’s health and well-being has attracted the attention of researchers and governmental agencies to look for the causes of air pollution in different neighbourhoods [7]. Serious measures have been taken in order to sustain the levels of air pollution, such as the introduction of fine-grained particle concentration thresholds or driving bans for vehicles that use diesel engines in several European cities [8]. When it comes to current approaches on predictive modeling in the area of air pollution, many focus on estimating the concentration of fine particulate matter in the nearest future in a particular area [2]. However, identifying the cause of high emission of fine particulate matter, as well as finding its potential sources can provide decision makers with valuable information for the design of counter measures. Detecting the sources of air pollution and treating them is a big step toward better air quality [3]. The problem we observe is that historical records from air quality sensors that are used to forecast the concentration of fine particulate matter are not sufficient for inference of factors that are likely to cause air pollution. Intuitively, we can assume that traffic, factories and production facilities, agriculture etc. might negatively affect the air quality. To test these assumptions, we need to incorporate external data sources into the main dataset of air quality sensory readings (Section 2). For this project, we aim at designing a proto-","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127007130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Effectively modelling and querying experience items like movies, books, or games in databases is challenging because these items are better described by their resulting user experience or perceived properties than by factual attributes. However, such information is often subjective, disputed, or unclear. Thus, social judgments like comments, reviews, discussions, or ratings have become a ubiquitous component of most Web applications dealing with such items, especially in the e-commerce domain. However, they usually do not play major role in the query process, and are typically just shown to the user. In this paper, we will discuss how to use unstructured user reviews to build a structured semantic representation of database items such that these perceptual attributes are (at least implicitly) represented and usable for navigational queries. Especially, we argue that a central challenge when extracting perceptual attributes from social judgments is respecting the subjectivity of expressed opinions. We claim that no representation consisting of only a single tuple will be sufficient. Instead, such systems should aim at discovering shared perspectives, representing dominant perceptions and opinions, and exploiting those perspectives for query processing.
{"title":"Perceptual Relational Attributes: Navigating and Discovering Shared Perspectives from User-Generated Reviews","authors":"C. Lofi, Manuel Valle Torre, M. Ye","doi":"10.18420/btw2019-11","DOIUrl":"https://doi.org/10.18420/btw2019-11","url":null,"abstract":"Effectively modelling and querying experience items like movies, books, or games in databases is challenging because these items are better described by their resulting user experience or perceived properties than by factual attributes. However, such information is often subjective, disputed, or unclear. Thus, social judgments like comments, reviews, discussions, or ratings have become a ubiquitous component of most Web applications dealing with such items, especially in the e-commerce domain. However, they usually do not play major role in the query process, and are typically just shown to the user. In this paper, we will discuss how to use unstructured user reviews to build a structured semantic representation of database items such that these perceptual attributes are (at least implicitly) represented and usable for navigational queries. Especially, we argue that a central challenge when extracting perceptual attributes from social judgments is respecting the subjectivity of expressed opinions. We claim that no representation consisting of only a single tuple will be sufficient. Instead, such systems should aim at discovering shared perspectives, representing dominant perceptions and opinions, and exploiting those perspectives for query processing.","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116205778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: Comma-separated values (CSV) is a widely-used format for data exchange. Due to the format’s prevalence, virtually all industrial-strength database systems and stream processing frameworks support importing CSV input. However, loading CSV input close to the speed of I/O hardware is challenging. Modern I/O devices such as InfiniBand NICs and NVMe SSDs are capable of sustaining high transfer rates of 100 Gbit/s and higher. At the same time, CSV parsing performance is limited by the complex control flows that its semi-structured and text-based layout incurs. In this paper, we propose to speed-up loading CSV input using GPUs. We devise a new parsing approach that streamlines the control flow while correctly handling context-sensitive CSV features such as quotes. By offloading I/O and parsing to the GPU, our approach enables databases to load CSVs at high throughput from main memory with NVLink 2.0, as well as directly from the network with RDMA. In our evaluation, we show that GPUs parse real-world datasets at up to 60 GB/s, thereby saturating high-bandwidth I/O devices.
{"title":"Fast CSV Loading Using GPUs and RDMA for In-Memory Data Processing","authors":"Alexander Kumaigorodski, Clemens Lutz, V. Markl","doi":"10.18420/btw2021-01","DOIUrl":"https://doi.org/10.18420/btw2021-01","url":null,"abstract":": Comma-separated values (CSV) is a widely-used format for data exchange. Due to the format’s prevalence, virtually all industrial-strength database systems and stream processing frameworks support importing CSV input. However, loading CSV input close to the speed of I/O hardware is challenging. Modern I/O devices such as InfiniBand NICs and NVMe SSDs are capable of sustaining high transfer rates of 100 Gbit/s and higher. At the same time, CSV parsing performance is limited by the complex control flows that its semi-structured and text-based layout incurs. In this paper, we propose to speed-up loading CSV input using GPUs. We devise a new parsing approach that streamlines the control flow while correctly handling context-sensitive CSV features such as quotes. By offloading I/O and parsing to the GPU, our approach enables databases to load CSVs at high throughput from main memory with NVLink 2.0, as well as directly from the network with RDMA. In our evaluation, we show that GPUs parse real-world datasets at up to 60 GB/s, thereby saturating high-bandwidth I/O devices.","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116446354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1007/978-3-642-60730-1_18
T. Zurek
{"title":"Parallel Temporal Joins","authors":"T. Zurek","doi":"10.1007/978-3-642-60730-1_18","DOIUrl":"https://doi.org/10.1007/978-3-642-60730-1_18","url":null,"abstract":"","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122647587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1007/978-3-642-72617-0_44
Hans-Jürgen Auth
{"title":"Anforderungen an DB-Systeme aus Sicht der MESSAGE HANDLING-Welt","authors":"Hans-Jürgen Auth","doi":"10.1007/978-3-642-72617-0_44","DOIUrl":"https://doi.org/10.1007/978-3-642-72617-0_44","url":null,"abstract":"","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122732601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Clerical tasks are created if a duplicate detection algorithm detects some similarity of records but not enough to allow an auto-merge operation. Data stewards review clerical tasks and make a final non-match or match decision. In this paper we evaluate different machine learning algorithms regarding their accuracy to predict the correct action for a clerical task and execute that action automatically if the prediction has sufficient confidence. This approach reduces the amount of work for data stewards by factors of magnitude.
{"title":"Machine Learning Applied to the Clerical Task Management Problem in Master Data Management Systems","authors":"M. Oberhofer, L. Bremer, Mariya Chkalova","doi":"10.18420/btw2019-25","DOIUrl":"https://doi.org/10.18420/btw2019-25","url":null,"abstract":"Clerical tasks are created if a duplicate detection algorithm detects some similarity of records but not enough to allow an auto-merge operation. Data stewards review clerical tasks and make a final non-match or match decision. In this paper we evaluate different machine learning algorithms regarding their accuracy to predict the correct action for a clerical task and execute that action automatically if the prediction has sufficient confidence. This approach reduces the amount of work for data stewards by factors of magnitude.","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122561380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}