首页 > 最新文献

Datenbanksysteme für Business, Technologie und Web最新文献

英文 中文
Eine Ereignissprache für das aktive, objektorientierte Datenbanksystem SAMOS 萨摩斯激活的对象对象数据库系统的事情
Pub Date : 1900-01-01 DOI: 10.1007/978-3-642-86096-6_6
Stella Gatziu Grivas, Klaus R. Dittrich
{"title":"Eine Ereignissprache für das aktive, objektorientierte Datenbanksystem SAMOS","authors":"Stella Gatziu Grivas, Klaus R. Dittrich","doi":"10.1007/978-3-642-86096-6_6","DOIUrl":"https://doi.org/10.1007/978-3-642-86096-6_6","url":null,"abstract":"","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123269658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Big graph analysis by visually created workflows 通过可视化创建工作流进行大图形分析
Pub Date : 1900-01-01 DOI: 10.18420/btw2019-45
M. Rostami, E. Peukert, M. Wilke, E. Rahm
The analysis of large graphs has received considerable attention recently but current solutions are typically hard to use. In this demonstration paper, we report on an effort to improve the usability of the open-source system Gradoop for processing and analyzing large graphs. This is achieved by integrating Gradoop into the popular open-source software KNIME to visually create graph analysis workflows, without the need for coding. We outline the integration approach and discuss what will be demonstrated.
大图形的分析最近受到了相当大的关注,但目前的解决方案通常难以使用。在这篇演示论文中,我们报告了一项改进开源系统Gradoop用于处理和分析大型图形的可用性的工作。这是通过将Gradoop集成到流行的开源软件KNIME中来可视化地创建图形分析工作流,而无需编码来实现的。我们概述了集成方法并讨论了将要演示的内容。
{"title":"Big graph analysis by visually created workflows","authors":"M. Rostami, E. Peukert, M. Wilke, E. Rahm","doi":"10.18420/btw2019-45","DOIUrl":"https://doi.org/10.18420/btw2019-45","url":null,"abstract":"The analysis of large graphs has received considerable attention recently but current solutions are typically hard to use. In this demonstration paper, we report on an effort to improve the usability of the open-source system Gradoop for processing and analyzing large graphs. This is achieved by integrating Gradoop into the popular open-source software KNIME to visually create graph analysis workflows, without the need for coding. We outline the integration approach and discuss what will be demonstrated.","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123394014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Assessing the Impact of Driving Bans with Data Analysis 用数据分析评估驾驶禁令的影响
Pub Date : 1900-01-01 DOI: 10.18420/btw2019-ws-31
Lucas Woltmann, Claudio Hartmann, Wolfgang Lehner
Suspended particulate matter (SPM) is a significant problem discussed in current environmental research with an impact on the every-day life of many people. Our goal for the BTW 2019 Data Science Challenge (DSC) is to leverage information from available sensor data about SPM and assess the benefits and disadvantages of driving bans. Our application builds upon data of 57 sensors in the city of Dresden and 338 sensors in the city of Stuttgart. Each sensor tracks particle concentration, temperature, and humidity. Stuttgart has a particular interesting situation because of the driving ban for outdated diesel engines on roads in the inner city introduced in January 2019. This gives us the possibility to compare the effectiveness of driving bans not only over time but also between two cities. While we only analyze two cities exemplary in this report, we see high potential of applying our tools to other cities and scenarios. We think, this universality of our approach is an important factor in knowledge transfer. The applications are not limited to SPM analyses but can be extended for example to weather and climate research.
悬浮颗粒物(SPM)是当前环境研究中的一个重要问题,它影响着许多人的日常生活。我们2019年BTW数据科学挑战赛(DSC)的目标是利用现有传感器数据中有关SPM的信息,并评估驾驶禁令的利弊。我们的应用程序建立在德累斯顿市57个传感器和斯图加特市338个传感器的数据基础上。每个传感器跟踪颗粒浓度、温度和湿度。斯图加特的情况特别有趣,因为2019年1月开始禁止在内城道路上使用过时的柴油发动机。这不仅使我们有可能比较不同时期和不同城市间禁行措施的效果。虽然我们在本报告中只分析了两个模范城市,但我们看到了将我们的工具应用于其他城市和场景的巨大潜力。我们认为,这种方法的普遍性是知识转移的一个重要因素。应用不仅限于SPM分析,还可以扩展到例如天气和气候研究。
{"title":"Assessing the Impact of Driving Bans with Data Analysis","authors":"Lucas Woltmann, Claudio Hartmann, Wolfgang Lehner","doi":"10.18420/btw2019-ws-31","DOIUrl":"https://doi.org/10.18420/btw2019-ws-31","url":null,"abstract":"Suspended particulate matter (SPM) is a significant problem discussed in current environmental research with an impact on the every-day life of many people. Our goal for the BTW 2019 Data Science Challenge (DSC) is to leverage information from available sensor data about SPM and assess the benefits and disadvantages of driving bans. Our application builds upon data of 57 sensors in the city of Dresden and 338 sensors in the city of Stuttgart. Each sensor tracks particle concentration, temperature, and humidity. Stuttgart has a particular interesting situation because of the driving ban for outdated diesel engines on roads in the inner city introduced in January 2019. This gives us the possibility to compare the effectiveness of driving bans not only over time but also between two cities. While we only analyze two cities exemplary in this report, we see high potential of applying our tools to other cities and scenarios. We think, this universality of our approach is an important factor in knowledge transfer. The applications are not limited to SPM analyses but can be extended for example to weather and climate research.","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125253255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improving GPU Matrix Multiplication by Leveraging Bit Level Granularity and Compression 利用位级粒度和压缩改进GPU矩阵乘法
Pub Date : 1900-01-01 DOI: 10.18420/BTW2023-49
Johannes Fett, Christian Schwarz, Urs Kober, Dirk Habich, Wolfgang Lehner
{"title":"Improving GPU Matrix Multiplication by Leveraging Bit Level Granularity and Compression","authors":"Johannes Fett, Christian Schwarz, Urs Kober, Dirk Habich, Wolfgang Lehner","doi":"10.18420/BTW2023-49","DOIUrl":"https://doi.org/10.18420/BTW2023-49","url":null,"abstract":"","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"188 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125675308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explanation of Air Pollution Using External Data Sources 利用外部数据来源解释空气污染
Pub Date : 1900-01-01 DOI: 10.18420/btw2019-ws-32
Mahdi Esmailoghli, S. Redyuk, R. Martinez, Ziawasch Abedjan, T. Rabl, V. Markl
During the last years, high emission of fine-grained particles into the atmosphere and its negative impact on people’s health and well-being has attracted the attention of researchers and governmental agencies to look for the causes of air pollution in different neighbourhoods [7]. Serious measures have been taken in order to sustain the levels of air pollution, such as the introduction of fine-grained particle concentration thresholds or driving bans for vehicles that use diesel engines in several European cities [8]. When it comes to current approaches on predictive modeling in the area of air pollution, many focus on estimating the concentration of fine particulate matter in the nearest future in a particular area [2]. However, identifying the cause of high emission of fine particulate matter, as well as finding its potential sources can provide decision makers with valuable information for the design of counter measures. Detecting the sources of air pollution and treating them is a big step toward better air quality [3]. The problem we observe is that historical records from air quality sensors that are used to forecast the concentration of fine particulate matter are not sufficient for inference of factors that are likely to cause air pollution. Intuitively, we can assume that traffic, factories and production facilities, agriculture etc. might negatively affect the air quality. To test these assumptions, we need to incorporate external data sources into the main dataset of air quality sensory readings (Section 2). For this project, we aim at designing a proto-
近年来,细颗粒大量排放到大气中及其对人们健康和福祉的负面影响引起了研究人员和政府机构的关注,他们开始寻找不同社区空气污染的原因[7]。为了维持空气污染水平,已经采取了严厉的措施,例如在几个欧洲城市引入细颗粒浓度阈值或禁止使用柴油发动机的车辆行驶[8]。就目前空气污染领域的预测建模方法而言,许多方法侧重于估算某一特定区域最近未来的细颗粒物浓度[2]。然而,确定细颗粒物高排放的原因,并找到其潜在的来源,可以为决策者提供设计对策的有价值的信息。发现空气污染源并加以治理是向改善空气质量迈出的一大步[3]。我们观察到的问题是,用于预测细颗粒物浓度的空气质量传感器的历史记录不足以推断可能导致空气污染的因素。直观地,我们可以假设交通、工厂和生产设施、农业等可能会对空气质量产生负面影响。为了验证这些假设,我们需要将外部数据源纳入空气质量传感器读数的主数据集(第2节)。对于本项目,我们的目标是设计一个原型
{"title":"Explanation of Air Pollution Using External Data Sources","authors":"Mahdi Esmailoghli, S. Redyuk, R. Martinez, Ziawasch Abedjan, T. Rabl, V. Markl","doi":"10.18420/btw2019-ws-32","DOIUrl":"https://doi.org/10.18420/btw2019-ws-32","url":null,"abstract":"During the last years, high emission of fine-grained particles into the atmosphere and its negative impact on people’s health and well-being has attracted the attention of researchers and governmental agencies to look for the causes of air pollution in different neighbourhoods [7]. Serious measures have been taken in order to sustain the levels of air pollution, such as the introduction of fine-grained particle concentration thresholds or driving bans for vehicles that use diesel engines in several European cities [8]. When it comes to current approaches on predictive modeling in the area of air pollution, many focus on estimating the concentration of fine particulate matter in the nearest future in a particular area [2]. However, identifying the cause of high emission of fine particulate matter, as well as finding its potential sources can provide decision makers with valuable information for the design of counter measures. Detecting the sources of air pollution and treating them is a big step toward better air quality [3]. The problem we observe is that historical records from air quality sensors that are used to forecast the concentration of fine particulate matter are not sufficient for inference of factors that are likely to cause air pollution. Intuitively, we can assume that traffic, factories and production facilities, agriculture etc. might negatively affect the air quality. To test these assumptions, we need to incorporate external data sources into the main dataset of air quality sensory readings (Section 2). For this project, we aim at designing a proto-","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127007130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Perceptual Relational Attributes: Navigating and Discovering Shared Perspectives from User-Generated Reviews 感知关系属性:从用户生成的评论中导航和发现共享观点
Pub Date : 1900-01-01 DOI: 10.18420/btw2019-11
C. Lofi, Manuel Valle Torre, M. Ye
Effectively modelling and querying experience items like movies, books, or games in databases is challenging because these items are better described by their resulting user experience or perceived properties than by factual attributes. However, such information is often subjective, disputed, or unclear. Thus, social judgments like comments, reviews, discussions, or ratings have become a ubiquitous component of most Web applications dealing with such items, especially in the e-commerce domain. However, they usually do not play major role in the query process, and are typically just shown to the user. In this paper, we will discuss how to use unstructured user reviews to build a structured semantic representation of database items such that these perceptual attributes are (at least implicitly) represented and usable for navigational queries. Especially, we argue that a central challenge when extracting perceptual attributes from social judgments is respecting the subjectivity of expressed opinions. We claim that no representation consisting of only a single tuple will be sufficient. Instead, such systems should aim at discovering shared perspectives, representing dominant perceptions and opinions, and exploiting those perspectives for query processing.
有效地在数据库中建模和查询电影、书籍或游戏等体验项目是一项挑战,因为这些项目是通过其最终用户体验或感知属性而不是事实属性来更好地描述的。然而,这些信息往往是主观的、有争议的或不明确的。因此,诸如评论、评论、讨论或评级之类的社会判断已经成为处理此类项目的大多数Web应用程序中无处不在的组件,尤其是在电子商务领域。但是,它们通常在查询过程中不起主要作用,通常只是显示给用户。在本文中,我们将讨论如何使用非结构化的用户评论来构建数据库项的结构化语义表示,以便(至少隐式地)表示这些感知属性并可用于导航查询。特别是,我们认为从社会判断中提取感知属性时的一个核心挑战是尊重表达意见的主观性。我们声明任何仅由单个元组组成的表示都是不充分的。相反,这样的系统应该致力于发现共享的观点,代表主导的看法和意见,并利用这些观点进行查询处理。
{"title":"Perceptual Relational Attributes: Navigating and Discovering Shared Perspectives from User-Generated Reviews","authors":"C. Lofi, Manuel Valle Torre, M. Ye","doi":"10.18420/btw2019-11","DOIUrl":"https://doi.org/10.18420/btw2019-11","url":null,"abstract":"Effectively modelling and querying experience items like movies, books, or games in databases is challenging because these items are better described by their resulting user experience or perceived properties than by factual attributes. However, such information is often subjective, disputed, or unclear. Thus, social judgments like comments, reviews, discussions, or ratings have become a ubiquitous component of most Web applications dealing with such items, especially in the e-commerce domain. However, they usually do not play major role in the query process, and are typically just shown to the user. In this paper, we will discuss how to use unstructured user reviews to build a structured semantic representation of database items such that these perceptual attributes are (at least implicitly) represented and usable for navigational queries. Especially, we argue that a central challenge when extracting perceptual attributes from social judgments is respecting the subjectivity of expressed opinions. We claim that no representation consisting of only a single tuple will be sufficient. Instead, such systems should aim at discovering shared perspectives, representing dominant perceptions and opinions, and exploiting those perspectives for query processing.","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116205778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast CSV Loading Using GPUs and RDMA for In-Memory Data Processing 快速CSV加载使用gpu和RDMA内存中的数据处理
Pub Date : 1900-01-01 DOI: 10.18420/btw2021-01
Alexander Kumaigorodski, Clemens Lutz, V. Markl
: Comma-separated values (CSV) is a widely-used format for data exchange. Due to the format’s prevalence, virtually all industrial-strength database systems and stream processing frameworks support importing CSV input. However, loading CSV input close to the speed of I/O hardware is challenging. Modern I/O devices such as InfiniBand NICs and NVMe SSDs are capable of sustaining high transfer rates of 100 Gbit/s and higher. At the same time, CSV parsing performance is limited by the complex control flows that its semi-structured and text-based layout incurs. In this paper, we propose to speed-up loading CSV input using GPUs. We devise a new parsing approach that streamlines the control flow while correctly handling context-sensitive CSV features such as quotes. By offloading I/O and parsing to the GPU, our approach enables databases to load CSVs at high throughput from main memory with NVLink 2.0, as well as directly from the network with RDMA. In our evaluation, we show that GPUs parse real-world datasets at up to 60 GB/s, thereby saturating high-bandwidth I/O devices.
: CSV (Comma-separated values)是一种广泛使用的数据交换格式。由于该格式的流行,几乎所有工业强度的数据库系统和流处理框架都支持导入CSV输入。但是,加载接近I/O硬件速度的CSV输入是一项挑战。InfiniBand网卡和NVMe ssd等现代I/O设备能够维持100 Gbit/s甚至更高的传输速率。同时,CSV解析性能受到其半结构化和基于文本的布局所导致的复杂控制流的限制。在本文中,我们提出使用gpu加速加载CSV输入。我们设计了一种新的解析方法,可以在正确处理上下文敏感的CSV特性(如引号)的同时简化控制流程。通过将I/O和解析卸载到GPU,我们的方法使数据库能够使用NVLink 2.0从主内存以高吞吐量加载csv,也可以直接从RDMA网络加载csv。在我们的评估中,我们表明gpu以高达60 GB/s的速度解析真实世界的数据集,从而使高带宽I/O设备饱和。
{"title":"Fast CSV Loading Using GPUs and RDMA for In-Memory Data Processing","authors":"Alexander Kumaigorodski, Clemens Lutz, V. Markl","doi":"10.18420/btw2021-01","DOIUrl":"https://doi.org/10.18420/btw2021-01","url":null,"abstract":": Comma-separated values (CSV) is a widely-used format for data exchange. Due to the format’s prevalence, virtually all industrial-strength database systems and stream processing frameworks support importing CSV input. However, loading CSV input close to the speed of I/O hardware is challenging. Modern I/O devices such as InfiniBand NICs and NVMe SSDs are capable of sustaining high transfer rates of 100 Gbit/s and higher. At the same time, CSV parsing performance is limited by the complex control flows that its semi-structured and text-based layout incurs. In this paper, we propose to speed-up loading CSV input using GPUs. We devise a new parsing approach that streamlines the control flow while correctly handling context-sensitive CSV features such as quotes. By offloading I/O and parsing to the GPU, our approach enables databases to load CSVs at high throughput from main memory with NVLink 2.0, as well as directly from the network with RDMA. In our evaluation, we show that GPUs parse real-world datasets at up to 60 GB/s, thereby saturating high-bandwidth I/O devices.","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116446354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Parallel Temporal Joins 并行时态连接
Pub Date : 1900-01-01 DOI: 10.1007/978-3-642-60730-1_18
T. Zurek
{"title":"Parallel Temporal Joins","authors":"T. Zurek","doi":"10.1007/978-3-642-60730-1_18","DOIUrl":"https://doi.org/10.1007/978-3-642-60730-1_18","url":null,"abstract":"","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122647587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Anforderungen an DB-Systeme aus Sicht der MESSAGE HANDLING-Welt 语音系统要求从语音世界的角度
Pub Date : 1900-01-01 DOI: 10.1007/978-3-642-72617-0_44
Hans-Jürgen Auth
{"title":"Anforderungen an DB-Systeme aus Sicht der MESSAGE HANDLING-Welt","authors":"Hans-Jürgen Auth","doi":"10.1007/978-3-642-72617-0_44","DOIUrl":"https://doi.org/10.1007/978-3-642-72617-0_44","url":null,"abstract":"","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122732601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Learning Applied to the Clerical Task Management Problem in Master Data Management Systems 机器学习在主数据管理系统文书任务管理中的应用
Pub Date : 1900-01-01 DOI: 10.18420/btw2019-25
M. Oberhofer, L. Bremer, Mariya Chkalova
Clerical tasks are created if a duplicate detection algorithm detects some similarity of records but not enough to allow an auto-merge operation. Data stewards review clerical tasks and make a final non-match or match decision. In this paper we evaluate different machine learning algorithms regarding their accuracy to predict the correct action for a clerical task and execute that action automatically if the prediction has sufficient confidence. This approach reduces the amount of work for data stewards by factors of magnitude.
如果重复检测算法检测到记录的某些相似性,但不足以允许自动合并操作,则创建文书任务。数据管理员审查文书任务并做出最终的不匹配或匹配决定。在本文中,我们评估了不同的机器学习算法在预测文书任务的正确动作方面的准确性,并在预测具有足够置信度的情况下自动执行该动作。这种方法在很大程度上减少了数据管理员的工作量。
{"title":"Machine Learning Applied to the Clerical Task Management Problem in Master Data Management Systems","authors":"M. Oberhofer, L. Bremer, Mariya Chkalova","doi":"10.18420/btw2019-25","DOIUrl":"https://doi.org/10.18420/btw2019-25","url":null,"abstract":"Clerical tasks are created if a duplicate detection algorithm detects some similarity of records but not enough to allow an auto-merge operation. Data stewards review clerical tasks and make a final non-match or match decision. In this paper we evaluate different machine learning algorithms regarding their accuracy to predict the correct action for a clerical task and execute that action automatically if the prediction has sufficient confidence. This approach reduces the amount of work for data stewards by factors of magnitude.","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122561380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Datenbanksysteme für Business, Technologie und Web
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1