Datenbank-Spektrum : Zeitschrift fur Datenbanktechnologie : Organ der Fachgruppe Datenbanken der Gesellschaft fur Informatik e.V最新文献

英文中文

An Extension of DNAContainer with a Small Memory Footprint 具有小内存占用的dncontainer扩展

Datenbank-Spektrum : Zeitschrift fur Datenbanktechnologie : Organ der Fachgruppe Datenbanken der Gesellschaft fur Informatik e.V

Pub Date : 2023-10-30 DOI: 10.1007/s13222-023-00460-3

Alex El-Shaikh, Bernhard Seeger

Abstract Over the past decade, DNA has emerged as a new storage medium with intriguing data volume and durability capabilities. Despite its advantages, DNA storage also has crucial limitations, such as intricate data access interfaces and restricted random accessibility. To overcome these limitations, DNAContainer has been introduced with a novel storage interface for DNA that spans a very large virtual address space on objects and allows random access to DNA at scale. In this paper, we substantially improve the first version of DNAContainer, focusing on the update capabilities of its data structures and optimizing its memory footprint. In addition, we extend the previous set of experiments on DNAContainer with new ones whose results reveal the impact of essential parameters on the performance and memory footprint.

在过去的十年中，DNA作为一种新的存储介质出现，具有令人着迷的数据量和持久性。尽管具有优势，但DNA存储也有关键的局限性，例如复杂的数据访问接口和受限的随机访问。为了克服这些限制，DNAContainer引入了一种新的DNA存储接口，该接口可以跨越对象上非常大的虚拟地址空间，并允许大规模随机访问DNA。在本文中，我们大幅改进了第一版的DNAContainer，重点关注其数据结构的更新能力和优化其内存占用。此外，我们用新的实验扩展了DNAContainer上的前一组实验，其结果揭示了基本参数对性能和内存占用的影响。

引用次数: 0

SportsTables: A New Corpus for Semantic Type Detection (Extended Version) sportstable:一个新的语义类型检测语料库(扩展版)

Datenbank-Spektrum : Zeitschrift fur Datenbanktechnologie : Organ der Fachgruppe Datenbanken der Gesellschaft fur Informatik e.V

Pub Date : 2023-10-16 DOI: 10.1007/s13222-023-00457-y

Sven Langenecker, Christoph Sturm, Christian Schalles, Carsten Binnig

Abstract Table corpora such as VizNet or TURL which contain annotated semantic types per column are important to build machine learning models for the task of automatic semantic type detection. However, there is a huge discrepancy between corpora and real-world data lakes since they contain a huge fraction of numerical data which are not present in existing corpora. Hence, in this paper, we introduce a new corpus that contains a much higher proportion of numerical columns than existing corpora. To reflect the distribution in real-world data lakes, our corpus SportsTables has on average approx. 86% numerical columns, posing new challenges to existing semantic type detection models which have mainly targeted non-numerical columns so far. To demonstrate this effect, we show in this extended version paper of [18] the results of an extensive study using four different state-of-the-art approaches for semantic type detection on our new corpus. Overall, the results demonstrate significant performance differences in predicting semantic types for textual and numerical data.

表语料库(如VizNet或TURL)每列包含注释的语义类型，对于构建机器学习模型来完成自动语义类型检测任务非常重要。然而，语料库与现实世界的数据湖之间存在巨大的差异，因为它们包含了大量现有语料库中不存在的数值数据。因此，在本文中，我们引入了一个新的语料库，它包含比现有语料库更高比例的数字列。为了反映真实世界数据湖中的分布，我们的语料库sportstabables平均约为。86%的数字列，对现有的主要针对非数字列的语义类型检测模型提出了新的挑战。为了证明这种效果，我们在[18]的这篇扩展版论文中展示了一项广泛研究的结果，该研究使用了四种不同的最先进的方法在我们的新语料库上进行语义类型检测。总体而言，结果表明在预测文本和数字数据的语义类型方面存在显著的性能差异。

{"title":"SportsTables: A New Corpus for Semantic Type Detection (Extended Version)","authors":"Sven Langenecker, Christoph Sturm, Christian Schalles, Carsten Binnig","doi":"10.1007/s13222-023-00457-y","DOIUrl":"https://doi.org/10.1007/s13222-023-00457-y","url":null,"abstract":"Abstract Table corpora such as VizNet or TURL which contain annotated semantic types per column are important to build machine learning models for the task of automatic semantic type detection. However, there is a huge discrepancy between corpora and real-world data lakes since they contain a huge fraction of numerical data which are not present in existing corpora. Hence, in this paper, we introduce a new corpus that contains a much higher proportion of numerical columns than existing corpora. To reflect the distribution in real-world data lakes, our corpus SportsTables has on average approx. 86% numerical columns, posing new challenges to existing semantic type detection models which have mainly targeted non-numerical columns so far. To demonstrate this effect, we show in this extended version paper of [18] the results of an extensive study using four different state-of-the-art approaches for semantic type detection on our new corpus. Overall, the results demonstrate significant performance differences in predicting semantic types for textual and numerical data.","PeriodicalId":72771,"journal":{"name":"Datenbank-Spektrum : Zeitschrift fur Datenbanktechnologie : Organ der Fachgruppe Datenbanken der Gesellschaft fur Informatik e.V","volume":"223 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136115593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dissertationen 论文

Datenbank-Spektrum : Zeitschrift fur Datenbanktechnologie : Organ der Fachgruppe Datenbanken der Gesellschaft fur Informatik e.V

Pub Date : 2023-10-13 DOI: 10.1007/s13222-023-00458-x

引用次数: 0

Accelerating Large Table Scan Using Processing-In-Memory Technology 使用内存处理技术加速大表扫描

Datenbank-Spektrum : Zeitschrift fur Datenbanktechnologie : Organ der Fachgruppe Datenbanken der Gesellschaft fur Informatik e.V

Pub Date : 2023-10-10 DOI: 10.1007/s13222-023-00456-z

Alexander Baumstark, Muhammad Attahir Jibril, Kai-Uwe Sattler

Abstract Today’s systems are capable of storing large amounts of data in main memory. Particularly, in-memory DBMSs benefit from this development. However, the processing of data from the main memory necessarily has to run via the CPU. This creates a bottleneck, which affects the possible performance of the DBMS. Processing-In-Memory (PIM) is a paradigm to overcome this problem, which was not available in commercial systems for a long time. With the availability of UPMEM, a commercial product is finally available that provides PIM technology in hardware. In this work, we focus on the acceleration of the table scan, a fundamental database query operation. We show and investigate an approach that can be used to optimize this operation by using PIM. We evaluate the PIM scan in terms of parallelism and execution time in benchmarks with different table sizes and compare it to a traditional CPU-based table scan. The result is a PIM table scan that outperforms the CPU-based scan significantly.

当今的系统能够在主存中存储大量数据。特别是，内存dbms可以从这种开发中受益。然而，处理来自主存的数据必须通过CPU来运行。这会产生瓶颈，影响DBMS的性能。内存中处理(PIM)是一种克服这个问题的范例，它在很长一段时间内无法在商业系统中使用。随着UPMEM的出现，在硬件中提供PIM技术的商业产品终于出现了。在这项工作中，我们专注于加速表扫描，这是一个基本的数据库查询操作。我们展示并研究了一种可用于通过使用PIM来优化此操作的方法。我们在不同表大小的基准测试中评估PIM扫描的并行性和执行时间，并将其与传统的基于cpu的表扫描进行比较。结果是PIM表扫描的性能明显优于基于cpu的扫描。

引用次数: 0

Geo Engine: Workflow-driven Geospatial Portals for Data Science 地理引擎:工作流驱动的数据科学地理空间门户

Datenbank-Spektrum : Zeitschrift fur Datenbanktechnologie : Organ der Fachgruppe Datenbanken der Gesellschaft fur Informatik e.V

Pub Date : 2023-09-18 DOI: 10.1007/s13222-023-00453-2

Christian Beilschmidt, Johannes Drönner, Michael Mattig, Bernhard Seeger

Geo data portals play a key role in the distribution and exploitation of domain-specific geo data. While such portals are highly specialized, they share a number of common requirements that span from data access and processing to UI components. Geo Engine is able to provide all the necessary parts for portal building. We demonstrate this on a real data portal we built for the dragonfly community and on a Data Science application. In addition, we show its general architecture and outline future improvements.

地理数据门户在特定领域地理数据的分发和利用中起着关键作用。虽然这些门户是高度专门化的，但它们有许多共同的需求，从数据访问和处理到UI组件。Geo Engine能够提供门户网站建设所需的所有部件。我们在为dragonfly社区和data Science应用程序构建的真实数据门户上演示了这一点。此外，我们还展示了它的总体架构并概述了未来的改进。

引用次数: 0

The InsightsNet Climate Change Corpus (ICCC) 洞察网气候变化语料库(ICCC)

Datenbank-Spektrum : Zeitschrift fur Datenbanktechnologie : Organ der Fachgruppe Datenbanken der Gesellschaft fur Informatik e.V

Pub Date : 2023-09-11 DOI: 10.1007/s13222-023-00454-1

Elena Volkanovska, Sherry Tan, Changxu Duan, Sabine Bartsch, Wolfgang Stille

Abstract The discourse on climate change has become a centerpiece of public debate, thereby creating a pressing need to analyze the multitude of messages created by the participants in this communication process. In addition to text, information on this topic is conveyed multimodally, through images, videos, tables and other data objects that are embedded within documents and accompany the text. This paper presents the process of building a multimodal pilot corpus to the InsightsNet Climate Change Corpus (ICCC) and using natural language processing (NLP) tools to enrich corpus (meta)data, thus creating a dataset that lends itself to the exploration of the interplay between the various modalities that constitute the discourse on climate change. We demonstrate how the pilot corpus can be queried for relevant information in two types of databases, and how the proposed data model promotes a more comprehensive sentiment analysis approach.

关于气候变化的讨论已经成为公众辩论的焦点，因此迫切需要分析这一传播过程中参与者所产生的大量信息。除文本外，关于这一主题的信息还通过图像、视频、表格和嵌入文档中并随文本提供的其他数据对象以多种方式传达。本文介绍了为InsightsNet气候变化语料库(ICCC)构建一个多模态试点语料库的过程，并使用自然语言处理(NLP)工具来丰富语料库(元)数据，从而创建一个数据集，该数据集有助于探索构成气候变化话语的各种模式之间的相互作用。我们演示了如何在两种类型的数据库中查询试点语料库中的相关信息，以及所提出的数据模型如何促进更全面的情感分析方法。

引用次数: 0

Datenbankherstellerrecht und Datenbankforschung 非人工智慧法法数据库搜索

Datenbank-Spektrum : Zeitschrift fur Datenbanktechnologie : Organ der Fachgruppe Datenbanken der Gesellschaft fur Informatik e.V

Pub Date : 2023-07-12 DOI: 10.1007/s13222-023-00446-1

Michael Beurskens, Stefanie Scherzinger

引用次数: 0

Datenbank-Community vernetzt sich in Dresden 德累斯顿的资料库社区连接

Datenbank-Spektrum : Zeitschrift fur Datenbanktechnologie : Organ der Fachgruppe Datenbanken der Gesellschaft fur Informatik e.V

Pub Date : 2023-07-01 DOI: 10.1007/s13222-023-00447-0

Wolfgang Lehner

引用次数: 0

Steuerrechtliche Herausforderungen datengetriebener Geschäftsmodelle am Beispiel des Connected-Car-Geschäftsmodells 互联业务模式

Datenbank-Spektrum : Zeitschrift fur Datenbanktechnologie : Organ der Fachgruppe Datenbanken der Gesellschaft fur Informatik e.V

Pub Date : 2023-06-30 DOI: 10.1007/s13222-023-00449-y

Carsten Gröger

引用次数: 0

A Systematic Approach to Consuming Data in Complex Data Management Landscapes Using Data Consumption Patterns 使用数据消费模式在复杂数据管理环境中消费数据的系统方法

Datenbank-Spektrum : Zeitschrift fur Datenbanktechnologie : Organ der Fachgruppe Datenbanken der Gesellschaft fur Informatik e.V

Pub Date : 2023-06-30 DOI: 10.1007/s13222-023-00450-5

Corinna Giebler, Eva Hoos

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Datenbank-Spektrum : Zeitschrift fur Datenbanktechnologie : Organ der Fachgruppe Datenbanken der Gesellschaft fur Informatik e.V

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀