Advances in database technology : proceedings. International Conference on Extending Database Technology最新文献

英文中文

A Formal Design Framework for Practical Property Graph Schema Languages 实用属性图模式语言的形式化设计框架

Advances in database technology : proceedings. International Conference on Extending Database Technology

Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.40

Nimo Beeren, G. Fletcher

Graph databases are increasingly receiving attention from industry and academia, due in part to their flexibility; a schema is often not required. However, schemas can significantly benefit query optimization, data integrity, and documentation. There currently does not exist a formal framework which captures the design space of state-of-the-art schema solutions. We present a formal design framework for property graph schema languages based on first-order logic rules, which balances expressivity and practicality. We show how this framework can be adapted to integrate a core set of constraints common in conceptual data modeling methods. To demonstrate practical feasibility, this model is imple-mented using graph queries for modern graph database systems, which we evaluate through a controlled experiment. We find that validation time scales linearly with the size of the data, while only using unoptimized straightforward implementations.

图数据库越来越受到工业界和学术界的关注，部分原因是其灵活性;通常不需要模式。但是，模式可以显著地促进查询优化、数据完整性和文档编制。目前还不存在一个正式的框架来捕捉最先进的模式解决方案的设计空间。提出了一种基于一阶逻辑规则的属性图模式语言的形式化设计框架，该框架兼顾了表达性和实用性。我们将展示如何调整此框架以集成概念数据建模方法中常见的一组核心约束。为了证明该模型的实际可行性，我们在现代图数据库系统中使用图查询来实现该模型，并通过对照实验对其进行了评估。我们发现验证时间与数据大小呈线性关系，而只使用未优化的直接实现。

引用次数: 0

Data Narration for the People: Challenges and Opportunities 面向人民的数据叙事:挑战与机遇

Advances in database technology : proceedings. International Conference on Extending Database Technology

Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.82

S. Amer-Yahia, Patrick Marcel, Verónika Peralta

Data narration is the process of telling stories with insights ex-tracted from data. It is an instance of data science [4] where the pipeline focuses on data collection and exploration, answering questions, structuring answers, and finally presenting them to stakeholders [16, 17]. This tutorial reviews the challenges and opportunities of the full and semi-automation of these steps. In doing so, it draws from the extensive literature in data narration, data exploration and data visualization. In particular, we point out key theoretical and practical contributions in each domain such as next-step recommendation and policy learning for data exploration, insight interestingness and evaluation frameworks, and the crafting of data stories for the people who will exploit them. We also identify topics that are still worth investigating, such as the inclusion of different stakeholders’ profiles in designing data pipelines with the goal of providing data narration for all.

数据叙事是用从数据中提取的见解来讲述故事的过程。它是数据科学的一个实例[4]，其中管道侧重于数据收集和探索，回答问题，构建答案，并最终将其呈现给利益相关者[16,17]。本教程回顾了这些步骤的完全自动化和半自动化的挑战和机遇。在此过程中，它借鉴了数据叙述、数据探索和数据可视化方面的广泛文献。我们特别指出了每个领域的关键理论和实践贡献，例如数据探索的下一步建议和政策学习，洞察兴趣和评估框架，以及为将利用它们的人制作数据故事。我们还确定了仍然值得研究的主题，例如在设计数据管道时包含不同涉众的配置文件，目的是为所有人提供数据叙述。

引用次数: 0

Mining Structures from Massive Texts by Exploring the Power of Pre-trained Language Models 通过探索预训练语言模型的力量从大量文本中挖掘结构

Advances in database technology : proceedings. International Conference on Extending Database Technology

Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.81

Yu Zhang, Yunyi Zhang, Jiawei Han

Technologies for handling massive structured or semi-structured data have been researched extensively in database communities. However, the real-world data are largely in the form of unstructured text, posing a great challenge to their management and analysis as well as their integration with semi-structured databases. Recent developments of deep learning methods and large pre-trained language models (PLMs) have revolutionized text mining and processing and shed new light on structuring massive text data and building a framework for integrated (i.e., structured and unstructured) data management and analysis. In this tutorial, we will focus on the recently developed text mining approaches empowered by PLMs that can work without relying on heavy human annotations. We will present an organized picture of how a set of weakly supervised methods explore the power of PLMs to structure text data, with the following outline: (1) an introduction to pre-trained languagemodels that serve as new tools for our tasks, (2) mining topic structures: unsupervised and seed-guided methods for topic discovery from massive text corpora, (3) mining document structures: weakly supervised methods for text classification, (4) mining entity structures: distantly supervised and weakly supervised methods for phrase mining, named entity recognition, taxonomy construction, and structured knowledge graph construction, and (5) towards an integrated information processing paradigm. 1 BACKGROUND, GOALS, AND DURATION The massive text data available on the Web, social media, news, scientific literature, government reports, and other information sources contain rich knowledge that can potentially benefit a wide variety of information processing tasks, and they can be potentially structured and analyzed by extended database technologies. For example, one can conduct entity recognition and concept ontology construction on a large collection of scientific papers and extract the factual knowledge for knowledge base construction and subsequent analysis. How to effectively leverage the unstructured massive text data for downstream applications has remained an important and active research question for the past few decades. Recently, pre-trained language models (PLMs) such as BERT [6] have revolutionized the text mining field and brought new inspirations to structuring text data. To be specific, the following paradigm is usually adopted: pre-training neural architectures on large-scale text corpora obtained from the world knowledge (e.g., a combination of Wikipedia, books, scientific corpora, and web content), and then transferring their representations to task-specific data. By doing so, the knowledge encoded in the world corpora can be effectively leveraged to enhance © 2023 Copyright held by the owner/author(s). Published in Proceedings of the 26th International Conference on Extending Database Technology (EDBT), 28th March-31st March, 2023, ISBN 978-3-89318-092-9 on OpenProceedings.org.

数据库社区对处理大量结构化或半结构化数据的技术进行了广泛的研究。然而，现实世界的数据大多是非结构化文本，这给数据的管理和分析以及与半结构化数据库的集成带来了巨大的挑战。深度学习方法和大型预训练语言模型(plm)的最新发展彻底改变了文本挖掘和处理，并为构建大量文本数据和构建集成(即结构化和非结构化)数据管理和分析框架提供了新的思路。在本教程中，我们将重点介绍最近开发的由plm支持的文本挖掘方法，这些方法可以在不依赖大量人工注释的情况下工作。我们将有组织地展示一组弱监督方法如何探索plm构建文本数据的能力，并给出以下概述:(1)介绍作为我们任务新工具的预训练语言模型，(2)挖掘主题结构:从大量文本语料库中发现主题的无监督和种子引导方法，(3)挖掘文档结构:挖掘文本分类的弱监督方法，(4)挖掘实体结构。远距离监督和弱监督方法用于短语挖掘、命名实体识别、分类构建和结构化知识图构建，以及(5)迈向集成信息处理范式。背景、目标和持续时间网络、社交媒体、新闻、科学文献、政府报告和其他信息源上的大量文本数据包含丰富的知识，可以潜在地有益于各种信息处理任务，并且可以通过扩展数据库技术对它们进行结构化和分析。例如，可以对大量的科学论文进行实体识别和概念本体构建，提取事实知识，用于知识库的构建和后续分析。如何有效地利用海量非结构化文本数据进行下游应用，是过去几十年一个重要而活跃的研究课题。最近，BERT[6]等预训练语言模型(plm)彻底改变了文本挖掘领域，并为结构化文本数据带来了新的灵感。具体而言，通常采用以下范式:在从世界知识中获得的大规模文本语料库(例如维基百科、书籍、科学语料库和web内容的组合)上预训练神经架构，然后将其表示转换为特定任务的数据。通过这样做，可以有效地利用世界语料库中编码的知识来增强©2023所有者/作者持有的版权。发表于第26届国际扩展数据库技术会议论文集(EDBT)， 2023年3月28日-31日，ISBN 978-3-89318-092-9, OpenProceedings.org。本论文的发布遵循知识共享许可协议cc -by-nc和4.0的条款。下游任务表现显著。然而，这种范例的主要挑战是，完全监督的plm微调通常需要大量的人工注释，这可能需要领域的专业知识，并且在实践中获得这些注释既昂贵又耗时。在本教程中，我们的目标是介绍以下方面的最新进展:(1)语言模型预训练，将大量文本转化为上下文化的文本表示;(2)弱监督方法，将预训练的表示转移到各种任务中，从大量文本中挖掘主题、文档和实体的结构。本教程中介绍的材料将极大地有利于从事文本挖掘/自然语言处理、数据挖掘和数据库系统工作的研究人员，以及旨在为目标应用程序获取结构化和可操作知识而不需要访问大量注释数据的实践者。本教程将在3小时内呈现。

{"title":"Mining Structures from Massive Texts by Exploring the Power of Pre-trained Language Models","authors":"Yu Zhang, Yunyi Zhang, Jiawei Han","doi":"10.48786/edbt.2023.81","DOIUrl":"https://doi.org/10.48786/edbt.2023.81","url":null,"abstract":"Technologies for handling massive structured or semi-structured data have been researched extensively in database communities. However, the real-world data are largely in the form of unstructured text, posing a great challenge to their management and analysis as well as their integration with semi-structured databases. Recent developments of deep learning methods and large pre-trained language models (PLMs) have revolutionized text mining and processing and shed new light on structuring massive text data and building a framework for integrated (i.e., structured and unstructured) data management and analysis. In this tutorial, we will focus on the recently developed text mining approaches empowered by PLMs that can work without relying on heavy human annotations. We will present an organized picture of how a set of weakly supervised methods explore the power of PLMs to structure text data, with the following outline: (1) an introduction to pre-trained languagemodels that serve as new tools for our tasks, (2) mining topic structures: unsupervised and seed-guided methods for topic discovery from massive text corpora, (3) mining document structures: weakly supervised methods for text classification, (4) mining entity structures: distantly supervised and weakly supervised methods for phrase mining, named entity recognition, taxonomy construction, and structured knowledge graph construction, and (5) towards an integrated information processing paradigm. 1 BACKGROUND, GOALS, AND DURATION The massive text data available on the Web, social media, news, scientific literature, government reports, and other information sources contain rich knowledge that can potentially benefit a wide variety of information processing tasks, and they can be potentially structured and analyzed by extended database technologies. For example, one can conduct entity recognition and concept ontology construction on a large collection of scientific papers and extract the factual knowledge for knowledge base construction and subsequent analysis. How to effectively leverage the unstructured massive text data for downstream applications has remained an important and active research question for the past few decades. Recently, pre-trained language models (PLMs) such as BERT [6] have revolutionized the text mining field and brought new inspirations to structuring text data. To be specific, the following paradigm is usually adopted: pre-training neural architectures on large-scale text corpora obtained from the world knowledge (e.g., a combination of Wikipedia, books, scientific corpora, and web content), and then transferring their representations to task-specific data. By doing so, the knowledge encoded in the world corpora can be effectively leveraged to enhance © 2023 Copyright held by the owner/author(s). Published in Proceedings of the 26th International Conference on Extending Database Technology (EDBT), 28th March-31st March, 2023, ISBN 978-3-89318-092-9 on OpenProceedings.org. ","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"108 1","pages":"851-854"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75928134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RDF-Analytics: Interactive Analytics over RDF Knowledge Graphs RDF-Analytics:基于RDF知识图的交互式分析

Advances in database technology : proceedings. International Conference on Extending Database Technology

Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.70

Maria-Evangelia Papadaki, Yannis Tzitzikas

The formulation of structured queries in knowledge graphs is a challenging task that presupposes familiarity with the syntax of the query language and the contents of the knowledge graph. To alleviate this difficulty in this paper we introduce RDF-ANALYTICS , a novel system that enables plain users to formulate analytic queries over complex, i.e. not necessarily star-schema based, RDF knowledge graphs. To come up with an intuitive interface, we leverage the familiarity of users with Faceted Search (FS) systems, i.e. we extend FS with actions that enable users to formulate analytic queries, too. Distinctive characteristics of the approach is the ability to include arbitrarily long paths in the analytic query (accompanied with count information), interactive formulation of HAVING restrictions, the support of both Faceted Search (i.e. the locating of the desired resources in a faceted search manner) and analytic queries, and the ability to formulate nested analytic queries. Finally, we present the results of a preliminary task-based evaluation with users, which are very promising.

知识图中结构化查询的表述是一项具有挑战性的任务，它以熟悉查询语言的语法和知识图的内容为前提。为了减轻这一困难，本文介绍了RDF- analytics，这是一个新颖的系统，它使普通用户能够在复杂的RDF知识图上制定分析查询，即不一定是基于星型模式的RDF知识图。为了提供一个直观的界面，我们利用了用户对分面搜索(FS)系统的熟悉程度，也就是说，我们扩展了FS，使用户也能够制定分析查询。该方法的显著特点是能够在分析查询中包含任意长的路径(附带计数信息)，具有限制的交互式公式，支持分面搜索(即以分面搜索方式定位所需资源)和分析查询，以及制定嵌套分析查询的能力。最后，我们提出了一个初步的基于任务的用户评估结果，这是非常有希望的。

引用次数: 0

Data Provenance for SHACL 用于acl的数据来源

Advances in database technology : proceedings. International Conference on Extending Database Technology

Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.23

Thomas Delva, Maxim Jakubowski

In constraint languages for RDF graphs, such as ShEx and SHACL, constraints on nodes and their properties are known as “shapes”. Using SHACL, we propose in this paper the notion of neighborhood of a node 𝑣 satisfying a given shape in a graph 𝐺 . This neighborhood is a subgraph of 𝐺 , and provides data provenance of 𝑣 for the given shape. We establish a correctness property for the obtained provenance mechanism, by proving that neighborhoods adhere to the Sufficiency requirement articulated for provenance semantics for database queries. As an additional benefit, neighborhoods allow a novel use of shapes: the extraction of a subgraph from an RDF graph, the so-called shape fragment. We compare shape fragments with SPARQL queries. We discuss implementation strategies for computing neighborhoods, and present initial experiments demonstrating that our ideas are fea-sible.

在RDF图的约束语言(如ShEx和SHACL)中，节点及其属性的约束被称为“形状”。利用SHACL，我们提出了满足图𝐺中给定形状的节点𝑣邻域的概念。这个邻域是𝐺的子图，并为给定形状提供𝑣的数据来源。通过证明邻域符合数据库查询的来源语义的充分性要求，我们为获得的来源机制建立了一个正确性属性。作为一个额外的好处，邻域允许对形状进行新的使用:从RDF图中提取子图，即所谓的形状片段。我们将形状片段与SPARQL查询进行比较。我们讨论了计算邻域的实现策略，并提出了初步的实验来证明我们的想法是可行的。

引用次数: 0

Stitcher: Learned Workload Synthesis from Historical Performance Footprints 缝制工:从历史性能足迹中学习工作量合成

Advances in database technology : proceedings. International Conference on Extending Database Technology

Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.33

Chengcheng Wan, Yiwen Zhu, Joyce Cahoon, Wenjing Wang, K. Lin, Sean Liu, Raymond Truong, Neetu Singh, Alexandra Ciortea, Konstantinos Karanasos, Subru Krishnan

Database benchmarking and workload replay have been widely used to drive system design, evaluate workload performance, de-termine product evolution, and guide cloud migration. However, they both suffer from some key limitations: the former fails to capture the variety and complexity of production workloads; the latter requires access to user data, queries, and machine specifications, deeming it inapplicable in the face of user privacy concerns. Here we introduce our vision of learned workload synthesis to overcome these issues: given the performance profile of a customer workload (e.g., CPU/memory counters), synthesize a new workload that yields the same performance profile when executed on a range of hardware/software configurations. We present Stitcher as a first step towards realizing this vision, which synthesizes workloads by combining pieces from standard benchmarks. We believe that our vision will spark new research avenues in database workload replay.

数据库基准测试和工作负载重放已被广泛用于驱动系统设计、评估工作负载性能、确定产品演进和指导云迁移。然而，它们都有一些关键的局限性:前者无法捕捉生产工作负载的多样性和复杂性;后者需要访问用户数据、查询和机器规格，认为它在用户隐私问题面前不适用。在这里，我们介绍学习工作负载合成的愿景，以克服这些问题:给定客户工作负载的性能概要(例如，CPU/内存计数器)，合成一个在一系列硬件/软件配置上执行时产生相同性能概要的新工作负载。我们将Stitcher作为实现这一愿景的第一步，它通过组合来自标准基准的片段来合成工作负载。我们相信，我们的愿景将在数据库工作负载重放方面激发新的研究途径。

{"title":"Stitcher: Learned Workload Synthesis from Historical Performance Footprints","authors":"Chengcheng Wan, Yiwen Zhu, Joyce Cahoon, Wenjing Wang, K. Lin, Sean Liu, Raymond Truong, Neetu Singh, Alexandra Ciortea, Konstantinos Karanasos, Subru Krishnan","doi":"10.48786/edbt.2023.33","DOIUrl":"https://doi.org/10.48786/edbt.2023.33","url":null,"abstract":"Database benchmarking and workload replay have been widely used to drive system design, evaluate workload performance, de-termine product evolution, and guide cloud migration. However, they both suffer from some key limitations: the former fails to capture the variety and complexity of production workloads; the latter requires access to user data, queries, and machine specifications, deeming it inapplicable in the face of user privacy concerns. Here we introduce our vision of learned workload synthesis to overcome these issues: given the performance profile of a customer workload (e.g., CPU/memory counters), synthesize a new workload that yields the same performance profile when executed on a range of hardware/software configurations. We present Stitcher as a first step towards realizing this vision, which synthesizes workloads by combining pieces from standard benchmarks. We believe that our vision will spark new research avenues in database workload replay.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"108 1","pages":"417-423"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91107488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Understanding crowd energy consumption behaviors 了解人群能源消耗行为

Advances in database technology : proceedings. International Conference on Extending Database Technology

Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.68

X. Liu, Xu Cheng, Yanyan Yang, Huan Huo, Yongping Liu, P. S. Nielsen

Understanding crowd behavior is crucial for energy demand-side management. In this paper, we employ the fluid dynamics concept potential flow to model the energy demand shift patterns of the crowd in both temporal and spatial dimensions. To facilitate the use of the proposed method, we implement a visual analysis platform that allows users to interactively explore and interpret the shift patterns. The effectiveness of the proposed method will be evaluated through a hands-on experience with a real case study during the conference demonstration.

了解人群行为对能源需求侧管理至关重要。本文采用流体力学的势流概念，从时间和空间两个维度对人群的能量需求转移模式进行建模。为了便于使用所提出的方法，我们实现了一个可视化分析平台，允许用户交互式地探索和解释移位模式。在会议演示期间，将通过实际案例研究的实践经验来评估所提出方法的有效性。

引用次数: 0

Pushing Edge Computing one Step Further: Resilient and Privacy-Preserving Processing on Personal Devices 进一步推动边缘计算:个人设备上的弹性和隐私保护处理

Advances in database technology : proceedings. International Conference on Extending Database Technology

Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.77

Ludovic Javet, N. Anciaux, Luc Bouganim, Léo Lamoureux, P. Pucheral

Can we push Edge computing one step further? This demonstration paper proposes an answer to this question by leveraging the generalization of Trusted Execution Environments at the very edge of the network to enable resilient and privacy-preserving computation on personal devices. Based on preliminary published results, we show that this can drastically change the way distributed processing over personal data is conceived and achieved. The platform presented here demonstrates the pertinence of the approach through execution scenarios integrating heterogeneous secure personal devices.

我们能否进一步推动边缘计算?这篇演示论文通过利用网络边缘可信执行环境的泛化来实现个人设备上的弹性和隐私保护计算，提出了这个问题的答案。根据初步公布的结果，我们表明这可以彻底改变个人数据分布式处理的构思和实现方式。这里展示的平台通过集成异构安全个人设备的执行场景展示了该方法的相关性。

引用次数: 0

REQUIRED: A Tool to Relax Queries through Relaxed Functional Dependencies 需要:一个通过放松的功能依赖来放松查询的工具

Advances in database technology : proceedings. International Conference on Extending Database Technology

Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.74

Loredana Caruccio, Stefano Cirillo, V. Deufemia, G. Polese, R. Stanzione

Query relaxation aims to relax the query constraints in order to derive some approximate results when the answer set is small. In this demo paper, we present REQUIRED, an automatized, portable, and scalable query relaxation tool leveraging metadata learned from an input dataset. The intuition is to use relationships underlying attribute values to derive a new query whose approximate results still meet the user’s expectations. In particular, REQUIRED exploits relaxed functional dependencies to modify the original query in two different ways: ( 𝑖 ) relaxing some query conditions by replacing the equality constraints with ranges and/or collections of admissible values, and ( 𝑖𝑖 ) rewriting the original query by replacing some or all the attributes involved in the conditions of the query with attributes related to them. Our demonstration scenarios show that REQUIRED is effective in properly relaxing queries according to the considered strategy.

查询松弛的目的是在答案集较小的情况下，放宽查询约束，从而得到一些近似的结果。在这篇演示论文中，我们介绍了REQUIRED，这是一个自动化、可移植和可扩展的查询放松工具，利用从输入数据集中学习的元数据。直观的做法是使用属性值背后的关系来派生一个新的查询，其近似结果仍然满足用户的期望。特别是，REQUIRED利用宽松的功能依赖关系以两种不同的方式修改原始查询:(纵向)通过将等式约束替换为范围和/或允许值的集合来放宽某些查询条件;(纵向)通过将查询条件中涉及的部分或全部属性替换为与之相关的属性来重写原始查询。我们的演示场景表明，REQUIRED可以根据所考虑的策略有效地适当放松查询。

引用次数: 0

Efficient Multi-Model Management 高效的多模式管理

Advances in database technology : proceedings. International Conference on Extending Database Technology

Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.37

Nils Strassenburg, Dominic Kupfer, J. Kowal, T. Rabl

Deep learning models are deployed in an increasing number of industrial domains, such as retail and automotive applications. An instance of a model typically performs one specific task, which is why larger software systems use multiple models in parallel. Given that all models in production software have to be managed, this leads to the problem of managing sets of related models, i.e., multi-model management. Existing approaches perform poorly on this task because they are optimized for saving single large models but not for simultaneously saving a set of related models. In this paper, we explore the space of multi-model management by presenting three optimized approaches: (1) A baseline approach that saves full model representations and minimizes the amount of saved metadata. (2) An update approach that reduces the storage consumption compared to the baseline by saving parameter updates instead of full models. (3) A provenance approach that saves model provenance data instead of model parameters. We evaluate the approaches for the multi-model management use cases of managing car battery cell models and image classification models. Our results show that the baseline outperforms existing approaches for save and recover times by more than an order of magnitude and that more sophisticated approaches reduce the storage consumption by up to 99%.

深度学习模型被部署在越来越多的工业领域，如零售和汽车应用。一个模型的实例通常执行一个特定的任务，这就是大型软件系统并行使用多个模型的原因。假设生产软件中的所有模型都必须被管理，这就导致了管理相关模型集的问题，即多模型管理。现有的方法在此任务上表现不佳，因为它们是为保存单个大型模型而优化的，而不是同时保存一组相关模型。在本文中，我们通过提出三种优化方法来探索多模型管理的空间:(1)保存完整模型表示并最小化保存元数据量的基线方法。(2)一种更新方法，通过保存参数更新而不是完整模型来减少与基线相比的存储消耗。(3)不保存模型参数而保存模型来源数据的溯源方法。我们评估了管理汽车电池模型和图像分类模型的多模型管理用例的方法。我们的结果表明，在保存和恢复时间方面，基线比现有的方法要好一个数量级以上，而更复杂的方法可以将存储消耗减少多达99%。

{"title":"Efficient Multi-Model Management","authors":"Nils Strassenburg, Dominic Kupfer, J. Kowal, T. Rabl","doi":"10.48786/edbt.2023.37","DOIUrl":"https://doi.org/10.48786/edbt.2023.37","url":null,"abstract":"Deep learning models are deployed in an increasing number of industrial domains, such as retail and automotive applications. An instance of a model typically performs one specific task, which is why larger software systems use multiple models in parallel. Given that all models in production software have to be managed, this leads to the problem of managing sets of related models, i.e., multi-model management. Existing approaches perform poorly on this task because they are optimized for saving single large models but not for simultaneously saving a set of related models. In this paper, we explore the space of multi-model management by presenting three optimized approaches: (1) A baseline approach that saves full model representations and minimizes the amount of saved metadata. (2) An update approach that reduces the storage consumption compared to the baseline by saving parameter updates instead of full models. (3) A provenance approach that saves model provenance data instead of model parameters. We evaluate the approaches for the multi-model management use cases of managing car battery cell models and image classification models. Our results show that the baseline outperforms existing approaches for save and recover times by more than an order of magnitude and that more sophisticated approaches reduce the storage consumption by up to 99%.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"77 1","pages":"457-463"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86764458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Advances in database technology : proceedings. International Conference on Extending Database Technology

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀