首页 > 最新文献

Proceedings of the Vldb Endowment最新文献

英文 中文
A Learned Query Rewrite System 一个习得的查询重写系统
3区 计算机科学 Q1 Computer Science Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611633
Xuanhe Zhou, Guoliang Li, Jianming Wu, Jiesi Liu, Zhaoyan Sun, Xinning Zhang
Query rewriting is a challenging task that transforms a SQL query to improve its performance while maintaining its result set. However, it is difficult to rewrite SQL queries, which often involve complex logical structures, and there are numerous candidate rewrite strategies for such queries, making it an NP-hard problem. Existing databases or query optimization engines adopt heuristics to rewrite queries, but these approaches may not be able to judiciously and adaptively apply the rewrite rules and may cause significant performance regression in some cases (e.g., correlated subqueries may not be eliminated). To address these limitations, we introduce LearnedRewrite, a query rewrite system that combines traditional and learned algorithms (i.e., Monte Carlo tree search + hybrid estimator) to rewrite queries. We have implemented the system in Calcite, and experimental results demonstrate LearnedRewrite achieves superior performance on three real datasets.
查询重写是一项具有挑战性的任务,它转换SQL查询以提高其性能,同时维护其结果集。然而,重写SQL查询很困难,因为它通常涉及复杂的逻辑结构,并且有许多用于此类查询的候选重写策略,使其成为np困难问题。现有的数据库或查询优化引擎采用启发式方法来重写查询,但是这些方法可能无法明智地、自适应地应用重写规则,并且在某些情况下可能会导致显著的性能回归(例如,相关子查询可能无法消除)。为了解决这些限制,我们引入了LearnedRewrite,这是一个查询重写系统,它结合了传统算法和学习算法(即蒙特卡罗树搜索+混合估计器)来重写查询。我们在方解石中实现了该系统,实验结果表明,LearnedRewrite在三个真实数据集上取得了优异的性能。
{"title":"A Learned Query Rewrite System","authors":"Xuanhe Zhou, Guoliang Li, Jianming Wu, Jiesi Liu, Zhaoyan Sun, Xinning Zhang","doi":"10.14778/3611540.3611633","DOIUrl":"https://doi.org/10.14778/3611540.3611633","url":null,"abstract":"Query rewriting is a challenging task that transforms a SQL query to improve its performance while maintaining its result set. However, it is difficult to rewrite SQL queries, which often involve complex logical structures, and there are numerous candidate rewrite strategies for such queries, making it an NP-hard problem. Existing databases or query optimization engines adopt heuristics to rewrite queries, but these approaches may not be able to judiciously and adaptively apply the rewrite rules and may cause significant performance regression in some cases (e.g., correlated subqueries may not be eliminated). To address these limitations, we introduce LearnedRewrite, a query rewrite system that combines traditional and learned algorithms (i.e., Monte Carlo tree search + hybrid estimator) to rewrite queries. We have implemented the system in Calcite, and experimental results demonstrate LearnedRewrite achieves superior performance on three real datasets.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134996878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards Auto-Generated Data Systems 走向自动生成数据系统
3区 计算机科学 Q1 Computer Science Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611635
Alvin Cheung, Maaz Bin Safeer Ahmad, Brandon Haynes, Chanwut Kittivorawong, Shadaj Laddad, Xiaoxuan Liu, Chenglong Wang, Cong Yan
After decades of progress, database management systems (DBMSs) are now the backbones of many data applications that we interact with on a daily basis. Yet, with the emergence of new data types and hardware, building and optimizing new data systems remain as difficult as the heyday of relational databases. In this paper, we summarize our work towards automating the building and optimization of data systems. Drawing from our own experience, we further argue that any automation technique must address three aspects: user specification, code generation, and result validation. We conclude by discussing a case study using videos data processing, along with opportunities for future research towards designing data systems that are automatically generated.
经过几十年的发展,数据库管理系统(dbms)现在是我们每天与之交互的许多数据应用程序的骨干。然而,随着新的数据类型和硬件的出现,构建和优化新的数据系统仍然像关系数据库的鼎盛时期一样困难。在本文中,我们总结了我们在自动化构建和优化数据系统方面的工作。根据我们自己的经验,我们进一步论证任何自动化技术都必须处理三个方面:用户规范、代码生成和结果验证。最后,我们讨论了一个使用视频数据处理的案例研究,以及未来设计自动生成数据系统的研究机会。
{"title":"Towards Auto-Generated Data Systems","authors":"Alvin Cheung, Maaz Bin Safeer Ahmad, Brandon Haynes, Chanwut Kittivorawong, Shadaj Laddad, Xiaoxuan Liu, Chenglong Wang, Cong Yan","doi":"10.14778/3611540.3611635","DOIUrl":"https://doi.org/10.14778/3611540.3611635","url":null,"abstract":"After decades of progress, database management systems (DBMSs) are now the backbones of many data applications that we interact with on a daily basis. Yet, with the emergence of new data types and hardware, building and optimizing new data systems remain as difficult as the heyday of relational databases. In this paper, we summarize our work towards automating the building and optimization of data systems. Drawing from our own experience, we further argue that any automation technique must address three aspects: user specification, code generation, and result validation. We conclude by discussing a case study using videos data processing, along with opportunities for future research towards designing data systems that are automatically generated.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134996887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Erica: Query Refinement for Diversity Constraint Satisfaction Erica:多样性约束满足的查询细化
3区 计算机科学 Q1 Computer Science Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611623
Jinyang Li, Alon Silberstein, Yuval Moskovitch, Julia Stoyanovich, H. V. Jagadish
Relational queries are commonly used to support decision making in critical domains like hiring and college admissions. For example, a college admissions officer may need to select a subset of the applicants for in-person interviews, who individually meet the qualification requirements (e.g., have a sufficiently high GPA) and are collectively demographically diverse (e.g., include a sufficient number of candidates of each gender and of each race). However, traditional relational queries only support selection conditions checked against each input tuple, and they do not support diversity conditions checked against multiple, possibly overlapping, groups of output tuples. To address this shortcoming, we present Erica, an interactive system that proposes minimal modifications for selection queries to have them satisfy constraints on the cardinalities of multiple groups in the result. We demonstrate the effectiveness of Erica using several real-life datasets and diversity requirements.
关系查询通常用于支持招聘和大学录取等关键领域的决策制定。例如,大学招生官可能需要选择一部分申请人进行面对面面试,这些申请人个人符合资格要求(例如,有足够高的GPA),并且在人口统计学上具有多样性(例如,包括足够数量的每种性别和每种种族的候选人)。然而,传统的关系查询只支持针对每个输入元组检查的选择条件,而不支持针对多个(可能重叠的)输出元组检查的多样性条件。为了解决这个缺点,我们提出了Erica,这是一个交互式系统,它对选择查询提出了最小的修改,以使它们满足对结果中多个组的基数的约束。我们使用几个现实生活中的数据集和多样性要求来证明Erica的有效性。
{"title":"Erica: Query Refinement for Diversity Constraint Satisfaction","authors":"Jinyang Li, Alon Silberstein, Yuval Moskovitch, Julia Stoyanovich, H. V. Jagadish","doi":"10.14778/3611540.3611623","DOIUrl":"https://doi.org/10.14778/3611540.3611623","url":null,"abstract":"Relational queries are commonly used to support decision making in critical domains like hiring and college admissions. For example, a college admissions officer may need to select a subset of the applicants for in-person interviews, who individually meet the qualification requirements (e.g., have a sufficiently high GPA) and are collectively demographically diverse (e.g., include a sufficient number of candidates of each gender and of each race). However, traditional relational queries only support selection conditions checked against each input tuple, and they do not support diversity conditions checked against multiple, possibly overlapping, groups of output tuples. To address this shortcoming, we present Erica, an interactive system that proposes minimal modifications for selection queries to have them satisfy constraints on the cardinalities of multiple groups in the result. We demonstrate the effectiveness of Erica using several real-life datasets and diversity requirements.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134996892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
XDB in Action: Decentralized Cross-Database Query Processing for Black-Box DBMSes XDB的实际应用:黑箱dbms的分散跨数据库查询处理
3区 计算机科学 Q1 Computer Science Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611625
Haralampos Gavriilidis, Leonhard Rose, Joel Ziegler, Kaustubh Beedkar, Jorge-Arnulfo Quiané-Ruiz, Volker Markl
Data are naturally produced at different locations and hence stored on different DBMSes. To maximize the value of the collected data, today's users combine data from different sources. Research in data integration has proposed the Mediator-Wrapper (MW) architecture to enable ad-hoc querying processing over multiple sources. The MW approach is desirable for users, as they do not need to deal with heterogeneous data sources. However, from a query processing perspective, the MW approach is inefficient: First, one needs to provision the mediating execution engine with resources. Second, during query processing, data gets "centralized" within the mediating engine, which causes redundant data movement. Recently, we proposed in-situ cross-database query processing , a paradigm for federated query processing without a mediating engine. Our approach optimizes runtime performance and reduces data movement by leveraging existing systems, eliminating the need for an additional federated query engine. In this demonstration, we showcase XDB, our prototype for in-situ cross-database query processing. We demonstrate several aspects of XDB, i.e. the cross-database environment, our optimization techniques, and its decentralized execution phase.
数据自然是在不同的位置产生的,因此存储在不同的dbms中。为了最大限度地发挥所收集数据的价值,今天的用户将来自不同来源的数据组合在一起。数据集成方面的研究提出了中介-包装器(Mediator-Wrapper, MW)体系结构,以支持对多个数据源进行临时查询处理。用户希望使用MW方法,因为他们不需要处理异构数据源。然而,从查询处理的角度来看,MW方法效率低下:首先,需要为中介执行引擎提供资源。其次,在查询处理期间,数据在中介引擎中被“集中”,这会导致冗余数据移动。最近,我们提出了原位跨数据库查询处理,这是一种无需中介引擎的联邦查询处理范例。我们的方法通过利用现有系统来优化运行时性能并减少数据移动,从而消除了对额外联邦查询引擎的需求。在这个演示中,我们将展示XDB,这是我们用于原位跨数据库查询处理的原型。我们演示了XDB的几个方面,即跨数据库环境、我们的优化技术和它的分散执行阶段。
{"title":"XDB in Action: Decentralized Cross-Database Query Processing for Black-Box DBMSes","authors":"Haralampos Gavriilidis, Leonhard Rose, Joel Ziegler, Kaustubh Beedkar, Jorge-Arnulfo Quiané-Ruiz, Volker Markl","doi":"10.14778/3611540.3611625","DOIUrl":"https://doi.org/10.14778/3611540.3611625","url":null,"abstract":"Data are naturally produced at different locations and hence stored on different DBMSes. To maximize the value of the collected data, today's users combine data from different sources. Research in data integration has proposed the Mediator-Wrapper (MW) architecture to enable ad-hoc querying processing over multiple sources. The MW approach is desirable for users, as they do not need to deal with heterogeneous data sources. However, from a query processing perspective, the MW approach is inefficient: First, one needs to provision the mediating execution engine with resources. Second, during query processing, data gets \"centralized\" within the mediating engine, which causes redundant data movement. Recently, we proposed in-situ cross-database query processing , a paradigm for federated query processing without a mediating engine. Our approach optimizes runtime performance and reduces data movement by leveraging existing systems, eliminating the need for an additional federated query engine. In this demonstration, we showcase XDB, our prototype for in-situ cross-database query processing. We demonstrate several aspects of XDB, i.e. the cross-database environment, our optimization techniques, and its decentralized execution phase.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134997922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TPCx-AI - An Industry Standard Benchmark for Artificial Intelligence and Machine Learning Systems TPCx-AI -人工智能和机器学习系统的行业标准基准
3区 计算机科学 Q1 Computer Science Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611554
Christoph Brücke, Philipp Härtling, Rodrigo D Escobar Palacios, Hamesh Patel, Tilmann Rabl
Artificial intelligence (AI) and machine learning (ML) techniques have existed for years, but new hardware trends and advances in model training and inference have radically improved their performance. With an ever increasing amount of algorithms, systems, and hardware solutions, it is challenging to identify good deployments even for experts. Researchers and industry experts have observed this challenge and have created several benchmark suites for AI and ML applications and systems. While they are helpful in comparing several aspects of AI applications, none of the existing benchmarks measures end-to-end performance of ML deployments. Many have been rigorously developed in collaboration between academia and industry, but no existing benchmark is standardized. In this paper, we introduce the TPC Express Benchmark for Artificial Intelligence (TPCx-AI), the first industry standard benchmark for end-to-end machine learning deployments. TPCx-AI is the first AI benchmark that represents the pipelines typically found in common ML and AI workloads. TPCx-AI provides a full software kit, which includes data generator, driver, and two full workload implementations, one based on Python libraries and one based on Apache Spark. We describe the complete benchmark and show benchmark results for various scale factors. TPCx-AI's core contributions are a novel unified data set covering structured and unstructured data; a fully scalable data generator that can generate realistic data from GB up to PB scale; and a diverse and representative workload using different data types and algorithms, covering a wide range of aspects of real ML workloads such as data integration, data processing, training, and inference.
人工智能(AI)和机器学习(ML)技术已经存在多年,但新的硬件趋势和模型训练和推理方面的进步从根本上提高了它们的性能。随着算法、系统和硬件解决方案的数量不断增加,即使是专家也很难确定好的部署。研究人员和行业专家已经观察到这一挑战,并为人工智能和机器学习应用程序和系统创建了几个基准套件。虽然它们有助于比较人工智能应用程序的几个方面,但现有的基准测试都无法衡量机器学习部署的端到端性能。许多是在学术界和工业界的合作下严格开发的,但没有现有的标准是标准化的。在本文中,我们介绍了TPC快速人工智能基准(TPCx-AI),这是端到端机器学习部署的第一个行业标准基准。TPCx-AI是第一个AI基准,它代表了常见ML和AI工作负载中常见的管道。TPCx-AI提供了一个完整的软件包,其中包括数据生成器、驱动程序和两个完整的工作负载实现,一个基于Python库,另一个基于Apache Spark。我们描述了完整的基准测试,并展示了各种规模因素的基准测试结果。TPCx-AI的核心贡献是一个涵盖结构化和非结构化数据的全新统一数据集;一个完全可扩展的数据生成器,可以生成从GB到PB规模的真实数据;以及使用不同数据类型和算法的多样化和代表性工作负载,涵盖了实际ML工作负载的广泛方面,如数据集成,数据处理,训练和推理。
{"title":"TPCx-AI - An Industry Standard Benchmark for Artificial Intelligence and Machine Learning Systems","authors":"Christoph Brücke, Philipp Härtling, Rodrigo D Escobar Palacios, Hamesh Patel, Tilmann Rabl","doi":"10.14778/3611540.3611554","DOIUrl":"https://doi.org/10.14778/3611540.3611554","url":null,"abstract":"Artificial intelligence (AI) and machine learning (ML) techniques have existed for years, but new hardware trends and advances in model training and inference have radically improved their performance. With an ever increasing amount of algorithms, systems, and hardware solutions, it is challenging to identify good deployments even for experts. Researchers and industry experts have observed this challenge and have created several benchmark suites for AI and ML applications and systems. While they are helpful in comparing several aspects of AI applications, none of the existing benchmarks measures end-to-end performance of ML deployments. Many have been rigorously developed in collaboration between academia and industry, but no existing benchmark is standardized. In this paper, we introduce the TPC Express Benchmark for Artificial Intelligence (TPCx-AI), the first industry standard benchmark for end-to-end machine learning deployments. TPCx-AI is the first AI benchmark that represents the pipelines typically found in common ML and AI workloads. TPCx-AI provides a full software kit, which includes data generator, driver, and two full workload implementations, one based on Python libraries and one based on Apache Spark. We describe the complete benchmark and show benchmark results for various scale factors. TPCx-AI's core contributions are a novel unified data set covering structured and unstructured data; a fully scalable data generator that can generate realistic data from GB up to PB scale; and a diverse and representative workload using different data types and algorithms, covering a wide range of aspects of real ML workloads such as data integration, data processing, training, and inference.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134997926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Demo of QueryBooster: Supporting Middleware-Based SQL Query Rewriting as a Service QueryBooster的演示:支持基于中间件的SQL查询重写服务
3区 计算机科学 Q1 Computer Science Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611615
Qiushi Bai, Sadeem Alsudais, Chen Li
Query rewriting is an important technique to optimize SQL performance in databases. With the prevalent use of business intelligence systems and object-relational mapping frameworks, existing rewriting capabilities inside databases are insufficient to optimize machine-generated queries. In this paper, we propose a novel system called "QueryBooster," to support SQL query rewriting as a cloud service. It provides a powerful and easy-to-use Web interface for users to formulate rewriting rules via a language or express rewriting intentions by providing example query pairs. It allows multiple users to share rewriting knowledge and automatically suggests shared rewriting rules for users. It requires no modifications or plugin installations to applications or databases. In this demonstration, we use real-world applications and datasets to show the user experience of QueryBooster to rewrite their application queries and share rewriting knowledge.
查询重写是优化数据库中SQL性能的一项重要技术。随着商业智能系统和对象关系映射框架的广泛使用,数据库内部现有的重写功能不足以优化机器生成的查询。在本文中,我们提出了一个名为“QueryBooster”的新系统,将SQL查询重写作为云服务来支持。它为用户提供了一个功能强大且易于使用的Web界面,使用户可以通过一种语言制定重写规则,或者通过提供示例查询对来表达重写意图。它允许多个用户共享重写知识,并自动为用户推荐共享的重写规则。它不需要对应用程序或数据库进行修改或安装插件。在这个演示中,我们使用真实的应用程序和数据集来展示QueryBooster的用户体验,以重写他们的应用程序查询并共享重写知识。
{"title":"Demo of QueryBooster: Supporting Middleware-Based SQL Query Rewriting as a Service","authors":"Qiushi Bai, Sadeem Alsudais, Chen Li","doi":"10.14778/3611540.3611615","DOIUrl":"https://doi.org/10.14778/3611540.3611615","url":null,"abstract":"Query rewriting is an important technique to optimize SQL performance in databases. With the prevalent use of business intelligence systems and object-relational mapping frameworks, existing rewriting capabilities inside databases are insufficient to optimize machine-generated queries. In this paper, we propose a novel system called \"QueryBooster,\" to support SQL query rewriting as a cloud service. It provides a powerful and easy-to-use Web interface for users to formulate rewriting rules via a language or express rewriting intentions by providing example query pairs. It allows multiple users to share rewriting knowledge and automatically suggests shared rewriting rules for users. It requires no modifications or plugin installations to applications or databases. In this demonstration, we use real-world applications and datasets to show the user experience of QueryBooster to rewrite their application queries and share rewriting knowledge.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data and AI Model Markets: Opportunities for Data and Model Sharing, Discovery, and Integration 数据和人工智能模型市场:数据和模型共享、发现和集成的机会
3区 计算机科学 Q1 Computer Science Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611573
Jian Pei, Raul Castro Fernandez, Xiaohui Yu
The markets for data and AI models are rapidly emerging and increasingly significant in the realm and the practices of data science and artificial intelligence. These markets are being studied from diverse perspectives, such as e-commerce, economics, machine learning, and data management. In light of these developments, there is a pressing need to present a comprehensive and forward-looking survey on the subject to the database and data management community. In this tutorial, we aim to provide a comprehensive and interdisciplinary introduction to data and AI model markets. Unlike a few recent surveys and tutorials that concentrate only on the economics aspect, we take a novel perspective and examine data and AI model markets as grand opportunities to address the long-standing problem of data and model sharing, discovery, and integration. We motivate the importance of data and model markets using practical examples, present the current industry landscape of such markets, and explore the modules and options of such markets from multiple dimensions, including assets in the markets (e.g., data versus models), platforms, and participants. Furthermore, we summarize the latest advancements and examine the future directions of data and AI model markets as mechanisms for enabling and facilitating sharing, discovery, and integration.
数据和人工智能模型市场正在迅速崛起,在数据科学和人工智能领域和实践中越来越重要。这些市场正在从不同的角度进行研究,如电子商务、经济学、机器学习和数据管理。鉴于这些发展,迫切需要向数据库和数据管理界提出一份关于这一主题的全面和前瞻性调查报告。在本教程中,我们的目标是为数据和人工智能模型市场提供全面和跨学科的介绍。与最近一些只关注经济学方面的调查和教程不同,我们采取了一种新颖的视角,并将数据和人工智能模型市场视为解决数据和模型共享、发现和集成等长期问题的大好机会。我们使用实际的例子来激发数据和模型市场的重要性,呈现这些市场的当前行业格局,并从多个维度探索这些市场的模块和选项,包括市场中的资产(例如,数据与模型)、平台和参与者。此外,我们总结了最新的进展,并研究了数据和人工智能模型市场的未来方向,作为实现和促进共享、发现和集成的机制。
{"title":"Data and AI Model Markets: Opportunities for Data and Model Sharing, Discovery, and Integration","authors":"Jian Pei, Raul Castro Fernandez, Xiaohui Yu","doi":"10.14778/3611540.3611573","DOIUrl":"https://doi.org/10.14778/3611540.3611573","url":null,"abstract":"The markets for data and AI models are rapidly emerging and increasingly significant in the realm and the practices of data science and artificial intelligence. These markets are being studied from diverse perspectives, such as e-commerce, economics, machine learning, and data management. In light of these developments, there is a pressing need to present a comprehensive and forward-looking survey on the subject to the database and data management community. In this tutorial, we aim to provide a comprehensive and interdisciplinary introduction to data and AI model markets. Unlike a few recent surveys and tutorials that concentrate only on the economics aspect, we take a novel perspective and examine data and AI model markets as grand opportunities to address the long-standing problem of data and model sharing, discovery, and integration. We motivate the importance of data and model markets using practical examples, present the current industry landscape of such markets, and explore the modules and options of such markets from multiple dimensions, including assets in the markets (e.g., data versus models), platforms, and participants. Furthermore, we summarize the latest advancements and examine the future directions of data and AI model markets as mechanisms for enabling and facilitating sharing, discovery, and integration.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135003305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Showcasing Data Management Challenges for Future IoT Applications with NebulaStream 利用星云流展示未来物联网应用的数据管理挑战
3区 计算机科学 Q1 Computer Science Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611588
Aljoscha Lepping, Hoang Mi Pham, Laura Mons, Balint Rueb, Philipp M. Grulich, Ankit Chaudhary, Steffen Zeuch, Volker Markl
Data management systems will face several new challenges in supporting IoT applications during the coming years. These challenges arise from managing large numbers of heterogeneous IoT devices and require combining elastic cloud and fog resources in unified fog-cloud environments. In this demonstration, we introduce a smart city simulation called IoTropolis and use it to create interactive eHealth and Smart Grid application scenarios. We use these scenarios to showcase three key challenges of unified fog-cloud environments. Furthermore, we demonstrate how our recently proposed data management system for the IoT NebulaStream addresses these challenges. Visitors to our demonstration can configure and interact with the scenarios to manage electricity usage in IoTropolis or to distribute patients across different hospitals. Thereby, visitors can actively engage with the challenges showcased by IoTropolis and utilize NebulaStream to address them. As a result, our demonstration enables visitors to experience data management for future IoT applications.
未来几年,数据管理系统在支持物联网应用方面将面临几个新的挑战。这些挑战来自于管理大量异构物联网设备,需要在统一的雾云环境中结合弹性云和雾资源。在本次演示中,我们介绍了一个名为IoTropolis的智能城市模拟,并使用它来创建交互式电子健康和智能电网应用场景。我们使用这些场景来展示统一雾云环境的三个关键挑战。此外,我们展示了我们最近提出的物联网NebulaStream数据管理系统如何应对这些挑战。参观我们演示的人可以配置并与场景交互,以管理IoTropolis的电力使用或将患者分配到不同的医院。因此,参观者可以积极参与IoTropolis展示的挑战,并利用NebulaStream来解决这些挑战。因此,我们的演示使参观者能够体验未来物联网应用的数据管理。
{"title":"Showcasing Data Management Challenges for Future IoT Applications with NebulaStream","authors":"Aljoscha Lepping, Hoang Mi Pham, Laura Mons, Balint Rueb, Philipp M. Grulich, Ankit Chaudhary, Steffen Zeuch, Volker Markl","doi":"10.14778/3611540.3611588","DOIUrl":"https://doi.org/10.14778/3611540.3611588","url":null,"abstract":"Data management systems will face several new challenges in supporting IoT applications during the coming years. These challenges arise from managing large numbers of heterogeneous IoT devices and require combining elastic cloud and fog resources in unified fog-cloud environments. In this demonstration, we introduce a smart city simulation called IoTropolis and use it to create interactive eHealth and Smart Grid application scenarios. We use these scenarios to showcase three key challenges of unified fog-cloud environments. Furthermore, we demonstrate how our recently proposed data management system for the IoT NebulaStream addresses these challenges. Visitors to our demonstration can configure and interact with the scenarios to manage electricity usage in IoTropolis or to distribute patients across different hospitals. Thereby, visitors can actively engage with the challenges showcased by IoTropolis and utilize NebulaStream to address them. As a result, our demonstration enables visitors to experience data management for future IoT applications.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135003652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cornet: Learning Spreadsheet Formatting Rules by Example 通过示例学习电子表格格式规则
3区 计算机科学 Q1 Computer Science Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611620
Mukul Singh, José Cambronero Sanchez, Sumit Gulwani, Vu Le, Carina Negreanu, Gust Verbruggen
Data management and analysis tasks are often carried out using spreadsheet software. A popular feature in most spreadsheet platforms is the ability to define data-dependent formatting rules. These rules can express actions such as "color red all entries in a column that are negative" or "bold all rows not containing error or failure". Unfortunately, users who want to exercise this functionality need to manually write these conditional formatting (CF) rules. We introduce Cornet, a system that automatically learns such conditional formatting rules from user examples. Cornet takes inspiration from inductive program synthesis and combines symbolic rule enumeration, based on semi-supervised clustering and iterative decision tree learning, with a neural ranker to produce accurate conditional formatting rules. In this demonstration, we show Cornet in action as a simple add-in to Microsoft's Excel. After the user provides one or two formatted cells as examples, Cornet generates formatting rule suggestions for the user to apply to the spreadsheet.
数据管理和分析任务通常使用电子表格软件进行。在大多数电子表格平台中,一个流行的特性是能够定义依赖于数据的格式化规则。这些规则可以表达诸如“将一列中为负数的所有条目涂成红色”或“将不包含错误或失败的所有行加粗”之类的操作。不幸的是,想要使用此功能的用户需要手动编写这些条件格式化(CF)规则。我们介绍Cornet,一个从用户示例中自动学习条件格式规则的系统。Cornet从归纳程序综合中获得灵感,将基于半监督聚类和迭代决策树学习的符号规则枚举与神经排序器相结合,以产生准确的条件格式规则。在这个演示中,我们将展示Cornet作为Microsoft Excel的一个简单插件的作用。在用户提供一个或两个格式化的单元格作为示例之后,Cornet将生成格式化规则建议,供用户应用于电子表格。
{"title":"Cornet: Learning Spreadsheet Formatting Rules by Example","authors":"Mukul Singh, José Cambronero Sanchez, Sumit Gulwani, Vu Le, Carina Negreanu, Gust Verbruggen","doi":"10.14778/3611540.3611620","DOIUrl":"https://doi.org/10.14778/3611540.3611620","url":null,"abstract":"Data management and analysis tasks are often carried out using spreadsheet software. A popular feature in most spreadsheet platforms is the ability to define data-dependent formatting rules. These rules can express actions such as \"color red all entries in a column that are negative\" or \"bold all rows not containing error or failure\". Unfortunately, users who want to exercise this functionality need to manually write these conditional formatting (CF) rules. We introduce Cornet, a system that automatically learns such conditional formatting rules from user examples. Cornet takes inspiration from inductive program synthesis and combines symbolic rule enumeration, based on semi-supervised clustering and iterative decision tree learning, with a neural ranker to produce accurate conditional formatting rules. In this demonstration, we show Cornet in action as a simple add-in to Microsoft's Excel. After the user provides one or two formatted cells as examples, Cornet generates formatting rule suggestions for the user to apply to the spreadsheet.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135003655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
QO-Insight: Inspecting Steered Query Optimizers QO-Insight:检查导向查询优化器
3区 计算机科学 Q1 Computer Science Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611586
Christoph Anneser, Mario Petruccelli, Nesime Tatbul, David Cohen, Zhenggang Xu, Prithviraj Pandian, Nikolay Laptev, Ryan Marcus, Alfons Kemper
Steered query optimizers address the planning mistakes of traditional query optimizers by providing them with hints on a per-query basis, thereby guiding them in the right direction. This paper introduces QO-Insight, a visual tool designed for exploring query execution traces of such steered query optimizers. Although steered query optimizers are typically perceived as black boxes, QO-Insight empowers database administrators and experts to gain qualitative insights and enhance their performance through visual inspection and analysis.
定向查询优化器通过在每个查询的基础上为传统查询优化器提供提示,从而解决了传统查询优化器的规划错误,从而将它们引导到正确的方向。本文介绍了qos - insight,这是一个可视化工具,用于探索此类导向查询优化器的查询执行轨迹。虽然导向查询优化器通常被视为黑盒,但qos - insight使数据库管理员和专家能够获得定性的见解,并通过视觉检查和分析提高性能。
{"title":"QO-Insight: Inspecting Steered Query Optimizers","authors":"Christoph Anneser, Mario Petruccelli, Nesime Tatbul, David Cohen, Zhenggang Xu, Prithviraj Pandian, Nikolay Laptev, Ryan Marcus, Alfons Kemper","doi":"10.14778/3611540.3611586","DOIUrl":"https://doi.org/10.14778/3611540.3611586","url":null,"abstract":"Steered query optimizers address the planning mistakes of traditional query optimizers by providing them with hints on a per-query basis, thereby guiding them in the right direction. This paper introduces QO-Insight, a visual tool designed for exploring query execution traces of such steered query optimizers. Although steered query optimizers are typically perceived as black boxes, QO-Insight empowers database administrators and experts to gain qualitative insights and enhance their performance through visual inspection and analysis.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134950906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings of the Vldb Endowment
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1