ACM SIGMOD Record最新文献

英文中文

Efficient Data Sharing across Trust Domains 跨信任域的高效数据共享

ACM SIGMOD Record

Pub Date : 2023-08-10 DOI: 10.1145/3615952.3615962

Natacha Crooks

Cross-Trust-Domain Processing. Data is now a commodity. We know how to compute and store it efficiently and reliably at scale. We have, however, paid less attention to the notion of trust. Yet, data owners today are no longer the entities storing or processing their data (medical records are stored on the cloud, data is shared across banks, etc.). In fact, distributed systems today consist of many different parties, whether it is cloud providers, jurisdictions, organisations or humans. Modern data processing and storage always straddles trust domains.

Cross-Trust-Domain处理。数据现在是一种商品。我们知道如何高效、可靠地大规模计算和存储数据。然而，我们对信任的概念关注较少。然而，今天的数据所有者不再是存储或处理其数据的实体(医疗记录存储在云上，数据在银行之间共享，等等)。事实上，今天的分布式系统由许多不同的方组成，无论是云提供商、司法管辖区、组织还是人类。现代数据处理和存储总是跨越信任域。

引用次数: 0

The Shapley Value in Database Management 数据库管理中的Shapley值

ACM SIGMOD Record

Pub Date : 2023-08-10 DOI: 10.1145/3615952.3615954

L. Bertossi, B. Kimelfeld, Ester Livshits, Mikaël Monet

Attribution scores can be applied in data management to quantify the contribution of individual items to conclusions from the data, as part of the explanation of what led to these conclusions. In Artificial Intelligence, Machine Learning, and Data Management, some of the common scores are deployments of the Shapley value, a formula for profit sharing in cooperative game theory. Since its invention in the 1950s, the Shapley value has been used for contribution measurement in many fields, from economics to law, with its latest researched applications in modern machine learning. Recent studies investigated the application of the Shapley value to database management. This article gives an overview of recent results on the computational complexity of the Shapley value for measuring the contribution of tuples to query answers and to the extent of inconsistency with respect to integrity constraints. More specifically, the article highlights lower and upper bounds on the complexity of calculating the Shapley value, either exactly or approximately, as well as solutions for realizing the calculation in practice.

归因分数可以应用于数据管理，以量化单个项目对数据结论的贡献，作为解释导致这些结论的原因的一部分。在人工智能、机器学习和数据管理中，一些常见的分数是Shapley值的部署，Shapley值是合作博弈论中利润分享的公式。自20世纪50年代发明以来，沙普利值已被用于从经济学到法律等许多领域的贡献衡量，其最新研究应用于现代机器学习。最近的研究调查了Shapley值在数据库管理中的应用。本文概述了Shapley值的计算复杂性的最新结果，Shapley值用于测量元组对查询答案的贡献，以及完整性约束不一致的程度。更具体地说，本文强调了Shapley值精确或近似计算复杂性的下界和上界，以及在实践中实现计算的解决方案。

引用次数: 2

R2T: Instance-optimal Truncation for Differentially Private Query Evaluation with Foreign Keys R2T:带有外键的差分私有查询求值的实例最优截断

ACM SIGMOD Record

Pub Date : 2023-06-08 DOI: 10.1145/3604437.3604462

Wei Dong, Juanru Fang, K. Yi, Yuchao Tao, Ashwin Machanavajjhala

Answering SPJA queries under differential privacy (DP), including graph pattern counting under node-DP as an important special case, has received considerable attention in recent years. The dual challenge of foreign-key constraints and self-joins is particularly tricky to deal with, and no existing DP mechanisms can correctly handle both. For the special case of graph pattern counting under node-DP, the existing mechanisms are correct (i.e., satisfy DP), but they do not offer nontrivial utility guarantees or are very complicated and costly. In this paper, we propose the first DP mechanism for answering arbitrary SPJA queries in a database with foreign-key constraints. Meanwhile, it achieves a fairly strong notion of optimality, which can be considered as a small and natural relaxation of instance optimality. Finally, our mechanism is simple enough that it can be easily implemented on top of any RDBMS and an LP solver. Experimental results show that it offers order-of-magnitude improvements in terms of utility over existing techniques, even those specifically designed for graph pattern counting.

在差分隐私(DP)下的SPJA查询的回答，包括作为重要特例的节点-DP下的图模式计数，近年来受到了广泛的关注。处理外键约束和自连接的双重挑战特别棘手，现有的DP机制无法正确处理这两个问题。对于节点DP下的图模式计数的特殊情况，现有的机制是正确的(即满足DP)，但它们不能提供非平凡的效用保证，或者非常复杂和昂贵。在本文中，我们提出了第一种DP机制，用于在具有外键约束的数据库中回答任意SPJA查询。同时，它实现了一个相当强的最优性概念，这可以看作是实例最优性的一个小而自然的放松。最后，我们的机制非常简单，可以在任何RDBMS和LP求解器上轻松实现。实验结果表明，它在效用方面比现有技术提供了数量级的改进，即使是那些专门为图形模式计数设计的技术。

引用次数: 1

Technical Perspective on 'R2T: Instance-optimal Truncation for Differentially Private Query Evaluation with Foreign Keys R2T的技术展望:带有外键的差分私有查询求值的实例最优截断

ACM SIGMOD Record

Pub Date : 2023-06-07 DOI: 10.1145/3604437.3604461

Graham Cormode

Increased use of data to inform decision making has brought with it a rising awareness of the importance of privacy, and the need for appropriate mitigations to be put in place to protect the interests of individuals whose data is being processed. From the demographic statistics that are produced by national censuses, to the complex predictive models built by "big tech" companies, data is the fuel that powers these applications. A majority of such uses rely on data that is derived from the properties and actions of individual people. This data is therefore considered sensitive, and in need of protections to prevent inappropriate use or disclosure. Some protections come from enforcing policies, access control, and contractual agreements. But in addition, we also seek technical interventions: definitions and algorithms that can be applied by computer systems in order to protect the private information while still enabling the intended use.

越来越多地使用数据为决策提供信息，使人们越来越认识到隐私的重要性，并认识到需要采取适当的缓解措施，以保护正在处理其数据的个人的利益。从国家人口普查产生的人口统计数据，到“大型科技”公司建立的复杂预测模型，数据是推动这些应用的燃料。大多数此类应用依赖于从个人属性和行为中获得的数据。因此，这些数据被认为是敏感的，需要保护以防止不当使用或披露。一些保护来自于执行策略、访问控制和合同协议。但除此之外，我们还寻求技术干预:计算机系统可以应用的定义和算法，以便在保护私人信息的同时仍能实现预期用途。

引用次数: 0

Convergence of Datalog over (Pre-) Semirings 数据在(预)半环上的收敛性

ACM SIGMOD Record

Pub Date : 2023-06-07 DOI: 10.1145/3604437.3604454

Mahmoud Abo Khamis, H. Ngo, R. Pichler, Dan Suciu, Y. Wang

Recursive queries have been traditionally studied in the framework of datalog, a language that restricts recursion to monotone queries over sets, which is guaranteed to converge in polynomial time in the size of the input. But modern big data systems require recursive computations beyond the Boolean space. In this paper we study the convergence of datalog when it is interpreted over an arbitrary semiring. We consider an ordered semiring, define the semantics of a datalog program as a least fixpoint in this semiring, and study the number of steps required to reach that fixpoint, if ever. We identify algebraic properties of the semiring that correspond to certain convergence properties of datalog programs. Finally, we describe a class of ordered semirings on which one can generalize the semi-na¨ve evaluation algorithm to compute their minimal fixpoints.

递归查询传统上是在datalog的框架下研究的，这种语言将递归限制为对集合的单调查询，保证在输入大小的多项式时间内收敛。但现代大数据系统需要超越布尔空间的递归计算。本文研究了在任意半环上解释数据表时的收敛性。我们考虑一个有序半环，将数据程序的语义定义为该半环中的最小不动点，并研究到达该不动点所需的步骤数(如果有的话)。我们确定了与数据规划的某些收敛性质相对应的半环的代数性质。最后，我们描述了一类有序半环，人们可以在其上推广半朴素求值算法来计算它们的最小不动点。

引用次数: 13

Technical Perspective: Revisiting Runtime Dynamic Optimization for Join Queries in Big Data Management Systems 技术视角:回顾大数据管理系统中连接查询的运行时动态优化

ACM SIGMOD Record

Pub Date : 2023-06-07 DOI: 10.1145/3604437.3604459

Andreas Kipf

Query optimization is the process of finding an efficient query execution plan for a given SQL query. The runtime difference between a good and a bad plan can be tremendous. For example, in the case of TPC-H query 5, a query with 5 joins, the difference between the best and the worst plan is more than 10,000×. Therefore, it is vital to avoid bad plans. The dominating factor which differentiates a good from a bad plan is their join order and whether this join order avoids large intermediate results.

查询优化是为给定的SQL查询找到有效的查询执行计划的过程。一个好计划和一个坏计划之间的运行时差异可能是巨大的。例如，在TPC-H查询5的情况下，有5个连接的查询，最佳计划和最差计划之间的差异超过10,000倍。因此，避免糟糕的计划是至关重要的。区分一个好计划和一个坏计划的主要因素是它们的连接顺序，以及这个连接顺序是否避免了大的中间结果。

引用次数: 0

Sortledton: a Universal Graph Data Structure Sortledton:一个通用的图数据结构

ACM SIGMOD Record

Pub Date : 2023-06-07 DOI: 10.1145/3604437.3604442

Per Fuchs, D. Margan, Jana Giceva

Despite the wide adoption of graph processing across many different application domains, there is no underlying data structure that can serve a variety of graph workloads (analytics, traversals, and pattern matching) on dynamic graphs with single edge updates updates.

尽管在许多不同的应用程序领域中广泛采用了图处理，但没有底层数据结构可以在具有单边更新的动态图上服务于各种图工作负载(分析、遍历和模式匹配)。

引用次数: 0

Ad Hoc Transactions: What They Are and Why We Should Care 特设事务:它们是什么以及为什么我们应该关心

ACM SIGMOD Record

Pub Date : 2023-06-07 DOI: 10.1145/3604437.3604440

Chuzhe Tang, Zhaoguo Wang, Xiaodong Zhang, Qianmian Yu, B. Zang, Hai-bing Guan, Haibo Chen

Many transactions in web applications are constructed ad hoc in the application code. For example, developers might explicitly use locking primitives or validation procedures to coordinate critical code fragments. We refer to database operations coordinated by application code as ad hoc transactions. Until now, little is known about them. This paper presents the first comprehensive study on ad hoc transactions. By studying 91 ad hoc transactions among 8 popular open-source web applications, we find that (i) every studied application uses ad hoc transactions (up to 16 per application), 71 of which play critical roles; (ii) compared with database transactions, concurrency control of ad hoc transactions is much more flexible; (iii) ad hoc transactions are error-prone-53 of them have correctness issues, and 33 of them are confirmed by developers; and (iv) ad hoc transactions have the potential to improve performance in contentious workloads by utilizing application semantics such as access patterns. Finally, implications of ad hoc transactions to the database research community are discussed.

web应用程序中的许多事务都是在应用程序代码中特别构造的。例如，开发人员可能显式地使用锁定原语或验证过程来协调关键的代码片段。我们将由应用程序代码协调的数据库操作称为临时事务。到目前为止，人们对它们知之甚少。本文首次对临时交易进行了全面研究。通过研究8个流行的开源web应用程序中的91个临时事务，我们发现(i)每个研究的应用程序都使用临时事务(每个应用程序多达16个)，其中71个起关键作用;(ii)与数据库事务相比，临时事务的并发控制要灵活得多;(iii) AD hoc交易容易出错——其中53个有正确性问题，其中33个被开发人员确认;(iv)通过利用诸如访问模式之类的应用程序语义，特设事务有可能在有争议的工作负载中提高性能。最后，讨论了特设事务对数据库研究界的影响。

引用次数: 0

Threshold Queries 阈值查询

ACM SIGMOD Record

Pub Date : 2023-06-07 DOI: 10.1145/3604437.3604452

A. Bonifati, Stefania Dumbrava, G. Fletcher, J. Hidders, Matthias Hofer, W. Martens, Filip Murlak, Joshua Shinavier, S. Staworko, Dominik Tomaszuk

Threshold queries are an important class of queries that only require computing or counting answers up to a specified threshold value. To the best of our knowledge, threshold queries have been largely disregarded in the research literature, which is surprising considering how common they are in practice. We explore how such queries appear in practice and present a method that can be used to significantly improve the asymptotic bounds of their state-of-the-art evaluation algorithms. Our experimental evaluation of these methods shows order-of-magnitude performance improvements.

阈值查询是一类重要的查询，它只需要计算或计数直至指定阈值的答案。据我们所知，阈值查询在研究文献中很大程度上被忽视了，考虑到它们在实践中是多么普遍，这是令人惊讶的。我们探索了这种查询在实践中是如何出现的，并提出了一种方法，可用于显着改善其最先进的评估算法的渐近边界。我们对这些方法的实验评估显示了数量级的性能改进。

引用次数: 0

Technical Perspective: Conjunctive Queries with Comparisons 技术角度:带有比较的连接查询

ACM SIGMOD Record

Pub Date : 2023-06-07 DOI: 10.1145/3604437.3604449

Stijn Vansummeren

Query processing, the art of efficiently executing a relational query on a given database, is a foundational and core area in data management research. Established at the dawn of relational database systems in the 1970's, relational query processing remains a highly relevant and vibrant research topic today as recent work shows that, apart from its application in traditional database scenarios, it is also highly effective in optimizing machine learning workloads [1].

查询处理是在给定数据库上高效执行关系查询的艺术，是数据管理研究的基础和核心领域。关系查询处理建立于20世纪70年代关系数据库系统的曙光，今天仍然是一个高度相关和充满活力的研究课题，因为最近的工作表明，除了在传统数据库场景中的应用之外，它在优化机器学习工作负载方面也非常有效[1]。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ACM SIGMOD Record

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀