ACM SIGMOD Record最新文献

英文中文

Technical Perspective: (Pre-) Semirings Come to the Recursion Party 技术观点:(Pre-)半星人来参加递归派对

ACM SIGMOD Record

Pub Date : 2023-06-07 DOI: 10.1145/3604437.3604453

A. Rudra

(This article is an imagined conversation with my U. at Buffalo UG algorithms class students.)

(本文是我与布法罗大学UG算法课学生的假想对话。)

引用次数: 0

Technical Perspective: When is it safe to run a transactional workload under Read Committed? 技术观点:在Read Committed下运行事务工作负载何时是安全的?

ACM SIGMOD Record

Pub Date : 2023-06-07 DOI: 10.1145/3604437.3604445

A. Fekete

A data management platform provides many capabilities to assist the data owner, application coder, or end-user. For example, it should support an expressive query language, schema definition, and sophisticated access control. Another way many platforms add value is through a transaction mechanism, which allows the application programmer to indicate that a stretch of code, including multiple accesses to data, represents a single real-world activity and so all these steps should happen as if a single step, despite really being interleaved with other programs, or perhaps cancelled after partial execution. If the platform perfectly hides interleaving of different activities, the execution is called serializable, and this is a great aid to protecting data quality. Any integrity constraint over the data (whether explicitly declared in schema or not) which is preserved by each transaction running alone, is also valid at the end of any serializable execution of several transactions.

数据管理平台提供了许多功能来帮助数据所有者、应用程序编码器或最终用户。例如，它应该支持表达性查询语言、模式定义和复杂的访问控制。许多平台增加价值的另一种方式是通过事务机制，它允许应用程序程序员指示一段代码，包括对数据的多次访问，代表一个真实世界的活动，因此所有这些步骤应该像一个步骤一样发生，尽管实际上与其他程序交错，或者可能在部分执行后取消。如果平台完美地隐藏了不同活动的交错，则执行称为可序列化的，这对保护数据质量有很大帮助。单独运行的每个事务保留的数据上的任何完整性约束(无论是否在模式中显式声明)，在多个事务的任何可序列化执行结束时也是有效的。

引用次数: 0

An Optimal Algorithm for Partial Order Multiway Search 一种偏序多路搜索的最优算法

ACM SIGMOD Record

Pub Date : 2023-06-07 DOI: 10.1145/3604437.3604456

Shangqi Lu, W. Martens, Matthias Niewerth, Yufei Tao

Partial order multiway search (POMS) is an important problem that finds use in crowdsourcing, distributed file systems, software testing, etc. In this problem, a game is played between an algorithm A and an oracle, based on a directed acyclic graph G known to both parties. First, the oracle picks a vertex t in G called the target; then, A aims to figure out which vertex is t by probing reachability. In each probe, A selects a set Q of vertices in G whose size is bounded by a pre-agreed value k, and the oracle then reveals, for each vertex q 2 Q, whether q can reach the target in G. The objective of A is to minimize the number of probes. This article presents an algorithm to solve POMS in O(log1+k n + d k log1+d n) probes, where n is the number of vertices in G, and d is the largest out-degree of the vertices in G. The probing complexity is asymptotically optimal.

偏序多路搜索(POMS)是众包、分布式文件系统、软件测试等领域的一个重要问题。在这个问题中，基于双方已知的有向无环图G，在算法a和oracle之间进行博弈。首先，oracle在G中选择一个顶点t，称为目标;那么，A的目的是通过探测可达性来找出哪个顶点是t。在每个探测中，A在G中选择一个集合Q的顶点，其大小以预先约定的值k为界，然后oracle显示，对于每个顶点Q 2q, Q是否可以到达G中的目标。A的目标是最小化探测的数量。本文提出了一种用O(log1+k n +k log1+d n)个探针求解POMS的算法，其中n为G中顶点的个数，d为G中顶点的最大出度，探测复杂度是渐近最优的。

引用次数: 0

Technical Perspective: Sortledton: a Universal Graph Data Structure 技术视角:Sortledton:一个通用的图数据结构

ACM SIGMOD Record

Pub Date : 2023-06-07 DOI: 10.1145/3604437.3604441

A. Bonifati

Graph processing is becoming ubiquitous due to the proliferation of interconnected data in several domains, including life sciences, social networks, cybersecurity, finance and logistics, to name a few. In parallel with the growth of the underlying graph data sources, a plethora of graph workloads have appeared, ranging from graph analytics to graph traversals and graph pattern matching. Graph systems executing both complex and simple graph workloads need to leverage adequate data structures for efficiently processing heterogeneous graph data. While the underlying graph data structures have been extensively studied for the static case, they are less understood for the dynamic case, with the data undergoing several updates per second. Moreover, the existing solutions suffer lack of generality, as they focus on one specific requirement and workload type at a time. Designing a universal data structure that adapts to several kinds of graph workloads in a dynamic setting and achieves significant efficiency on all of them is far from being trivial.

由于在生命科学、社交网络、网络安全、金融和物流等多个领域互联数据的激增，图形处理正变得无处不在。随着底层图数据源的增长，出现了大量的图工作负载，从图分析到图遍历和图模式匹配。执行复杂和简单图形工作负载的图形系统都需要利用足够的数据结构来有效地处理异构图形数据。虽然静态情况下的底层图数据结构已经得到了广泛的研究，但动态情况下的底层图数据结构却很少被理解，因为动态情况下的数据每秒要进行几次更新。此外，现有的解决方案缺乏通用性，因为它们一次只关注一种特定的需求和工作负载类型。设计一种通用的数据结构，以适应动态设置中的几种图形工作负载，并在所有这些工作负载上实现显著的效率，这绝非易事。

引用次数: 0

Technical Perspective for Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory 谢尔曼的技术观点:一个写优化的分布式B+树索引在分解内存

ACM SIGMOD Record

Pub Date : 2023-06-07 DOI: 10.1145/3604437.3604447

Tim Kraska

Separation of compute and storage has become the defacto standard for cloud database systems. First proposed in 2007 for database systems [2], it is now widely adopted by all major cloud providers such as Amazon Redshift, Google BigQuery, and Snowflake. Separation of compute and storage adds enormous value for the customer. Users can scale storage independently of compute, which enables them to only pay for what they really uses. Consider a scenario in which data grows linearly over time, but most queries only access the last month of data, which remains relatively stable. Without the separation of compute and storage, the user would gradually be forced to significantly increase the database cluster capacity. In contrast, modern cloud database systems allow scaling the storage separately from compute; the compute cluster stays the same over time, whereas the data is stored on cheap cloud storage services, like Amazon S3.

计算和存储的分离已经成为云数据库系统事实上的标准。它于2007年首次提出用于数据库系统[2]，现在被所有主要的云提供商(如Amazon Redshift, Google BigQuery和Snowflake)广泛采用。计算和存储的分离为客户增加了巨大的价值。用户可以独立于计算扩展存储，这使得他们只需为他们真正使用的东西付费。考虑这样一个场景，其中数据随时间线性增长，但是大多数查询只访问上个月的数据，这保持相对稳定。如果没有计算和存储的分离，用户将逐渐被迫大幅增加数据库集群的容量。相比之下，现代云数据库系统允许将存储与计算分开扩展;随着时间的推移，计算集群保持不变，而数据存储在便宜的云存储服务上，如Amazon S3。

引用次数: 1

Technical Perspective: Query Answers - Fewer is Faster 技术角度:查询答案-越少越快

ACM SIGMOD Record

Pub Date : 2023-06-07 DOI: 10.1145/3604437.3604451

L. Libkin

We often write queries using LIMIT k, indicating that only k answers are to be returned. This feature is present in most query languages, for different data models: SQL, SPARQL, Cypher etc. For example, in a repository of about 250M SPARQL queries, about 15M queries are of this form. Not surprisingly of course, the database research community studied such queries extensively. The dominant setting is this: there is an ordering on tuples that can be returned by a query. Then the answer is limited to the first k tuples in this ordering.

我们经常使用LIMIT k来编写查询，表示只返回k个答案。这个特性存在于大多数查询语言中，适用于不同的数据模型:SQL、SPARQL、Cypher等。例如，在大约有250M个SPARQL查询的存储库中，大约有15M个查询是这种形式。当然，毫不奇怪，数据库研究社区广泛地研究了这类查询。主要的设置是这样的:查询可以返回的元组有一个排序。那么答案就被限制在这个顺序的前k个元组中。

引用次数: 0

Conjunctive Queries with Comparisons 带有比较的连接查询

ACM SIGMOD Record

Pub Date : 2023-06-07 DOI: 10.1145/3604437.3604450

Qichen Wang, K. Yi

Conjunctive queries with predicates in the form of comparisons that span multiple relations have regained interest recently, due to their relevance in OLAP queries, spatiotemporal databases, and machine learning over relational data. The standard technique, predicate pushdown, has limited efficacy on such comparisons. A technique by Willard can be used to process short comparisons that are adjacent in the join tree in time linear in the input size plus output size. In this paper, we describe a new algorithm for evaluating conjunctive queries with both short and long comparisons, and identify an acyclic condition under which linear time can be achieved. We have also implemented the new algorithm on top of Spark, and our experimental results demonstrate order-of-magnitude speedups over SparkSQL on a variety of graph patterns and analytical queries.

以跨多个关系的比较形式使用谓词的联合查询最近重新引起了人们的兴趣，因为它们在OLAP查询、时空数据库和关系数据上的机器学习中具有相关性。标准的谓词下推技术在这种比较中效果有限。Willard的一种技术可用于处理连接树中输入大小和输出大小在时间上呈线性关系的相邻的短比较。在本文中，我们描述了一种具有短比较和长比较的联合查询的新算法，并确定了一个可以实现线性时间的非循环条件。我们还在Spark上实现了新算法，我们的实验结果表明，在各种图形模式和分析查询上，它比SparkSQL的速度提高了数量级。

引用次数: 1

Efficiently Making Cross-Engine Transactions Consistent 有效地使跨引擎事务保持一致

ACM SIGMOD Record

Pub Date : 2023-06-07 DOI: 10.1145/3604437.3604444

Jianqiu Zhang, Kaisong Huang, Tianzheng Wang, King Lv

Database systems are becoming increasingly multi-engine. In particular, a main-memory engine may coexist with a traditional storage-centric engine in a system to support various applications. It is desirable to allow applications to access data in both engines using cross-engine transactions. But existing systems are either only designed for singleengine accesses, or impose many restrictions by limiting crossengine transactions to certain isolation levels and operations. The result is inadequate cross-engine support in terms of correctness, performance and programmability.

数据库系统正变得越来越多引擎。特别是，在一个系统中，主存引擎可以与传统的以存储为中心的引擎共存，以支持各种应用程序。希望允许应用程序使用跨引擎事务访问两个引擎中的数据。但是，现有的系统要么仅为单引擎访问而设计，要么通过将跨引擎事务限制在特定的隔离级别和操作来施加许多限制。其结果是在正确性、性能和可编程性方面缺乏跨引擎支持。

引用次数: 0

Technical Perspective: Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs 技术角度:通过基数估计图的镜头精确的基于摘要的基数估计

ACM SIGMOD Record

Pub Date : 2023-06-07 DOI: 10.1145/3604437.3604457

Dan Suciu

Query engines are really good at choosing an efficient query plan. Users don't need to worry about how they write their query, since the optimizer makes all the right choices for executing the query, while taking into account all aspects of data, such as its size, the characteristics of the storage device, the distribution pattern, the availability of indexes, and so on. The query optimizer always makes the best choice, no matter how complex the query is, or how contrived it was written. Or, this is what we expect today from a modern query optimizer. Unfortunately, reality is not as nice.

查询引擎非常擅长选择高效的查询计划。用户不需要担心如何编写查询，因为优化器为执行查询做出了所有正确的选择，同时考虑了数据的所有方面，例如数据的大小、存储设备的特征、分布模式、索引的可用性等等。查询优化器总是做出最佳选择，不管查询有多复杂，或者编写得有多不自然。或者，这就是我们今天对现代查询优化器的期望。不幸的是，现实并不那么美好。

引用次数: 1

Technical Perspective: Optimal Algorithms for Multiway Search on Partial Orders 技术视角:偏阶多路搜索的最优算法

ACM SIGMOD Record

Pub Date : 2023-06-07 DOI: 10.1145/3604437.3604455

Rajesh Jayaram

Given a list of comparable items A = {a1, . . . , an sorted so that a1 < a2 < . . . < an, a canonical problem is locating a target item q within A if it exists. The canonical algorithm for this problem, of course, is binary search, which locates q using at most O(log n) comparisons between q and elements of A. Binary search is an indispensable tool for totally ordered datasets. However, many naturally occurring datasets are only partially ordered (posets), meaning that not all pairs of elements are comparable. Every such poset can be expressed as a directed acyclic graph (DAG), with edges (x,y) representing the relation x < y.

给定一个可比较项目列表a = {a1，…，排序使a1 < a2 <…。< an，一个典型问题是在a中定位目标项q，如果它存在的话。当然，这个问题的标准算法是二分搜索，它在q和a的元素之间最多使用O(log n)次比较来定位q。二分搜索是完全有序数据集不可缺少的工具。然而，许多自然发生的数据集只是部分有序的(poset)，这意味着并非所有元素对都是可比较的。每个这样的偏序集都可以表示为一个有向无环图(DAG)，其中边(x,y)表示关系x < y。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ACM SIGMOD Record

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀