ACM Sigplan Notices最新文献

英文中文

High-coverage, unbounded sound predictive race detection 高覆盖率，无界声音预测种族检测

Q1 Computer Science

ACM Sigplan Notices

Pub Date : 2018-06-11 DOI: 10.1145/3296979.3192385

J. Roemer, K. Genç, Michael D. Bond

Dynamic program analysis can predict data races knowable from an observed execution, but existing predictive analyses either miss races or cannot analyze full program executions. This paper presents Vindicator, a novel, sound (no false races) predictive approach that finds more data races than existing predictive approaches. Vindicator achieves high coverage by using a new, efficient analysis that finds all possible predictable races but may detect false races. Vindicator ensures soundness using a novel algorithm that checks each potential race to determine whether it is a true predictable race. An evaluation using large Java programs shows that Vindicator finds hard-to-detect predictable races that existing sound predictive analyses miss, at a comparable performance cost.

动态程序分析可以预测从观察到的执行可知的数据竞争，但是现有的预测分析要么错过竞争，要么不能分析完整的程序执行。本文介绍了Vindicator，一种新颖的，可靠的(无虚假竞争)预测方法，它比现有的预测方法发现更多的数据竞争。Vindicator通过使用一种新的、有效的分析来实现高覆盖率，这种分析可以发现所有可能的可预测的种族，但可能会发现虚假的种族。Vindicator使用一种新颖的算法来确保可靠性，该算法检查每个潜在的竞争，以确定它是否是真正可预测的竞争。使用大型Java程序的评估表明，Vindicator以相当的性能成本发现了现有可靠的预测分析遗漏的难以检测的可预测竞赛。

引用次数: 32

Persistency for synchronization-free regions 无同步区域的持久性

Q1 Computer Science

ACM Sigplan Notices

Pub Date : 2018-06-11 DOI: 10.1145/3296979.3192367

V. Gogte, S. Diestelhorst, William Wang, S. Narayanasamy, Peter M. Chen, T. Wenisch

Nascent persistent memory (PM) technologies promise the performance of DRAM with the durability of disk, but how best to integrate them into programming systems remains an open question. Recent work extends language memory models with a persistency model prescribing semantics for updates to PM. These semantics enable programmers to design data structures in PM that are accessed like memory and yet are recoverable upon crash or failure. Alas, we find the semantics and performance of existing approaches unsatisfying. Existing approaches require high-overhead mechanisms, are restricted to certain synchronization constructs, provide incomplete semantics, and/or may recover to state that cannot arise in fault-free execution. We propose persistency semantics that guarantee failure atomicity of synchronization-free regions (SFRs) - program regions delimited by synchronization operations. Our approach provides clear semantics for the PM state recovery code may observe and extends C++11's "sequential consistency for data-race-free" guarantee to post-failure recovery code. We investigate two designs for failure-atomic SFRs that vary in performance and the degree to which commit of persistent state may lag execution. We demonstrate both approaches in LLVM v3.6.0 and compare to a state-of-the-art baseline to show performance improvement up to 87.5% (65.5% avg).

新生的持久内存(PM)技术保证了DRAM的性能和磁盘的耐用性，但是如何最好地将它们集成到编程系统中仍然是一个悬而未决的问题。最近的工作扩展了语言记忆模型，使用持久性模型规定了更新到PM的语义。这些语义使程序员能够在PM中设计数据结构，这些数据结构可以像内存一样访问，并且在崩溃或故障时可以恢复。遗憾的是，我们发现现有方法的语义和性能都不令人满意。现有的方法需要高开销的机制，受限于某些同步构造，提供不完整的语义，和/或可能恢复到在无故障执行中无法出现的状态。我们提出了保证无同步区(SFRs)——由同步操作划分的程序区域的故障原子性的持久性语义。我们的方法为PM状态恢复代码提供了清晰的语义，可以将c++ 11的“无数据竞争的顺序一致性”保证扩展到故障后恢复代码。我们研究了两种故障原子sfr的设计，它们在性能和持久状态提交可能延迟执行的程度上有所不同。我们在LLVM v3.6.0中演示了这两种方法，并与最先进的基线进行了比较，显示性能提高高达87.5%(平均65.5%)。

{"title":"Persistency for synchronization-free regions","authors":"V. Gogte, S. Diestelhorst, William Wang, S. Narayanasamy, Peter M. Chen, T. Wenisch","doi":"10.1145/3296979.3192367","DOIUrl":"https://doi.org/10.1145/3296979.3192367","url":null,"abstract":"Nascent persistent memory (PM) technologies promise the performance of DRAM with the durability of disk, but how best to integrate them into programming systems remains an open question. Recent work extends language memory models with a persistency model prescribing semantics for updates to PM. These semantics enable programmers to design data structures in PM that are accessed like memory and yet are recoverable upon crash or failure. Alas, we find the semantics and performance of existing approaches unsatisfying. Existing approaches require high-overhead mechanisms, are restricted to certain synchronization constructs, provide incomplete semantics, and/or may recover to state that cannot arise in fault-free execution. We propose persistency semantics that guarantee failure atomicity of synchronization-free regions (SFRs) - program regions delimited by synchronization operations. Our approach provides clear semantics for the PM state recovery code may observe and extends C++11's \"sequential consistency for data-race-free\" guarantee to post-failure recovery code. We investigate two designs for failure-atomic SFRs that vary in performance and the degree to which commit of persistent state may lag execution. We demonstrate both approaches in LLVM v3.6.0 and compare to a state-of-the-art baseline to show performance improvement up to 87.5% (65.5% avg).","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"73 1","pages":"46 - 61"},"PeriodicalIF":0.0,"publicationDate":"2018-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83626690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 80

Enhancing computation-to-core assignment with physical location information 利用物理位置信息增强计算到核心的分配

Q1 Computer Science

ACM Sigplan Notices

Pub Date : 2018-06-11 DOI: 10.1145/3296979.3192386

Orhan Kislal, Jagadish B. Kotra, Xulong Tang, M. Kandemir, Myoungsoo Jung

Going beyond a certain number of cores in modern architectures requires an on-chip network more scalable than conventional buses. However, employing an on-chip network in a manycore system (to improve scalability) makes the latencies of the data accesses issued by a core non-uniform. This non-uniformity can play a significant role in shaping the overall application performance. This work presents a novel compiler strategy which involves exposing architecture information to the compiler to enable an optimized computation-to-core mapping. Specifically, we propose a compiler-guided scheme that takes into account the relative positions of (and distances between) cores, last-level caches (LLCs) and memory controllers (MCs) in a manycore system, and generates a mapping of computations to cores with the goal of minimizing the on-chip network traffic. The experimental data collected using a set of 21 multi-threaded applications reveal that, on an average, our approach reduces the on-chip network latency in a 6×6 manycore system by 38.4% in the case of private LLCs, and 43.8% in the case of shared LLCs. These improvements translate to the corresponding execution time improvements of 10.9% and 12.7% for the private LLC and shared LLC based systems, respectively.

在现代架构中，超过一定数量的核心需要一个比传统总线更具可扩展性的片上网络。然而，在多核系统中使用片上网络(以提高可伸缩性)会使由一个核心发出的数据访问延迟不一致。这种不均匀性在影响应用程序的整体性能方面起着重要的作用。这项工作提出了一种新的编译器策略，它包括向编译器公开体系结构信息，以实现优化的计算到核心的映射。具体来说，我们提出了一种编译器引导的方案，该方案考虑了多核系统中内核，最后一级缓存(llc)和内存控制器(mc)的相对位置(和之间的距离)，并生成了计算到内核的映射，目标是最小化片上网络流量。使用一组21个多线程应用程序收集的实验数据表明，平均而言，我们的方法在私有llc的情况下将6×6多核系统的片上网络延迟降低了38.4%，在共享llc的情况下降低了43.8%。这些改进转化为基于私有LLC和基于共享LLC的系统相应的执行时间改进，分别为10.9%和12.7%。

{"title":"Enhancing computation-to-core assignment with physical location information","authors":"Orhan Kislal, Jagadish B. Kotra, Xulong Tang, M. Kandemir, Myoungsoo Jung","doi":"10.1145/3296979.3192386","DOIUrl":"https://doi.org/10.1145/3296979.3192386","url":null,"abstract":"Going beyond a certain number of cores in modern architectures requires an on-chip network more scalable than conventional buses. However, employing an on-chip network in a manycore system (to improve scalability) makes the latencies of the data accesses issued by a core non-uniform. This non-uniformity can play a significant role in shaping the overall application performance. This work presents a novel compiler strategy which involves exposing architecture information to the compiler to enable an optimized computation-to-core mapping. Specifically, we propose a compiler-guided scheme that takes into account the relative positions of (and distances between) cores, last-level caches (LLCs) and memory controllers (MCs) in a manycore system, and generates a mapping of computations to cores with the goal of minimizing the on-chip network traffic. The experimental data collected using a set of 21 multi-threaded applications reveal that, on an average, our approach reduces the on-chip network latency in a 6×6 manycore system by 38.4% in the case of private LLCs, and 43.8% in the case of shared LLCs. These improvements translate to the corresponding execution time improvements of 10.9% and 12.7% for the private LLC and shared LLC based systems, respectively.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"33 17","pages":"312 - 327"},"PeriodicalIF":0.0,"publicationDate":"2018-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3296979.3192386","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72366245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics 胶子:分布式异构图形分析的通信优化基板

Q1 Computer Science

ACM Sigplan Notices

Pub Date : 2018-06-11 DOI: 10.1145/3296979.3192404

Roshan Dathathri, G. Gill, Loc Hoang, Hoang-Vu Dang, Alex Brooks, Nikoli Dryden, M. Snir, K. Pingali

This paper introduces a new approach to building distributed-memory graph analytics systems that exploits heterogeneity in processor types (CPU and GPU), partitioning policies, and programming models. The key to this approach is Gluon, a communication-optimizing substrate. Programmers write applications in a shared-memory programming system of their choice and interface these applications with Gluon using a lightweight API. Gluon enables these programs to run on heterogeneous clusters and optimizes communication in a novel way by exploiting structural and temporal invariants of graph partitioning policies. To demonstrate Gluon’s ability to support different programming models, we interfaced Gluon with the Galois and Ligra shared-memory graph analytics systems to produce distributed-memory versions of these systems named D-Galois and D-Ligra, respectively. To demonstrate Gluon’s ability to support heterogeneous processors, we interfaced Gluon with IrGL, a state-of-the-art single-GPU system for graph analytics, to produce D-IrGL, the first multi-GPU distributed-memory graph analytics system. Our experiments were done on CPU clusters with up to 256 hosts and roughly 70,000 threads and on multi-GPU clusters with up to 64 GPUs. The communication optimizations in Gluon improve end-to-end application execution time by ∼2.6× on the average. D-Galois and D-IrGL scale well and are faster than Gemini, the state-of-the-art distributed CPU graph analytics system, by factors of ∼3.9× and ∼4.9×, respectively, on the average.

本文介绍了一种构建分布式内存图形分析系统的新方法，该方法利用处理器类型(CPU和GPU)、分区策略和编程模型的异质性。这种方法的关键是Gluon，一种通信优化基板。程序员在他们选择的共享内存编程系统中编写应用程序，并使用轻量级API将这些应用程序与Gluon连接起来。Gluon使这些程序能够在异构集群上运行，并通过利用图分区策略的结构和时间不变量以一种新颖的方式优化通信。为了证明Gluon支持不同编程模型的能力，我们将Gluon与Galois和Ligra共享内存图形分析系统连接起来，分别生成了这些系统的分布式内存版本，分别命名为D-Galois和D-Ligra。为了证明Gluon支持异构处理器的能力，我们将Gluon与IrGL(一种用于图形分析的最先进的单gpu系统)连接起来，产生了D-IrGL，这是第一个多gpu分布式内存图形分析系统。我们的实验是在多达256个主机和大约70,000个线程的CPU集群和多达64个gpu的多gpu集群上完成的。Gluon中的通信优化平均将端到端应用程序的执行时间提高了约2.6倍。D-Galois和D-IrGL具有良好的可扩展性，并且比最先进的分布式CPU图形分析系统Gemini的平均速度分别提高了~ 3.9倍和~ 4.9倍。

{"title":"Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics","authors":"Roshan Dathathri, G. Gill, Loc Hoang, Hoang-Vu Dang, Alex Brooks, Nikoli Dryden, M. Snir, K. Pingali","doi":"10.1145/3296979.3192404","DOIUrl":"https://doi.org/10.1145/3296979.3192404","url":null,"abstract":"This paper introduces a new approach to building distributed-memory graph analytics systems that exploits heterogeneity in processor types (CPU and GPU), partitioning policies, and programming models. The key to this approach is Gluon, a communication-optimizing substrate. Programmers write applications in a shared-memory programming system of their choice and interface these applications with Gluon using a lightweight API. Gluon enables these programs to run on heterogeneous clusters and optimizes communication in a novel way by exploiting structural and temporal invariants of graph partitioning policies. To demonstrate Gluon’s ability to support different programming models, we interfaced Gluon with the Galois and Ligra shared-memory graph analytics systems to produce distributed-memory versions of these systems named D-Galois and D-Ligra, respectively. To demonstrate Gluon’s ability to support heterogeneous processors, we interfaced Gluon with IrGL, a state-of-the-art single-GPU system for graph analytics, to produce D-IrGL, the first multi-GPU distributed-memory graph analytics system. Our experiments were done on CPU clusters with up to 256 hosts and roughly 70,000 threads and on multi-GPU clusters with up to 64 GPUs. The communication optimizations in Gluon improve end-to-end application execution time by ∼2.6× on the average. D-Galois and D-IrGL scale well and are faster than Gemini, the state-of-the-art distributed CPU graph analytics system, by factors of ∼3.9× and ∼4.9×, respectively, on the average.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"17 1","pages":"752 - 768"},"PeriodicalIF":0.0,"publicationDate":"2018-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80131698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 121

D4: fast concurrency debugging with parallel differential analysis D4:快速并发调试与并行差分分析

Q1 Computer Science

ACM Sigplan Notices

Pub Date : 2018-06-11 DOI: 10.1145/3296979.3192390

Bozhen Liu, Jeff Huang

We present D4, a fast concurrency analysis framework that detects concurrency bugs (e.g., data races and deadlocks) interactively in the programming phase. As developers add, modify, and remove statements, the code changes are sent to D4 to detect concurrency bugs in real time, which in turn provides immediate feedback to the developer of the new bugs. The cornerstone of D4 includes a novel system design and two novel parallel differential algorithms that embrace both change and parallelization for fundamental static analyses of concurrent programs. Both algorithms react to program changes by memoizing the analysis results and only recomputing the impact of a change in parallel. Our evaluation on an extensive collection of large real-world applications shows that D4 efficiently pinpoints concurrency bugs within 100ms on average after a code change, several orders of magnitude faster than both the exhaustive analysis and the state-of-the-art incremental techniques.

我们提出了D4，一个快速并发分析框架，可以在编程阶段交互式地检测并发错误(例如，数据竞争和死锁)。当开发人员添加、修改和删除语句时，代码更改将被发送到D4以实时检测并发性错误，从而向开发人员提供新错误的即时反馈。D4的基础包括一种新颖的系统设计和两种新颖的并行差分算法，这些算法包含了用于并发程序基本静态分析的变化和并行化。这两种算法对程序变化的反应都是通过记忆分析结果，并且只并行地重新计算变化的影响。我们对大量实际大型应用程序的评估表明，D4在代码更改后平均在100毫秒内有效地找出并发性错误，比详尽分析和最先进的增量技术快几个数量级。

引用次数: 35

Modularity for decidability of deductive verification with applications to distributed systems 应用于分布式系统的演绎验证的可判定性的模块化

Q1 Computer Science

ACM Sigplan Notices

Pub Date : 2018-06-11 DOI: 10.1145/3296979.3192414

Marcelo Taube, Giuliano Losa, K. McMillan, O. Padon, M. Sagiv, Sharon Shoham, James R. Wilcox, Doug Woos

Proof automation can substantially increase productivity in formal verification of complex systems. However, unpredictablility of automated provers in handling quantified formulas presents a major hurdle to usability of these tools. We propose to solve this problem not by improving the provers, but by using a modular proof methodology that allows us to produce decidable verification conditions. Decidability greatly improves predictability of proof automation, resulting in a more practical verification approach. We apply this methodology to develop verified implementations of distributed protocols, demonstrating its effectiveness.

证明自动化可以大大提高复杂系统正式验证的生产率。然而，自动化证明程序在处理量化公式时的不可预测性是这些工具可用性的主要障碍。我们建议解决这个问题，不是通过改进证明者，而是通过使用模块化证明方法，使我们能够产生可确定的验证条件。可判定性大大提高了证明自动化的可预测性，从而产生了更实用的验证方法。我们将此方法应用于开发分布式协议的验证实现，以证明其有效性。

引用次数: 52

Pinpoint: fast and precise sparse value flow analysis for million lines of code Pinpoint:对百万行代码进行快速、精确的稀疏价值流分析

Q1 Computer Science

ACM Sigplan Notices

Pub Date : 2018-06-11 DOI: 10.1145/3296979.3192418

Qingkai Shi, Xiao Xiao, Rongxin Wu, Jinguo Zhou, Gang Fan, Charles Zhang

When dealing with millions of lines of code, we still cannot have the cake and eat it: sparse value-flow analysis is powerful in checking source-sink problems, but existing work cannot escape from the “pointer trap” – a precise points-to analysis limits its scalability and an imprecise one seriously undermines its precision. We present Pinpoint, a holistic approach that decomposes the cost of high-precision points-to analysis by precisely discovering local data dependence and delaying the expensive inter-procedural analysis through memorization. Such memorization enables the on-demand slicing of only the necessary inter-procedural data dependence and path feasibility queries, which are then solved by a costly SMT solver. Experiments show that Pinpoint can check programs such as MySQL (around 2 million lines of code) within 1.5 hours. The overall false positive rate is also very low (14.3% - 23.6%). Pinpoint has discovered over forty real bugs in mature and extensively checked open source systems. And the implementation of Pinpoint and all experimental results are freely available.

当处理数百万行代码时，我们仍然不能兼得:稀疏的价值流分析在检查源-汇问题方面很强大，但是现有的工作无法摆脱“指针陷阱”——精确的点到分析限制了它的可伸缩性，而不精确的分析严重地破坏了它的精度。我们提出了一种全面的方法，通过精确地发现局部数据依赖性和通过记忆延迟昂贵的过程间分析来分解高精度点到分析的成本。这样的记忆使得只需按需切片必要的过程间数据依赖性和路径可行性查询，然后由昂贵的SMT求解器解决。实验表明，Pinpoint可以在1.5小时内检查MySQL(大约200万行代码)等程序。总体假阳性率也很低(14.3% - 23.6%)。在经过广泛检查的成熟开源系统中，Pinpoint已经发现了40多个真正的bug。并免费提供了Pinpoint的实现和所有实验结果。

引用次数: 79

Accelerating search-based program synthesis using learned probabilistic models 使用学习概率模型加速基于搜索的程序合成

Q1 Computer Science

ACM Sigplan Notices

Pub Date : 2018-06-11 DOI: 10.1145/3296979.3192410

Woosuk Lee, K. Heo, R. Alur, M. Naik

A key challenge in program synthesis concerns how to efficiently search for the desired program in the space of possible programs. We propose a general approach to accelerate search-based program synthesis by biasing the search towards likely programs. Our approach targets a standard formulation, syntax-guided synthesis (SyGuS), by extending the grammar of possible programs with a probabilistic model dictating the likelihood of each program. We develop a weighted search algorithm to efficiently enumerate programs in order of their likelihood. We also propose a method based on transfer learning that enables to effectively learn a powerful model, called probabilistic higher-order grammar, from known solutions in a domain. We have implemented our approach in a tool called Euphony and evaluate it on SyGuS benchmark problems from a variety of domains. We show that Euphony can learn good models using easily obtainable solutions, and achieves significant performance gains over existing general-purpose as well as domain-specific synthesizers.

程序综合的一个关键挑战是如何在可能的程序空间中有效地搜索所需的程序。我们提出了一种通用的方法，通过将搜索偏向于可能的程序来加速基于搜索的程序合成。我们的方法目标是一个标准的公式，语法引导合成(SyGuS)，通过一个概率模型来扩展可能程序的语法，规定每个程序的可能性。我们开发了一种加权搜索算法，以有效地按可能性顺序枚举程序。我们还提出了一种基于迁移学习的方法，该方法能够从一个领域的已知解中有效地学习一个强大的模型，称为概率高阶语法。我们已经在一个名为Euphony的工具中实现了我们的方法，并对来自各种领域的SyGuS基准问题进行了评估。我们展示了Euphony可以使用易于获得的解决方案学习良好的模型，并且比现有的通用和特定领域的合成器实现了显着的性能提升。

引用次数: 95

MixT: a language for mixing consistency in geodistributed transactions MixT:一种用于混合地理分布事务一致性的语言

Q1 Computer Science

ACM Sigplan Notices

Pub Date : 2018-06-11 DOI: 10.1145/3296979.3192375

Mae Milano, A. Myers

Programming concurrent, distributed systems is hard—especially when these systems mutate shared, persistent state replicated at geographic scale. To enable high availability and scalability, a new class of weakly consistent data stores has become popular. However, some data needs strong consistency. To manipulate both weakly and strongly consistent data in a single transaction, we introduce a new abstraction: mixed-consistency transactions, embodied in a new embedded language, MixT. Programmers explicitly associate consistency models with remote storage sites; each atomic, isolated transaction can access a mixture of data with different consistency models. Compile-time information-flow checking, applied to consistency models, ensures that these models are mixed safely and enables the compiler to automatically partition transactions. New run-time mechanisms ensure that consistency models can also be mixed safely, even when the data used by a transaction resides on separate, mutually unaware stores. Performance measurements show that despite their stronger guarantees, mixed-consistency transactions retain much of the speed of weak consistency, significantly outperforming traditional serializable transactions.

编程并发的分布式系统非常困难，特别是当这些系统在地理范围内复制的共享、持久状态发生变化时。为了实现高可用性和可伸缩性，一类新的弱一致性数据存储开始流行起来。但是，有些数据需要很强的一致性。为了在单个事务中操作弱一致性和强一致性数据，我们引入了一种新的抽象:混合一致性事务，它体现在一种新的嵌入式语言MixT中。程序员显式地将一致性模型与远程存储站点关联起来;每个原子的、孤立的事务都可以访问具有不同一致性模型的混合数据。应用于一致性模型的编译时信息流检查确保这些模型安全地混合，并使编译器能够自动对事务进行分区。新的运行时机制确保一致性模型也可以安全地混合，即使事务使用的数据驻留在独立的、互不知情的存储中。性能测量表明，尽管混合一致性事务具有更强的保证，但它仍然保持了弱一致性的大部分速度，显著优于传统的可序列化事务。

{"title":"MixT: a language for mixing consistency in geodistributed transactions","authors":"Mae Milano, A. Myers","doi":"10.1145/3296979.3192375","DOIUrl":"https://doi.org/10.1145/3296979.3192375","url":null,"abstract":"Programming concurrent, distributed systems is hard—especially when these systems mutate shared, persistent state replicated at geographic scale. To enable high availability and scalability, a new class of weakly consistent data stores has become popular. However, some data needs strong consistency. To manipulate both weakly and strongly consistent data in a single transaction, we introduce a new abstraction: mixed-consistency transactions, embodied in a new embedded language, MixT. Programmers explicitly associate consistency models with remote storage sites; each atomic, isolated transaction can access a mixture of data with different consistency models. Compile-time information-flow checking, applied to consistency models, ensures that these models are mixed safely and enables the compiler to automatically partition transactions. New run-time mechanisms ensure that consistency models can also be mixed safely, even when the data used by a transaction resides on separate, mutually unaware stores. Performance measurements show that despite their stronger guarantees, mixed-consistency transactions retain much of the speed of weak consistency, significantly outperforming traditional serializable transactions.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"20 1","pages":"226 - 241"},"PeriodicalIF":0.0,"publicationDate":"2018-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78471870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Locality analysis through static parallel sampling 静态并行抽样的局部性分析

Q1 Computer Science

ACM Sigplan Notices

Pub Date : 2018-06-11 DOI: 10.1145/3296979.3192402

Dong Chen, Fangzhou Liu, C. Ding, Sreepathi Pai

Locality analysis is important since accessing memory is much slower than computing. Compile-time locality analysis can provide detailed program-level feedback for compilers or runtime systems faster than trace-based locality analysis. In this paper, we describe a new approach to locality analysis based on static parallel sampling. A compiler analyzes loop-based code and generates sampler code which is run to measure locality. Our approach can predict precise cache line granularity miss ratio curves for complex loops with non-linear array references and even branches. The precision and overhead of static sampling are evaluated using PolyBench and a bit-reversal loop. Our result shows that by randomly sampling 2% of loop iterations, a compiler can construct almost exact miss ratio curves as trace based analysis. Sampling 0.5% and 1% iterations can achieve good precision and efficiency with an average 0.6% to 1% the time of tracing respectively. Our analysis can also be parallelized. The analysis may assist program optimization techniques such as tiling, program co-location, cache hint selection and help to analyze write locality and parallel locality.

局部性分析很重要，因为访问内存比计算慢得多。编译时局部性分析可以比基于跟踪的局部性分析更快地为编译器或运行时系统提供详细的程序级反馈。本文提出了一种新的基于静态并行采样的局部性分析方法。编译器分析基于循环的代码并生成采样代码，该采样代码用于测量局部性。我们的方法可以精确地预测具有非线性数组引用和甚至分支的复杂循环的缓存线粒度缺失率曲线。使用PolyBench和位反转回路评估静态采样的精度和开销。我们的结果表明，通过随机抽取2%的循环迭代，编译器可以构建几乎精确的缺失率曲线作为基于跟踪的分析。采样0.5%和1%迭代可以获得良好的精度和效率，平均跟踪时间分别为0.6% ~ 1%。我们的分析也可以并行化。该分析可能有助于程序优化技术，如平铺、程序共定位、缓存提示选择，并有助于分析写局域性和并行局域性。

引用次数: 22

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ACM Sigplan Notices

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀