并发数据结构轻松学（扩展版）

arXiv - CS - Programming Languages Pub Date : 2024-08-25 DOI:arxiv-2408.13779

Callista Le, Kiran Gopinathan, Koon Wen Lee, Seth Gilbert, Ilya Sergey

{"title":"并发数据结构轻松学（扩展版）","authors":"Callista Le, Kiran Gopinathan, Koon Wen Lee, Seth Gilbert, Ilya Sergey","doi":"arxiv-2408.13779","DOIUrl":null,"url":null,"abstract":"Design of an efficient thread-safe concurrent data structure is a balancing\nact between its implementation complexity and performance. Lock-based\nconcurrent data structures, which are relatively easy to derive from their\nsequential counterparts and to prove thread-safe, suffer from poor throughput\nunder even light multi-threaded workload. At the same time, lock-free\nconcurrent structures allow for high throughput, but are notoriously difficult\nto get right and require careful reasoning to formally establish their\ncorrectness. We explore a solution to this conundrum based on batch parallelism, an\napproach for designing concurrent data structures via a simple insight:\nefficiently processing a batch of a priori known operations in parallel is\neasier than optimising performance for a stream of arbitrary asynchronous\nrequests. Alas, batch-parallel structures have not seen wide practical adoption\ndue to (i) the inconvenience of having to structure multi-threaded programs to\nexplicitly group operations and (ii) the lack of a systematic methodology to\nimplement batch-parallel structures as simply as lock-based ones. We present OBatcher-an OCaml library that streamlines the design,\nimplementation, and usage of batch-parallel structures. It solves the first\nchallenge (how to use) by suggesting a new lightweight implicit batching design\nthat is built on top of generic asynchronous programming mechanisms. The second\nchallenge (how to implement) is addressed by identifying a family of strategies\nfor converting common sequential structures into efficient batch-parallel ones.\nWe showcase OBatcher with a diverse set of benchmarks. Our evaluation of all\nthe implementations on large asynchronous workloads shows that (a) they\nconsistently outperform the corresponding coarse-grained lock-based\nimplementations and that (b) their throughput scales reasonably with the number\nof processors.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"58 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Concurrent Data Structures Made Easy (Extended Version)\",\"authors\":\"Callista Le, Kiran Gopinathan, Koon Wen Lee, Seth Gilbert, Ilya Sergey\",\"doi\":\"arxiv-2408.13779\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Design of an efficient thread-safe concurrent data structure is a balancing\\nact between its implementation complexity and performance. Lock-based\\nconcurrent data structures, which are relatively easy to derive from their\\nsequential counterparts and to prove thread-safe, suffer from poor throughput\\nunder even light multi-threaded workload. At the same time, lock-free\\nconcurrent structures allow for high throughput, but are notoriously difficult\\nto get right and require careful reasoning to formally establish their\\ncorrectness. We explore a solution to this conundrum based on batch parallelism, an\\napproach for designing concurrent data structures via a simple insight:\\nefficiently processing a batch of a priori known operations in parallel is\\neasier than optimising performance for a stream of arbitrary asynchronous\\nrequests. Alas, batch-parallel structures have not seen wide practical adoption\\ndue to (i) the inconvenience of having to structure multi-threaded programs to\\nexplicitly group operations and (ii) the lack of a systematic methodology to\\nimplement batch-parallel structures as simply as lock-based ones. We present OBatcher-an OCaml library that streamlines the design,\\nimplementation, and usage of batch-parallel structures. It solves the first\\nchallenge (how to use) by suggesting a new lightweight implicit batching design\\nthat is built on top of generic asynchronous programming mechanisms. The second\\nchallenge (how to implement) is addressed by identifying a family of strategies\\nfor converting common sequential structures into efficient batch-parallel ones.\\nWe showcase OBatcher with a diverse set of benchmarks. Our evaluation of all\\nthe implementations on large asynchronous workloads shows that (a) they\\nconsistently outperform the corresponding coarse-grained lock-based\\nimplementations and that (b) their throughput scales reasonably with the number\\nof processors.\",\"PeriodicalId\":501197,\"journal\":{\"name\":\"arXiv - CS - Programming Languages\",\"volume\":\"58 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Programming Languages\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.13779\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.13779","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

设计一种高效的线程安全并发数据结构，需要在实现复杂性和性能之间取得平衡。基于锁的并发数据结构相对容易从其顺序对应结构中推导出来，也容易证明线程安全，但在轻度多线程工作负载下吞吐量很低。与此同时，无锁并发结构允许高吞吐量，但众所周知很难正确处理，需要仔细推理才能正式确定其正确性。我们探索了一种基于批处理并行性的解决方案，这是一种设计并发数据结构的方法，其原理很简单：高效地并行处理一批先验已知的操作，比优化任意异步请求流的性能更容易。遗憾的是，批处理并行结构尚未得到广泛的实际应用，原因在于：(i) 多线程程序的结构不便明确地对操作进行分组；(ii) 缺乏系统的方法来像基于锁的结构那样简单地实现批处理并行结构。我们提出了 OBatcher--一个可以简化批处理并行结构的设计、实现和使用的 OCaml 库。它在通用异步编程机制的基础上提出了一种新的轻量级隐式批处理设计，从而解决了第一个挑战（如何使用）。第二个挑战（如何实现）是通过确定一系列将常见顺序结构转换为高效批处理并行结构的策略来解决的。我们在大型异步工作负载上对所有实现进行了评估，结果表明：(a) 它们的性能始终优于相应的基于粗粒度锁的实现；(b) 它们的吞吐量随处理器数量的增加而合理扩展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Concurrent Data Structures Made Easy (Extended Version)

Design of an efficient thread-safe concurrent data structure is a balancing act between its implementation complexity and performance. Lock-based concurrent data structures, which are relatively easy to derive from their sequential counterparts and to prove thread-safe, suffer from poor throughput under even light multi-threaded workload. At the same time, lock-free concurrent structures allow for high throughput, but are notoriously difficult to get right and require careful reasoning to formally establish their correctness. We explore a solution to this conundrum based on batch parallelism, an approach for designing concurrent data structures via a simple insight: efficiently processing a batch of a priori known operations in parallel is easier than optimising performance for a stream of arbitrary asynchronous requests. Alas, batch-parallel structures have not seen wide practical adoption due to (i) the inconvenience of having to structure multi-threaded programs to explicitly group operations and (ii) the lack of a systematic methodology to implement batch-parallel structures as simply as lock-based ones. We present OBatcher-an OCaml library that streamlines the design, implementation, and usage of batch-parallel structures. It solves the first challenge (how to use) by suggesting a new lightweight implicit batching design that is built on top of generic asynchronous programming mechanisms. The second challenge (how to implement) is addressed by identifying a family of strategies for converting common sequential structures into efficient batch-parallel ones. We showcase OBatcher with a diverse set of benchmarks. Our evaluation of all the implementations on large asynchronous workloads shows that (a) they consistently outperform the corresponding coarse-grained lock-based implementations and that (b) their throughput scales reasonably with the number of processors.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助