Callback: Efficient synchronization without invalidation with a directory just for spin-waiting

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2015-06-13 DOI:10.1145/2749469.2750405

Alberto Ros, S. Kaxiras

{"title":"Callback: Efficient synchronization without invalidation with a directory just for spin-waiting","authors":"Alberto Ros, S. Kaxiras","doi":"10.1145/2749469.2750405","DOIUrl":null,"url":null,"abstract":"Cache coherence protocols based on self-invalidation allow a simpler design compared to traditional invalidation-based protocols, by relying on data-race-free (DRF) semantics and applying self-invalidation on racy synchronization points exposed to the hardware. Their simplicity lies in the absence of invalidation traffic, which eliminates the need to track readers in a directory, and reduces the number of transient protocol states. With the addition of self-downgrade these protocols can become effectively directory-free. While this works well for race-free data, unfortunately, lack of explicit invalidations compromises the effectiveness of any synchronization that relies on races. This includes any form of spin waiting, which is employed for signaling, locking, and barrier primitives. In this work we propose a new solution for spin-waiting in these protocols, the callback mechanism, that is simpler and more efficient than explicit invalidation. Callbacks are set by reads involved in spin waiting, and are satisfied by writes (that can even precede these reads). To implement callbacks we use a small (just a few entries) directory-cache structure that is intended to service only these “spin-waiting” races. This directory structure is self-contained and is not backed up in any way. Entries are created on demand and can be evicted without the need to preserve their information. Our evaluation shows a significant improvement both over explicit invalidation and over exponential back-off, the state-of-the-art mechanism for self-invalidation protocols to avoid spinning in the shared cache.","PeriodicalId":6878,"journal":{"name":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","volume":"3 1","pages":"427-438"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2749469.2750405","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

Abstract

Cache coherence protocols based on self-invalidation allow a simpler design compared to traditional invalidation-based protocols, by relying on data-race-free (DRF) semantics and applying self-invalidation on racy synchronization points exposed to the hardware. Their simplicity lies in the absence of invalidation traffic, which eliminates the need to track readers in a directory, and reduces the number of transient protocol states. With the addition of self-downgrade these protocols can become effectively directory-free. While this works well for race-free data, unfortunately, lack of explicit invalidations compromises the effectiveness of any synchronization that relies on races. This includes any form of spin waiting, which is employed for signaling, locking, and barrier primitives. In this work we propose a new solution for spin-waiting in these protocols, the callback mechanism, that is simpler and more efficient than explicit invalidation. Callbacks are set by reads involved in spin waiting, and are satisfied by writes (that can even precede these reads). To implement callbacks we use a small (just a few entries) directory-cache structure that is intended to service only these “spin-waiting” races. This directory structure is self-contained and is not backed up in any way. Entries are created on demand and can be evicted without the need to preserve their information. Our evaluation shows a significant improvement both over explicit invalidation and over exponential back-off, the state-of-the-art mechanism for self-invalidation protocols to avoid spinning in the shared cache.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

回调:有效的同步，不会使目录失效，只是为了等待旋转

与传统的基于失效的协议相比，基于自失效的缓存一致性协议允许更简单的设计，它依赖于数据无竞争(DRF)语义，并在暴露给硬件的动态同步点上应用自失效。它们的简单性在于没有无效通信流，这消除了在目录中跟踪读取器的需要，并减少了临时协议状态的数量。通过添加自降级功能，这些协议可以有效地摆脱目录限制。虽然这对于没有竞争的数据很有效，但不幸的是，缺乏显式的失效会影响依赖于竞争的任何同步的有效性。这包括用于信令、锁定和屏障原语的任何形式的自旋等待。在这项工作中，我们提出了一种新的自旋等待解决方案，即回调机制，它比显式失效更简单，更有效。回调由spin等待中涉及的读操作设置，并由写操作满足(甚至可以在这些读操作之前)。为了实现回调，我们使用一个小的(只有几个条目)目录缓存结构，该结构旨在仅为这些“自旋等待”竞争提供服务。此目录结构是自包含的，不以任何方式进行备份。条目是按需创建的，可以在不保留其信息的情况下删除。我们的评估显示，与显式失效和指数回退(用于避免在共享缓存中旋转的自我失效协议的最先进机制)相比，有了显著的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)

自引率

0.00%

发文量

期刊最新文献

Redundant Memory Mappings for fast access to large memories Multiple Clone Row DRAM: A low latency and area optimized DRAM Manycore Network Interfaces for in-memory rack-scale computing Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures ShiDianNao: Shifting vision processing closer to the sensor