{"title":"SELCC: Coherent Caching over Compute-Limited Disaggregated Memory","authors":"Ruihong Wang, Jianguo Wang, Walid G. Aref","doi":"arxiv-2409.02088","DOIUrl":null,"url":null,"abstract":"Disaggregating memory from compute offers the opportunity to better utilize\nstranded memory in data centers. It is important to cache data in the compute\nnodes and maintain cache coherence across multiple compute nodes to save on\nround-trip communication cost between the disaggregated memory and the compute\nnodes. However, the limited computing power on the disaggregated memory servers\nmakes it challenging to maintain cache coherence among multiple compute-side\ncaches over disaggregated shared memory. This paper introduces SELCC; a\nShared-Exclusive Latch Cache Coherence protocol that maintains cache coherence\nwithout imposing any computational burden on the remote memory side. SELCC\nbuilds on a one-sided shared-exclusive latch protocol by introducing lazy latch\nrelease and invalidation messages among the compute nodes so that it can\nguarantee both data access atomicity and cache coherence. SELCC minimizes\ncommunication round-trips by embedding the current cache copy holder IDs into\nRDMA latch words and prioritizes local concurrency control over global\nconcurrency control. We instantiate the SELCC protocol onto compute-sided\ncache, forming an abstraction layer over disaggregated memory. This abstraction\nlayer provides main-memory-like APIs to upper-level applications, and thus\nenabling existing data structures and algorithms to function over disaggregated\nmemory with minimal code change. To demonstrate the usability of SELCC, we\nimplement a B-tree and three transaction concurrency control algorithms over\nSELCC's APIs. Micro-benchmark results show that the SELCC protocol achieves\nbetter performance compared to RPC-based cache-coherence protocols.\nAdditionally, YCSB and TPC-C benchmarks indicate that applications over SELCC\ncan achieve comparable or superior performance against competitors over\ndisaggregated memory.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"72 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.02088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Disaggregating memory from compute offers the opportunity to better utilize
stranded memory in data centers. It is important to cache data in the compute
nodes and maintain cache coherence across multiple compute nodes to save on
round-trip communication cost between the disaggregated memory and the compute
nodes. However, the limited computing power on the disaggregated memory servers
makes it challenging to maintain cache coherence among multiple compute-side
caches over disaggregated shared memory. This paper introduces SELCC; a
Shared-Exclusive Latch Cache Coherence protocol that maintains cache coherence
without imposing any computational burden on the remote memory side. SELCC
builds on a one-sided shared-exclusive latch protocol by introducing lazy latch
release and invalidation messages among the compute nodes so that it can
guarantee both data access atomicity and cache coherence. SELCC minimizes
communication round-trips by embedding the current cache copy holder IDs into
RDMA latch words and prioritizes local concurrency control over global
concurrency control. We instantiate the SELCC protocol onto compute-sided
cache, forming an abstraction layer over disaggregated memory. This abstraction
layer provides main-memory-like APIs to upper-level applications, and thus
enabling existing data structures and algorithms to function over disaggregated
memory with minimal code change. To demonstrate the usability of SELCC, we
implement a B-tree and three transaction concurrency control algorithms over
SELCC's APIs. Micro-benchmark results show that the SELCC protocol achieves
better performance compared to RPC-based cache-coherence protocols.
Additionally, YCSB and TPC-C benchmarks indicate that applications over SELCC
can achieve comparable or superior performance against competitors over
disaggregated memory.