{"title":"Implementation of atomic primitives on distributed shared memory multiprocessors","authors":"Maged M. Michael, M. Scott","doi":"10.1109/HPCA.1995.386540","DOIUrl":null,"url":null,"abstract":"In this paper we consider several hardware implementations of the general-purpose atomic primitives fetch and /spl Phi/, compare and swap, load linked, and store conditional on large-scale shared-memory multiprocessors. These primitives have proven popular on small-scale bets-based machines, but have yet to become widely available on large-scale, distributed shared memory machines. We propose several alternative hardware implementations of these primitives, and then analyze the performance of these implementations for various data sharing patterns. Our results indicate that good overall performance can be obtained by implementing compare and swap in the cache controllers, and by providing an additional instruction to load an exclusive copy of a cache line.<<ETX>>","PeriodicalId":330315,"journal":{"name":"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.1995.386540","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29
Abstract
In this paper we consider several hardware implementations of the general-purpose atomic primitives fetch and /spl Phi/, compare and swap, load linked, and store conditional on large-scale shared-memory multiprocessors. These primitives have proven popular on small-scale bets-based machines, but have yet to become widely available on large-scale, distributed shared memory machines. We propose several alternative hardware implementations of these primitives, and then analyze the performance of these implementations for various data sharing patterns. Our results indicate that good overall performance can be obtained by implementing compare and swap in the cache controllers, and by providing an additional instruction to load an exclusive copy of a cache line.<>