gpu的回合制时空相干性

IF 2.9 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE ACM Transactions on Architecture and Code Optimization Pub Date : 2023-07-19 DOI:https://dl.acm.org/doi/10.1145/3593054

Sooraj Puthoor, Mikko H. Lipasti

{"title":"gpu的回合制时空相干性","authors":"Sooraj Puthoor, Mikko H. Lipasti","doi":"https://dl.acm.org/doi/10.1145/3593054","DOIUrl":null,"url":null,"abstract":"This article introduces turn-based spatiotemporal coherence. Spatiotemporal coherence is a novel coherence implementation that assigns write permission to epochs (or turns) as opposed to a processor core. This paradigm shift in the assignment of write permissions satisfies all conditions of a coherence protocol with virtually no coherence overhead. We discuss the implementation of this coherence mechanism on a baseline GPU. The evaluation shows that spatiotemporal coherence achieves a speedup of 7.13% for workloads with read data reuse across kernels compared to the baseline software-managed GPU coherence implementation while also providing write atomicity and avoiding the need for software inserted acquire-release operations.1","PeriodicalId":50920,"journal":{"name":"ACM Transactions on Architecture and Code Optimization","volume":"1 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2023-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Turn-based Spatiotemporal Coherence for GPUs\",\"authors\":\"Sooraj Puthoor, Mikko H. Lipasti\",\"doi\":\"https://dl.acm.org/doi/10.1145/3593054\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article introduces turn-based spatiotemporal coherence. Spatiotemporal coherence is a novel coherence implementation that assigns write permission to epochs (or turns) as opposed to a processor core. This paradigm shift in the assignment of write permissions satisfies all conditions of a coherence protocol with virtually no coherence overhead. We discuss the implementation of this coherence mechanism on a baseline GPU. The evaluation shows that spatiotemporal coherence achieves a speedup of 7.13% for workloads with read data reuse across kernels compared to the baseline software-managed GPU coherence implementation while also providing write atomicity and avoiding the need for software inserted acquire-release operations.1\",\"PeriodicalId\":50920,\"journal\":{\"name\":\"ACM Transactions on Architecture and Code Optimization\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2023-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Architecture and Code Optimization\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/https://dl.acm.org/doi/10.1145/3593054\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Architecture and Code Optimization","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3593054","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

本文介绍了基于回合的时空相干性。时空相干是一种新的相干实现，它将写权限分配给epoch(或turn)，而不是处理器核心。这种写权限分配的范式转换满足一致性协议的所有条件，几乎没有一致性开销。我们讨论了这种一致性机制在基线GPU上的实现。评估表明，与基线软件管理的GPU一致性实现相比，时空一致性在跨内核读取数据重用的工作负载上实现了7.13%的加速，同时还提供了写原子性并避免了对软件插入的获取-释放操作的需要

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Turn-based Spatiotemporal Coherence for GPUs

This article introduces turn-based spatiotemporal coherence. Spatiotemporal coherence is a novel coherence implementation that assigns write permission to epochs (or turns) as opposed to a processor core. This paradigm shift in the assignment of write permissions satisfies all conditions of a coherence protocol with virtually no coherence overhead. We discuss the implementation of this coherence mechanism on a baseline GPU. The evaluation shows that spatiotemporal coherence achieves a speedup of 7.13% for workloads with read data reuse across kernels compared to the baseline software-managed GPU coherence implementation while also providing write atomicity and avoiding the need for software inserted acquire-release operations.¹

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Architecture and Code Optimization 工程技术-计算机：理论方法

CiteScore

3.60

自引率

6.20%

发文量

审稿时长

6-12 weeks

期刊介绍： ACM Transactions on Architecture and Code Optimization (TACO) focuses on hardware, software, and system research spanning the fields of computer architecture and code optimization. Articles that appear in TACO will either present new techniques and concepts or report on experiences and experiments with actual systems. Insights useful to architects, hardware or software developers, designers, builders, and users will be emphasized.

期刊最新文献

Hermes: Efficient Serving of LLM Applications with Probabilistic Demand Modeling RACER: Avoiding End-to-End Slowdowns in Accelerated Chip Multi-Processors DLAS: A Conceptual Model for Across-Stack Deep Learning Acceleration A Survey of General-purpose Polyhedral Compilers Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM Architecture