PARS:支持多种区域大小的模式感知空间数据预取器

IF 2.7 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI:10.1109/TCAD.2024.3442981
Yiquan Lin;Wenhai Lin;Jiexiong Xu;Yiquan Chen;Zhen Jin;Jingchang Qin;Jiahao He;Shishun Cai;Yuzhong Zhang;Zonghui Wang;Wenzhi Chen
{"title":"PARS:支持多种区域大小的模式感知空间数据预取器","authors":"Yiquan Lin;Wenhai Lin;Jiexiong Xu;Yiquan Chen;Zhen Jin;Jingchang Qin;Jiahao He;Shishun Cai;Yuzhong Zhang;Zonghui Wang;Wenzhi Chen","doi":"10.1109/TCAD.2024.3442981","DOIUrl":null,"url":null,"abstract":"Hardware data prefetching is a well-studied technique to bridge the processor-memory performance gap. Bit-pattern-based prefetchers are one of the most promising spatial data prefetchers that achieve substantial performance gains. In bit-pattern-based prefetchers, the region size is a crucial parameter, which denotes the memory size that can be recorded by a pattern or prefetched by a prediction. However, existing bit-pattern-based prefetchers only support one fixed region size. Our experiment shows that the fixed region size cannot meet the requirements for numerous applications and leads to suboptimal performance and high hardware overhead. In this article, we propose PARS, a pattern-aware spatial data prefetcher supporting multiple region sizes. The key idea of PARS is that it supports multiple region sizes, enabling it to simultaneously enhance application performance while reducing the hardware overhead. Moreover, PARS supports dynamically switching appropriate region sizes for different patterns through an adaptive RS-switching mechanism. We evaluated PARS on numerous workloads and results show that PARS provides an average performance improvement of 40.6% over a baseline with no data prefetchers and outperforms the two state-of-the-art prefetchers Bingo by 2.1% (up to 24.4%) and Pythia by 3.9% (up to 111.2%) in the single-core system. In the four-core system, PARS outperforms Bingo by 5.0% (up to 66.0%) and Pythia by 5.4% (up to 177.9%).","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3638-3649"},"PeriodicalIF":2.7000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PARS: A Pattern-Aware Spatial Data Prefetcher Supporting Multiple Region Sizes\",\"authors\":\"Yiquan Lin;Wenhai Lin;Jiexiong Xu;Yiquan Chen;Zhen Jin;Jingchang Qin;Jiahao He;Shishun Cai;Yuzhong Zhang;Zonghui Wang;Wenzhi Chen\",\"doi\":\"10.1109/TCAD.2024.3442981\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hardware data prefetching is a well-studied technique to bridge the processor-memory performance gap. Bit-pattern-based prefetchers are one of the most promising spatial data prefetchers that achieve substantial performance gains. In bit-pattern-based prefetchers, the region size is a crucial parameter, which denotes the memory size that can be recorded by a pattern or prefetched by a prediction. However, existing bit-pattern-based prefetchers only support one fixed region size. Our experiment shows that the fixed region size cannot meet the requirements for numerous applications and leads to suboptimal performance and high hardware overhead. In this article, we propose PARS, a pattern-aware spatial data prefetcher supporting multiple region sizes. The key idea of PARS is that it supports multiple region sizes, enabling it to simultaneously enhance application performance while reducing the hardware overhead. Moreover, PARS supports dynamically switching appropriate region sizes for different patterns through an adaptive RS-switching mechanism. We evaluated PARS on numerous workloads and results show that PARS provides an average performance improvement of 40.6% over a baseline with no data prefetchers and outperforms the two state-of-the-art prefetchers Bingo by 2.1% (up to 24.4%) and Pythia by 3.9% (up to 111.2%) in the single-core system. In the four-core system, PARS outperforms Bingo by 5.0% (up to 66.0%) and Pythia by 5.4% (up to 177.9%).\",\"PeriodicalId\":13251,\"journal\":{\"name\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"volume\":\"43 11\",\"pages\":\"3638-3649\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10745807/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10745807/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

硬件数据预取是一项经过深入研究的技术,可弥合处理器与内存之间的性能差距。基于比特模式的预取器是最有前途的空间数据预取器之一,可大幅提高性能。在基于位模式的预取器中,区域大小是一个关键参数,它表示可被模式记录或被预测预取的内存大小。然而,现有的基于位模式的预取器只支持一种固定的区域大小。我们的实验表明,固定区域大小无法满足众多应用的要求,而且会导致性能不理想和高硬件开销。在本文中,我们提出了支持多种区域大小的模式感知空间数据预取器 PARS。PARS 的主要理念是支持多种区域大小,从而在提高应用性能的同时降低硬件开销。此外,PARS 还通过自适应 RS 切换机制,支持针对不同模式动态切换适当的区域大小。我们在大量工作负载上对 PARS 进行了评估,结果表明,与没有数据预取器的基线相比,PARS 的平均性能提高了 40.6%,在单核系统中,PARS 的性能比两个最先进的预取器 Bingo 高出 2.1%(最高达 24.4%),比 Pythia 高出 3.9%(最高达 111.2%)。在四核系统中,PARS 的性能比 Bingo 高出 5.0%(最高达 66.0%),比 Pythia 高出 5.4%(最高达 177.9%)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PARS: A Pattern-Aware Spatial Data Prefetcher Supporting Multiple Region Sizes
Hardware data prefetching is a well-studied technique to bridge the processor-memory performance gap. Bit-pattern-based prefetchers are one of the most promising spatial data prefetchers that achieve substantial performance gains. In bit-pattern-based prefetchers, the region size is a crucial parameter, which denotes the memory size that can be recorded by a pattern or prefetched by a prediction. However, existing bit-pattern-based prefetchers only support one fixed region size. Our experiment shows that the fixed region size cannot meet the requirements for numerous applications and leads to suboptimal performance and high hardware overhead. In this article, we propose PARS, a pattern-aware spatial data prefetcher supporting multiple region sizes. The key idea of PARS is that it supports multiple region sizes, enabling it to simultaneously enhance application performance while reducing the hardware overhead. Moreover, PARS supports dynamically switching appropriate region sizes for different patterns through an adaptive RS-switching mechanism. We evaluated PARS on numerous workloads and results show that PARS provides an average performance improvement of 40.6% over a baseline with no data prefetchers and outperforms the two state-of-the-art prefetchers Bingo by 2.1% (up to 24.4%) and Pythia by 3.9% (up to 111.2%) in the single-core system. In the four-core system, PARS outperforms Bingo by 5.0% (up to 66.0%) and Pythia by 5.4% (up to 177.9%).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.60
自引率
13.80%
发文量
500
审稿时长
7 months
期刊介绍: The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.
期刊最新文献
Table of Contents NOVELLA: Nonvolatile Last-Level Cache Bypass for Optimizing Off-Chip Memory Energy FreePrune: An Automatic Pruning Framework Across Various Granularities Based on Training-Free Evaluation CaBaFL: Asynchronous Federated Learning via Hierarchical Cache and Feature Balance MaskedHLS: Domain-Specific High-Level Synthesis of Masked Cryptographic Designs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1