Memory Dependence Speculation for Simultaneous Multi-Threading Processors

Pub Date : 2024-01-17 DOI:10.1142/s0129626424500014
Jonathan Flores, Wei-Ming Lin
{"title":"Memory Dependence Speculation for Simultaneous Multi-Threading Processors","authors":"Jonathan Flores, Wei-Ming Lin","doi":"10.1142/s0129626424500014","DOIUrl":null,"url":null,"abstract":"Simultaneous Multi-Threading (SMT) processors provide improvement over the traditional out-of-order superscalar architecture by allowing instructions from several independent threads to execute out-of-order concurrently. Maintaining the accuracy of values read from and written to memory is a great bottleneck for processor performance, as loads must stall execution until all prior store addresses are known or risk reading invalid data. Prior research in this area has mainly focused on superscalar architecture, as such, it is only natural to extend memory dependence speculation techniques to an SMT architecture. In this paper, we allow for loads among threads to execute as soon as their addresses are resolved without checking for prior memory address conflicts. Stores also perform a check on all later loads to see if any read was too early due to an address match, if so, the processor state is recovered, and the load re-issued. This aggressive technique allows for the greatest potential instructions per clock cycle gains over predictive techniques as the pipeline is never stalled for loads. Our simulations show that an overall IPC gain up to 12% and 10% is possible for both 4-threaded and 8-threaded workloads respectively. Conversely, a maximum overall IPC loss of at least 2.3% and 2% for 4-threaded and 8-threaded workloads respectively was also observed.","PeriodicalId":0,"journal":{"name":"","volume":" January","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0129626424500014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Simultaneous Multi-Threading (SMT) processors provide improvement over the traditional out-of-order superscalar architecture by allowing instructions from several independent threads to execute out-of-order concurrently. Maintaining the accuracy of values read from and written to memory is a great bottleneck for processor performance, as loads must stall execution until all prior store addresses are known or risk reading invalid data. Prior research in this area has mainly focused on superscalar architecture, as such, it is only natural to extend memory dependence speculation techniques to an SMT architecture. In this paper, we allow for loads among threads to execute as soon as their addresses are resolved without checking for prior memory address conflicts. Stores also perform a check on all later loads to see if any read was too early due to an address match, if so, the processor state is recovered, and the load re-issued. This aggressive technique allows for the greatest potential instructions per clock cycle gains over predictive techniques as the pipeline is never stalled for loads. Our simulations show that an overall IPC gain up to 12% and 10% is possible for both 4-threaded and 8-threaded workloads respectively. Conversely, a maximum overall IPC loss of at least 2.3% and 2% for 4-threaded and 8-threaded workloads respectively was also observed.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
同步多线程处理器的内存依赖性推测
同时多线程(SMT)处理器允许多个独立线程的指令同时在无序状态下执行,从而改进了传统的无序超标量架构。保持从内存读取和写入内存的值的准确性是处理器性能的一大瓶颈,因为负载必须停止执行,直到知道所有先前的存储地址,否则就有可能读取无效数据。这一领域的前期研究主要集中在超标量架构上,因此,将内存依赖性推测技术扩展到 SMT 架构是很自然的事情。在本文中,我们允许线程间的加载在其地址解析后立即执行,而无需检查之前的内存地址冲突。此外,我们还对所有后续加载执行检查,以确定是否有任何读取因地址匹配而过早,如果有,则恢复处理器状态并重新加载。与预测技术相比,由于流水线不会因为加载而停滞,因此这种积极的技术可以在每个时钟周期内实现最大的潜在指令增益。我们的模拟结果表明,4 线程和 8 线程工作负载的总体 IPC 增益分别可达 12% 和 10%。相反,我们也观察到 4 线程和 8 线程工作负载的最大总体 IPC 损失分别至少为 2.3% 和 2%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1