MultiPoint: Enabling scalable pre-silicon performance evaluation for multi-task workloads

Chenji Han , Xinyu Li , Feng Xue , Weitong Wang , Yuxuan Wu , Wenxiang Wang , Fuxin Zhang
{"title":"MultiPoint: Enabling scalable pre-silicon performance evaluation for multi-task workloads","authors":"Chenji Han ,&nbsp;Xinyu Li ,&nbsp;Feng Xue ,&nbsp;Weitong Wang ,&nbsp;Yuxuan Wu ,&nbsp;Wenxiang Wang ,&nbsp;Fuxin Zhang","doi":"10.1016/j.tbench.2025.100189","DOIUrl":null,"url":null,"abstract":"<div><div>With the core numbers integrated within single processors growing and the fast development of cloud computing, performance evaluation for multi-core systems is increasingly crucial. It is typically conducted by executing multi-task workloads, exemplified by SPEC CPU Rate, to measure metrics like system’s throughput. In response, several sampling-based methods have been developed for their pre-silicon performance evaluation. Nevertheless, these methods involve directly capturing multi-task checkpoints, which presents scalability issues of significant storage and time overheads. Therefore, enabling more scalable performance evaluation remains a critical problem.</div><div>In this work, we propose MultiPoint to enable scalable pre-silicon performance evaluation for multi-task workloads. It is noted that in the multi-task workloads of interest, each task executes independently without inter-task communication. Therefore, MultiPoint is motivated to construct the required multi-task checkpoints by recovering multiple single-task checkpoints across different cores and guarantee their smooth execution through address remapping and shuffling. We implemented MultiPoint on the Emulator Accelerator and assessed its evaluation accuracy against its post-silicon Loongson 3A6000 processor. Using SPEC CPU 2017 as the benchmark, MultiPoint achieved the estimation errors of 6.20%, 5.45%, and 6.99% for Rate 2, Rate 4, and Rate 8, respectively, achieving comparable accuracy compared to direct multi-task checkpointing but in a more scalable manner with substantially 86.0% lower storage and 93.7% less time overheads.</div></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 3","pages":"Article 100189"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S277248592500002X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

With the core numbers integrated within single processors growing and the fast development of cloud computing, performance evaluation for multi-core systems is increasingly crucial. It is typically conducted by executing multi-task workloads, exemplified by SPEC CPU Rate, to measure metrics like system’s throughput. In response, several sampling-based methods have been developed for their pre-silicon performance evaluation. Nevertheless, these methods involve directly capturing multi-task checkpoints, which presents scalability issues of significant storage and time overheads. Therefore, enabling more scalable performance evaluation remains a critical problem.
In this work, we propose MultiPoint to enable scalable pre-silicon performance evaluation for multi-task workloads. It is noted that in the multi-task workloads of interest, each task executes independently without inter-task communication. Therefore, MultiPoint is motivated to construct the required multi-task checkpoints by recovering multiple single-task checkpoints across different cores and guarantee their smooth execution through address remapping and shuffling. We implemented MultiPoint on the Emulator Accelerator and assessed its evaluation accuracy against its post-silicon Loongson 3A6000 processor. Using SPEC CPU 2017 as the benchmark, MultiPoint achieved the estimation errors of 6.20%, 5.45%, and 6.99% for Rate 2, Rate 4, and Rate 8, respectively, achieving comparable accuracy compared to direct multi-task checkpointing but in a more scalable manner with substantially 86.0% lower storage and 93.7% less time overheads.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
随着集成在单核处理器中的内核数量不断增加以及云计算的快速发展,多核系统的性能评估变得越来越重要。评估通常通过执行多任务工作负载(如 SPEC CPU Rate)来衡量系统吞吐量等指标。为此,人们开发了几种基于采样的方法,用于硅前性能评估。然而,这些方法涉及直接捕获多任务检查点,会带来大量存储和时间开销的可扩展性问题。因此,实现更具可扩展性的性能评估仍然是一个关键问题。在这项工作中,我们提出了 MultiPoint,以实现多任务工作负载的可扩展硅前性能评估。我们注意到,在所关注的多任务工作负载中,每个任务都是独立执行的,任务间没有通信。因此,MultiPoint 的动机是通过在不同内核间恢复多个单任务检查点来构建所需的多任务检查点,并通过地址重映射和洗牌保证其顺利执行。我们在仿真器加速器上实施了MultiPoint,并针对其硅片后的Loongson 3A6000处理器评估了其评估精度。以 SPEC CPU 2017 为基准,MultiPoint 对速率 2、速率 4 和速率 8 的估算误差分别为 6.20%、5.45% 和 6.99%,与直接多任务检查点相比,精度相当,但扩展性更强,存储开销大幅降低 86.0%,时间开销降低 93.7%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
4.80
自引率
0.00%
发文量
0
期刊最新文献
Evaluation of mechanical properties of natural fiber based polymer composite Could bibliometrics reveal top science and technology achievements and researchers? The case for evaluatology-based science and technology evaluation Exploring the Orca Predation Algorithm for Economic Dispatch Optimization in Power Systems Corrigendum regarding missing Declaration Conflict-of -Interests statements in previously published articles Five Axioms of Things
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1