Data analytics workloads: Characterization and similarity analysis

Reena Panda, L. John
{"title":"Data analytics workloads: Characterization and similarity analysis","authors":"Reena Panda, L. John","doi":"10.1109/PCCC.2014.7017065","DOIUrl":null,"url":null,"abstract":"Performance of modern day computer systems greatly depends on the wide range of workloads, which run on the systems. Thus, a representative set of workloads, representing the different classes of real-world applications, need to be used by computer designers and researchers for processor design-space evaluation studies. While a number of different benchmark suites are available, a few common benchmark suites like the SPEC CPU2006 benchmarks are widely used by researchers either due to ease of setup, or simulation time constraints etc. However, as the popular benchmarks such as SPEC CPU2006 benchmarks do not capture the characteristics of the wide variety of emerging real-world applications, using them as the basis for performance evaluation may lead to either suboptimal designs or misleading results. In this paper, we characterize the behavior of the data analytics workloads, an important class of emerging applications, and perform a systematic similarity analysis with the popular SPEC CPU2006 & SPECjbb2013 benchmarks suites. To characterize the workloads, we use hardware performance counter based measurements and a variety of extracted micro-architecture independent workload characteristics. Then, we use statistical data analysis techniques, namely principal component analysis and clustering techniques, to analyze the similarity/dissimilarity among these different classes of applications. In this paper, we demonstrate the inherent differences between the characteristics of the different classes of applications and how to arrive at meaningful subsets of benchmarks, which will help in faster and more accurate targeted early hardware system performance evaluation.","PeriodicalId":105442,"journal":{"name":"2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PCCC.2014.7017065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

Abstract

Performance of modern day computer systems greatly depends on the wide range of workloads, which run on the systems. Thus, a representative set of workloads, representing the different classes of real-world applications, need to be used by computer designers and researchers for processor design-space evaluation studies. While a number of different benchmark suites are available, a few common benchmark suites like the SPEC CPU2006 benchmarks are widely used by researchers either due to ease of setup, or simulation time constraints etc. However, as the popular benchmarks such as SPEC CPU2006 benchmarks do not capture the characteristics of the wide variety of emerging real-world applications, using them as the basis for performance evaluation may lead to either suboptimal designs or misleading results. In this paper, we characterize the behavior of the data analytics workloads, an important class of emerging applications, and perform a systematic similarity analysis with the popular SPEC CPU2006 & SPECjbb2013 benchmarks suites. To characterize the workloads, we use hardware performance counter based measurements and a variety of extracted micro-architecture independent workload characteristics. Then, we use statistical data analysis techniques, namely principal component analysis and clustering techniques, to analyze the similarity/dissimilarity among these different classes of applications. In this paper, we demonstrate the inherent differences between the characteristics of the different classes of applications and how to arrive at meaningful subsets of benchmarks, which will help in faster and more accurate targeted early hardware system performance evaluation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
数据分析工作负载:特征和相似性分析
现代计算机系统的性能在很大程度上取决于系统上运行的各种工作负载。因此,计算机设计人员和研究人员需要使用一组具有代表性的工作负载来进行处理器设计空间评估研究,这些工作负载代表了实际应用程序的不同类别。虽然有许多不同的基准测试套件可用,但一些常见的基准测试套件,如SPEC CPU2006基准测试,由于易于设置或模拟时间限制等原因,被研究人员广泛使用。然而,由于SPEC CPU2006等流行的基准测试并没有捕捉到各种新兴的实际应用程序的特征,因此使用它们作为性能评估的基础可能会导致次优设计或误导性结果。在本文中,我们描述了数据分析工作负载的行为特征,这是一类重要的新兴应用程序,并使用流行的SPEC CPU2006和SPECjbb2013基准套件进行了系统的相似性分析。为了描述工作负载,我们使用基于硬件性能计数器的测量和各种提取的独立于微架构的工作负载特征。然后,我们使用统计数据分析技术,即主成分分析和聚类技术,来分析这些不同类别的应用程序之间的相似/不相似。在本文中,我们展示了不同类别应用程序特征之间的内在差异,以及如何获得有意义的基准子集,这将有助于更快,更准确地进行有针对性的早期硬件系统性能评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Performance and energy evaluation of RESTful web services in Raspberry Pi Proximity-driven social interactions and their impact on the throughput scaling of wireless networks POLA: A privacy-preserving protocol for location-based real-time advertising Replica placement in content delivery networks with stochastic demands and M/M/1 servers Combinatorial JPT based on orthogonal beamforming for two-cell cooperation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1