Data analytics workloads: Characterization and similarity analysis

2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC) Pub Date : 2014-12-01 DOI:10.1109/PCCC.2014.7017065

Reena Panda, L. John

{"title":"Data analytics workloads: Characterization and similarity analysis","authors":"Reena Panda, L. John","doi":"10.1109/PCCC.2014.7017065","DOIUrl":null,"url":null,"abstract":"Performance of modern day computer systems greatly depends on the wide range of workloads, which run on the systems. Thus, a representative set of workloads, representing the different classes of real-world applications, need to be used by computer designers and researchers for processor design-space evaluation studies. While a number of different benchmark suites are available, a few common benchmark suites like the SPEC CPU2006 benchmarks are widely used by researchers either due to ease of setup, or simulation time constraints etc. However, as the popular benchmarks such as SPEC CPU2006 benchmarks do not capture the characteristics of the wide variety of emerging real-world applications, using them as the basis for performance evaluation may lead to either suboptimal designs or misleading results. In this paper, we characterize the behavior of the data analytics workloads, an important class of emerging applications, and perform a systematic similarity analysis with the popular SPEC CPU2006 & SPECjbb2013 benchmarks suites. To characterize the workloads, we use hardware performance counter based measurements and a variety of extracted micro-architecture independent workload characteristics. Then, we use statistical data analysis techniques, namely principal component analysis and clustering techniques, to analyze the similarity/dissimilarity among these different classes of applications. In this paper, we demonstrate the inherent differences between the characteristics of the different classes of applications and how to arrive at meaningful subsets of benchmarks, which will help in faster and more accurate targeted early hardware system performance evaluation.","PeriodicalId":105442,"journal":{"name":"2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PCCC.2014.7017065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

Abstract

Performance of modern day computer systems greatly depends on the wide range of workloads, which run on the systems. Thus, a representative set of workloads, representing the different classes of real-world applications, need to be used by computer designers and researchers for processor design-space evaluation studies. While a number of different benchmark suites are available, a few common benchmark suites like the SPEC CPU2006 benchmarks are widely used by researchers either due to ease of setup, or simulation time constraints etc. However, as the popular benchmarks such as SPEC CPU2006 benchmarks do not capture the characteristics of the wide variety of emerging real-world applications, using them as the basis for performance evaluation may lead to either suboptimal designs or misleading results. In this paper, we characterize the behavior of the data analytics workloads, an important class of emerging applications, and perform a systematic similarity analysis with the popular SPEC CPU2006 & SPECjbb2013 benchmarks suites. To characterize the workloads, we use hardware performance counter based measurements and a variety of extracted micro-architecture independent workload characteristics. Then, we use statistical data analysis techniques, namely principal component analysis and clustering techniques, to analyze the similarity/dissimilarity among these different classes of applications. In this paper, we demonstrate the inherent differences between the characteristics of the different classes of applications and how to arrive at meaningful subsets of benchmarks, which will help in faster and more accurate targeted early hardware system performance evaluation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

数据分析工作负载:特征和相似性分析

现代计算机系统的性能在很大程度上取决于系统上运行的各种工作负载。因此，计算机设计人员和研究人员需要使用一组具有代表性的工作负载来进行处理器设计空间评估研究，这些工作负载代表了实际应用程序的不同类别。虽然有许多不同的基准测试套件可用，但一些常见的基准测试套件，如SPEC CPU2006基准测试，由于易于设置或模拟时间限制等原因，被研究人员广泛使用。然而，由于SPEC CPU2006等流行的基准测试并没有捕捉到各种新兴的实际应用程序的特征，因此使用它们作为性能评估的基础可能会导致次优设计或误导性结果。在本文中，我们描述了数据分析工作负载的行为特征，这是一类重要的新兴应用程序，并使用流行的SPEC CPU2006和SPECjbb2013基准套件进行了系统的相似性分析。为了描述工作负载，我们使用基于硬件性能计数器的测量和各种提取的独立于微架构的工作负载特征。然后，我们使用统计数据分析技术，即主成分分析和聚类技术，来分析这些不同类别的应用程序之间的相似/不相似。在本文中，我们展示了不同类别应用程序特征之间的内在差异，以及如何获得有意义的基准子集，这将有助于更快，更准确地进行有针对性的早期硬件系统性能评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)

自引率

0.00%

发文量