在Intel Xeon Phi处理器上表征数据分析工作负载

2015 IEEE International Symposium on Workload Characterization Pub Date : 2015-10-04 DOI:10.1109/IISWC.2015.20

Biwei Xie, Xu Liu, Jianfeng Zhan, Zhen Jia, Yuqing Zhu, Lei Wang, Lixin Zhang

{"title":"在Intel Xeon Phi处理器上表征数据分析工作负载","authors":"Biwei Xie, Xu Liu, Jianfeng Zhan, Zhen Jia, Yuqing Zhu, Lei Wang, Lixin Zhang","doi":"10.1109/IISWC.2015.20","DOIUrl":null,"url":null,"abstract":"With the growing computation demands of data analytics, heterogeneous architectures become popular for their support of high parallelism. Intel Xeon Phi, a many-core coprocessor originally designed for high performance computing applications, is promising for data analytics workloads. However, to the best of knowledge, there is no prior work systematically characterizing the performance of data analytics workloads on Xeon Phi. It is difficult to design a benchmark suite to represent the behavior of data analytics workloads on Xeon Phi. The main challenge resides in fully exploiting Xeon Phi's features, such as long SIMD instruction, simultaneous multithreading, and complex memory hierarchy. To address this issue, we develop Big Data Bench-Phi, which consists of seven representative data analytics workloads. All of these benchmarks are optimized for Xeon Phi and able to characterize Xeon Phi's support for data analytics workloads. Compared with a 24-core Xeon E5-2620 machine, Big Data Bench-Phi achieves reasonable speedups for most of its benchmarks, ranging from 1.5 to 23.4X. Our experiments show that workloads working on high-dimensional matrices can significantly benefit from instruction- and thread-level parallelism on Xeon Phi.","PeriodicalId":142698,"journal":{"name":"2015 IEEE International Symposium on Workload Characterization","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Characterizing Data Analytics Workloads on Intel Xeon Phi\",\"authors\":\"Biwei Xie, Xu Liu, Jianfeng Zhan, Zhen Jia, Yuqing Zhu, Lei Wang, Lixin Zhang\",\"doi\":\"10.1109/IISWC.2015.20\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the growing computation demands of data analytics, heterogeneous architectures become popular for their support of high parallelism. Intel Xeon Phi, a many-core coprocessor originally designed for high performance computing applications, is promising for data analytics workloads. However, to the best of knowledge, there is no prior work systematically characterizing the performance of data analytics workloads on Xeon Phi. It is difficult to design a benchmark suite to represent the behavior of data analytics workloads on Xeon Phi. The main challenge resides in fully exploiting Xeon Phi's features, such as long SIMD instruction, simultaneous multithreading, and complex memory hierarchy. To address this issue, we develop Big Data Bench-Phi, which consists of seven representative data analytics workloads. All of these benchmarks are optimized for Xeon Phi and able to characterize Xeon Phi's support for data analytics workloads. Compared with a 24-core Xeon E5-2620 machine, Big Data Bench-Phi achieves reasonable speedups for most of its benchmarks, ranging from 1.5 to 23.4X. Our experiments show that workloads working on high-dimensional matrices can significantly benefit from instruction- and thread-level parallelism on Xeon Phi.\",\"PeriodicalId\":142698,\"journal\":{\"name\":\"2015 IEEE International Symposium on Workload Characterization\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Symposium on Workload Characterization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IISWC.2015.20\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Symposium on Workload Characterization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC.2015.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

随着数据分析计算需求的不断增长，异构架构因其对高并行性的支持而受到欢迎。英特尔至强协处理器是一款多核协处理器，最初是为高性能计算应用而设计的，有望用于数据分析工作负载。然而，据我所知，目前还没有研究系统地描述Xeon Phi协处理器上数据分析工作负载的性能。很难设计一个基准套件来表示Xeon Phi处理器上数据分析工作负载的行为。主要的挑战在于充分利用Xeon Phi处理器的特性，如长SIMD指令、同时多线程和复杂的内存层次结构。为了解决这个问题，我们开发了Big Data Bench-Phi，它由七个代表性的数据分析工作负载组成。所有这些基准测试都针对至强协处理器进行了优化，并能够表征至强协处理器对数据分析工作负载的支持。与24核至强E5-2620机器相比，大数据Bench-Phi在大多数基准测试中都达到了合理的速度，范围从1.5到23.4倍。我们的实验表明，处理高维矩阵的工作负载可以显著受益于Xeon Phi处理器上的指令级和线程级并行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Characterizing Data Analytics Workloads on Intel Xeon Phi

With the growing computation demands of data analytics, heterogeneous architectures become popular for their support of high parallelism. Intel Xeon Phi, a many-core coprocessor originally designed for high performance computing applications, is promising for data analytics workloads. However, to the best of knowledge, there is no prior work systematically characterizing the performance of data analytics workloads on Xeon Phi. It is difficult to design a benchmark suite to represent the behavior of data analytics workloads on Xeon Phi. The main challenge resides in fully exploiting Xeon Phi's features, such as long SIMD instruction, simultaneous multithreading, and complex memory hierarchy. To address this issue, we develop Big Data Bench-Phi, which consists of seven representative data analytics workloads. All of these benchmarks are optimized for Xeon Phi and able to characterize Xeon Phi's support for data analytics workloads. Compared with a 24-core Xeon E5-2620 machine, Big Data Bench-Phi achieves reasonable speedups for most of its benchmarks, ranging from 1.5 to 23.4X. Our experiments show that workloads working on high-dimensional matrices can significantly benefit from instruction- and thread-level parallelism on Xeon Phi.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE International Symposium on Workload Characterization

自引率

0.00%

发文量

期刊最新文献

Fast Computational GPU Design with GT-Pin On Power-Performance Characterization of Concurrent Throughput Kernels CRONO: A Benchmark Suite for Multithreaded Graph Algorithms Executing on Futuristic Multicores Exploring Parallel Programming Models for Heterogeneous Computing Systems Revealing Critical Loads and Hidden Data Locality in GPGPU Applications