2009 IEEE International Symposium on Workload Characterization (IISWC)最新文献

英文中文

Logicalization of communication traces from parallel execution 从并行执行开始的通信跟踪的逻辑化

2009 IEEE International Symposium on Workload Characterization (IISWC)

Pub Date : 2009-10-04 DOI: 10.1109/IISWC.2009.5306796

Qiang Xu, J. Subhlok, Rong Zheng, S. Voss

Communication traces are integral to performance modeling and analysis of parallel programs. However, execution on a large number of nodes results in a large trace volume that is cumbersome and expensive to analyze. This paper presents an automatic framework to convert all process traces corresponding to the parallel execution of an SPMD MPI program into a single logical trace. First, the application communication matrix is generated from process traces. Next, topology identification is performed based on the underlying communication structure and independent of the way ranks (or numbers) are assigned to processes. Finally, message exchanges between physical processes are converted into logical message exchanges that represent similar message exchanges across all processes, resulting in a trace volume reduction approximately equal to the number of processes executing the application. This logicalization framework has been implemented and the results report on its performance and effectiveness.

通信轨迹是并行程序性能建模和分析不可或缺的一部分。但是，在大量节点上执行会导致大量的跟踪量，分析起来既麻烦又昂贵。本文提出了一种将SPMD MPI程序并行执行的所有进程跟踪转换为单个逻辑跟踪的自动框架。首先，根据过程跟踪生成应用程序通信矩阵。接下来，拓扑识别是基于底层通信结构执行的，与给流程分配等级(或编号)的方式无关。最后，将物理进程之间的消息交换转换为逻辑消息交换，逻辑消息交换表示跨所有进程的类似消息交换，从而减少的跟踪量大约等于执行应用程序的进程数量。该逻辑化框架已经实现，结果报告了其性能和有效性。

引用次数: 14

Performance characterization and cache-aware core scheduling in a virtualized multi-core server under 10GbE 10GbE下虚拟化多核服务器中的性能表征和缓存感知核心调度

2009 IEEE International Symposium on Workload Characterization (IISWC)

Pub Date : 2009-10-04 DOI: 10.1109/IISWC.2009.5306784

Danhua Guo, Guangdeng Liao, L. Bhuyan

Virtual Machine (VM) technology is experiencing a resurgent interest as the ubiquitous multi-core processors have become the de facto configuration on modern web servers. Multicore servers potentially provide sufficient physical resources to realize VM's benefits including performance isolation, manageability and scalability. However, the network performance of virtualized multi-core servers falls short of expectation. It is therefore important to understand the overhead implications. In this paper, we evaluate the network performance of a virtualized multi-core server using a TCP streaming microbenchmark (Iperf) and SPECweb2005. We first motivate our research by presenting the performance gap between native and virtualized environment. We then break down the overhead from an architectural viewpoint and show that the cache topology greatly influences the performance. We also profile the Virtual Machine Monitor (VMM) at a function level to illustrate that functions in the current version of the Xen scheduler are the major contributors to the poor utilization of cache topology. Consequently, we implement a static onloading scheme to separate interrupt handling from application processes and execute them on cores with cache affinity. Based on the observed benefits, we modify the Xen scheduler to migrate virtual CPUs dynamically to exploit the cache topology. Our results show that the VM performance improves by an average of 12% for Iperf and 15% for SPECweb2005.

随着无处不在的多核处理器成为现代web服务器的实际配置，虚拟机(VM)技术正经历着人们对其兴趣的复兴。多核服务器可能提供足够的物理资源来实现VM的优势，包括性能隔离、可管理性和可伸缩性。但是，虚拟化多核服务器的网络性能没有达到预期。因此，理解开销含义是很重要的。在本文中，我们使用TCP流微基准测试(Iperf)和SPECweb2005来评估虚拟化多核服务器的网络性能。我们首先通过展示本机环境和虚拟化环境之间的性能差距来激励我们的研究。然后，我们从体系结构的角度分析了开销，并展示了缓存拓扑对性能的巨大影响。我们还在功能级别对虚拟机监视器(VMM)进行了分析，以说明当前版本的Xen调度器中的功能是导致缓存拓扑利用率低下的主要原因。因此，我们实现了一个静态加载方案，将中断处理从应用程序进程中分离出来，并在具有缓存关联的核心上执行它们。根据观察到的好处，我们修改Xen调度器，以动态迁移虚拟cpu，以利用缓存拓扑。我们的结果表明，Iperf和SPECweb2005的VM性能平均提高了12%和15%。

{"title":"Performance characterization and cache-aware core scheduling in a virtualized multi-core server under 10GbE","authors":"Danhua Guo, Guangdeng Liao, L. Bhuyan","doi":"10.1109/IISWC.2009.5306784","DOIUrl":"https://doi.org/10.1109/IISWC.2009.5306784","url":null,"abstract":"Virtual Machine (VM) technology is experiencing a resurgent interest as the ubiquitous multi-core processors have become the de facto configuration on modern web servers. Multicore servers potentially provide sufficient physical resources to realize VM's benefits including performance isolation, manageability and scalability. However, the network performance of virtualized multi-core servers falls short of expectation. It is therefore important to understand the overhead implications. In this paper, we evaluate the network performance of a virtualized multi-core server using a TCP streaming microbenchmark (Iperf) and SPECweb2005. We first motivate our research by presenting the performance gap between native and virtualized environment. We then break down the overhead from an architectural viewpoint and show that the cache topology greatly influences the performance. We also profile the Virtual Machine Monitor (VMM) at a function level to illustrate that functions in the current version of the Xen scheduler are the major contributors to the poor utilization of cache topology. Consequently, we implement a static onloading scheme to separate interrupt handling from application processes and execute them on cores with cache affinity. Based on the observed benefits, we modify the Xen scheduler to migrate virtual CPUs dynamically to exploit the cache topology. Our results show that the VM performance improves by an average of 12% for Iperf and 15% for SPECweb2005.","PeriodicalId":387816,"journal":{"name":"2009 IEEE International Symposium on Workload Characterization (IISWC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124443967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Image feature extraction for mobile processors 移动处理器图像特征提取

2009 IEEE International Symposium on Workload Characterization (IISWC)

Pub Date : 2009-10-04 DOI: 10.1109/IISWC.2009.5306789

M. Murphy, K. Keutzer, Hong Wang

High-quality cameras are a standard feature of mobile platforms, but the computational capabilities of mobile processors limit the applications capable of exploiting them. Emerging mobile application domains, for example Mobile Augmented Reality (MAR), rely heavily on techniques from computer vision, requiring sophisticated analyses of images followed by higher-level processing. An important class of image analyses is the detection of sparse localized interest points. The Scale Invariant Feature Transform (SIFT), the most popular such analysis, is computationally representative of many other feature extractors. Using a novel code-generation framework, we demonstrate that a small set of optimizations produce high-performance SIFT implementations for three very different architectures: a laptop CPU (Core 2 Duo), a low-power CPU (Intel Atom), and a low-power GPU (GMA X3100). We improve the runtime of SIFT by more than 5X on our low-power architectures, enabling a low-power mobile device to extract SIFT features up to 63% as fast as the laptop CPU.

高质量的相机是移动平台的标准功能，但移动处理器的计算能力限制了能够利用它们的应用程序。新兴的移动应用领域，例如移动增强现实(MAR)，严重依赖于计算机视觉技术，需要对图像进行复杂的分析，然后进行更高级的处理。一类重要的图像分析是检测稀疏的局部兴趣点。尺度不变特征变换(SIFT)是这类分析中最流行的，在计算上代表了许多其他特征提取器。使用一种新的代码生成框架，我们演示了一组优化为三种非常不同的体系结构产生高性能SIFT实现:笔记本电脑CPU (Core 2 Duo)、低功耗CPU (Intel Atom)和低功耗GPU (GMA X3100)。在我们的低功耗架构上，我们将SIFT的运行时间提高了5倍以上，使低功耗移动设备提取SIFT特征的速度达到笔记本电脑CPU的63%。

引用次数: 15

Rodinia: A benchmark suite for heterogeneous computing Rodinia:异构计算的基准套件

2009 IEEE International Symposium on Workload Characterization (IISWC)

Pub Date : 2009-10-04 DOI: 10.1109/IISWC.2009.5306797

Shuai Che, Michael Boyer, Jiayuan Meng, D. Tarjan, J. Sheaffer, Sang-Ha Lee, K. Skadron

This paper presents and characterizes Rodinia, a benchmark suite for heterogeneous computing. To help architects study emerging platforms such as GPUs (Graphics Processing Units), Rodinia includes applications and kernels which target multi-core CPU and GPU platforms. The choice of applications is inspired by Berkeley's dwarf taxonomy. Our characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.

本文介绍并描述了Rodinia，一个异构计算的基准套件。为了帮助架构师研究GPU(图形处理单元)等新兴平台，Rodinia包括针对多核CPU和GPU平台的应用程序和内核。应用程序的选择灵感来自伯克利的侏儒分类法。我们的描述表明，Rodinia基准测试涵盖了广泛的并行通信模式、同步技术和功耗，并得出了一些重要的体系结构见解，例如内存带宽限制的重要性日益增加以及随之而来的数据布局的重要性。

引用次数: 2693

SD-VBS: The San Diego Vision Benchmark Suite SD-VBS:圣地亚哥视觉基准套件

2009 IEEE International Symposium on Workload Characterization (IISWC)

Pub Date : 2009-10-01 DOI: 10.1109/IISWC.2009.5306794

Sravanthi Kota Venkata, Ikkjin Ahn, Donghwan Jeon, Anshuman Gupta, Christopher M. Louie, Saturnino Garcia, Serge J. Belongie, M. Taylor

In the era of multi-core, computer vision has emerged as an exciting application area which promises to continue to drive the demand for both more powerful and more energy efficient processors. Although there is still a long way to go, vision has matured significantly over the last few decades, and the list of applications that are useful to end users continues to grow. The parallelism inherent in vision applications makes them a promising workload for multi-core and many-core processors.

在多核时代，计算机视觉已经成为一个令人兴奋的应用领域，它承诺将继续推动对更强大、更节能的处理器的需求。尽管还有很长的路要走，但在过去的几十年里，视觉已经非常成熟，对最终用户有用的应用程序列表也在不断增长。视觉应用程序固有的并行性使其成为多核和多核处理器的有前途的工作负载。

引用次数: 208

IISWC 2009 organizing committee IISWC 2009组委会

2009 IEEE International Symposium on Workload Characterization (IISWC)

Pub Date : 1900-01-01 DOI: 10.1109/iiswc.2009.5306805

Tom Conte, Georgia Tech, David August, Hillery Hunter, David Kaeli, Charles Levine

v Program Committee David August, Princeton Leslie Barnes, AMD Pradeep Dubey, Intel Lieven Eeckhout, Ghent Paolo Faraboschi, HP Jim Held, Intel Michael Hind, IBM Research Hillery Hunter, IBM Research David Kaeli, Northeastern Hyesoon Kim, Georgia Tech Hsien-Hsin Lee, Georgia Tech Charles Levine, Microsoft Markus Levy, EEMBC Jose Martinez, Cornell Onur Mutlu, CMU Nacho Navarro, UPC JoAnn Paul, Virginia Tech Sanjay Patel, Illinois Yale Patt, UT-Austin Eric Rotenberg, NC State Ravi Soundararajan, VMWare Wayne Wolf, Georgia Tech

v项目委员会David August，普林斯顿大学Leslie Barnes, AMD公司Pradeep Dubey，英特尔公司Lieven Eeckhout，根特公司Paolo Faraboschi，惠普公司Jim Held，英特尔公司Michael Hind, IBM研究院Hillery Hunter, IBM研究院David Kaeli，东北大学Kim Hyesoon，佐治亚理工学院Lee Hsien-Hsin，佐治亚理工学院Charles Levine，微软公司Markus Levy，佐治亚理工大学Jose Martinez，康奈尔大学Onur Mutlu, CMU纳乔纳瓦罗，UPC公司JoAnn Paul，弗吉尼亚理工大学Sanjay Patel，伊利诺伊大学Patt, UT-Austin大学Eric Rotenberg，北卡罗来纳州立大学Ravi Soundararajan，VMWare Wayne Wolf, Georgia Tech

引用次数: 0

IISWC 2009 reviewers IISWC 2009审稿人

2009 IEEE International Symposium on Workload Characterization (IISWC)

Pub Date : 1900-01-01 DOI: 10.1109/iiswc.2009.5306802

David August, L. Barnes, Pradeep Dubey, L. Eeckhout, P. Faraboschi, J. Held, M. Hind, Sunpyo Hong, Hillery Hunter, D. Kaeli, Hyesoon Kim, Minjang Kim YoonguKim, Nagesh B. Lakshminarayana, Hsien-Hsin Lee, Jaekyu Lee, Charles Levine, M. Levy, J. Martínez, OnurMutlu Nacho Navarro, J. Paul, S. Patel, Y. Patt, E. Rotenberg, Ravi Soundararajan

引用次数: 0

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2009 IEEE International Symposium on Workload Characterization (IISWC)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀