首页 > 最新文献

2009 IEEE International Symposium on Workload Characterization (IISWC)最新文献

英文 中文
Message from the program chair 来自节目主持人的信息
Pub Date : 2021-12-01 DOI: 10.1109/DEXA.2006.87
T. Conte
It is my pleasure to present to you the program of the 2009 fifth annual IEEE International Symposium on Workload Characterization. We received 56 papers, of which we accepted 23 to the symposium. Each paper received on average 3.5 reviews. The program committee worked tirelessly to do these reviews, largely by themselves, or with the help of colleagues in a few rare cases. The program committee then met in Austin, TX in person and via teleconference in June to do the hard work of deciding which papers made the cut. This was no easy process, and I am indebted to many on the program committee for their hard work. In particular, I am quite grateful to those who helped shepherd papers that needed a few light revisions. These include of IBM Research, of Georgia Tech, Leslie Barnes of AMD and David Kaeli of Northeastern University.
我很高兴向大家介绍2009年第五届IEEE工作量表征国际研讨会的议程。我们收到56篇论文,其中23篇被我们接受。每篇论文平均收到3.5篇评论。项目委员会不知疲倦地做这些审查,主要是自己做,或者在少数情况下得到同事的帮助。随后,项目委员会于6月在德克萨斯州奥斯汀亲自召开会议,并通过电话会议来决定哪些论文入围。这是一个不容易的过程,我要感谢项目委员会的许多人的辛勤工作。特别是,我非常感谢那些帮助指导需要少量修改的论文的人。其中包括IBM研究院、佐治亚理工学院、AMD公司的Leslie Barnes和东北大学的David Kaeli。
{"title":"Message from the program chair","authors":"T. Conte","doi":"10.1109/DEXA.2006.87","DOIUrl":"https://doi.org/10.1109/DEXA.2006.87","url":null,"abstract":"It is my pleasure to present to you the program of the 2009 fifth annual IEEE International Symposium on Workload Characterization. We received 56 papers, of which we accepted 23 to the symposium. Each paper received on average 3.5 reviews. The program committee worked tirelessly to do these reviews, largely by themselves, or with the help of colleagues in a few rare cases. The program committee then met in Austin, TX in person and via teleconference in June to do the hard work of deciding which papers made the cut. This was no easy process, and I am indebted to many on the program committee for their hard work. In particular, I am quite grateful to those who helped shepherd papers that needed a few light revisions. These include of IBM Research, of Georgia Tech, Leslie Barnes of AMD and David Kaeli of Northeastern University.","PeriodicalId":387816,"journal":{"name":"2009 IEEE International Symposium on Workload Characterization (IISWC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114040768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Message from the general chair 主席的口信
Pub Date : 2021-11-01 DOI: 10.1109/micro.2012.5
Derek Chiou
Welcome to the fifth annual IEEE International Symposium on Workload Characterization being held from October 4th to October 6th, 2009 at the AT&T Conference Center at the edge of the University of Texas at Austin campus. IISWC started in Austin as the Workshop on Workload Characterization and has not been held in Austin since the very first IISWC held in 2005. It's good to be back.
欢迎参加2009年10月4日至10月6日在德克萨斯大学奥斯汀校区边上的AT&T会议中心举行的第五届IEEE工作量表征国际研讨会。IISWC最初是在奥斯汀举办的工作量表征研讨会,自2005年第一届IISWC以来就没有在奥斯汀举行过。回来真好。
{"title":"Message from the general chair","authors":"Derek Chiou","doi":"10.1109/micro.2012.5","DOIUrl":"https://doi.org/10.1109/micro.2012.5","url":null,"abstract":"Welcome to the fifth annual IEEE International Symposium on Workload Characterization being held from October 4th to October 6th, 2009 at the AT&T Conference Center at the edge of the University of Texas at Austin campus. IISWC started in Austin as the Workshop on Workload Characterization and has not been held in Austin since the very first IISWC held in 2005. It's good to be back.","PeriodicalId":387816,"journal":{"name":"2009 IEEE International Symposium on Workload Characterization (IISWC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116174610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding the applicability of CMP performance optimizations on data mining applications 了解CMP性能优化在数据挖掘应用程序上的适用性
Pub Date : 2009-10-04 DOI: 10.1109/IISWC.2009.5306779
Ivan Jibaja, K. Shaw
A major challenge to the creation of chip multiprocessors is designing the on-chip memory and communication resources to efficiently support parallel workloads. A variety of cache organizations, data management techniques, and hardware optimizations that take advantage of specific data characteristics have been developed to improve application performance. The success of these approaches depends on applications exhibiting the presumed data characteristics.
创建芯片多处理器的一个主要挑战是设计芯片上的内存和通信资源,以有效地支持并行工作负载。已经开发了各种缓存组织、数据管理技术和利用特定数据特征的硬件优化,以提高应用程序性能。这些方法的成功取决于应用程序显示假定的数据特征。
{"title":"Understanding the applicability of CMP performance optimizations on data mining applications","authors":"Ivan Jibaja, K. Shaw","doi":"10.1109/IISWC.2009.5306779","DOIUrl":"https://doi.org/10.1109/IISWC.2009.5306779","url":null,"abstract":"A major challenge to the creation of chip multiprocessors is designing the on-chip memory and communication resources to efficiently support parallel workloads. A variety of cache organizations, data management techniques, and hardware optimizations that take advantage of specific data characteristics have been developed to improve application performance. The success of these approaches depends on applications exhibiting the presumed data characteristics.","PeriodicalId":387816,"journal":{"name":"2009 IEEE International Symposium on Workload Characterization (IISWC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125682236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Understanding PARSEC performance on contemporary CMPs 了解PARSEC在当代cmp上的性能
Pub Date : 2009-10-04 DOI: 10.1109/IISWC.2009.5306793
M. Bhadauria, Vincent M. Weaver, S. Mckee
PARSEC is a reference application suite used in industry and academia to assess new Chip Multiprocessor (CMP) designs. No investigation to date has profiled PARSEC on real hardware to better understand scaling properties and bottlenecks. This understanding is crucial in guiding future CMP designs for these kinds of emerging workloads. We use hardware performance counters, taking a systems-level approach and varying common architectural parameters: number of out-of-order cores, memory hierarchy configurations, number of multiple simultaneous threads, number of memory channels, and processor frequencies. We find these programs to be largely compute-bound, and thus limited by number of cores, micro-architectural resources, and cache-to-cache transfers, rather than by off-chip memory or system bus bandwidth. Half the suite fails to scale linearly with increasing number of threads, and some applications saturate performance at few threads on all platforms tested. Exploiting thread level parallelism delivers greater payoffs than exploiting instruction level parallelism. To reduce power and improve performance, we recommend increasing the number of arithmetic units per core, increasing support for TLP, and reducing support for ILP.
PARSEC是工业界和学术界用于评估新的芯片多处理器(CMP)设计的参考应用程序套件。到目前为止,还没有研究在实际硬件上分析PARSEC,以便更好地理解伸缩特性和瓶颈。这种理解对于指导针对这些新兴工作负载的未来CMP设计至关重要。我们使用硬件性能计数器,采用系统级方法和不同的公共体系结构参数:乱序核的数量、内存层次结构配置、多个并发线程的数量、内存通道的数量和处理器频率。我们发现这些程序在很大程度上受计算约束,因此受到核心数量、微架构资源和缓存到缓存传输的限制,而不是芯片外内存或系统总线带宽。一半的套件不能随着线程数量的增加而线性扩展,并且在所有测试的平台上,一些应用程序在几个线程时就会使性能饱和。利用线程级并行性比利用指令级并行性带来更大的回报。为了降低功耗和提高性能,我们建议增加每个核心的算术单元数,增加对TLP的支持,减少对ILP的支持。
{"title":"Understanding PARSEC performance on contemporary CMPs","authors":"M. Bhadauria, Vincent M. Weaver, S. Mckee","doi":"10.1109/IISWC.2009.5306793","DOIUrl":"https://doi.org/10.1109/IISWC.2009.5306793","url":null,"abstract":"PARSEC is a reference application suite used in industry and academia to assess new Chip Multiprocessor (CMP) designs. No investigation to date has profiled PARSEC on real hardware to better understand scaling properties and bottlenecks. This understanding is crucial in guiding future CMP designs for these kinds of emerging workloads. We use hardware performance counters, taking a systems-level approach and varying common architectural parameters: number of out-of-order cores, memory hierarchy configurations, number of multiple simultaneous threads, number of memory channels, and processor frequencies. We find these programs to be largely compute-bound, and thus limited by number of cores, micro-architectural resources, and cache-to-cache transfers, rather than by off-chip memory or system bus bandwidth. Half the suite fails to scale linearly with increasing number of threads, and some applications saturate performance at few threads on all platforms tested. Exploiting thread level parallelism delivers greater payoffs than exploiting instruction level parallelism. To reduce power and improve performance, we recommend increasing the number of arithmetic units per core, increasing support for TLP, and reducing support for ILP.","PeriodicalId":387816,"journal":{"name":"2009 IEEE International Symposium on Workload Characterization (IISWC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126890333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 86
Analyzing and improving performance scalability of commercial server workloads on a chip multiprocessor 在芯片多处理器上分析和改进商业服务器工作负载的性能可伸缩性
Pub Date : 2009-10-04 DOI: 10.1109/IISWC.2009.5306781
K. Ishizaki, T. Nakatani, S. Daijavad
A chip multiprocessor (CMP) with many low performance cores can achieve high performance or high performance/power for commercial server applications. The large number of hardware threads of a CMP with many low performance cores poses significant challenges to application developers in writing scalable applications. Many papers have assessed the architectural characteristics and the performance scalability, and some of them have identified lock contention as one of the scalability bottlenecks. However, there are few studies that resolved these problems, analyzed their causes, and compared the architectural characteristics before and after the scalability limitations were addressed. We analyzed and resolved some of the problems limiting the scalability of three commercial server applications with 64 hardware threads. We also did before and after comparisons of the architectural characteristics affected by the scalability enhancements, supporting the development of new processors. We addressed the lock contention with changes in the Java code. Our enhancements improved the performance scalability by up to 132%. We show that though the causes of lock contention are in different software layers, they share certain similarities and can be organized in three categories. Our comparisons reveal that the CPI and data TLB miss rates decrease, but the L2 data cache miss rates, L2 instruction cache miss rates, and memory traffic increase. These results suggest that we need to address the performance scalability problems of an application before we can accurately measure the architectural characteristics of a CMP.
具有许多低性能核心的芯片多处理器(CMP)可以为商业服务器应用程序实现高性能或高性能/高功耗。具有许多低性能核心的CMP的大量硬件线程给应用程序开发人员编写可伸缩应用程序带来了重大挑战。许多论文对系统的体系结构特征和性能可伸缩性进行了评估,其中一些论文认为锁争用是可伸缩性瓶颈之一。然而,很少有研究解决了这些问题,分析了其原因,并比较了解决可伸缩性限制前后的体系结构特征。我们分析并解决了限制三个具有64个硬件线程的商业服务器应用程序的可伸缩性的一些问题。我们还对受可伸缩性增强影响的体系结构特征进行了前后比较,以支持新处理器的开发。我们通过更改Java代码来解决锁争用问题。我们的改进将性能可伸缩性提高了132%。我们表明,尽管锁争用的原因在不同的软件层中,但它们有某些相似之处,可以分为三类。我们的比较表明,CPI和数据TLB缺失率下降,但L2数据缓存缺失率,L2指令缓存缺失率和内存流量增加。这些结果表明,我们需要先解决应用程序的性能可伸缩性问题,然后才能准确地测量CMP的体系结构特征。
{"title":"Analyzing and improving performance scalability of commercial server workloads on a chip multiprocessor","authors":"K. Ishizaki, T. Nakatani, S. Daijavad","doi":"10.1109/IISWC.2009.5306781","DOIUrl":"https://doi.org/10.1109/IISWC.2009.5306781","url":null,"abstract":"A chip multiprocessor (CMP) with many low performance cores can achieve high performance or high performance/power for commercial server applications. The large number of hardware threads of a CMP with many low performance cores poses significant challenges to application developers in writing scalable applications. Many papers have assessed the architectural characteristics and the performance scalability, and some of them have identified lock contention as one of the scalability bottlenecks. However, there are few studies that resolved these problems, analyzed their causes, and compared the architectural characteristics before and after the scalability limitations were addressed. We analyzed and resolved some of the problems limiting the scalability of three commercial server applications with 64 hardware threads. We also did before and after comparisons of the architectural characteristics affected by the scalability enhancements, supporting the development of new processors. We addressed the lock contention with changes in the Java code. Our enhancements improved the performance scalability by up to 132%. We show that though the causes of lock contention are in different software layers, they share certain similarities and can be organized in three categories. Our comparisons reveal that the CPI and data TLB miss rates decrease, but the L2 data cache miss rates, L2 instruction cache miss rates, and memory traffic increase. These results suggest that we need to address the performance scalability problems of an application before we can accurately measure the architectural characteristics of a CMP.","PeriodicalId":387816,"journal":{"name":"2009 IEEE International Symposium on Workload Characterization (IISWC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123880234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A characterization and analysis of PTX kernels PTX核的性质与分析
Pub Date : 2009-10-04 DOI: 10.1109/IISWC.2009.5306801
Andrew Kerr, G. Diamos, S. Yalamanchili
General purpose application development for GPUs (GPGPU) has recently gained momentum as a cost-effective approach for accelerating data- and compute-intensive applications. It has been driven by the introduction of C-based programming environments such as NVIDIA's CUDA [1], OpenCL [2], and Intel's Ct [3]. While significant effort has been focused on developing and evaluating applications and software tools, comparatively little has been devoted to the analysis and characterization of applications to assist future work in compiler optimizations, application re-structuring, and micro-architecture design. This paper proposes a set of metrics for GPU workloads and uses these metrics to analyze the behavior of GPU programs. We report on an analysis of over 50 kernels and applications including the full NVIDIA CUDA SDK and UIUC's Parboil Benchmark Suite covering control flow, data flow, parallelism, and memory behavior. The analysis was performed using a full function emulator we developed that implements the NVIDIA virtual machine referred to as PTX (Parallel Thread eXecution architecture) - a machine model and low level virtual ISA that is representative of ISAs for data parallel execution. The emulator can execute compiled kernels from the CUDA compiler, currently supports the full PTX 1.4 specification [4], and has been validated against the full CUDA SDK. The results quantify the importance of optimizations such as those for branch reconvergence, the prevalance of sharing between threads, and highlights opportunities for additional parallelism.
gpu通用应用程序开发(GPGPU)作为加速数据和计算密集型应用程序的一种经济有效的方法,最近获得了发展势头。它是由引入基于c语言的编程环境,如NVIDIA的CUDA[1]、OpenCL[2]和Intel的Ct[3]所推动的。虽然大量的工作集中在开发和评估应用程序和软件工具上,但相对较少的工作是用于分析和描述应用程序,以帮助将来在编译器优化、应用程序重构和微体系结构设计方面的工作。本文提出了一组GPU工作负载指标,并使用这些指标来分析GPU程序的行为。我们报告了对50多个内核和应用程序的分析,包括完整的NVIDIA CUDA SDK和UIUC的Parboil基准套件,涵盖控制流,数据流,并行性和内存行为。分析是使用我们开发的全功能模拟器执行的,该模拟器实现了NVIDIA虚拟机,称为PTX(并行线程执行架构)——一种机器模型和低级虚拟ISA,代表了数据并行执行的ISA。模拟器可以执行CUDA编译器编译的内核,目前支持完整的PTX 1.4规范[4],并且已经针对完整的CUDA SDK进行了验证。结果量化了优化的重要性,比如分支再收敛、线程间共享的普遍性,并突出了额外并行性的机会。
{"title":"A characterization and analysis of PTX kernels","authors":"Andrew Kerr, G. Diamos, S. Yalamanchili","doi":"10.1109/IISWC.2009.5306801","DOIUrl":"https://doi.org/10.1109/IISWC.2009.5306801","url":null,"abstract":"General purpose application development for GPUs (GPGPU) has recently gained momentum as a cost-effective approach for accelerating data- and compute-intensive applications. It has been driven by the introduction of C-based programming environments such as NVIDIA's CUDA [1], OpenCL [2], and Intel's Ct [3]. While significant effort has been focused on developing and evaluating applications and software tools, comparatively little has been devoted to the analysis and characterization of applications to assist future work in compiler optimizations, application re-structuring, and micro-architecture design. This paper proposes a set of metrics for GPU workloads and uses these metrics to analyze the behavior of GPU programs. We report on an analysis of over 50 kernels and applications including the full NVIDIA CUDA SDK and UIUC's Parboil Benchmark Suite covering control flow, data flow, parallelism, and memory behavior. The analysis was performed using a full function emulator we developed that implements the NVIDIA virtual machine referred to as PTX (Parallel Thread eXecution architecture) - a machine model and low level virtual ISA that is representative of ISAs for data parallel execution. The emulator can execute compiled kernels from the CUDA compiler, currently supports the full PTX 1.4 specification [4], and has been validated against the full CUDA SDK. The results quantify the importance of optimizations such as those for branch reconvergence, the prevalance of sharing between threads, and highlights opportunities for additional parallelism.","PeriodicalId":387816,"journal":{"name":"2009 IEEE International Symposium on Workload Characterization (IISWC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127650904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 135
Characterization of DBT overhead DBT开销的表征
Pub Date : 2009-10-04 DOI: 10.1109/IISWC.2009.5306785
E. Borin, Youfeng Wu
In recent years, dynamic binary translation has emerged as an important tool with many real world applications. Besides supporting legacy binary code and ISA virtualization, it enables innovative co-designed microarchitectures and allows transparent binary instrumentation. The dynamic nature of the translation usually incurs extra execution overhead and many research works had proposed software and hardware solutions to minimize the overhead [1, 2]. In this paper, we analyze our dynamic binary translator performance and depict the main sources of overhead in details. We classify the translation operations and associated overhead into five major categories, and quantify their contribution to the overall overhead. Based on the analysis and detailed evaluation, we identify and point out the most promising solutions to address the overhead problem. We believe this study is an important first step toward the grand goal of zero-overhead dynamic binary translation.
近年来,动态二进制翻译已成为一种重要的工具,具有许多实际应用。除了支持遗留二进制代码和ISA虚拟化之外,它还支持创新的协同设计微架构,并允许透明的二进制工具。翻译的动态性通常会导致额外的执行开销,许多研究工作已经提出了软件和硬件解决方案来最小化开销[1,2]。本文分析了动态二进制转换器的性能,详细描述了开销的主要来源。我们将翻译操作和相关开销分为五大类,并量化它们对总体开销的贡献。在分析和详细评估的基础上,我们确定并指出了最有希望解决开销问题的解决方案。我们相信这项研究是实现零开销动态二进制翻译宏伟目标的重要的第一步。
{"title":"Characterization of DBT overhead","authors":"E. Borin, Youfeng Wu","doi":"10.1109/IISWC.2009.5306785","DOIUrl":"https://doi.org/10.1109/IISWC.2009.5306785","url":null,"abstract":"In recent years, dynamic binary translation has emerged as an important tool with many real world applications. Besides supporting legacy binary code and ISA virtualization, it enables innovative co-designed microarchitectures and allows transparent binary instrumentation. The dynamic nature of the translation usually incurs extra execution overhead and many research works had proposed software and hardware solutions to minimize the overhead [1, 2]. In this paper, we analyze our dynamic binary translator performance and depict the main sources of overhead in details. We classify the translation operations and associated overhead into five major categories, and quantify their contribution to the overall overhead. Based on the analysis and detailed evaluation, we identify and point out the most promising solutions to address the overhead problem. We believe this study is an important first step toward the grand goal of zero-overhead dynamic binary translation.","PeriodicalId":387816,"journal":{"name":"2009 IEEE International Symposium on Workload Characterization (IISWC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124997619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Workload characterization and optimization of high-performance text indexing on the Cell Broadband Engine™ (Cell/B.E.) Cell宽带引擎™(Cell/B.E.)上高性能文本索引的工作负载表征和优化
Pub Date : 2009-10-04 DOI: 10.1109/IISWC.2009.5306798
D. Scarpazza, G. W. Braudaway
In this paper we examine text indexing on the Cell Broadband Engine™ (Cell/B.E.), an emerging workload on an emerging multicore architecture. The Cell Broadband Engine is a microprocessor jointly developed by Sony Computer Entertainment, Toshiba, and IBM (herein, we refer to it simply as the “Cell”).
在本文中,我们研究了Cell宽带引擎™(Cell/B.E.)上的文本索引,Cell/B.E.是新兴多核架构上的新兴工作负载。Cell宽带引擎是由索尼计算机娱乐公司、东芝公司和IBM公司联合开发的一种微处理器(在这里,我们将其简称为“Cell”)。
{"title":"Workload characterization and optimization of high-performance text indexing on the Cell Broadband Engine™ (Cell/B.E.)","authors":"D. Scarpazza, G. W. Braudaway","doi":"10.1109/IISWC.2009.5306798","DOIUrl":"https://doi.org/10.1109/IISWC.2009.5306798","url":null,"abstract":"In this paper we examine text indexing on the Cell Broadband Engine™ (Cell/B.E.), an emerging workload on an emerging multicore architecture. The Cell Broadband Engine is a microprocessor jointly developed by Sony Computer Entertainment, Toshiba, and IBM (herein, we refer to it simply as the “Cell”).","PeriodicalId":387816,"journal":{"name":"2009 IEEE International Symposium on Workload Characterization (IISWC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132414047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Green clouds and black swans in the exascale era 百亿亿次时代的绿云和黑天鹅
Pub Date : 2009-10-04 DOI: 10.1109/IISWC.2009.5306800
Parthasarathy Ranganathan
The petascale milestone is behind us, and the next grand challenge is to design systems and datacenters for the exascale era (10^18 flops). this talk will speculate on challenges and opportunities in understanding and characterizing workloads in the exascale era, with particular emphasis on potential “black swan events*”.
千兆级的里程碑已经过去了,下一个重大挑战是为百亿亿级时代(10^18次失败)设计系统和数据中心。本次演讲将推测在百亿亿次时代理解和描述工作负载的挑战和机遇,特别强调潜在的“黑天鹅事件”。
{"title":"Green clouds and black swans in the exascale era","authors":"Parthasarathy Ranganathan","doi":"10.1109/IISWC.2009.5306800","DOIUrl":"https://doi.org/10.1109/IISWC.2009.5306800","url":null,"abstract":"The petascale milestone is behind us, and the next grand challenge is to design systems and datacenters for the exascale era (10^18 flops). this talk will speculate on challenges and opportunities in understanding and characterizing workloads in the exascale era, with particular emphasis on potential “black swan events*”.","PeriodicalId":387816,"journal":{"name":"2009 IEEE International Symposium on Workload Characterization (IISWC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122183876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Experimental evaluation of N-tier systems: Observation and analysis of multi-bottlenecks n层系统的实验评估:多瓶颈的观察与分析
Pub Date : 2009-10-04 DOI: 10.1109/IISWC.2009.5306791
Simon Malkowski, Markus Hedwig, C. Pu
In many areas such as e-commerce, mission-critical N-tier applications have grown increasingly complex. They are characterized by non-stationary workloads (e.g., peak load several times the sustained load) and complex dependencies among the component servers. We have studied N-tier applications through a large number of experiments using the RUBiS and RUBBoS benchmarks. We apply statistical methods such as kernel density estimation, adaptive filtering, and change detection through multiple-model hypothesis tests to analyze more than 200GB of recorded data. Beyond the usual single-bottlenecks, we have observed more intricate bottleneck phenomena. For instance, in several configurations all system components show average resource utilization significantly below saturation, but overall throughput is limited despite addition of more resources. More concretely, our analysis shows experimental evidence of multi-bottleneck cases with low average resource utilization where several resources saturate alternatively, indicating a clear lack of independence in their utilization. Our data corroborates the increasing awareness of the need for more sophisticated analytical performance models to describe N-tier applications that do not rely on independent resource utilization assumptions. We also present a preliminary taxonomy of multi-bottlenecks found in our experimentally observed data.
在电子商务等许多领域,关键任务n层应用程序变得越来越复杂。它们的特点是非固定工作负载(例如,峰值负载是持续负载的几倍)和组件服务器之间的复杂依赖关系。我们通过使用RUBiS和RUBBoS基准测试进行了大量实验,研究了n层应用程序。我们采用核密度估计、自适应滤波、多模型假设检验变化检测等统计方法对超过200GB的记录数据进行了分析。除了通常的单一瓶颈之外,我们还观察到更复杂的瓶颈现象。例如,在几种配置中,所有系统组件的平均资源利用率明显低于饱和,但是尽管增加了更多的资源,总体吞吐量仍然有限。更具体地说,我们的分析显示了多瓶颈情况下平均资源利用率较低的实验证据,其中几种资源交替饱和,表明它们的利用明显缺乏独立性。我们的数据证实,人们越来越意识到需要更复杂的分析性能模型来描述不依赖于独立资源利用假设的n层应用程序。我们还提出了在实验观察数据中发现的多瓶颈的初步分类。
{"title":"Experimental evaluation of N-tier systems: Observation and analysis of multi-bottlenecks","authors":"Simon Malkowski, Markus Hedwig, C. Pu","doi":"10.1109/IISWC.2009.5306791","DOIUrl":"https://doi.org/10.1109/IISWC.2009.5306791","url":null,"abstract":"In many areas such as e-commerce, mission-critical N-tier applications have grown increasingly complex. They are characterized by non-stationary workloads (e.g., peak load several times the sustained load) and complex dependencies among the component servers. We have studied N-tier applications through a large number of experiments using the RUBiS and RUBBoS benchmarks. We apply statistical methods such as kernel density estimation, adaptive filtering, and change detection through multiple-model hypothesis tests to analyze more than 200GB of recorded data. Beyond the usual single-bottlenecks, we have observed more intricate bottleneck phenomena. For instance, in several configurations all system components show average resource utilization significantly below saturation, but overall throughput is limited despite addition of more resources. More concretely, our analysis shows experimental evidence of multi-bottleneck cases with low average resource utilization where several resources saturate alternatively, indicating a clear lack of independence in their utilization. Our data corroborates the increasing awareness of the need for more sophisticated analytical performance models to describe N-tier applications that do not rely on independent resource utilization assumptions. We also present a preliminary taxonomy of multi-bottlenecks found in our experimentally observed data.","PeriodicalId":387816,"journal":{"name":"2009 IEEE International Symposium on Workload Characterization (IISWC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117169037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
期刊
2009 IEEE International Symposium on Workload Characterization (IISWC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1