首页 > 最新文献

2010 39th International Conference on Parallel Processing Workshops最新文献

英文 中文
Integrating Power and Cooling into Parallel Performance Analysis 将电源和冷却集成到并行性能分析中
Pub Date : 2010-09-13 DOI: 10.1109/ICPPW.2010.72
R. Knapp, K. Karavanic, A. Márquez
We present a new, integrated approach to parallel performance analysis that integrates traditional application-oriented performance data with measurements of the physical runtime environment. We have developed the needed infrastructure for combined evaluation of system, application, and machine room performance in the high end environment. We illustrate the utility of our approach, with data from our study of the power and cooling impact of the choice of physical location for an application within the machine room. We demonstrate the integration of measured performance data from the application, system, and physical room environment, and discuss the challenges encountered.
我们提出了一种新的、集成的并行性能分析方法,它将传统的面向应用程序的性能数据与物理运行时环境的测量相结合。我们已经开发了在高端环境中对系统、应用程序和机房性能进行综合评估所需的基础设施。我们通过研究机房内应用程序物理位置选择对电源和冷却影响的数据来说明我们方法的实用性。我们演示了来自应用程序、系统和物理房间环境的测量性能数据的集成,并讨论了遇到的挑战。
{"title":"Integrating Power and Cooling into Parallel Performance Analysis","authors":"R. Knapp, K. Karavanic, A. Márquez","doi":"10.1109/ICPPW.2010.72","DOIUrl":"https://doi.org/10.1109/ICPPW.2010.72","url":null,"abstract":"We present a new, integrated approach to parallel performance analysis that integrates traditional application-oriented performance data with measurements of the physical runtime environment. We have developed the needed infrastructure for combined evaluation of system, application, and machine room performance in the high end environment. We illustrate the utility of our approach, with data from our study of the power and cooling impact of the choice of physical location for an application within the machine room. We demonstrate the integration of measured performance data from the application, system, and physical room environment, and discuss the challenges encountered.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133472143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The Characteristics of Cloud Computing 云计算的特点
Pub Date : 2010-09-13 DOI: 10.1109/ICPPW.2010.45
Chunye Gong, Jie Liu, Qiang Zhang, Haitao Chen, Z. Gong
Cloud computing emerges as one of the hottest topic in field of information technology. Cloud computing is based on several other computing research areas such as HPC, virtualization, utility computing and grid computing. In order to make clear the essential of cloud computing, we propose the characteristics of this area which make cloud computing being cloud computing and distinguish it from other research areas. The cloud computing has its own conceptional, technical, economic and user experience characteristics. The service oriented, loose coupling, strong fault tolerant, business model and ease use are main characteristics of cloud computing. Clear insights into cloud computing will help the development and adoption of this evolving technology both for academe and industry.
云计算是当今信息技术领域最热门的话题之一。云计算是基于其他几个计算研究领域,如HPC、虚拟化、效用计算和网格计算。为了明确云计算的本质,我们提出了该领域的特征,这些特征使云计算成为云计算,并将其与其他研究领域区分开来。云计算在概念上、技术上、经济上和用户体验上都有自己的特点。面向服务、松耦合、强容错、商业模型和易用性是云计算的主要特点。对云计算的清晰见解将有助于学术界和工业界对这一不断发展的技术的开发和采用。
{"title":"The Characteristics of Cloud Computing","authors":"Chunye Gong, Jie Liu, Qiang Zhang, Haitao Chen, Z. Gong","doi":"10.1109/ICPPW.2010.45","DOIUrl":"https://doi.org/10.1109/ICPPW.2010.45","url":null,"abstract":"Cloud computing emerges as one of the hottest topic in field of information technology. Cloud computing is based on several other computing research areas such as HPC, virtualization, utility computing and grid computing. In order to make clear the essential of cloud computing, we propose the characteristics of this area which make cloud computing being cloud computing and distinguish it from other research areas. The cloud computing has its own conceptional, technical, economic and user experience characteristics. The service oriented, loose coupling, strong fault tolerant, business model and ease use are main characteristics of cloud computing. Clear insights into cloud computing will help the development and adoption of this evolving technology both for academe and industry.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117236212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 415
A Micro-benchmark Suite for AMD GPUs AMD gpu的微基准测试套件
Pub Date : 2010-09-13 DOI: 10.1109/ICPPW.2010.59
Ryan Taylor, Xiaoming Li
Optimizing programs for Graphic Processing Unit (GPU) requires thorough knowledge about the values of architectural features for the new computing platform. However, this knowledge is frequently unavailable, e.g., due to insufficient documentation, which is probably a result of the infancy of general purpose computing on the GPU. What makes the modeling of program performance on GPU even more difficult is that the exact value of some “architectural” parameters on the GPU depends on how a GPU program interacts with those features. For example, AMD GPUs show different memory latencies when the memory is accessed with address sequences that have different patterns. Current micro-benchmark suites such as X-Ray are powerless for characterizing the GPU. Clearly, a preliminary for efficient code optimization and automatic tuning on the GPU is a systematic method to measure the architectural features and identify the most basic program characteristics that determine the performance of a program on the new GPU architectures. In this paper, we present a micro-benchmark suite for AMD GPUs that supports the AMD StreamSDK. Our model identifies and measures a series of architectural features and basic program characteristics that are most important and most predictive for program performance on the platform. The features and characteristics include vectorization, burst write latency, texture fetch latency, global read and write latency, ALU/Fetch operation ratio, domain size and register usage for both AMD’s pixel shader and compute shader modes. Our performance model not only generates correct values for those parameters, but also provides a clear picture of program performance on the GPU.
优化图形处理单元(GPU)的程序需要对新计算平台的架构特性的价值有全面的了解。然而,这些知识经常是不可用的,例如,由于文档不足,这可能是GPU上通用计算的初级阶段的结果。使GPU上的程序性能建模变得更加困难的是,GPU上一些“架构”参数的确切值取决于GPU程序如何与这些特征交互。例如,当使用具有不同模式的地址序列访问内存时,AMD gpu显示不同的内存延迟。目前的微基准套件,如x射线,是无力表征GPU。显然,在GPU上进行有效的代码优化和自动调优的初步方法是测量架构特征并确定决定程序在新GPU架构上性能的最基本程序特征的系统方法。在本文中,我们提出了一个支持AMD StreamSDK的AMD gpu微基准测试套件。我们的模型识别并测量了一系列体系结构特征和基本程序特征,这些特征对平台上的程序性能最重要、最具预测性。特性和特征包括矢量化,突发写入延迟,纹理获取延迟,全局读写延迟,ALU/ fetch操作比,域大小和寄存器使用AMD的像素着色器和计算着色器模式。我们的性能模型不仅为这些参数生成正确的值,而且还提供了GPU上程序性能的清晰图像。
{"title":"A Micro-benchmark Suite for AMD GPUs","authors":"Ryan Taylor, Xiaoming Li","doi":"10.1109/ICPPW.2010.59","DOIUrl":"https://doi.org/10.1109/ICPPW.2010.59","url":null,"abstract":"Optimizing programs for Graphic Processing Unit (GPU) requires thorough knowledge about the values of architectural features for the new computing platform. However, this knowledge is frequently unavailable, e.g., due to insufficient documentation, which is probably a result of the infancy of general purpose computing on the GPU. What makes the modeling of program performance on GPU even more difficult is that the exact value of some “architectural” parameters on the GPU depends on how a GPU program interacts with those features. For example, AMD GPUs show different memory latencies when the memory is accessed with address sequences that have different patterns. Current micro-benchmark suites such as X-Ray are powerless for characterizing the GPU. Clearly, a preliminary for efficient code optimization and automatic tuning on the GPU is a systematic method to measure the architectural features and identify the most basic program characteristics that determine the performance of a program on the new GPU architectures. In this paper, we present a micro-benchmark suite for AMD GPUs that supports the AMD StreamSDK. Our model identifies and measures a series of architectural features and basic program characteristics that are most important and most predictive for program performance on the platform. The features and characteristics include vectorization, burst write latency, texture fetch latency, global read and write latency, ALU/Fetch operation ratio, domain size and register usage for both AMD’s pixel shader and compute shader modes. Our performance model not only generates correct values for those parameters, but also provides a clear picture of program performance on the GPU.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127236866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Sensor-Aided Personal Navigation Systems for Handheld Devices 手持设备的传感器辅助个人导航系统
Pub Date : 2010-09-13 DOI: 10.1109/ICPPW.2010.78
Chao-Min Su, Jiader Chou, Chih-Wei Yi, Y. Tseng, Chia-Hung Tsai
The positioning technique is the key technique for developing geographic applications, like location based services. The Global Positioning System (GPS) is a common approach for positioning in vehicular navigations. Although GPS can provide absolute position information, the accuracy of GPS is not enough for personal navigations. What is worse, GPS does not work well indoors. Instead, Inertial Measurement Units (IMUs) can be used to track objects with high precision, but it provides relative position information. Thus, integration of GPS and IMU can do positioning indoors and outdoors. In this paper, combining our previous work, a pedestrian tracking system for handheld devices, with GPS leads to a personal navigation system for handheld devices. The position and heading information can be calculated from this system. The system also serves a platform for many applications related to the location.
定位技术是开发地理应用(如基于位置的服务)的关键技术。全球定位系统(GPS)是车辆导航中常用的定位方法。虽然GPS可以提供绝对的位置信息,但GPS的精度不足以用于个人导航。更糟糕的是,GPS在室内不能很好地工作。相反,惯性测量单元(imu)可以用于高精度跟踪物体,但它提供的是相对位置信息。因此,GPS和IMU的结合可以实现室内和室外的定位。本文结合前人的工作,将手持设备的行人跟踪系统与GPS结合,形成手持设备的个人导航系统。该系统可以计算出船舶的位置和航向信息。该系统还为许多与位置相关的应用程序提供了平台。
{"title":"Sensor-Aided Personal Navigation Systems for Handheld Devices","authors":"Chao-Min Su, Jiader Chou, Chih-Wei Yi, Y. Tseng, Chia-Hung Tsai","doi":"10.1109/ICPPW.2010.78","DOIUrl":"https://doi.org/10.1109/ICPPW.2010.78","url":null,"abstract":"The positioning technique is the key technique for developing geographic applications, like location based services. The Global Positioning System (GPS) is a common approach for positioning in vehicular navigations. Although GPS can provide absolute position information, the accuracy of GPS is not enough for personal navigations. What is worse, GPS does not work well indoors. Instead, Inertial Measurement Units (IMUs) can be used to track objects with high precision, but it provides relative position information. Thus, integration of GPS and IMU can do positioning indoors and outdoors. In this paper, combining our previous work, a pedestrian tracking system for handheld devices, with GPS leads to a personal navigation system for handheld devices. The position and heading information can be calculated from this system. The system also serves a platform for many applications related to the location.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130735248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
A Power-Aware Cloud Architecture with Smart Metering 具有智能计量的电力感知云架构
Pub Date : 2010-09-13 DOI: 10.1109/ICPPW.2010.73
Che-Yuan Tu, Wen-Chieh Kuo, Wei-Hua Teng, Yao-Tsung Wang, Steven Shiau
With the growing interest of cloud computing and carbon emission reduction, how to build energy efficient cloud architecture becomes a crisis issue for service providers. In this paper, we propose a power-aware cloud architecture based on DRBL (Diskless Remote Boot in Linux), cpufreqd and xenpm. We also introduce a low-cost smart metering system based on open hardware Arduino board. Composing with existing technique, such as Dynamic Voltage Frequency Scaling (DVFS), ACPI, diskless design and RAM disk storage, our experiment results show that this architecture will reduce energy consumption from 4 to 11% when running CPU-intensive applications. In conclusion, this paper reveals that service providers could benefit from diskless design and RAM disk storage if their applications are CPU-intensive.
随着人们对云计算和碳减排的兴趣日益浓厚,如何构建高效节能的云架构成为服务商面临的一个危机问题。在本文中,我们提出了一种基于DRBL (Linux中无磁盘远程启动)、cpufreqd和xenpm的功耗感知云架构。本文还介绍了一种基于开放硬件Arduino板的低成本智能计量系统。结合现有技术,如动态电压频率缩放(DVFS)、ACPI、无磁盘设计和RAM磁盘存储,我们的实验结果表明,当运行cpu密集型应用程序时,该架构将降低4%至11%的能耗。总之,本文揭示了如果服务提供商的应用程序是cpu密集型的,那么他们可以从无磁盘设计和RAM磁盘存储中获益。
{"title":"A Power-Aware Cloud Architecture with Smart Metering","authors":"Che-Yuan Tu, Wen-Chieh Kuo, Wei-Hua Teng, Yao-Tsung Wang, Steven Shiau","doi":"10.1109/ICPPW.2010.73","DOIUrl":"https://doi.org/10.1109/ICPPW.2010.73","url":null,"abstract":"With the growing interest of cloud computing and carbon emission reduction, how to build energy efficient cloud architecture becomes a crisis issue for service providers. In this paper, we propose a power-aware cloud architecture based on DRBL (Diskless Remote Boot in Linux), cpufreqd and xenpm. We also introduce a low-cost smart metering system based on open hardware Arduino board. Composing with existing technique, such as Dynamic Voltage Frequency Scaling (DVFS), ACPI, diskless design and RAM disk storage, our experiment results show that this architecture will reduce energy consumption from 4 to 11% when running CPU-intensive applications. In conclusion, this paper reveals that service providers could benefit from diskless design and RAM disk storage if their applications are CPU-intensive.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115956062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Application Specific Instruction Accelerator for Multistandard Viterbi and Turbo Decoding 多标准Viterbi和Turbo译码专用指令加速器
Pub Date : 2010-09-13 DOI: 10.1109/ICPPW.2010.17
Mangesh K. Kunchamwar, Durga P. Prasad, Pawan Hegde, P. Balsara, R. Sangireddy
There is an increasing demand for converged solution for multi-standard radio processors to support existing and future standards. In this work, heterogeneous multi-processor platform is proposed for multi standard wireless communication system which is programmable and scalable in adapting to future standards. Channel decoding algorithms form important constituent of wireless communication system because of their computational complexity. A programmable radio processor is proposed for channel decoding with application specific instruction accelerators. Viterbi and Turbo channel decoding algorithms are analyzed for computational parallelism in the algorithms and for hardware reusability across the algorithms. Application specific instruction accelerator is designed by exploiting similar characteristics and computational parallelism across the algorithms. The analysis shows that the throughput of 54Mbps for UWB Viterbi Decoder and 12 Mbps for UMTS Turbo Decoder at 91.7MHz can be achieved using the proposed design.
对多标准无线电处理器的融合解决方案的需求不断增加,以支持现有和未来的标准。本文提出了一种多标准无线通信系统的异构多处理器平台,该平台具有可编程性和可扩展性,可以适应未来的标准。信道解码算法由于其计算复杂度而成为无线通信系统的重要组成部分。提出了一种可编程无线电处理器,用于特定应用指令加速器的信道解码。分析了Viterbi和Turbo信道解码算法的计算并行性和算法之间的硬件可重用性。针对特定应用的指令加速器是利用算法之间的相似特性和计算并行性来设计的。分析表明,在91.7MHz频率下,UWB Viterbi译码器和UMTS Turbo译码器的吞吐量分别达到54Mbps和12mbps。
{"title":"Application Specific Instruction Accelerator for Multistandard Viterbi and Turbo Decoding","authors":"Mangesh K. Kunchamwar, Durga P. Prasad, Pawan Hegde, P. Balsara, R. Sangireddy","doi":"10.1109/ICPPW.2010.17","DOIUrl":"https://doi.org/10.1109/ICPPW.2010.17","url":null,"abstract":"There is an increasing demand for converged solution for multi-standard radio processors to support existing and future standards. In this work, heterogeneous multi-processor platform is proposed for multi standard wireless communication system which is programmable and scalable in adapting to future standards. Channel decoding algorithms form important constituent of wireless communication system because of their computational complexity. A programmable radio processor is proposed for channel decoding with application specific instruction accelerators. Viterbi and Turbo channel decoding algorithms are analyzed for computational parallelism in the algorithms and for hardware reusability across the algorithms. Application specific instruction accelerator is designed by exploiting similar characteristics and computational parallelism across the algorithms. The analysis shows that the throughput of 54Mbps for UWB Viterbi Decoder and 12 Mbps for UMTS Turbo Decoder at 91.7MHz can be achieved using the proposed design.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134175889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
FFT Algorithms Evaluation on a Homogeneous Multi-processor System-on-Chip 同构多处理器片上系统的FFT算法评价
Pub Date : 2010-09-13 DOI: 10.1109/ICPPW.2010.20
Roberto Airoldi, F. Garzia, J. Nurmi
This paper presents the evaluation of radix-2, radix-4 and radix-8 algorithms for N-point FFTs on a homogeneous Multi-Processor System-on-Chip, prototyped on FPGA device. The evaluation of the algorithms was done analysing profiling of the algorithms in comparison to a single processor architecture. The performance were evaluated in terms of required clock cycles, achieved speed-up and parallelization efficiency. The analysis showed for each algorithm how the parallelization efficiency grows moving from small to larger FFTs. Moreover the comparison between the different implementations showed the parallelization properties of each algorithm. Radix-2 algorithm shows the best speed-up and parallelization efficiency while radix-4 gives the best performance in terms of required clock cycles.
本文介绍了在FPGA上原型化的同构多处理器片上n点fft的基数2、基数4和基数8算法的评估。对算法进行了评价,分析了算法与单处理器架构的比较。从所需的时钟周期、实现的加速和并行化效率等方面对性能进行了评估。分析显示了每种算法的并行化效率如何从小型fft增长到大型fft。此外,通过对不同实现的比较,可以看出每种算法的并行化特性。基数-2算法在加速和并行化效率方面表现最好,而基数-4算法在所需时钟周期方面表现最好。
{"title":"FFT Algorithms Evaluation on a Homogeneous Multi-processor System-on-Chip","authors":"Roberto Airoldi, F. Garzia, J. Nurmi","doi":"10.1109/ICPPW.2010.20","DOIUrl":"https://doi.org/10.1109/ICPPW.2010.20","url":null,"abstract":"This paper presents the evaluation of radix-2, radix-4 and radix-8 algorithms for N-point FFTs on a homogeneous Multi-Processor System-on-Chip, prototyped on FPGA device. The evaluation of the algorithms was done analysing profiling of the algorithms in comparison to a single processor architecture. The performance were evaluated in terms of required clock cycles, achieved speed-up and parallelization efficiency. The analysis showed for each algorithm how the parallelization efficiency grows moving from small to larger FFTs. Moreover the comparison between the different implementations showed the parallelization properties of each algorithm. Radix-2 algorithm shows the best speed-up and parallelization efficiency while radix-4 gives the best performance in terms of required clock cycles.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115703798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Message Driven Programming with S-Net: Methodology and Performance 基于S-Net的消息驱动编程:方法论和性能
Pub Date : 2010-09-13 DOI: 10.1109/ICPPW.2010.61
F. Penczek, S. Herhut, S. Scholz, A. Shafarenko, Jungsook Yang, Chun-Yi Chen, N. Bagherzadeh, C. Grelck
Development and implementation of the coordination language S-NET has been reported previously. In this paper we apply the S-NET design methodology to a computer graphics problem. We demonstrate (i) how a complete separation of concerns can be achieved between algorithm engineering and concurrency engineering and (ii) that the S-NET implementation is quite capable of achieving performance that matches what can be achieved using low-level tools such as MPI. We find this remarkable as under S-NET communication, concurrency and synchronization are completely separated from algorithmic code. We argue that our approach delivers a flexible component technology which liberates application developers from the logistics of task and data management while at the same time making it unnecessary for a distributed computing professional to acquire detailed knowledge of the application area.
先前已经报道了协调语言S-NET的开发和实现。本文将S-NET设计方法应用于一个计算机图形学问题。我们演示了(i)如何在算法工程和并发工程之间实现关注点的完全分离,以及(ii) S-NET实现非常有能力实现与使用MPI等低级工具所能实现的性能相匹配的性能。我们发现在S-NET通信下,并发和同步完全与算法代码分离。我们认为,我们的方法提供了一种灵活的组件技术,将应用程序开发人员从任务和数据管理的后勤工作中解放出来,同时使分布式计算专业人员无需获取应用程序领域的详细知识。
{"title":"Message Driven Programming with S-Net: Methodology and Performance","authors":"F. Penczek, S. Herhut, S. Scholz, A. Shafarenko, Jungsook Yang, Chun-Yi Chen, N. Bagherzadeh, C. Grelck","doi":"10.1109/ICPPW.2010.61","DOIUrl":"https://doi.org/10.1109/ICPPW.2010.61","url":null,"abstract":"Development and implementation of the coordination language S-NET has been reported previously. In this paper we apply the S-NET design methodology to a computer graphics problem. We demonstrate (i) how a complete separation of concerns can be achieved between algorithm engineering and concurrency engineering and (ii) that the S-NET implementation is quite capable of achieving performance that matches what can be achieved using low-level tools such as MPI. We find this remarkable as under S-NET communication, concurrency and synchronization are completely separated from algorithmic code. We argue that our approach delivers a flexible component technology which liberates application developers from the logistics of task and data management while at the same time making it unnecessary for a distributed computing professional to acquire detailed knowledge of the application area.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114851832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Multi-layer Prefetching for Hybrid Storage Systems: Algorithms, Models, and Evaluations 混合存储系统的多层预取:算法、模型和评估
Pub Date : 2010-09-13 DOI: 10.1109/ICPPW.2010.18
Mais Nijim, Ziliang Zong, X. Qin, Y. Nijim
Parallel storage systems have been highly scalable and widely used in support of data-intensive applications. In future systems with the nature of massive data processing and storing, hybrid storage systems opt for a solution to fulfill a variety of demands such as large storage capacity, high I/O performance and low cost. Hybrid storage systems (HSS) contain both high-end storage components (e.g. solid-state disks and hard disk drives) to guarantee performance, and low-end storage components (e.g. tapes) to reduce cost. In HSS, transferring data back and forth among solid-state disks (SSDs), hard disk drives (HDDs), and tapes plays a critical role in achieving high I/O performance. Prefetching is a promising solution to reduce the latency of data transferring in HSS. However, prefetching in the context of HSS is technically challenging due to an interesting dilemma: aggressive prefetching is required to efficiently reduce I/O latency, whereas overaggressive prefetching may waste I/O bandwidth by transferring useless data from HDDs to SSDs or from tapes to HDDs. To address this problem, we propose a multi-layer prefetching algorithm that can judiciously prefetch data from tapes to HDDs and from HDDs to SSDs. To evaluate our algorithm, we develop an analytical model and the experimental results reveal that our prefetching algorithm improves the performance in hybrid storage systems.
并行存储系统具有高度可扩展性,并广泛用于支持数据密集型应用程序。在未来具有海量数据处理和存储性质的系统中,混合存储系统选择满足大存储容量、高I/O性能和低成本等多种需求的解决方案。混合存储系统(HSS)既包含保证性能的高端存储组件(如固态磁盘和硬盘驱动器),也包含降低成本的低端存储组件(如磁带)。在HSS中,数据在ssd (solid-state disk)、hdd (hard disk drives)和磁带之间的来回传输对实现高I/O性能起着至关重要的作用。预取是降低HSS数据传输延迟的一种很有前途的解决方案。然而,HSS环境中的预取在技术上具有挑战性,因为存在一个有趣的难题:需要主动预取来有效地减少I/O延迟,而过度预取可能会将无用的数据从hdd传输到ssd或从磁带传输到hdd,从而浪费I/O带宽。为了解决这个问题,我们提出了一种多层预取算法,可以明智地从磁带预取数据到hdd,从hdd预取数据到ssd。为了评估我们的算法,我们建立了一个分析模型,实验结果表明我们的预取算法提高了混合存储系统的性能。
{"title":"Multi-layer Prefetching for Hybrid Storage Systems: Algorithms, Models, and Evaluations","authors":"Mais Nijim, Ziliang Zong, X. Qin, Y. Nijim","doi":"10.1109/ICPPW.2010.18","DOIUrl":"https://doi.org/10.1109/ICPPW.2010.18","url":null,"abstract":"Parallel storage systems have been highly scalable and widely used in support of data-intensive applications. In future systems with the nature of massive data processing and storing, hybrid storage systems opt for a solution to fulfill a variety of demands such as large storage capacity, high I/O performance and low cost. Hybrid storage systems (HSS) contain both high-end storage components (e.g. solid-state disks and hard disk drives) to guarantee performance, and low-end storage components (e.g. tapes) to reduce cost. In HSS, transferring data back and forth among solid-state disks (SSDs), hard disk drives (HDDs), and tapes plays a critical role in achieving high I/O performance. Prefetching is a promising solution to reduce the latency of data transferring in HSS. However, prefetching in the context of HSS is technically challenging due to an interesting dilemma: aggressive prefetching is required to efficiently reduce I/O latency, whereas overaggressive prefetching may waste I/O bandwidth by transferring useless data from HDDs to SSDs or from tapes to HDDs. To address this problem, we propose a multi-layer prefetching algorithm that can judiciously prefetch data from tapes to HDDs and from HDDs to SSDs. To evaluate our algorithm, we develop an analytical model and the experimental results reveal that our prefetching algorithm improves the performance in hybrid storage systems.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122728952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Implementation and Performance Evaluation of XcalableMP: A Parallel Programming Language for Distributed Memory Systems 分布式存储系统并行编程语言XcalableMP的实现与性能评价
Pub Date : 2010-09-13 DOI: 10.1109/ICPPW.2010.62
Jinpil Lee, M. Sato
Although MPI is a de-facto standard for parallel programming on distributed memory systems, writing MPI programs is often a time-consuming and complicated process. XcalableMP is a language extension of C and Fortran for parallel programming on distributed memory systems that helps users to reduce those programming efforts. XcalableMP provides two programming models. The first one is the global view model, which supports typical parallelization based on the data and task parallel paradigm, and enables parallelizing the original sequential code using minimal modification with simple, OpenMP-like directives. The other one is the local view model, which allows using CAF-like expressions to describe inter-node communication. Users can even use MPI and OpenMP explicitly in our language to optimize performance explicitly. In this paper, we introduce XcalableMP, the implementation of the compiler, and the performance evaluation result. For the performance evaluation, we parallelized HPCC Benchmark in XcalableMP. It shows that users can describe the parallelization for distributed memory system with a small modification to the original sequential code.
尽管MPI是分布式内存系统上并行编程的事实上的标准,但编写MPI程序通常是一个耗时且复杂的过程。XcalableMP是C和Fortran的一种语言扩展,用于在分布式内存系统上进行并行编程,可以帮助用户减少编程工作。XcalableMP提供了两种编程模型。第一个是全局视图模型,它支持基于数据和任务并行范式的典型并行化,并允许使用简单的、类似openmp的指令进行最小的修改来并行化原始顺序代码。另一个是本地视图模型,它允许使用类似于ca的表达式来描述节点间通信。用户甚至可以用我们的语言显式地使用MPI和OpenMP来显式地优化性能。本文介绍了XcalableMP编译器的实现和性能评估结果。为了进行性能评估,我们在XcalableMP中并行化了HPCC Benchmark。结果表明,用户只需对原始顺序代码进行少量修改,就可以描述分布式存储系统的并行化。
{"title":"Implementation and Performance Evaluation of XcalableMP: A Parallel Programming Language for Distributed Memory Systems","authors":"Jinpil Lee, M. Sato","doi":"10.1109/ICPPW.2010.62","DOIUrl":"https://doi.org/10.1109/ICPPW.2010.62","url":null,"abstract":"Although MPI is a de-facto standard for parallel programming on distributed memory systems, writing MPI programs is often a time-consuming and complicated process. XcalableMP is a language extension of C and Fortran for parallel programming on distributed memory systems that helps users to reduce those programming efforts. XcalableMP provides two programming models. The first one is the global view model, which supports typical parallelization based on the data and task parallel paradigm, and enables parallelizing the original sequential code using minimal modification with simple, OpenMP-like directives. The other one is the local view model, which allows using CAF-like expressions to describe inter-node communication. Users can even use MPI and OpenMP explicitly in our language to optimize performance explicitly. In this paper, we introduce XcalableMP, the implementation of the compiler, and the performance evaluation result. For the performance evaluation, we parallelized HPCC Benchmark in XcalableMP. It shows that users can describe the parallelization for distributed memory system with a small modification to the original sequential code.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129998195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
期刊
2010 39th International Conference on Parallel Processing Workshops
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1