首页 > 最新文献

2008 IEEE International Symposium on Parallel and Distributed Processing with Applications最新文献

英文 中文
DMA Performance Analysis and Multi-core Memory Optimization for SWIM Benchmark on the Cell Processor Cell处理器上基于SWIM基准的DMA性能分析和多核内存优化
Y. Dou, Lin Deng, Jinhui Xu, Yi Zheng
The Cell processor is a typical heterogeneous multi-core processor, which owns powerful computing capability. But we are facing the challenges of 'memory wall' in developing parallel applications, such as, limited capacity of local memory, limited memory bandwidth for multi-cores and the long latency for data communication. The DMA transfer mechanism is often used to hide the long latency and improve the effective usage of memory bandwidth. In the paper, we start with a series of DMA experimental tests in the context of the Cell processor architecture, and perform mathematical analysis to setup a unified formula on the average bandwidth of DMA by means of exponential fitting, which describes that SPE amount and DMA block size take main effects on DMA bandwidth in quantity. With the supports of the DMA performance formula, we perform 4 types of memory optimization in the process of parallelizing the SWIM benchmark program into a multi-core version. We take Sony PlayStation 3 (PS3) as our test-bed. For SWIM benchmark, with 6 SPE cores, we obtain over 13 times of speedup compared to single PPE, and 3.3 to 6.18 times to AMD and Intel CPU.
Cell处理器是一种典型的异构多核处理器,具有强大的计算能力。但是,在开发并行应用程序时,我们面临着“内存墙”的挑战,例如本地内存容量有限,多核内存带宽有限以及数据通信的长延迟。DMA传输机制通常用于隐藏长延迟和提高内存带宽的有效利用率。本文从在Cell处理器架构下的一系列DMA实验测试入手,通过数学分析,采用指数拟合的方法建立了DMA平均带宽的统一公式,说明SPE量和DMA块大小在数量上对DMA带宽有主要影响。在DMA性能公式的支持下,我们在将SWIM基准程序并行化成多核版本的过程中进行了4种类型的内存优化。我们以索尼PlayStation 3 (PS3)作为我们的测试平台。对于SWIM基准测试,使用6个SPE内核,与单个PPE相比,我们获得了超过13倍的加速,与AMD和Intel CPU相比,我们获得了3.3到6.18倍的加速。
{"title":"DMA Performance Analysis and Multi-core Memory Optimization for SWIM Benchmark on the Cell Processor","authors":"Y. Dou, Lin Deng, Jinhui Xu, Yi Zheng","doi":"10.1109/ISPA.2008.54","DOIUrl":"https://doi.org/10.1109/ISPA.2008.54","url":null,"abstract":"The Cell processor is a typical heterogeneous multi-core processor, which owns powerful computing capability. But we are facing the challenges of 'memory wall' in developing parallel applications, such as, limited capacity of local memory, limited memory bandwidth for multi-cores and the long latency for data communication. The DMA transfer mechanism is often used to hide the long latency and improve the effective usage of memory bandwidth. In the paper, we start with a series of DMA experimental tests in the context of the Cell processor architecture, and perform mathematical analysis to setup a unified formula on the average bandwidth of DMA by means of exponential fitting, which describes that SPE amount and DMA block size take main effects on DMA bandwidth in quantity. With the supports of the DMA performance formula, we perform 4 types of memory optimization in the process of parallelizing the SWIM benchmark program into a multi-core version. We take Sony PlayStation 3 (PS3) as our test-bed. For SWIM benchmark, with 6 SPE cores, we obtain over 13 times of speedup compared to single PPE, and 3.3 to 6.18 times to AMD and Intel CPU.","PeriodicalId":345341,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123986296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Computation with Energy-Time Trade-Offs: Models, Algorithms and Lower-Bounds 计算与能量-时间的权衡:模型,算法和下限
B. Bingham, M. Greenstreet
Power consumption has become one of the most critical concerns for processor design. This motivates designing algorithms for minimum execution time subject to energy constraints. We propose simple models for analysing algorithms that reflect the energy-time trade-offs of CMOS circuits. Using these models, we derive lower bounds for the energy-constrained execution time of sorting, addition and multiplication, and we present algorithms that meet these bounds. We show that minimizing time under energy constraints is not the same as minimizing operation count or computation depth.
功耗已经成为处理器设计中最重要的问题之一。这激发了在能量约束下设计最小执行时间的算法。我们提出了简单的模型来分析反映CMOS电路能量时间权衡的算法。利用这些模型,我们推导了排序、加法和乘法的能量约束执行时间的下界,并给出了满足这些边界的算法。我们表明,在能量约束下最小化时间并不等同于最小化操作次数或计算深度。
{"title":"Computation with Energy-Time Trade-Offs: Models, Algorithms and Lower-Bounds","authors":"B. Bingham, M. Greenstreet","doi":"10.1109/ISPA.2008.127","DOIUrl":"https://doi.org/10.1109/ISPA.2008.127","url":null,"abstract":"Power consumption has become one of the most critical concerns for processor design. This motivates designing algorithms for minimum execution time subject to energy constraints. We propose simple models for analysing algorithms that reflect the energy-time trade-offs of CMOS circuits. Using these models, we derive lower bounds for the energy-constrained execution time of sorting, addition and multiplication, and we present algorithms that meet these bounds. We show that minimizing time under energy constraints is not the same as minimizing operation count or computation depth.","PeriodicalId":345341,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125437323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Dynamic Reconfigurable Task Schedule Support towards a Reflective Middleware for Sensor Network 面向传感器网络反射中间件的动态可重构任务调度支持
E. P. Freitas, A. Binotto, C. Pereira, A. Stork, Tony Larsson
Sensor networks are being applied in several emerging sophisticated applications due to the use of powerful and high-quality sensor nodes, such as radars and visible light cameras. However, these nodes need additional features to optimally benefit from heterogeneous modern computing platforms. Therefore, reconfigurable computing is a potential paradigm for those scenarios as it can provide flexibility to explore the computational resources on that kind of high performance computing system. This paper presents a reconfigurable sensor node allocation support, based on application requirements, provided by a middleware focused on heterogeneous sensor networks. In order to address this concern, an aspect-orientation paradigm and intelligent agents approach is proposed followed by an UAV case study.
由于使用强大和高质量的传感器节点,例如雷达和可见光摄像机,传感器网络正在应用于一些新兴的复杂应用中。然而,这些节点需要额外的特性才能从异构的现代计算平台中获得最佳收益。因此,可重构计算是这些场景的一个潜在范例,因为它可以提供在这种高性能计算系统上探索计算资源的灵活性。本文提出了一种基于应用需求的可重构传感器节点分配支持,该支持由一种异构传感器网络中间件提供。为了解决这一问题,提出了面向方面的范式和智能代理方法,并对无人机进行了案例研究。
{"title":"Dynamic Reconfigurable Task Schedule Support towards a Reflective Middleware for Sensor Network","authors":"E. P. Freitas, A. Binotto, C. Pereira, A. Stork, Tony Larsson","doi":"10.1109/ISPA.2008.70","DOIUrl":"https://doi.org/10.1109/ISPA.2008.70","url":null,"abstract":"Sensor networks are being applied in several emerging sophisticated applications due to the use of powerful and high-quality sensor nodes, such as radars and visible light cameras. However, these nodes need additional features to optimally benefit from heterogeneous modern computing platforms. Therefore, reconfigurable computing is a potential paradigm for those scenarios as it can provide flexibility to explore the computational resources on that kind of high performance computing system. This paper presents a reconfigurable sensor node allocation support, based on application requirements, provided by a middleware focused on heterogeneous sensor networks. In order to address this concern, an aspect-orientation paradigm and intelligent agents approach is proposed followed by an UAV case study.","PeriodicalId":345341,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116400141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Novel QoS-Enable Real-Time Publish-Subscribe Service 一种新的支持qos的实时发布-订阅服务
Xinjie Lv, Tian Yang, Zaifei Liao, Xin Li, Yongyan Wang, W. Liu, Hongan Wang
Complex distributed real-time applications require complicated processing and sharing of an extensive amount of data under critical timing constraints. In this paper, we present a comprehensive overview of the Data Distribution Service standard (DDS) and describe its QoS (Quality of Service) features for developing real-time applications. Real-time ECA (RECA) rules are introduced to efficiently describe QoS policy in an active real-time database (ARTDB) named Agilor. And then we propose a novel QoS-Enable Real-Time Publish-Subscribe (QERTPS) service compatible to DDS for distributed real-time data acquisition. QERTPS could support several different QoS levels for various applications at the same time. Furthermore, QERTPS is implemented by object models and RECA rules in Agilor. To illustrate the benefits of QERTPS for real-time data acquisition, an example application is presented. Experimental evaluation shows that the proposed service provides a stable and timely service for providing different QoS levels.
复杂的分布式实时应用程序需要在关键的时间限制下对大量数据进行复杂的处理和共享。在本文中,我们全面概述了数据分发服务标准(DDS),并描述了其用于开发实时应用的QoS(服务质量)特性。为了有效地描述动态实时数据库Agilor中的QoS策略,引入了实时ECA (RECA)规则。在此基础上,提出了一种兼容DDS的基于qos的实时发布订阅(QERTPS)服务,用于分布式实时数据采集。QERTPS可以同时为各种应用程序支持几种不同的QoS级别。在Agilor中,通过对象模型和RECA规则实现QERTPS。为了说明QERTPS对实时数据采集的好处,给出了一个示例应用程序。实验评估表明,该服务能够提供稳定、及时的服务,提供不同的QoS级别。
{"title":"A Novel QoS-Enable Real-Time Publish-Subscribe Service","authors":"Xinjie Lv, Tian Yang, Zaifei Liao, Xin Li, Yongyan Wang, W. Liu, Hongan Wang","doi":"10.1109/ISPA.2008.61","DOIUrl":"https://doi.org/10.1109/ISPA.2008.61","url":null,"abstract":"Complex distributed real-time applications require complicated processing and sharing of an extensive amount of data under critical timing constraints. In this paper, we present a comprehensive overview of the Data Distribution Service standard (DDS) and describe its QoS (Quality of Service) features for developing real-time applications. Real-time ECA (RECA) rules are introduced to efficiently describe QoS policy in an active real-time database (ARTDB) named Agilor. And then we propose a novel QoS-Enable Real-Time Publish-Subscribe (QERTPS) service compatible to DDS for distributed real-time data acquisition. QERTPS could support several different QoS levels for various applications at the same time. Furthermore, QERTPS is implemented by object models and RECA rules in Agilor. To illustrate the benefits of QERTPS for real-time data acquisition, an example application is presented. Experimental evaluation shows that the proposed service provides a stable and timely service for providing different QoS levels.","PeriodicalId":345341,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"81 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134012491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A Tool for the Semiautomatic Acquisition of the Morphological Data of Blood Vessel Networks 血管网络形态数据的半自动采集工具
M. Cannataro, P. Guzzi, G. Tradigo, P. Veltri
The simulation of the dynamics of the blood flow in the venous system of the lower limb is an important tool for supporting clinical research and for suggesting possible treatments for many diseases, e.g. for enhancing the surgical treatment of chronic venous insufficiency (CVI). Nevertheless the accuracy of the simulation of the blood flow is strictly related to the morphological data characterizing the investigated venous system. Although some of these data can be extracted from the observation of the real blood flow of a patient, e.g. through the acquisition of a set of images, the extraction of such values is often performed in a manual way, so the need for the automatic induction of parameters arises. The paper presents a software module that allows the semiautomatic acquisition of the morphological data of the venous system of a patient. The tool, developed as a plugin of the ImageJ imaging platform, receives in input a DICOM file containing the computerized tomography (CT) of the vessels network of the lower limb, and produces in a semi-automatic way a weighted graph of the network. This model can be used as the input for a subsequent simulation of the system.
下肢静脉系统血流动力学的模拟是支持临床研究和为许多疾病提供可能的治疗方法的重要工具,例如加强慢性静脉功能不全(CVI)的外科治疗。然而,血流模拟的准确性与所研究的静脉系统的形态学数据密切相关。虽然其中一些数据可以通过观察患者的真实血流来提取,例如通过获取一组图像,但这些值的提取通常是通过手动方式进行的,因此需要自动诱导参数。本文提出了一个软件模块,允许半自动采集病人静脉系统的形态数据。该工具作为ImageJ成像平台的插件开发,在输入中接收包含下肢血管网络的计算机断层扫描(CT)的DICOM文件,并以半自动方式生成网络的加权图。该模型可用作系统后续仿真的输入。
{"title":"A Tool for the Semiautomatic Acquisition of the Morphological Data of Blood Vessel Networks","authors":"M. Cannataro, P. Guzzi, G. Tradigo, P. Veltri","doi":"10.1109/ISPA.2008.120","DOIUrl":"https://doi.org/10.1109/ISPA.2008.120","url":null,"abstract":"The simulation of the dynamics of the blood flow in the venous system of the lower limb is an important tool for supporting clinical research and for suggesting possible treatments for many diseases, e.g. for enhancing the surgical treatment of chronic venous insufficiency (CVI). Nevertheless the accuracy of the simulation of the blood flow is strictly related to the morphological data characterizing the investigated venous system. Although some of these data can be extracted from the observation of the real blood flow of a patient, e.g. through the acquisition of a set of images, the extraction of such values is often performed in a manual way, so the need for the automatic induction of parameters arises. The paper presents a software module that allows the semiautomatic acquisition of the morphological data of the venous system of a patient. The tool, developed as a plugin of the ImageJ imaging platform, receives in input a DICOM file containing the computerized tomography (CT) of the vessels network of the lower limb, and produces in a semi-automatic way a weighted graph of the network. This model can be used as the input for a subsequent simulation of the system.","PeriodicalId":345341,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129647847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An Analysis of Exclusive Control Mechanisms 排他性控制机制分析
Kazuaki Masamoto, Takaichi Yoshida
Spin-lock is commonly used for process serialization, where it works well for multi processor systems. Under some conditions however, it may cause an unexpected increase of CPU overhead. To address this problem a simulator has been developed which evaluates various algorithms of the lock/unlock process to determine a method to minimize their affect on the stability and scalability of a system. This paper analyzes the effect of spin-lock, how CPU overhead changes as a function of traffic, by using the simulator. It demonstrates how multiple processors go into ldquobusy waitrdquo, consuming CPU time, and working for nothing, with only a few processors able to advance. The paper also shows how this problem can be solved by capped spin-lock where the spin is capped to a certain limit and a pause is inserted between the spins avoiding unnecessary consumption of CPU power and maintaining scalability over a number of processors.
自旋锁通常用于进程序列化,它在多处理器系统中工作得很好。但是,在某些情况下,它可能会导致CPU开销的意外增加。为了解决这个问题,我们开发了一个模拟器来评估锁/解锁过程的各种算法,以确定一种方法来最小化它们对系统稳定性和可扩展性的影响。本文通过仿真分析了自旋锁的影响,以及CPU开销随流量的变化情况。它演示了多个处理器如何进入冗长的等待状态,消耗CPU时间,并且不做任何工作,只有少数处理器能够前进。本文还展示了如何通过封顶自旋锁来解决这个问题,其中自旋被封顶到一定限度,并在自旋之间插入暂停,避免了不必要的CPU功耗消耗,并保持了多个处理器之间的可伸缩性。
{"title":"An Analysis of Exclusive Control Mechanisms","authors":"Kazuaki Masamoto, Takaichi Yoshida","doi":"10.1109/ISPA.2008.101","DOIUrl":"https://doi.org/10.1109/ISPA.2008.101","url":null,"abstract":"Spin-lock is commonly used for process serialization, where it works well for multi processor systems. Under some conditions however, it may cause an unexpected increase of CPU overhead. To address this problem a simulator has been developed which evaluates various algorithms of the lock/unlock process to determine a method to minimize their affect on the stability and scalability of a system. This paper analyzes the effect of spin-lock, how CPU overhead changes as a function of traffic, by using the simulator. It demonstrates how multiple processors go into ldquobusy waitrdquo, consuming CPU time, and working for nothing, with only a few processors able to advance. The paper also shows how this problem can be solved by capped spin-lock where the spin is capped to a certain limit and a pause is inserted between the spins avoiding unnecessary consumption of CPU power and maintaining scalability over a number of processors.","PeriodicalId":345341,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"180 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114946223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Aggregation and Analysis: A Grid-Based Approach for Medicine and Biology 数据聚合和分析:基于网格的医学和生物学方法
D. Kyriazis, K. Tserpes, George Kousiouris, A. Menychtas, G. Katsaros, T. Varvarigou
A constantly increasing number of applications from various scientific sectors are finding their way towards adopting grid technologies in order to take advantage of their capabilities: the advent of grid environments made feasible the solution of computational intensive problems in a reliable and cost-effective way. In this paper we present a grid-based approach for aggregation of data that are obtained from various sources (e.g. cameras, sensors) and their analysis with the use of genetic algorithms. By also taking into consideration general historical data and patient-specific medical information, we present the realization of the proposed approach with an application scenario for personalized healthcare and medicine.
越来越多来自不同科学领域的应用正在寻找采用网格技术的方法,以便利用网格技术的能力:网格环境的出现使得以可靠和经济有效的方式解决计算密集型问题成为可能。在本文中,我们提出了一种基于网格的方法,用于从各种来源(例如相机,传感器)获得的数据的聚合及其使用遗传算法的分析。通过考虑一般的历史数据和患者特定的医疗信息,我们提出了一个个性化医疗和医疗的应用场景实现所提出的方法。
{"title":"Data Aggregation and Analysis: A Grid-Based Approach for Medicine and Biology","authors":"D. Kyriazis, K. Tserpes, George Kousiouris, A. Menychtas, G. Katsaros, T. Varvarigou","doi":"10.1109/ISPA.2008.34","DOIUrl":"https://doi.org/10.1109/ISPA.2008.34","url":null,"abstract":"A constantly increasing number of applications from various scientific sectors are finding their way towards adopting grid technologies in order to take advantage of their capabilities: the advent of grid environments made feasible the solution of computational intensive problems in a reliable and cost-effective way. In this paper we present a grid-based approach for aggregation of data that are obtained from various sources (e.g. cameras, sensors) and their analysis with the use of genetic algorithms. By also taking into consideration general historical data and patient-specific medical information, we present the realization of the proposed approach with an application scenario for personalized healthcare and medicine.","PeriodicalId":345341,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124725627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Parallelization with Automatic Parallelizing Compiler Generating Consumer Electronics Multicore API 使用自动并行编译器生成消费电子多核API的并行化
Takamichi Miyamoto, Saori Asaka, Hiroki Mikami, M. Mase, Y. Wada, H. Nakano, K. Kimura, H. Kasahara
Multicore processors have been adopted for consumer electronics like portable electronics, mobile phones, car navigation systems, digital TVs and games to obtain high performance with low power consumption. The OSCAR automatic parallelizing compiler has been developed to utilize these multicores easily. Also, a new consumer electronics multicore application program interface (API) to use the OSCAR compiler with native sequential compilers for various kinds of multicores from different vendors has been developed in NEDO (New Energy and Industrial Technology Development Organization) "Multicore Technology for Realtime Consumer Electronics" project with Japanese 6 IT companies. This paper evaluates the parallel processing performance of multimedia applications using this API by the OSCAR compiler on the FR1000 4 VLIW cores multicore processor developed by Fujitsu Ltd, and the RP1 4 SH-4A cores multicore processor jointly-developed by Renesas Technology Corp., Hitachi Ltd. and Waseda University. As the results, the parallel codes generated by the OSCAR compiler using the API give us 3.27 times speedup on average using 4 cores against 1 core on the FR1000 multicore, and 3.31 times speedup on average using 4 cores against 1 core on the RP1 multicore.
多核处理器已被用于便携式电子产品、移动电话、汽车导航系统、数字电视和游戏等消费电子产品,以获得低功耗的高性能。OSCAR自动并行编译器是为了方便地利用这些多核而开发的。此外,NEDO(新能源和工业技术开发组织)还开发了一种新的消费类电子产品多核应用程序接口(API),用于将OSCAR编译器与来自不同供应商的各种多核的本机顺序编译器一起使用。与日本6家IT公司合作的“实时消费电子产品的多核技术”项目。本文在富士通公司开发的FR1000 4 VLIW核多核处理器和瑞萨科技公司、日立公司和早稻田大学联合开发的RP1 4 SH-4A核多核处理器上,利用OSCAR编译器对基于该API的多媒体应用程序并行处理性能进行了评估。结果,OSCAR编译器使用API生成的并行代码在FR1000多核上使用4核比1核平均加速3.27倍,在RP1多核上使用4核比1核平均加速3.31倍。
{"title":"Parallelization with Automatic Parallelizing Compiler Generating Consumer Electronics Multicore API","authors":"Takamichi Miyamoto, Saori Asaka, Hiroki Mikami, M. Mase, Y. Wada, H. Nakano, K. Kimura, H. Kasahara","doi":"10.1109/ISPA.2008.58","DOIUrl":"https://doi.org/10.1109/ISPA.2008.58","url":null,"abstract":"Multicore processors have been adopted for consumer electronics like portable electronics, mobile phones, car navigation systems, digital TVs and games to obtain high performance with low power consumption. The OSCAR automatic parallelizing compiler has been developed to utilize these multicores easily. Also, a new consumer electronics multicore application program interface (API) to use the OSCAR compiler with native sequential compilers for various kinds of multicores from different vendors has been developed in NEDO (New Energy and Industrial Technology Development Organization) \"Multicore Technology for Realtime Consumer Electronics\" project with Japanese 6 IT companies. This paper evaluates the parallel processing performance of multimedia applications using this API by the OSCAR compiler on the FR1000 4 VLIW cores multicore processor developed by Fujitsu Ltd, and the RP1 4 SH-4A cores multicore processor jointly-developed by Renesas Technology Corp., Hitachi Ltd. and Waseda University. As the results, the parallel codes generated by the OSCAR compiler using the API give us 3.27 times speedup on average using 4 cores against 1 core on the FR1000 multicore, and 3.31 times speedup on average using 4 cores against 1 core on the RP1 multicore.","PeriodicalId":345341,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116921804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Broadcasting in Weighted-Vertex Graphs 加权顶点图中的广播
Hovhannes A. Harutyunyan, Shahin Kamali
In this paper a new model for information dissemination in communication network is presented. The model is defined on networks in which nodes are assigned some weights representing the internal delay they should pass before sending data to their neighbors. The new model, called weighted-vertex model, comes to have real world applications in parallel computation and satellite terrestrial networks. As a generalization of the classical model, optimum broadcasting in weighted-vertex model is NP_Hard. The problem remains NP_Hard in some classes of weighed-vertex graphs. We show existence of approximation algorithms for the broadcasting problem in weighted vertex model, as well as better approximations for specific subclasses of weighted graphs.
本文提出了一种新的通信网络信息传播模型。该模型是在网络上定义的,其中节点被赋予一些权重,表示它们在向邻居发送数据之前应该通过的内部延迟。这种被称为加权顶点模型的新模型在并行计算和卫星地面网络中得到了实际应用。作为经典模型的推广,加权顶点模型的最优广播是NP_Hard。在某些加权顶点图中,问题仍然是NP_Hard。我们证明了在加权顶点模型中广播问题的近似算法的存在性,以及加权图的特定子类的更好的近似。
{"title":"Broadcasting in Weighted-Vertex Graphs","authors":"Hovhannes A. Harutyunyan, Shahin Kamali","doi":"10.1109/ISPA.2008.95","DOIUrl":"https://doi.org/10.1109/ISPA.2008.95","url":null,"abstract":"In this paper a new model for information dissemination in communication network is presented. The model is defined on networks in which nodes are assigned some weights representing the internal delay they should pass before sending data to their neighbors. The new model, called weighted-vertex model, comes to have real world applications in parallel computation and satellite terrestrial networks. As a generalization of the classical model, optimum broadcasting in weighted-vertex model is NP_Hard. The problem remains NP_Hard in some classes of weighed-vertex graphs. We show existence of approximation algorithms for the broadcasting problem in weighted vertex model, as well as better approximations for specific subclasses of weighted graphs.","PeriodicalId":345341,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121847075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Power-Efficient Architecture of Zigbee Security Processing Zigbee安全处理的节能架构
Jiho Kim, Jungyu Lee, Ohyoung Song
In general, the cryptographic operation in wireless devices which have low memory and low computing power causes the system overhead, so that it badly affects the performance of other tasks. Therefore, it is positively necessary to implement the security hardware which is dedicated to the cryptographic operation. Early researches about the security hardware architectures make design metrics with data throughput, gate usage, and power consumption to demonstrate the efficiency of their architectures. In this paper, we provide an efficient hardware architecture of the security processing for ZigBee, which satisfies the constraints IEEE 802.15.4 standard requires. These requirements mainly consist of the critical response time, the verification delay, and the throughput. In experiments, we implemented the security processing for ZigBee that used fewer logic gates and consumed low power than other earlier ZigBee chips and fulfilled the standard requirements with considerable margins.
一般来说,在内存和计算能力都很低的无线设备中进行加密操作会造成系统开销,从而严重影响其他任务的性能。因此,实现专用于加密操作的安全硬件是非常必要的。早期关于安全硬件体系结构的研究使用数据吞吐量、门的使用和功耗作为设计指标来展示其体系结构的效率。本文提出了一种高效的ZigBee安全处理硬件架构,满足IEEE 802.15.4标准的要求。这些需求主要包括关键响应时间、验证延迟和吞吐量。在实验中,我们实现了ZigBee的安全处理,与其他早期的ZigBee芯片相比,使用更少的逻辑门和更低的功耗,并以可观的利润满足了标准要求。
{"title":"Power-Efficient Architecture of Zigbee Security Processing","authors":"Jiho Kim, Jungyu Lee, Ohyoung Song","doi":"10.1109/ISPA.2008.113","DOIUrl":"https://doi.org/10.1109/ISPA.2008.113","url":null,"abstract":"In general, the cryptographic operation in wireless devices which have low memory and low computing power causes the system overhead, so that it badly affects the performance of other tasks. Therefore, it is positively necessary to implement the security hardware which is dedicated to the cryptographic operation. Early researches about the security hardware architectures make design metrics with data throughput, gate usage, and power consumption to demonstrate the efficiency of their architectures. In this paper, we provide an efficient hardware architecture of the security processing for ZigBee, which satisfies the constraints IEEE 802.15.4 standard requires. These requirements mainly consist of the critical response time, the verification delay, and the throughput. In experiments, we implemented the security processing for ZigBee that used fewer logic gates and consumed low power than other earlier ZigBee chips and fulfilled the standard requirements with considerable margins.","PeriodicalId":345341,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122812999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2008 IEEE International Symposium on Parallel and Distributed Processing with Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1