首页 > 最新文献

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)最新文献

英文 中文
Lessons learned during the implementation of the BVR Wireless Sensor Network protocol on SunSPOTs 在太阳黑子上实施BVR无线传感器网络协议的经验教训
Ralph robert Erdt, M. Gergeleit
The Beacon Vector Routing (BVR) protocol [1] is a well known routing protocol for Wireless Sensor Networks (WSNs). Simulations have shown that the protocol scales well in an environment with perfect links and an ideal circular radio coverage. However, when it comes to an implementation on an embedded hardware that uses IEEE 802.15.4 2.4 GHz wireless transceivers, some problems turn out that have significant impact on the overall performance of the protocol.
信标矢量路由(Beacon Vector Routing, BVR)协议[1]是一种众所周知的无线传感器网络路由协议。仿真结果表明,该协议在具有完美链路和理想圆形无线电覆盖的环境下具有良好的可扩展性。然而,当涉及到使用IEEE 802.15.4 2.4 GHz无线收发器的嵌入式硬件上的实现时,一些问题会对协议的整体性能产生重大影响。
{"title":"Lessons learned during the implementation of the BVR Wireless Sensor Network protocol on SunSPOTs","authors":"Ralph robert Erdt, M. Gergeleit","doi":"10.1109/IPDPSW.2010.5470862","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470862","url":null,"abstract":"The Beacon Vector Routing (BVR) protocol [1] is a well known routing protocol for Wireless Sensor Networks (WSNs). Simulations have shown that the protocol scales well in an environment with perfect links and an ideal circular radio coverage. However, when it comes to an implementation on an embedded hardware that uses IEEE 802.15.4 2.4 GHz wireless transceivers, some problems turn out that have significant impact on the overall performance of the protocol.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130092070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Particle Swarm Optimization to solve the Vehicle Routing Problem with Heterogeneous fleet, Mixed Backhauls, and time windows 粒子群算法求解异构车队、混合回程和时间窗车辆路径问题
Farah Belmecheri, C. Prins, F. Yalaoui, L. Amodeo
Many distribution companies must deliver and pick up goods to satisfy customers. This problem is called the Vehicle Routing Problem with Mixed linehauls and Backhauls (VRPMB) which considers that some goods must be delivered from a depot to linehaul customers, while others must be picked up at backhaul customers to be brought to the depot. This paper studies an enriched version called Heterogeneous fleet VRPMB with Time Windows or HVRPMBTW which has not much been studied in the literature. A Particle Swarm Optimization heuristic (PSO) is proposed to solve this problem. This approach uses and models the social behavior of bird flocking, fish schooling. The adaptation and implementation of PSO search strategy to HVRPMBTW is explained, then the results are compared to previous works (Ant Colony Optimization) and compared also to the high quality solutions obtained by an exact method (solver CPLEX). Good promising results are reported and have shown the effectiveness of the method.
许多分销公司必须配送和提货以满足客户。这个问题被称为线路和回程混合的车辆路线问题(VRPMB),它认为一些货物必须从仓库交付给线路客户,而另一些货物必须在回程客户处取走才能送到仓库。本文研究了一个丰富的版本,称为异构舰队vrpmmb与时间窗口或HVRPMBTW,这是文献中研究不多的。提出了一种粒子群启发式算法(PSO)来解决这一问题。这种方法使用并模拟了鸟群和鱼群的社会行为。说明了PSO搜索策略对HVRPMBTW的适应和实现,然后将结果与先前的研究(蚁群优化)进行了比较,并与精确方法(求解器CPLEX)获得的高质量解进行了比较。结果表明,该方法是有效的。
{"title":"Particle Swarm Optimization to solve the Vehicle Routing Problem with Heterogeneous fleet, Mixed Backhauls, and time windows","authors":"Farah Belmecheri, C. Prins, F. Yalaoui, L. Amodeo","doi":"10.1109/IPDPSW.2010.5470702","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470702","url":null,"abstract":"Many distribution companies must deliver and pick up goods to satisfy customers. This problem is called the Vehicle Routing Problem with Mixed linehauls and Backhauls (VRPMB) which considers that some goods must be delivered from a depot to linehaul customers, while others must be picked up at backhaul customers to be brought to the depot. This paper studies an enriched version called Heterogeneous fleet VRPMB with Time Windows or HVRPMBTW which has not much been studied in the literature. A Particle Swarm Optimization heuristic (PSO) is proposed to solve this problem. This approach uses and models the social behavior of bird flocking, fish schooling. The adaptation and implementation of PSO search strategy to HVRPMBTW is explained, then the results are compared to previous works (Ant Colony Optimization) and compared also to the high quality solutions obtained by an exact method (solver CPLEX). Good promising results are reported and have shown the effectiveness of the method.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134077880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Prototype for a large-scale static timing analyzer running on an IBM Blue Gene 在IBM Blue Gene上运行的大型静态定时分析仪的原型
A. Holder, C. Carothers, Kerim Kalafala
This paper focuses on parallelization of the classic static timing analysis (STA) algorithm for verifying timing characteristics of digital integrated circuits. Given ever-increasing circuit complexities, including the need to analyze circuits with billions of transistors, across potentially thousands of process corners, with accuracy tolerances down to the picosecond range, sequential execution of STA algorithms is quickly becoming a bottleneck to the overall chip design closure process. A message passing based parallel processing technique for performing STA leveraging an IBM Blue Gene/L supercomputing platform is presented. Results are collected for a small industrial 65 nm benchmarking design, where the algorithm demonstrates speedup of nearly 39 times on 64 processors and a peak of 119 times (without partitioning costs, speedup is 263 times) on 1024 processors. With an idealized synthetic circuit, the algorithm demonstrated 259 times speedup, 925 times speedup without partitioning overhead, on 1024 processors. To the best of our knowledge, this is the first result demonstrating scalable STA on the IBM Blue Gene.
本文重点研究了用于验证数字集成电路时序特性的经典静态时序分析算法的并行化。考虑到电路复杂性的不断增加,包括需要分析具有数十亿晶体管的电路,可能跨越数千个工艺角,精度公差低至皮秒范围,STA算法的顺序执行正迅速成为整体芯片设计闭合过程的瓶颈。提出了一种利用IBM Blue Gene/L超级计算平台执行STA的基于消息传递的并行处理技术。结果是针对小型工业65纳米基准测试设计收集的,其中该算法在64个处理器上的加速速度接近39倍,在1024个处理器上的加速速度峰值为119倍(没有分区成本,加速速度为263倍)。在一个理想的合成电路中,该算法在1024个处理器上加速了259倍,在没有分区开销的情况下加速了925倍。据我们所知,这是在IBM Blue Gene上演示可伸缩STA的第一个结果。
{"title":"Prototype for a large-scale static timing analyzer running on an IBM Blue Gene","authors":"A. Holder, C. Carothers, Kerim Kalafala","doi":"10.1109/IPDPSW.2010.5470757","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470757","url":null,"abstract":"This paper focuses on parallelization of the classic static timing analysis (STA) algorithm for verifying timing characteristics of digital integrated circuits. Given ever-increasing circuit complexities, including the need to analyze circuits with billions of transistors, across potentially thousands of process corners, with accuracy tolerances down to the picosecond range, sequential execution of STA algorithms is quickly becoming a bottleneck to the overall chip design closure process. A message passing based parallel processing technique for performing STA leveraging an IBM Blue Gene/L supercomputing platform is presented. Results are collected for a small industrial 65 nm benchmarking design, where the algorithm demonstrates speedup of nearly 39 times on 64 processors and a peak of 119 times (without partitioning costs, speedup is 263 times) on 1024 processors. With an idealized synthetic circuit, the algorithm demonstrated 259 times speedup, 925 times speedup without partitioning overhead, on 1024 processors. To the best of our knowledge, this is the first result demonstrating scalable STA on the IBM Blue Gene.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134246880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
High precision integer multiplication with a graphics processing unit 高精度整数乘法与图形处理单元
Niall Emmart, C. Weems
In this paper we evaluate the potential for using an NVIDIA graphics processing unit (GPU) to accelerate high precision integer multiplication. The reported peak vector performance for a typical GPU appears to offer considerable potential for accelerating such a regular computation. Because of limitations in the on-chip memory, the high cost of kernel launches, and the particular nature of the architecture's support for parallelism, we found it necessary to use a hybrid algorithmic approach to obtain good performance. On the GPU itself we use an adaptation of the Strassen FFT algorithm to multiply 32KB chunks, while on the CPU we adapt the Karatsuba divide-and-conquer approach to optimize the application of the GPU's partial multiplies, which are viewed as “digits” by our implementation of Karatsuba. Even with this approach, the result is at best a modest increase in performance, compared with executing the same multiplication using the GMP package on a CPU at a comparable technology node. We identify the sources of this lackluster performance and discuss the likely impact of planned advances in GPU architecture.
在本文中,我们评估了使用NVIDIA图形处理单元(GPU)加速高精度整数乘法的潜力。据报道,典型GPU的峰值矢量性能似乎为加速这种常规计算提供了相当大的潜力。由于片上内存的限制、内核启动的高成本以及体系结构对并行性支持的特殊性质,我们发现有必要使用混合算法方法来获得良好的性能。在GPU本身,我们使用Strassen FFT算法来乘32KB的块,而在CPU上,我们采用Karatsuba分治方法来优化GPU的部分乘法的应用,这些部分乘法被我们的Karatsuba实现视为“数字”。即使使用这种方法,与在类似技术节点的CPU上使用GMP包执行相同的乘法相比,结果充其量也只是性能上的适度提高。我们确定了这种低迷性能的来源,并讨论了GPU架构计划进展的可能影响。
{"title":"High precision integer multiplication with a graphics processing unit","authors":"Niall Emmart, C. Weems","doi":"10.1109/IPDPSW.2010.5470814","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470814","url":null,"abstract":"In this paper we evaluate the potential for using an NVIDIA graphics processing unit (GPU) to accelerate high precision integer multiplication. The reported peak vector performance for a typical GPU appears to offer considerable potential for accelerating such a regular computation. Because of limitations in the on-chip memory, the high cost of kernel launches, and the particular nature of the architecture's support for parallelism, we found it necessary to use a hybrid algorithmic approach to obtain good performance. On the GPU itself we use an adaptation of the Strassen FFT algorithm to multiply 32KB chunks, while on the CPU we adapt the Karatsuba divide-and-conquer approach to optimize the application of the GPU's partial multiplies, which are viewed as “digits” by our implementation of Karatsuba. Even with this approach, the result is at best a modest increase in performance, compared with executing the same multiplication using the GMP package on a CPU at a comparable technology node. We identify the sources of this lackluster performance and discuss the likely impact of planned advances in GPU architecture.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134171643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Modeling bounds on migration overhead for a traveling thread architecture 对移动线程架构的迁移开销进行建模
P. Fratta, P. Kogge
Heterogeneous multicore architectures have gained widespread use in the general purpose and scientific computing communities, and architects continue to investigate techniques for easing the burden of parallelization from the programmer. This paper presents a new class of heterogeneous multicores that leverages past work in architectures supporting the execution of traveling threads. These traveling threads execute on simple cores distributed across the chip and can move up the hierarchy and between cores based on data locality. This new design offers the benefits of improved performance at lower energy and power density than centralized counterparts through intelligent data placement and cooperative caching policies. We employ a methodology consisting of mathematical modeling and simulation to estimate the upper bounds on migration overhead for various architectural organizations. Results illustrate that the new architecture can match the performance of a conventional processor with reasonable thread sizes. We have observed that between 0.04 and 7.09 instructions per migration (IPM) (1.88 IPM on average) are sufficient to match the performance of the conventional processor. These results confirm that this distributed architecture and corresponding execution model offer promising potential in overcoming the design challenges of centralized counterparts.
异构多核体系结构在通用和科学计算社区中得到了广泛的应用,架构师们继续研究减轻程序员并行化负担的技术。本文提出了一类新的异构多核,它利用了过去支持流动线程执行的体系结构中的工作。这些移动线程在分布在整个芯片上的简单内核上执行,并且可以根据数据位置在层次结构和内核之间移动。这种新设计通过智能数据放置和协作缓存策略,在更低的能量和功率密度下提供了比集中式对等物更高的性能。我们采用一种由数学建模和模拟组成的方法来估计各种架构组织的迁移开销的上限。结果表明,在合理的线程大小下,新架构可以匹配传统处理器的性能。我们已经观察到,每次迁移(IPM) 0.04到7.09条指令(平均1.88条IPM)足以匹配传统处理器的性能。这些结果证实,这种分布式体系结构和相应的执行模型在克服集中式对等体的设计挑战方面具有很大的潜力。
{"title":"Modeling bounds on migration overhead for a traveling thread architecture","authors":"P. Fratta, P. Kogge","doi":"10.1109/IPDPSW.2010.5470686","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470686","url":null,"abstract":"Heterogeneous multicore architectures have gained widespread use in the general purpose and scientific computing communities, and architects continue to investigate techniques for easing the burden of parallelization from the programmer. This paper presents a new class of heterogeneous multicores that leverages past work in architectures supporting the execution of traveling threads. These traveling threads execute on simple cores distributed across the chip and can move up the hierarchy and between cores based on data locality. This new design offers the benefits of improved performance at lower energy and power density than centralized counterparts through intelligent data placement and cooperative caching policies. We employ a methodology consisting of mathematical modeling and simulation to estimate the upper bounds on migration overhead for various architectural organizations. Results illustrate that the new architecture can match the performance of a conventional processor with reasonable thread sizes. We have observed that between 0.04 and 7.09 instructions per migration (IPM) (1.88 IPM on average) are sufficient to match the performance of the conventional processor. These results confirm that this distributed architecture and corresponding execution model offer promising potential in overcoming the design challenges of centralized counterparts.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132901599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Randomized self-stabilizing leader election in preference-based anonymous trees 基于偏好的匿名树随机自稳定领导选举
Daniel Fajardo-Delgado, José Alberto Fernández-Zepeda, A. Bourgeois
The performance of processors in a distributed system can be measured by parameters such as bandwidth, storage capacity, work capability, reliability, manufacture technology, years of usage, among others. An algorithm using a preference-based approach uses these parameters to make decisions. In this paper we introduce a randomized self-stabilizing leader election algorithm for preference-based anonymous trees. Our algorithm uses the preference of the processors as criteria to select a leader under symmetric or non-symmetric configurations. It is partially inspired on Xu and Srimani's algorithm, but we use a distributed daemon and randomization to break symmetry. We prove that our algorithm has an optimal average complexity time and performed simulations to verify our results.
分布式系统中处理器的性能可以通过带宽、存储容量、工作能力、可靠性、制造技术、使用年限等参数来衡量。使用基于偏好的方法的算法使用这些参数来做出决策。本文介绍了一种基于偏好的匿名树的随机自稳定领导者选举算法。我们的算法使用处理器的偏好作为标准来选择对称或非对称配置下的领导者。它部分受到Xu和Srimani算法的启发,但我们使用分布式守护进程和随机化来打破对称性。我们证明了我们的算法具有最优的平均复杂度时间,并通过仿真验证了我们的结果。
{"title":"Randomized self-stabilizing leader election in preference-based anonymous trees","authors":"Daniel Fajardo-Delgado, José Alberto Fernández-Zepeda, A. Bourgeois","doi":"10.1142/S0129054112400394","DOIUrl":"https://doi.org/10.1142/S0129054112400394","url":null,"abstract":"The performance of processors in a distributed system can be measured by parameters such as bandwidth, storage capacity, work capability, reliability, manufacture technology, years of usage, among others. An algorithm using a preference-based approach uses these parameters to make decisions. In this paper we introduce a randomized self-stabilizing leader election algorithm for preference-based anonymous trees. Our algorithm uses the preference of the processors as criteria to select a leader under symmetric or non-symmetric configurations. It is partially inspired on Xu and Srimani's algorithm, but we use a distributed daemon and randomization to break symmetry. We prove that our algorithm has an optimal average complexity time and performed simulations to verify our results.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133002581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Flexible IP cores for the k-NN classification problem and their FPGA implementation 用于k-NN分类问题的柔性IP核及其FPGA实现
E. Manolakos, I. Stamoulias
The k-nearest neighbor (k-NN) is a popular non-parametric benchmark classification algorithm to which new classifiers are usually compared. It is used in numerous applications, some of which may involve thousands of data vectors in a possibly very high dimensional feature space. For real-time classification a hardware implementation of the algorithm can deliver high performance gains by exploiting parallel processing and block pipelining. We present two different linear array architectures that have been described as soft parameterized IP cores in VHDL. The IP cores are used to synthesize and evaluate a variety of array architectures for a different k-NN problem instances and Xilinx FPGAs. It is shown that we can solve efficiently, using a medium size FPGA device, very large size classification problems, with thousands of reference data vectors or vector dimensions, while achieving very high throughput. To the best of our knowledge, this is the first effort to design flexible IP cores for the FPGA implementation of the widely used k-NN classifier.
k近邻(k-NN)是一种流行的非参数基准分类算法,新分类器通常与之比较。它被用于许多应用中,其中一些应用可能在可能非常高维的特征空间中涉及数千个数据向量。对于实时分类,该算法的硬件实现可以通过利用并行处理和块流水线来获得高性能。我们提出了两种不同的线性阵列架构,它们在VHDL中被描述为软参数化IP核。IP核用于综合和评估不同k-NN问题实例和赛灵思fpga的各种阵列架构。结果表明,我们可以使用一个中等大小的FPGA器件,高效地解决具有数千个参考数据向量或向量维数的超大规模分类问题,同时实现非常高的吞吐量。据我们所知,这是第一次为广泛使用的k-NN分类器的FPGA实现设计灵活的IP内核。
{"title":"Flexible IP cores for the k-NN classification problem and their FPGA implementation","authors":"E. Manolakos, I. Stamoulias","doi":"10.1109/IPDPSW.2010.5470733","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470733","url":null,"abstract":"The k-nearest neighbor (k-NN) is a popular non-parametric benchmark classification algorithm to which new classifiers are usually compared. It is used in numerous applications, some of which may involve thousands of data vectors in a possibly very high dimensional feature space. For real-time classification a hardware implementation of the algorithm can deliver high performance gains by exploiting parallel processing and block pipelining. We present two different linear array architectures that have been described as soft parameterized IP cores in VHDL. The IP cores are used to synthesize and evaluate a variety of array architectures for a different k-NN problem instances and Xilinx FPGAs. It is shown that we can solve efficiently, using a medium size FPGA device, very large size classification problems, with thousands of reference data vectors or vector dimensions, while achieving very high throughput. To the best of our knowledge, this is the first effort to design flexible IP cores for the FPGA implementation of the widely used k-NN classifier.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133051082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Clairvoyant site allocation of jobs with highly variable service demands in a computational grid 计算网格中具有高度可变服务需求的作业的透视站点分配
S. Zikos, H. Karatza
In this paper we evaluate performance of three different site allocation policies in a 2-level computational grid with heterogeneous sites. We consider that schedulers are aware of service demands of jobs which show high variability. A simulation model is used to evaluate performance in terms of the average response time and slowdown, under medium and high load. Simulation results show that the proposed policy outperforms the other two that are examined, especially at high load.
在本文中,我们评估了三种不同的站点分配策略在具有异构站点的2级计算网格中的性能。我们认为调度器意识到工作的服务需求表现出高度的可变性。采用仿真模型对中、高负载下的平均响应时间和速度进行了评估。仿真结果表明,该策略在高负载下的性能优于其他两种策略。
{"title":"Clairvoyant site allocation of jobs with highly variable service demands in a computational grid","authors":"S. Zikos, H. Karatza","doi":"10.1109/IPDPSW.2010.5470781","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470781","url":null,"abstract":"In this paper we evaluate performance of three different site allocation policies in a 2-level computational grid with heterogeneous sites. We consider that schedulers are aware of service demands of jobs which show high variability. A simulation model is used to evaluate performance in terms of the average response time and slowdown, under medium and high load. Simulation results show that the proposed policy outperforms the other two that are examined, especially at high load.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"210 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116012252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Scalability analysis of embarassingly parallel applications on large clusters 大型集群上令人尴尬的并行应用程序的可伸缩性分析
Fabrício A. B. Silva, H. Senger
This work presents a scalability analysis of embarrassingly parallel applications running on cluster and multi-cluster machines. Several applications can be included in this category. Examples are Bag-of-tasks (BoT) applications and some classes of online web services, such as index processing in online web search. The analysis presented here is divided in two parts: first, the impact of front end topology on scalability is assessed through a lower bound analysis. In a second step several task mapping strategies are compared from the scalability standpoint.
这项工作提出了在集群和多集群机器上运行的令人尴尬的并行应用程序的可伸缩性分析。有几个应用程序可以包含在这个类别中。例如任务包(BoT)应用程序和某些类型的在线web服务,例如在线web搜索中的索引处理。本文的分析分为两部分:首先,通过下界分析评估前端拓扑对可伸缩性的影响。在第二步中,从可伸缩性的角度比较了几种任务映射策略。
{"title":"Scalability analysis of embarassingly parallel applications on large clusters","authors":"Fabrício A. B. Silva, H. Senger","doi":"10.1109/IPDPSW.2010.5470724","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470724","url":null,"abstract":"This work presents a scalability analysis of embarrassingly parallel applications running on cluster and multi-cluster machines. Several applications can be included in this category. Examples are Bag-of-tasks (BoT) applications and some classes of online web services, such as index processing in online web search. The analysis presented here is divided in two parts: first, the impact of front end topology on scalability is assessed through a lower bound analysis. In a second step several task mapping strategies are compared from the scalability standpoint.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116283027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Parallel external sorting for CUDA-enabled GPUs with load balancing and low transfer overhead 支持cuda的gpu的并行外部排序,具有负载平衡和低传输开销
H. Peters, Ole Schulz-Hildebrandt, N. Luttenberger
Sorting is a well-investigated topic in Computer Science in general and by now many efficient sorting algorithms for CPUs and GPUs have been developed. There is no swapping, paging, etc. available on GPUs to provide more virtual memory than physically available, thus if one wants to sort sequences that exceed GPU memory using the GPU the problem of external sorting arises.
排序是计算机科学中一个被广泛研究的话题,到目前为止,已经开发了许多用于cpu和gpu的高效排序算法。GPU上没有交换、分页等功能来提供比物理可用的更多的虚拟内存,因此,如果想要使用GPU对超出GPU内存的序列进行排序,就会出现外部排序的问题。
{"title":"Parallel external sorting for CUDA-enabled GPUs with load balancing and low transfer overhead","authors":"H. Peters, Ole Schulz-Hildebrandt, N. Luttenberger","doi":"10.1109/IPDPSW.2010.5470833","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470833","url":null,"abstract":"Sorting is a well-investigated topic in Computer Science in general and by now many efficient sorting algorithms for CPUs and GPUs have been developed. There is no swapping, paging, etc. available on GPUs to provide more virtual memory than physically available, thus if one wants to sort sequences that exceed GPU memory using the GPU the problem of external sorting arises.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"507 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116361731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
期刊
2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1