首页 > 最新文献

2015 IEEE International Parallel and Distributed Processing Symposium Workshop最新文献

英文 中文
Accelerating Large-Scale Single-Source Shortest Path on FPGA FPGA上大规模单源最短路径加速
Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.130
Shijie Zhou, C. Chelmis, V. Prasanna
Many real-world problems can be represented as graphs and solved by graph traversal algorithms. Single-Source Shortest Path (SSSP) is a fundamental graph algorithm. Today, large-scale graphs involve millions or even billions of vertices, making efficient parallel graph processing challenging. In this paper, we propose a single-FPGA based design to accelerate SSSP for massive graphs. We adopt the well-known Bellman-Ford algorithm. In the proposed design, graph is stored in external memory, which is more realistic for processing large scale graphs. Using the available external memory bandwidth, our design achieves the maximum data parallelism to concurrently process multiple edges in each clock cycle, regardless of data dependencies. The performance of our design is independent of the graph structure as well. We propose a optimized data layout to enable efficient utilization of external memory bandwidth. We prototype our design using a state-of-the-art FPGA. Experimental results show that our design is capable of processing 1.6 billion edges per second (GTEPS) using a single FPGA, while simultaneously achieving high clock rate of over 200 MHz. This would place us in the 131st position of the Graph 500 benchmark list of supercomputing systems for data intensive applications. Our solution therefore provides comparable performance to state-of-the-art systems.
许多现实世界的问题都可以用图来表示,并通过图遍历算法来解决。单源最短路径(SSSP)是一种基本的图算法。如今,大规模图涉及数百万甚至数十亿个顶点,这使得高效的并行图处理具有挑战性。在本文中,我们提出了一种基于单fpga的设计来加速大规模图形的SSSP。我们采用了著名的Bellman-Ford算法。在本设计中,图形存储在外部存储器中,这对于处理大规模图形更为现实。利用可用的外部内存带宽,我们的设计实现了最大的数据并行性,在每个时钟周期内并发处理多个边缘,而不考虑数据依赖性。我们设计的性能与图的结构无关。我们提出一个优化的数据布局,使有效利用外部存储器带宽。我们使用最先进的FPGA设计原型。实验结果表明,我们的设计能够使用单个FPGA每秒处理16亿个边缘(GTEPS),同时实现超过200 MHz的高时钟速率。这将使我们在数据密集型应用程序的Graph 500超级计算系统基准列表中排名第131位。因此,我们的解决方案提供了与最先进的系统相当的性能。
{"title":"Accelerating Large-Scale Single-Source Shortest Path on FPGA","authors":"Shijie Zhou, C. Chelmis, V. Prasanna","doi":"10.1109/IPDPSW.2015.130","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.130","url":null,"abstract":"Many real-world problems can be represented as graphs and solved by graph traversal algorithms. Single-Source Shortest Path (SSSP) is a fundamental graph algorithm. Today, large-scale graphs involve millions or even billions of vertices, making efficient parallel graph processing challenging. In this paper, we propose a single-FPGA based design to accelerate SSSP for massive graphs. We adopt the well-known Bellman-Ford algorithm. In the proposed design, graph is stored in external memory, which is more realistic for processing large scale graphs. Using the available external memory bandwidth, our design achieves the maximum data parallelism to concurrently process multiple edges in each clock cycle, regardless of data dependencies. The performance of our design is independent of the graph structure as well. We propose a optimized data layout to enable efficient utilization of external memory bandwidth. We prototype our design using a state-of-the-art FPGA. Experimental results show that our design is capable of processing 1.6 billion edges per second (GTEPS) using a single FPGA, while simultaneously achieving high clock rate of over 200 MHz. This would place us in the 131st position of the Graph 500 benchmark list of supercomputing systems for data intensive applications. Our solution therefore provides comparable performance to state-of-the-art systems.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"54 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113956531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
NIDISC Introduction and Committees NIDISC介绍和委员会
Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2014.211
P. Bouvry, F. Seredyński, E. Talbi
This section includes the articles presented at the 18th International Workshop on Nature Inspired Distributed Computing (NIDISC 2015) held in conjunction with the 29th IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS 2015), May 25-29 2015, Hyderabad, India. The NIDISC workshop is an opportunity for researchers to explore the connections between biology, nature-inspired techniques, metaheuristics and the development of solutions to problems that arise in parallel and distributed processing, communications and other application areas.
本节包括在第18届自然启发分布式计算国际研讨会(NIDISC 2015)上发表的文章,该研讨会与第29届IEEE/ACM国际并行和分布式处理研讨会(IPDPS 2015)一起举行,2015年5月25日至29日,印度海德拉巴。NIDISC研讨会为研究人员提供了一个机会,探索生物学、自然启发技术、元启发式和解决并行和分布式处理、通信和其他应用领域出现的问题的发展之间的联系。
{"title":"NIDISC Introduction and Committees","authors":"P. Bouvry, F. Seredyński, E. Talbi","doi":"10.1109/IPDPSW.2014.211","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.211","url":null,"abstract":"This section includes the articles presented at the 18th International Workshop on Nature Inspired Distributed Computing (NIDISC 2015) held in conjunction with the 29th IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS 2015), May 25-29 2015, Hyderabad, India. The NIDISC workshop is an opportunity for researchers to explore the connections between biology, nature-inspired techniques, metaheuristics and the development of solutions to problems that arise in parallel and distributed processing, communications and other application areas.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125642672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Generic Framework for Impossibility Results in Time-Varying Graphs 时变图不可能结果的一般框架
Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.59
Nicolas Braud-Santoni, S. Dubois, Mohamed-Hamza Kaaouachi, F. Petit
We address highly dynamic distributed systems modelled by time-varying graphs (TVGs). We are interested in proof of impossibility results that often use informal arguments about convergence. First, we provide a topological distance metric over sets of TVGs to correctly define the convergence of TVG sequences in such sets. Next, we provide a general framework that formally proves the convergence of the sequence of executions of any deterministic algorithm over TVGs of any convergent sequence of TVGs. Finally, we illustrate the relevance of the above result by proving that no deterministic algorithm exists to compute the underlying graph of any connected-over-time TVG, i.e. Any TVG of the weakest class of long-lived TVGs.
我们解决了由时变图(tvg)建模的高度动态分布式系统。我们感兴趣的是不可能结果的证明,通常使用关于收敛的非正式论证。首先,我们提供了TVG集合上的拓扑距离度量,以正确定义TVG序列在这些集合中的收敛性。接下来,我们提供了一个通用框架,正式证明了任意确定性算法在任意收敛tvg序列的tvg上执行序列的收敛性。最后,我们通过证明不存在确定性算法来计算任何连接随时间推移的TVG的底层图,即任何长寿命TVG的最弱类的TVG,来说明上述结果的相关性。
{"title":"A Generic Framework for Impossibility Results in Time-Varying Graphs","authors":"Nicolas Braud-Santoni, S. Dubois, Mohamed-Hamza Kaaouachi, F. Petit","doi":"10.1109/IPDPSW.2015.59","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.59","url":null,"abstract":"We address highly dynamic distributed systems modelled by time-varying graphs (TVGs). We are interested in proof of impossibility results that often use informal arguments about convergence. First, we provide a topological distance metric over sets of TVGs to correctly define the convergence of TVG sequences in such sets. Next, we provide a general framework that formally proves the convergence of the sequence of executions of any deterministic algorithm over TVGs of any convergent sequence of TVGs. Finally, we illustrate the relevance of the above result by proving that no deterministic algorithm exists to compute the underlying graph of any connected-over-time TVG, i.e. Any TVG of the weakest class of long-lived TVGs.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"260 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122679351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Distributed Scheduling Algorithm for Highly Available Component Based Applications 基于高可用组件应用的分布式调度算法
Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.114
M. Frîncu
The emergence of multi-clouds makes it difficult for application providers to offer reliable applications to end users. The different levels of infrastructure reliability offered by various cloud providers need to be abstracted at application level through application-aware algorithms for high availability. This task is challenging due to the closed world approach taken by the various cloud providers. In the face of different access and management policies orchestrated distributed management algorithms are needed instead of centralized solutions. In this paper we present a decentralized autonomic algorithm for achieving application high availability by harnessing the properties of scalable component-based applications and the advantage of overlay networks to communicate between peers. In a multi-cloud environment the algorithm maintains cloud provider independence while achieving global application availability. The algorithm was tested on a simulator and results show that it gives similar results to a centralized approach without inducing much communication overhead.
多云的出现使得应用程序提供商很难向最终用户提供可靠的应用程序。各种云提供商提供的不同级别的基础设施可靠性需要通过应用程序感知算法在应用程序级别进行抽象,以实现高可用性。由于各种云提供商采用的封闭世界方法,这项任务具有挑战性。面对不同的访问和管理策略,需要编排分布式管理算法而不是集中式解决方案。在本文中,我们提出了一种分散的自治算法,通过利用可扩展的基于组件的应用程序的特性和覆盖网络在对等体之间通信的优势来实现应用程序的高可用性。在多云环境中,该算法在实现全局应用程序可用性的同时保持云提供商的独立性。该算法在模拟器上进行了测试,结果表明,该算法在不增加通信开销的情况下获得了与集中式方法相似的结果。
{"title":"Distributed Scheduling Algorithm for Highly Available Component Based Applications","authors":"M. Frîncu","doi":"10.1109/IPDPSW.2015.114","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.114","url":null,"abstract":"The emergence of multi-clouds makes it difficult for application providers to offer reliable applications to end users. The different levels of infrastructure reliability offered by various cloud providers need to be abstracted at application level through application-aware algorithms for high availability. This task is challenging due to the closed world approach taken by the various cloud providers. In the face of different access and management policies orchestrated distributed management algorithms are needed instead of centralized solutions. In this paper we present a decentralized autonomic algorithm for achieving application high availability by harnessing the properties of scalable component-based applications and the advantage of overlay networks to communicate between peers. In a multi-cloud environment the algorithm maintains cloud provider independence while achieving global application availability. The algorithm was tested on a simulator and results show that it gives similar results to a centralized approach without inducing much communication overhead.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"439 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122885804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
GPU Accelerated Molecular Dynamics with Method of Heterogeneous Load Balancing 基于异构负载均衡方法的GPU加速分子动力学
Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.41
T. Udagawa, M. Sekijima
Molecular Dynamics simulations are widely used to obtain a deeper understanding of chemical reactions, fluid flows, phase transitions, and other physical phenomena due to molecular interactions. The main problem with this method is that it is computationally demanding because of its amount of O (N2) and requirements for prolonged simulations. The use of Graphics Processing Units (GPUs) is an attractive solution and has been applied to this problem thus far. However, such heterogeneous approaches occasionally cause load imbalances between CPUs and GPUs and they don't utilize all computational resources. We propose a method of balancing the workload between CPUs and GPUs, which we implemented. Our method is based on formulating and observing workloads and it statically distributes work according to spatial decomposition. We succeeded in utilizing processors more efficiently and accelerating simulations by 20.7 % at most compared to the original GPU optimized code.
分子动力学模拟被广泛用于对化学反应、流体流动、相变和其他由分子相互作用引起的物理现象有更深入的了解。这种方法的主要问题是,由于其O (N2)的数量和长时间模拟的要求,它的计算要求很高。图形处理单元(gpu)的使用是一个有吸引力的解决方案,迄今为止已经应用于这个问题。然而,这种异构方法偶尔会导致cpu和gpu之间的负载不平衡,而且它们不会利用所有的计算资源。我们提出了一种在cpu和gpu之间平衡工作负载的方法,并实现了该方法。该方法基于对工作量的表述和观察,根据空间分解静态分配工作。与最初的GPU优化代码相比,我们成功地更有效地利用了处理器,并将模拟速度提高了20.7%。
{"title":"GPU Accelerated Molecular Dynamics with Method of Heterogeneous Load Balancing","authors":"T. Udagawa, M. Sekijima","doi":"10.1109/IPDPSW.2015.41","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.41","url":null,"abstract":"Molecular Dynamics simulations are widely used to obtain a deeper understanding of chemical reactions, fluid flows, phase transitions, and other physical phenomena due to molecular interactions. The main problem with this method is that it is computationally demanding because of its amount of O (N2) and requirements for prolonged simulations. The use of Graphics Processing Units (GPUs) is an attractive solution and has been applied to this problem thus far. However, such heterogeneous approaches occasionally cause load imbalances between CPUs and GPUs and they don't utilize all computational resources. We propose a method of balancing the workload between CPUs and GPUs, which we implemented. Our method is based on formulating and observing workloads and it statically distributes work according to spatial decomposition. We succeeded in utilizing processors more efficiently and accelerating simulations by 20.7 % at most compared to the original GPU optimized code.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132657339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Enhancing Speedups for FPGA Accelerated SPICE through Frequency Scaling and Precision Reduction 通过频率缩放和精度降低来提高FPGA加速SPICE的速度
Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.100
L. Hui, Nachiket Kapre
Frequency scaling and precision reduction optimization of an FPGA accelerated SPICE circuit simulator can enhance performance by 1.5x while lowering implementation cost by 15 -- 20%. This is possible due the inherent fault tolerant capabilities of SPICE that can naturally drive simulator convergence even in presence of arithmetic errors due to frequency scaling and precision reduction. We quantify the impact of these transformations on SPICE by analyzing the resulting convergence residue and runtime. To explain the impact of our optimizations, we develop an empirical error model derived from in-situ frequency scaling experiments and build analytical models of rounding and truncation errors using Gappa-based numerical analysis. Across a range of benchmark SPICE circuits, we are able to tolerate to bit-level fault rates of 10--4 (frequency scaling) and manage up to 8-bit loss in least-significant digits (precision reduction) without compromising SPICE convergence quality while delivering speedups.
FPGA加速SPICE电路模拟器的频率缩放和精度降低优化可以提高1.5倍的性能,同时降低15 - 20%的实施成本。这是可能的,因为SPICE固有的容错能力,即使在频率缩放和精度降低导致的算术错误存在的情况下,也可以自然地驱动模拟器收敛。我们通过分析结果的收敛剩余和运行时间来量化这些转换对SPICE的影响。为了解释优化的影响,我们建立了一个基于现场频率缩放实验的经验误差模型,并使用基于gappa的数值分析建立了舍入和截断误差的分析模型。在一系列基准SPICE电路中,我们能够容忍10- 4的比特级故障率(频率缩放),并在最低有效数字(精度降低)中管理高达8位的损失,而不会影响SPICE收敛质量,同时提供速度。
{"title":"Enhancing Speedups for FPGA Accelerated SPICE through Frequency Scaling and Precision Reduction","authors":"L. Hui, Nachiket Kapre","doi":"10.1109/IPDPSW.2015.100","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.100","url":null,"abstract":"Frequency scaling and precision reduction optimization of an FPGA accelerated SPICE circuit simulator can enhance performance by 1.5x while lowering implementation cost by 15 -- 20%. This is possible due the inherent fault tolerant capabilities of SPICE that can naturally drive simulator convergence even in presence of arithmetic errors due to frequency scaling and precision reduction. We quantify the impact of these transformations on SPICE by analyzing the resulting convergence residue and runtime. To explain the impact of our optimizations, we develop an empirical error model derived from in-situ frequency scaling experiments and build analytical models of rounding and truncation errors using Gappa-based numerical analysis. Across a range of benchmark SPICE circuits, we are able to tolerate to bit-level fault rates of 10--4 (frequency scaling) and manage up to 8-bit loss in least-significant digits (precision reduction) without compromising SPICE convergence quality while delivering speedups.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134404026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Genetic Algorithm Approach for Adjusting Time Series Based Load Prediction 基于时间序列调整的负荷预测遗传算法
Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.96
Raed Alkharboush, R. E. Grande, A. Boukerche
Distributed virtual simulation are prone to load oscillations, as well as load imbalances during run-time. Detecting such imbalances and responding accordingly using load redistribution can be of great utility in keeping execution performance close to the aimed optimal. A dynamic balancing scheme can introduce a reactive approach, but a predictive scheme can prevent imbalances before they occur. Several models can be employed for predicting load, but due to the characteristics in which the load is collected and presented, time series offer reasonable load forecasting in a short time. However, the Holt's model, well known model for time series representation, shows limitations on the forecasting of load. In order to correct this issue, a genetic algorithm approach is introduced to dynamically adjust the model based on the recent modifications on the load behaviour. The convergence of the algorithm can substantially influence the response time of the predictive balancing system, so an analysis is conducted to identify the minimum number of iterations for generating a reasonable adjustment.
分布式虚拟仿真在运行过程中容易出现负载振荡和负载不平衡等问题。检测这种不平衡并使用负载重新分配做出相应的响应,对于保持执行性能接近目标最优非常有用。动态平衡方案可以引入反应性方法,但预测方案可以在失衡发生之前防止失衡。有几种模型可用于负荷预测,但由于负荷收集和呈现的特性,时间序列在短时间内提供了合理的负荷预测。然而,以时间序列表示著称的霍尔特模型在负荷预测方面存在一定的局限性。为了纠正这一问题,引入了一种遗传算法方法,根据最近荷载行为的变化动态调整模型。算法的收敛性对预测平衡系统的响应时间有很大的影响,因此进行了分析以确定产生合理调整的最小迭代次数。
{"title":"A Genetic Algorithm Approach for Adjusting Time Series Based Load Prediction","authors":"Raed Alkharboush, R. E. Grande, A. Boukerche","doi":"10.1109/IPDPSW.2015.96","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.96","url":null,"abstract":"Distributed virtual simulation are prone to load oscillations, as well as load imbalances during run-time. Detecting such imbalances and responding accordingly using load redistribution can be of great utility in keeping execution performance close to the aimed optimal. A dynamic balancing scheme can introduce a reactive approach, but a predictive scheme can prevent imbalances before they occur. Several models can be employed for predicting load, but due to the characteristics in which the load is collected and presented, time series offer reasonable load forecasting in a short time. However, the Holt's model, well known model for time series representation, shows limitations on the forecasting of load. In order to correct this issue, a genetic algorithm approach is introduced to dynamically adjust the model based on the recent modifications on the load behaviour. The convergence of the algorithm can substantially influence the response time of the predictive balancing system, so an analysis is conducted to identify the minimum number of iterations for generating a reasonable adjustment.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134589198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance and Energy Efficient Asymmetrically Reliable Caches for Multicore Architectures 多核架构的性能和能效非对称可靠缓存
Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.113
Sanem Arslan, H. Topcuoglu, M. Kandemir, Oguz Tosun
Modern architectures are increasingly susceptible to transient and permanent faults due to continuously decreasing transistor sizes and faster operating frequencies. The probability of soft error occurrence is relatively high on cache structures due to the large area of the logic compared to other parts. Applying fault tolerance unselectively for all caches has a significant overhead on performance and energy. In this study, we propose asymmetrically reliable caches aiming to provide required reliability using just enough extra hardware under the performance and energy constraints. In our framework, a chip multiprocessor consists of one reliability-aware core which has ECC protection on its data cache for critical data and a set of less reliable cores with unprotected data caches to map noncritical data. The experimental results for selected applications show that our proposed technique provides 21% better reliability for only 6% more energy consumption compared to traditional caches.
由于晶体管尺寸的不断减小和工作频率的不断加快,现代架构越来越容易受到瞬态和永久故障的影响。由于与其他部分相比,缓存结构上的逻辑面积较大,因此发生软错误的概率相对较高。对所有缓存不选择性地应用容错会对性能和能量造成很大的开销。在本研究中,我们提出了非对称可靠缓存,旨在在性能和能量限制下使用足够的额外硬件提供所需的可靠性。在我们的框架中,一个芯片多处理器由一个可靠性感知核心组成,该核心对关键数据的数据缓存具有ECC保护,而一组可靠性较低的核心具有未受保护的数据缓存来映射非关键数据。选定应用的实验结果表明,与传统缓存相比,我们提出的技术在仅增加6%的能耗的情况下提供了21%的可靠性。
{"title":"Performance and Energy Efficient Asymmetrically Reliable Caches for Multicore Architectures","authors":"Sanem Arslan, H. Topcuoglu, M. Kandemir, Oguz Tosun","doi":"10.1109/IPDPSW.2015.113","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.113","url":null,"abstract":"Modern architectures are increasingly susceptible to transient and permanent faults due to continuously decreasing transistor sizes and faster operating frequencies. The probability of soft error occurrence is relatively high on cache structures due to the large area of the logic compared to other parts. Applying fault tolerance unselectively for all caches has a significant overhead on performance and energy. In this study, we propose asymmetrically reliable caches aiming to provide required reliability using just enough extra hardware under the performance and energy constraints. In our framework, a chip multiprocessor consists of one reliability-aware core which has ECC protection on its data cache for critical data and a set of less reliable cores with unprotected data caches to map noncritical data. The experimental results for selected applications show that our proposed technique provides 21% better reliability for only 6% more energy consumption compared to traditional caches.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115715127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Performance Modeling of Multi-tiered Web Applications with Varying Service Demands 具有不同服务需求的多层Web应用程序的性能建模
Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.28
A. Kattepur, M. Nambiar
Multi-tiered transactional web applications are frequently used in enterprise based systems. Due to their inherent distributed nature, pre-deployment testing for high-availability and varying concurrency are important for post-deployment performance. Accurate performance modeling of such applications can help estimate values for future deployment variations as well as validate experimental results. In order to theoretically model performance of multi-tiered applications, we use queuing networks and Mean Value Analysis (MVA) models. While MVA has been shown to work well with closed queuing networks, there are particular limitations in cases where the service demands vary with concurrency. This is further contrived by the use of multi-server queues in multi-core CPUs, that are not traditionally captured in MVA. We compare performance of a multi-server MVA model alongside actual performance testing measurements and demonstrate this deviation. Using spline interpolation of collected service demands, we show that a modified version of the MVA algorithm (called MVASD) that accepts an array of service demands, can provide superior estimates of maximum throughput and response time. Results are demonstrated over multi-tier vehicle insurance registration and e-commerce web applications. The mean deviations of predicted throughput and response time are shown to be less the 3% and 9%, respectively. Additionally, we analyze the effect of spline interpolation of service demands as a function of throughput on the prediction results.
多层事务性web应用程序经常用于基于企业的系统。由于其固有的分布式特性,高可用性和可变并发性的部署前测试对于部署后性能非常重要。对这些应用程序进行准确的性能建模可以帮助估计未来部署变化的值,并验证实验结果。为了从理论上模拟多层应用程序的性能,我们使用排队网络和均值分析(MVA)模型。虽然MVA已被证明可以很好地用于封闭排队网络,但在服务需求随并发性变化的情况下,它有特殊的局限性。通过在多核cpu中使用多服务器队列进一步实现了这一点,而传统上MVA不会捕获这些队列。我们将多服务器MVA模型的性能与实际性能测试结果进行比较,并演示这种偏差。使用收集到的服务需求的样条插值,我们展示了MVA算法(称为MVASD)的修改版本,它接受一系列服务需求,可以提供更好的最大吞吐量和响应时间估计。结果通过多层车辆保险登记和电子商务web应用程序进行了演示。预测吞吐量和响应时间的平均偏差分别小于3%和9%。此外,我们还分析了服务需求作为吞吐量函数的样条插值对预测结果的影响。
{"title":"Performance Modeling of Multi-tiered Web Applications with Varying Service Demands","authors":"A. Kattepur, M. Nambiar","doi":"10.1109/IPDPSW.2015.28","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.28","url":null,"abstract":"Multi-tiered transactional web applications are frequently used in enterprise based systems. Due to their inherent distributed nature, pre-deployment testing for high-availability and varying concurrency are important for post-deployment performance. Accurate performance modeling of such applications can help estimate values for future deployment variations as well as validate experimental results. In order to theoretically model performance of multi-tiered applications, we use queuing networks and Mean Value Analysis (MVA) models. While MVA has been shown to work well with closed queuing networks, there are particular limitations in cases where the service demands vary with concurrency. This is further contrived by the use of multi-server queues in multi-core CPUs, that are not traditionally captured in MVA. We compare performance of a multi-server MVA model alongside actual performance testing measurements and demonstrate this deviation. Using spline interpolation of collected service demands, we show that a modified version of the MVA algorithm (called MVASD) that accepts an array of service demands, can provide superior estimates of maximum throughput and response time. Results are demonstrated over multi-tier vehicle insurance registration and e-commerce web applications. The mean deviations of predicted throughput and response time are shown to be less the 3% and 9%, respectively. Additionally, we analyze the effect of spline interpolation of service demands as a function of throughput on the prediction results.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115285543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Firefly Inspired Improved Distributed Proximity Algorithm for D2D Communication 萤火虫启发的D2D通信改进分布式接近算法
Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.64
A. Pratap, R. Misra
Device-to-Device (i.e. D2D) communication under-laying cellular technology not only increases system capacity but also utilizes the advantage of physical proximity of communicating devices to support services like proximity services, offload traffic from Base Station (i.e. BS) etc. But proximity discovery and synchronization among devices efficiently poses new research challenges for cellular networks. Inspired by the synchronization behaviour of fire fly found in nature, the reported algorithms based on bio-inspired firefly heuristics for synchronization among devices as well as service interest among them having drawback of large convergence time and large message exchanges. Therefore, we propose an improved O (n log n) distributed firefly algorithm for D2D large scale networks using tree based topological mechanism using RSSI based ranging scheme.
基于蜂窝技术的设备对设备(即D2D)通信不仅增加了系统容量,而且利用通信设备物理接近的优势来支持诸如接近服务、从基站(即BS)卸载流量等服务。但是,设备间的近距离发现和同步对蜂窝网络的研究提出了新的挑战。受自然界萤火虫同步行为的启发,基于仿生萤火虫启发式的设备间同步及服务兴趣算法存在收敛时间大、消息交换量大的缺点。因此,我们提出了一种改进的O (n log n)分布式萤火虫算法,该算法采用基于RSSI的测距方案,采用基于树的拓扑机制,用于D2D大规模网络。
{"title":"Firefly Inspired Improved Distributed Proximity Algorithm for D2D Communication","authors":"A. Pratap, R. Misra","doi":"10.1109/IPDPSW.2015.64","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.64","url":null,"abstract":"Device-to-Device (i.e. D2D) communication under-laying cellular technology not only increases system capacity but also utilizes the advantage of physical proximity of communicating devices to support services like proximity services, offload traffic from Base Station (i.e. BS) etc. But proximity discovery and synchronization among devices efficiently poses new research challenges for cellular networks. Inspired by the synchronization behaviour of fire fly found in nature, the reported algorithms based on bio-inspired firefly heuristics for synchronization among devices as well as service interest among them having drawback of large convergence time and large message exchanges. Therefore, we propose an improved O (n log n) distributed firefly algorithm for D2D large scale networks using tree based topological mechanism using RSSI based ranging scheme.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121742933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
2015 IEEE International Parallel and Distributed Processing Symposium Workshop
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1