Pub Date : 2010-04-19DOI: 10.1109/IPDPSW.2010.5470783
Gael Abgrall, F. Roy, J. Diguet, G. Gogniat, J. Delahaye
This paper presents an in-depth analysis of the behavior of a SCA component-based waveform application in terms of ¿inter-component¿ communication latency. The main limitation with SCA, in the context of embedded systems, is the additional cost introduced by the use of CORBA. Previous studies have already defined the major metrics of interest regarding this issue, these are CPU cost, memory requirements and ¿inter-component¿ latency. Real-time systems can not afford high latency, in consequence, this paper focuses on this metric. The starting point of this paper is the desire of knowing if the SCA CF does not also bring an overhead. Measurements have been realized with OmniORB as CORBA distribution and OSSIE for SCA implementation. In order to perform these measurements, a SCA waveform composed of several ¿empty-components¿ have been created. ¿Empty-components¿ are software components compliant to SCA without any signal processing part. The study only focuses on communications between components. The same kind of ¿inter-component¿ link has been measured between two components using CORBA without SCA. It is possible to compare the latency values between the two measurements and to show as a result that they are approximately the same. The CORBA bus is really the part which brings an overhead to the system. The final part of this paper introduces a statistical estimation of the latency distributions. It results from measurements performed with various data packet sizes and uses a fitting method based on a combination of Gaussian functions.
{"title":"Predictibility of inter-component latency in a software communications architecture operating environment","authors":"Gael Abgrall, F. Roy, J. Diguet, G. Gogniat, J. Delahaye","doi":"10.1109/IPDPSW.2010.5470783","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470783","url":null,"abstract":"This paper presents an in-depth analysis of the behavior of a SCA component-based waveform application in terms of ¿inter-component¿ communication latency. The main limitation with SCA, in the context of embedded systems, is the additional cost introduced by the use of CORBA. Previous studies have already defined the major metrics of interest regarding this issue, these are CPU cost, memory requirements and ¿inter-component¿ latency. Real-time systems can not afford high latency, in consequence, this paper focuses on this metric. The starting point of this paper is the desire of knowing if the SCA CF does not also bring an overhead. Measurements have been realized with OmniORB as CORBA distribution and OSSIE for SCA implementation. In order to perform these measurements, a SCA waveform composed of several ¿empty-components¿ have been created. ¿Empty-components¿ are software components compliant to SCA without any signal processing part. The study only focuses on communications between components. The same kind of ¿inter-component¿ link has been measured between two components using CORBA without SCA. It is possible to compare the latency values between the two measurements and to show as a result that they are approximately the same. The CORBA bus is really the part which brings an overhead to the system. The final part of this paper introduces a statistical estimation of the latency distributions. It results from measurements performed with various data packet sizes and uses a fitting method based on a combination of Gaussian functions.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129843492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-04-19DOI: 10.1109/IPDPSW.2010.5470843
S. Pakin, C. Stunkel, J. Flich, H. Andrade, Vibhore Kumar, D. Turaga
Efficient data motion is the cornerstone of both traditional parallel computing and more recent stream processing systems. This special combined meeting of the Communication Architecture for Clusters (CAC) and the Scalable Stream Processing Systems (SSPS) workshops showcases the latest research advances in both forms of data motion: network communication within a computer system and data streamed from a remote source and processed through a continuous processing framework supporting data analysis applications.
{"title":"Welcome to CAC/SSPS 2010","authors":"S. Pakin, C. Stunkel, J. Flich, H. Andrade, Vibhore Kumar, D. Turaga","doi":"10.1109/IPDPSW.2010.5470843","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470843","url":null,"abstract":"Efficient data motion is the cornerstone of both traditional parallel computing and more recent stream processing systems. This special combined meeting of the Communication Architecture for Clusters (CAC) and the Scalable Stream Processing Systems (SSPS) workshops showcases the latest research advances in both forms of data motion: network communication within a computer system and data streamed from a remote source and processed through a continuous processing framework supporting data analysis applications.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127132000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-04-19DOI: 10.1109/IPDPSW.2010.5470941
S. Tomov, Rajib Nath, H. Ltaief, J. Dongarra
Solving dense linear systems of equations is a fundamental problem in scientific computing. Numerical simulations involving complex systems represented in terms of unknown variables and relations between them often lead to linear systems of equations that must be solved as fast as possible. We describe current efforts toward the development of these critical solvers in the area of dense linear algebra (DLA) for multicore with GPU accelerators. We describe how to code/develop solvers to effectively use the high computing power available in these new and emerging hybrid architectures. The approach taken is based on hybridization techniques in the context of Cholesky, LU, and QR factorizations. We use a high-level parallel programming model and leverage existing software infrastructure, e.g. optimized BLAS for CPU and GPU, and LAPACK for sequential CPU processing. Included also are architecture and algorithm-specific optimizations for standard solvers as well as mixed-precision iterative refinement solvers. The new algorithms, depending on the hardware configuration and routine parameters, can lead to orders of magnitude acceleration when compared to the same algorithms on standard multicore architectures that do not contain GPU accelerators. The newly developed DLA solvers are integrated and freely available through the MAGMA library.
{"title":"Dense linear algebra solvers for multicore with GPU accelerators","authors":"S. Tomov, Rajib Nath, H. Ltaief, J. Dongarra","doi":"10.1109/IPDPSW.2010.5470941","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470941","url":null,"abstract":"Solving dense linear systems of equations is a fundamental problem in scientific computing. Numerical simulations involving complex systems represented in terms of unknown variables and relations between them often lead to linear systems of equations that must be solved as fast as possible. We describe current efforts toward the development of these critical solvers in the area of dense linear algebra (DLA) for multicore with GPU accelerators. We describe how to code/develop solvers to effectively use the high computing power available in these new and emerging hybrid architectures. The approach taken is based on hybridization techniques in the context of Cholesky, LU, and QR factorizations. We use a high-level parallel programming model and leverage existing software infrastructure, e.g. optimized BLAS for CPU and GPU, and LAPACK for sequential CPU processing. Included also are architecture and algorithm-specific optimizations for standard solvers as well as mixed-precision iterative refinement solvers. The new algorithms, depending on the hardware configuration and routine parameters, can lead to orders of magnitude acceleration when compared to the same algorithms on standard multicore architectures that do not contain GPU accelerators. The newly developed DLA solvers are integrated and freely available through the MAGMA library.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127221826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-04-19DOI: 10.1109/IPDPSW.2010.5470700
Michail-Antisthenis I. Tsompanas, G. Sirakoulis, I. Karafyllidis
Nowadays, there is an increasingly recognized need for more computing power, which has led to multicore processors. However, this evolution is still restrained by the poor efficiency of memory chips. As a possible solution to the problem, this paper examines a model of re-distributing the memory resources assigned to the processor, especially the on-chip memory, in order to achieve higher performance. The proposed model uses the basic concepts of game theory applied to cellular automata lattices and the iterated spatial prisoner's dilemma game. A simulation was established in order to evaluate the performance of this model under different circumstances. Moreover, a corresponding FPGA logic circuit was designed as a part of an embedded, real-time co-circuit, aiming at memory resources fair distribution. The proposed FPGA implementation proved advantageous in terms of low-cost, high-speed, compactness and portability features. Finally, a significant improvement on the performance of the memory resources was ascertained from simulation results.
{"title":"Modeling memory resources distribution on multicore processors using games on cellular automata lattices","authors":"Michail-Antisthenis I. Tsompanas, G. Sirakoulis, I. Karafyllidis","doi":"10.1109/IPDPSW.2010.5470700","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470700","url":null,"abstract":"Nowadays, there is an increasingly recognized need for more computing power, which has led to multicore processors. However, this evolution is still restrained by the poor efficiency of memory chips. As a possible solution to the problem, this paper examines a model of re-distributing the memory resources assigned to the processor, especially the on-chip memory, in order to achieve higher performance. The proposed model uses the basic concepts of game theory applied to cellular automata lattices and the iterated spatial prisoner's dilemma game. A simulation was established in order to evaluate the performance of this model under different circumstances. Moreover, a corresponding FPGA logic circuit was designed as a part of an embedded, real-time co-circuit, aiming at memory resources fair distribution. The proposed FPGA implementation proved advantageous in terms of low-cost, high-speed, compactness and portability features. Finally, a significant improvement on the performance of the memory resources was ascertained from simulation results.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127400010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-04-19DOI: 10.1109/IPDPSW.2010.5470904
Rong Ge, Xizhou Feng, Sindhu Subramanya, Xian-He Sun
Energy efficiency and parallel I/O performance have become two critical measures in high performance computing (HPC). However, there is little empirical data that characterize the energy-performance behaviors of parallel I/O workload. In this paper, we present a methodology to profile the performance, energy, and energy efficiency of parallel I/O access patterns and report our findings on the impacting factors of parallel I/O energy efficiency. Our study shows that choosing the right buffer size can change the energy-performance efficiency by up to 30 times. High spatial and temporal spacing can also lead to significant improvement in energy-performance efficiency (about 2X). We observe CPU frequency has a more complex impact, depending on the IO operations, spatial and temporal, and memory buffer size. The presented methodology and findings are useful for evaluating the energy efficiency of I/O intensive applications and for providing a guideline to develop energy efficient parallel I/O technology.
{"title":"Characterizing energy efficiency of I/O intensive parallel applications on power-aware clusters","authors":"Rong Ge, Xizhou Feng, Sindhu Subramanya, Xian-He Sun","doi":"10.1109/IPDPSW.2010.5470904","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470904","url":null,"abstract":"Energy efficiency and parallel I/O performance have become two critical measures in high performance computing (HPC). However, there is little empirical data that characterize the energy-performance behaviors of parallel I/O workload. In this paper, we present a methodology to profile the performance, energy, and energy efficiency of parallel I/O access patterns and report our findings on the impacting factors of parallel I/O energy efficiency. Our study shows that choosing the right buffer size can change the energy-performance efficiency by up to 30 times. High spatial and temporal spacing can also lead to significant improvement in energy-performance efficiency (about 2X). We observe CPU frequency has a more complex impact, depending on the IO operations, spatial and temporal, and memory buffer size. The presented methodology and findings are useful for evaluating the energy efficiency of I/O intensive applications and for providing a guideline to develop energy efficient parallel I/O technology.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129149623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-04-19DOI: 10.1109/IPDPSW.2010.5470795
H. Maamar, R. Pazzi, A. Boukerche, E. Petriu
With the advances of wireless communication and mobile computing, there is a growing interest among researchers about augmented reality and streaming 3D graphics on mobile devices for training first responders to be better prepared in a case of disaster scenarios. However, several challenges need to be resolved before this technology become a commodity. One of the major difficulties in 3D streaming over thin mobile devices is related to the supplying partner strategy as it is not easy to discover the peer that has the correct information and that posses enough bandwidth to send the required data quickly and efficiently to the peers in need. In this paper, we propose a new supplying partner strategy for mobile networks-based 3D streaming. The primary goal of the work presented in this paper is first to address the thin mobile devices low storage capabilities; and second to avoid the flooding problem that most wireless mobile networks suffer from. Our proposed protocol is based on the quick discovery of multiple supplying partners, by optimizing the time required by peers to acquire data, avoiding unnecessary messages propagation and network congestion, and decreasing the latency and the network bandwidth over utilization.
{"title":"A supplying partner strategy for mobile networks-based 3D streaming - proof of concept","authors":"H. Maamar, R. Pazzi, A. Boukerche, E. Petriu","doi":"10.1109/IPDPSW.2010.5470795","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470795","url":null,"abstract":"With the advances of wireless communication and mobile computing, there is a growing interest among researchers about augmented reality and streaming 3D graphics on mobile devices for training first responders to be better prepared in a case of disaster scenarios. However, several challenges need to be resolved before this technology become a commodity. One of the major difficulties in 3D streaming over thin mobile devices is related to the supplying partner strategy as it is not easy to discover the peer that has the correct information and that posses enough bandwidth to send the required data quickly and efficiently to the peers in need. In this paper, we propose a new supplying partner strategy for mobile networks-based 3D streaming. The primary goal of the work presented in this paper is first to address the thin mobile devices low storage capabilities; and second to avoid the flooding problem that most wireless mobile networks suffer from. Our proposed protocol is based on the quick discovery of multiple supplying partners, by optimizing the time required by peers to acquire data, avoiding unnecessary messages propagation and network congestion, and decreasing the latency and the network bandwidth over utilization.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130601019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-04-19DOI: 10.1109/IPDPSW.2010.5470697
Andres Bernal, H. Castro
pALS acronym for parallel Adaptive Learning Search is a computational object oriented framework for the development of parallel and cooperative metaheuristics for solving complex optimization problems. The library exploits the paralellization allowing the deployment of mainly two models: the parallel execution of operators and the execution of separate instances or multi-start models. pALS also allows to include in the design of the problem's solution cooperation strategies such as the islands model for genetic algorithms or the parallel exploration of neighborhoods in metaheuristics derived from local searches, including a broad set of topologies associated with these models. pALS has been successfully used in different optimization problems and has proven to be a flexible, extensible and commanding library to promptly develop prototypes offering a collection of ready to use operators that encompass the nucleus of many metaheuristics including hybrid metaheuristics.
{"title":"pALS: An object-oriented framework for developing parallel cooperative metaheuristics","authors":"Andres Bernal, H. Castro","doi":"10.1109/IPDPSW.2010.5470697","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470697","url":null,"abstract":"pALS acronym for parallel Adaptive Learning Search is a computational object oriented framework for the development of parallel and cooperative metaheuristics for solving complex optimization problems. The library exploits the paralellization allowing the deployment of mainly two models: the parallel execution of operators and the execution of separate instances or multi-start models. pALS also allows to include in the design of the problem's solution cooperation strategies such as the islands model for genetic algorithms or the parallel exploration of neighborhoods in metaheuristics derived from local searches, including a broad set of topologies associated with these models. pALS has been successfully used in different optimization problems and has proven to be a flexible, extensible and commanding library to promptly develop prototypes offering a collection of ready to use operators that encompass the nucleus of many metaheuristics including hybrid metaheuristics.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132982872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-04-19DOI: 10.1109/IPDPSW.2010.5470789
M. B. Yassein, S. Manaseer, Asmahan Abu Al-hassan, Zeinab Abu Taye, A. Al-Dubai
Broadcasting is an essential operation in Mobile ad hoc Networks (MANETs) environments. It is used in the initial phase of route discovery process in many reactive protocols. Although broadcasting is simple, it causes the well known broadcast storm problem, which is a result of packet redundancy, contention and collision. A probabilistic scheme has been proposed to overcome this problem. This work aims to study the effect of network density and network mobility on probabilistic schemes using different thresholds (fixed, 2p, 3p and 4p) with the Pessimistic Linear Exponential Backoff (PLEB) algorithm and compare the results with the standard MAC. A number of simulation experiments have been conducted to examine the performance of the proposed PLEP under different operating conditions. The simulation results show that in dense networks the normalized routing load, delay and routing packets are high and the PLEB outperforms the standard MAC in terms of delay.
{"title":"A new probabilistic Linear Exponential Backoff scheme for MANETs","authors":"M. B. Yassein, S. Manaseer, Asmahan Abu Al-hassan, Zeinab Abu Taye, A. Al-Dubai","doi":"10.1109/IPDPSW.2010.5470789","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470789","url":null,"abstract":"Broadcasting is an essential operation in Mobile ad hoc Networks (MANETs) environments. It is used in the initial phase of route discovery process in many reactive protocols. Although broadcasting is simple, it causes the well known broadcast storm problem, which is a result of packet redundancy, contention and collision. A probabilistic scheme has been proposed to overcome this problem. This work aims to study the effect of network density and network mobility on probabilistic schemes using different thresholds (fixed, 2p, 3p and 4p) with the Pessimistic Linear Exponential Backoff (PLEB) algorithm and compare the results with the standard MAC. A number of simulation experiments have been conducted to examine the performance of the proposed PLEP under different operating conditions. The simulation results show that in dense networks the normalized routing load, delay and routing packets are high and the PLEB outperforms the standard MAC in terms of delay.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132999734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-04-19DOI: 10.1109/IPDPSW.2010.5470935
Jouni Mäenpää, G. Camarillo
Distributed Hash Table (DHT) based peer-to-peer overlays are decentralized, scalable, and fault tolerant. However, due to their decentralized nature, it is very hard to know the state and prevailing operating conditions of a running overlay. If the system could figure out the operating conditions, it would be easier to monitor the system and re-configure it in response to changing conditions. Many DHT-based system such as the Peer-to-Peer Session Initiation Protocol (P2PSIP) would benefit from the ability to accurately estimate the prevailing operating conditions of the overlay. In this paper, we evaluate mechanisms that can be used to do this. We focus on network size, join rate, and leave rate. We start from existing mechanisms and show that their accuracy is not sufficient. Next, we show how the mechanisms can be improved to achieve a higher level of accuracy. The improvements we study include various mechanisms improving the accuracy of leave rate estimation, use of a secondary network size estimate, sharing of estimates between peers, and statistical mechanisms to process shared estimates.
{"title":"Estimating operating conditions in a Peer-to-Peer Session Initiation Protocol overlay network","authors":"Jouni Mäenpää, G. Camarillo","doi":"10.1109/IPDPSW.2010.5470935","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470935","url":null,"abstract":"Distributed Hash Table (DHT) based peer-to-peer overlays are decentralized, scalable, and fault tolerant. However, due to their decentralized nature, it is very hard to know the state and prevailing operating conditions of a running overlay. If the system could figure out the operating conditions, it would be easier to monitor the system and re-configure it in response to changing conditions. Many DHT-based system such as the Peer-to-Peer Session Initiation Protocol (P2PSIP) would benefit from the ability to accurately estimate the prevailing operating conditions of the overlay. In this paper, we evaluate mechanisms that can be used to do this. We focus on network size, join rate, and leave rate. We start from existing mechanisms and show that their accuracy is not sufficient. Next, we show how the mechanisms can be improved to achieve a higher level of accuracy. The improvements we study include various mechanisms improving the accuracy of leave rate estimation, use of a secondary network size estimate, sharing of estimates between peers, and statistical mechanisms to process shared estimates.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132135533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-04-19DOI: 10.1109/IPDPSW.2010.5470748
Z. Nawaz, K. Bertels, H. Sumbul
The Smith-Waterman (SW) algorithm is one of the widely used algorithms for sequence alignment in computational biology. With the growing size of the sequence database, there is always a need for even faster implementation of SW. In this paper, we have implemented two Recursive Variable Expansion (RVE) based techniques, which are proved to give better speedup than any best dataflow approach at the cost of extra area. Compared to dataflow approach, our HW implementation is 2.29 times faster at the expense of 2.82 times more area.
{"title":"Fast Smith-Waterman hardware implementation","authors":"Z. Nawaz, K. Bertels, H. Sumbul","doi":"10.1109/IPDPSW.2010.5470748","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470748","url":null,"abstract":"The Smith-Waterman (SW) algorithm is one of the widely used algorithms for sequence alignment in computational biology. With the growing size of the sequence database, there is always a need for even faster implementation of SW. In this paper, we have implemented two Recursive Variable Expansion (RVE) based techniques, which are proved to give better speedup than any best dataflow approach at the cost of extra area. Compared to dataflow approach, our HW implementation is 2.29 times faster at the expense of 2.82 times more area.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125489368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}