D. Chavarría-Miranda, M. Halappanavar, S. Krishnamoorthy, J. Manzano, Abhinav Vishnu, A. Hoisie
Efficient utilization of high-performance computing (HPC) platforms is an important and complex problem. Execution models, abstract descriptions of the dynamic runtime behavior of the execution stack, have significant impact on the utilization of HPC systems. Using a computational chemistry kernel as a case study and a wide variety of execution models combined with load balancing techniques, we explore the impact of execution models on the utilization of an HPC system. We demonstrate a 50 percent improvement in performance by using work stealing relative to a more traditional static scheduling approach. We also use a novel semi-matching technique for load balancing that has comparable performance to a traditional hyper graph-based partitioning implementation, which is computationally expensive. Using this study, we found that execution model design choices and assumptions can limit critical optimizations such as global, dynamic load balancing and finding the correct balance between available work units and different system and runtime overheads. With the emergence of multi- and many-core architectures and the consequent growth in the complexity of HPC platforms, we believe that these lessons will be beneficial to researchers tuning diverse applications on modern HPC platforms, especially on emerging dynamic platforms with energy-induced performance variability.
{"title":"On the Impact of Execution Models: A Case Study in Computational Chemistry","authors":"D. Chavarría-Miranda, M. Halappanavar, S. Krishnamoorthy, J. Manzano, Abhinav Vishnu, A. Hoisie","doi":"10.1109/IPDPSW.2015.111","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.111","url":null,"abstract":"Efficient utilization of high-performance computing (HPC) platforms is an important and complex problem. Execution models, abstract descriptions of the dynamic runtime behavior of the execution stack, have significant impact on the utilization of HPC systems. Using a computational chemistry kernel as a case study and a wide variety of execution models combined with load balancing techniques, we explore the impact of execution models on the utilization of an HPC system. We demonstrate a 50 percent improvement in performance by using work stealing relative to a more traditional static scheduling approach. We also use a novel semi-matching technique for load balancing that has comparable performance to a traditional hyper graph-based partitioning implementation, which is computationally expensive. Using this study, we found that execution model design choices and assumptions can limit critical optimizations such as global, dynamic load balancing and finding the correct balance between available work units and different system and runtime overheads. With the emergence of multi- and many-core architectures and the consequent growth in the complexity of HPC platforms, we believe that these lessons will be beneficial to researchers tuning diverse applications on modern HPC platforms, especially on emerging dynamic platforms with energy-induced performance variability.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"750 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126941519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Several recent papers have introduced a periodic verification mechanism to detect silent errors in iterative solvers. Chen [PPoPP'13, pp. 167 -- 176] has shown how to combine such a verification mechanism (a stability test checking the orthogonality of two vectors and recomputing the residual) with check pointing: the idea is to verify every d iterations, and to checkpoint every c × d iterations. When a silent error is detected by the verification mechanism, one can rollback to, and re-execute from, the last checkpoint. In this paper, we also propose to combine check pointing and verification, but we use ABFT rather than stability tests. ABFT can be used for error detection, but also for error detection and correction, allowing a forward recovery (and no rollback nor re-execution) when a single error is detected. We introduce an abstract performance model to compute the performance of all schemes, and we instantiate it using the Conjugate Gradient algorithm. Finally, we validate our new approach through a set of simulations.
最近的几篇论文介绍了一种周期性验证机制来检测迭代求解器中的无声错误。Chen [PPoPP'13, pp. 167—176]已经展示了如何将这种验证机制(检查两个向量的正交性并重新计算残差的稳定性测试)与检查点结合起来:其思想是每d次迭代验证一次,并且每c × d次迭代检查点。当验证机制检测到无声错误时,可以回滚到最后一个检查点,并从该检查点重新执行。在本文中,我们也建议将检查指向和验证结合起来,但我们使用ABFT而不是稳定性测试。ABFT可用于错误检测,也可用于错误检测和纠正,当检测到单个错误时,允许向前恢复(不回滚也不重新执行)。我们引入了一个抽象的性能模型来计算所有方案的性能,并使用共轭梯度算法对其进行了实例化。最后,我们通过一组仿真验证了我们的新方法。
{"title":"Combining Backward and Forward Recovery to Cope with Silent Errors in Iterative Solvers","authors":"M. Fasi, Y. Robert, B. Uçar","doi":"10.1109/IPDPSW.2015.22","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.22","url":null,"abstract":"Several recent papers have introduced a periodic verification mechanism to detect silent errors in iterative solvers. Chen [PPoPP'13, pp. 167 -- 176] has shown how to combine such a verification mechanism (a stability test checking the orthogonality of two vectors and recomputing the residual) with check pointing: the idea is to verify every d iterations, and to checkpoint every c × d iterations. When a silent error is detected by the verification mechanism, one can rollback to, and re-execute from, the last checkpoint. In this paper, we also propose to combine check pointing and verification, but we use ABFT rather than stability tests. ABFT can be used for error detection, but also for error detection and correction, allowing a forward recovery (and no rollback nor re-execution) when a single error is detected. We introduce an abstract performance model to compute the performance of all schemes, and we instantiate it using the Conjugate Gradient algorithm. Finally, we validate our new approach through a set of simulations.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127006321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recommendation is an indispensable technique especially in e-commerce services such as Amazon or Netflix to provide more preferable items to users. Matrix factorization is a well-known algorithm for recommendation which estimates affinities between users and items solely based on ratings explicitly given by users. To handle the large amounts of data, stochastic gradient descent (SGD), which is an online loss minimization algorithm, can be applied to matrix factorization. SGD is an effective method in terms of both convergence speed and memory consumption, but is difficult to be parallelized due to its essential sequentiality. FPSGD by Zhuang et al. Cite fpsgd is an existing parallel SGD method for matrix factorization by dividing the rating matrix into many small blocks. Threads work on blocks, so that they do not update the same rows or columns of the factor matrices. Because of this technique FPSGD achieves higher convergence speed than other existing methods. Still, as we demonstrate in this paper, FPSGD does not scale beyond 32 cores with 1.4GB Netflix dataset because assigning non-conflicting blocks to threads needs a lock operation. In this work, we propose an alternative approach of SGD for matrix factorization using task parallel programming model. As a result, we have successfully overcome the bottleneck of FPSGD and achieved higher scalability with 64 cores.
{"title":"Scalable Task-Parallel SGD on Matrix Factorization in Multicore Architectures","authors":"Yusuke Nishioka, K. Taura","doi":"10.1109/IPDPSW.2015.135","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.135","url":null,"abstract":"Recommendation is an indispensable technique especially in e-commerce services such as Amazon or Netflix to provide more preferable items to users. Matrix factorization is a well-known algorithm for recommendation which estimates affinities between users and items solely based on ratings explicitly given by users. To handle the large amounts of data, stochastic gradient descent (SGD), which is an online loss minimization algorithm, can be applied to matrix factorization. SGD is an effective method in terms of both convergence speed and memory consumption, but is difficult to be parallelized due to its essential sequentiality. FPSGD by Zhuang et al. Cite fpsgd is an existing parallel SGD method for matrix factorization by dividing the rating matrix into many small blocks. Threads work on blocks, so that they do not update the same rows or columns of the factor matrices. Because of this technique FPSGD achieves higher convergence speed than other existing methods. Still, as we demonstrate in this paper, FPSGD does not scale beyond 32 cores with 1.4GB Netflix dataset because assigning non-conflicting blocks to threads needs a lock operation. In this work, we propose an alternative approach of SGD for matrix factorization using task parallel programming model. As a result, we have successfully overcome the bottleneck of FPSGD and achieved higher scalability with 64 cores.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130687982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Storm pub-sub is a novel high performance publish subscribe system designed to efficiently match events and the subscriptions with high throughput. Moving a content based pub-sub system first to a local cluster and then to a distributed cluster framework is for high performance and scalability. We depart from the use of broker overlays, where each server must support the whole range of operations of a pub-sub service, as well as overlay management and routing functionality. In this system different operations involved in pub-sub are separated to leverage their natural potential for parallelization using bolts. The storm pub-sub is compared with the traditional pub-sub system Siena, a broker based architecture. Through experimentation on local cluster as well as on distributed cluster we show that our approach of designing publish subscribe system on storm scales well for high volume of data. Storm pub-sub system approximately produces 2200 event/s on distributed cluster. In this paper we describe design and implementation of storm pub-sub and evaluate it in terms of scalability and throughput.
{"title":"Storm Pub-Sub: High Performance, Scalable Content Based Event Matching System Using Storm","authors":"M. Shah, D. Kulkarni","doi":"10.1109/IPDPSW.2015.95","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.95","url":null,"abstract":"Storm pub-sub is a novel high performance publish subscribe system designed to efficiently match events and the subscriptions with high throughput. Moving a content based pub-sub system first to a local cluster and then to a distributed cluster framework is for high performance and scalability. We depart from the use of broker overlays, where each server must support the whole range of operations of a pub-sub service, as well as overlay management and routing functionality. In this system different operations involved in pub-sub are separated to leverage their natural potential for parallelization using bolts. The storm pub-sub is compared with the traditional pub-sub system Siena, a broker based architecture. Through experimentation on local cluster as well as on distributed cluster we show that our approach of designing publish subscribe system on storm scales well for high volume of data. Storm pub-sub system approximately produces 2200 event/s on distributed cluster. In this paper we describe design and implementation of storm pub-sub and evaluate it in terms of scalability and throughput.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"27 26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131451897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a method for solving nonconvex mixed-integer nonlinear programs using a branch-and-bound framework. At each node in the search tree, we solve the continuous nonlinear relaxation multiple times using an existing non-linear solver. Since the relaxation we create is in general not convex, this method may not find an optimal solution. In order to mitigate this difficulty, we solve the relaxation multiple times in parallel starting from different initial points. Our preliminary computational experiments show that this approach gives optimal or near-optimal solutions on benchmark problems, and that the method benefits well from parallelism.
{"title":"A Branch-and-Estimate Heuristic Procedure for Solving Nonconvex Integer Optimization Problems","authors":"Prashant Palkar, Ashutosh Mahajan","doi":"10.1109/IPDPSW.2015.43","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.43","url":null,"abstract":"We present a method for solving nonconvex mixed-integer nonlinear programs using a branch-and-bound framework. At each node in the search tree, we solve the continuous nonlinear relaxation multiple times using an existing non-linear solver. Since the relaxation we create is in general not convex, this method may not find an optimal solution. In order to mitigate this difficulty, we solve the relaxation multiple times in parallel starting from different initial points. Our preliminary computational experiments show that this approach gives optimal or near-optimal solutions on benchmark problems, and that the method benefits well from parallelism.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134072351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Uniform Reliable Broadcast (URB) is an important abstraction in distributed systems, offering delivery guarantee when spreading messages among processes. Informally, URB guarantees that if a process (correct or not) delivers a message m, then all correct processes deliver m. This abstraction has been extensively investigated in distributed systems where all processes have different identifiers. Furthermore, the majority of papers in the literature usually assume that the communication channels of the system are reliable, which is not always the case in real systems. In this paper, the URB abstraction is investigated in anonymous asynchronous message passing systems with fair lossy communication channels. Firstly, a simple algorithm is given to solve URB in such system model assuming a majority of correct processes. Then a new failure detector class AT is proposed. With AT, URB can be implemented with any number of correct processes. Due to the message loss caused by fair lossy communication channels, every correct process in this first algorithm has to broadcast all URB delivered messages forever, which makes the algorithm to be non-quiescent. In order to get a quiescent URB algorithm in anonymous asynchronous systems, a perfect anonymous failure detector AP* is proposed. Finally, a quiescent URB algorithm using AT and AP* is given.
{"title":"Implementing Uniform Reliable Broadcast in Anonymous Distributed Systems with Fair Lossy Channels","authors":"Jian Tang, M. Larrea, S. Arévalo, Ernesto Jiménez","doi":"10.1109/IPDPSW.2015.23","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.23","url":null,"abstract":"Uniform Reliable Broadcast (URB) is an important abstraction in distributed systems, offering delivery guarantee when spreading messages among processes. Informally, URB guarantees that if a process (correct or not) delivers a message m, then all correct processes deliver m. This abstraction has been extensively investigated in distributed systems where all processes have different identifiers. Furthermore, the majority of papers in the literature usually assume that the communication channels of the system are reliable, which is not always the case in real systems. In this paper, the URB abstraction is investigated in anonymous asynchronous message passing systems with fair lossy communication channels. Firstly, a simple algorithm is given to solve URB in such system model assuming a majority of correct processes. Then a new failure detector class AT is proposed. With AT, URB can be implemented with any number of correct processes. Due to the message loss caused by fair lossy communication channels, every correct process in this first algorithm has to broadcast all URB delivered messages forever, which makes the algorithm to be non-quiescent. In order to get a quiescent URB algorithm in anonymous asynchronous systems, a perfect anonymous failure detector AP* is proposed. Finally, a quiescent URB algorithm using AT and AP* is given.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134482985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nentawe Gurumdimma, A. Jhumka, Maria Liakata, Edward Chuah, J. Browne
The ability to automatically detect faults or fault patterns to enhance system reliability is important for system administrators in reducing system failures. To achieve this objective, the message logs from cluster system are augmented with failure information, i.e., The raw log data is labelled. However, tagging or labelling of raw log data is very costly. In this paper, our objective is to detect failure patterns in the message logs using unlabelled data. To achieve our aim, we propose a methodology whereby a pre-processing step is first performed where redundant data is removed. A clustering algorithm is then executed on the resulting logs, and we further developed an unsupervised algorithm to detect failure patterns in the clustered log by harnessing the characteristics of these sequences. We evaluated our methodology on large production data, and results shows that, on average, an f-measure of 78% can be obtained without having data labels. The implication of our methodology is that a system administrator with little knowledge of the system can detect failure runs with reasonably high accuracy.
{"title":"Towards Detecting Patterns in Failure Logs of Large-Scale Distributed Systems","authors":"Nentawe Gurumdimma, A. Jhumka, Maria Liakata, Edward Chuah, J. Browne","doi":"10.1109/IPDPSW.2015.109","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.109","url":null,"abstract":"The ability to automatically detect faults or fault patterns to enhance system reliability is important for system administrators in reducing system failures. To achieve this objective, the message logs from cluster system are augmented with failure information, i.e., The raw log data is labelled. However, tagging or labelling of raw log data is very costly. In this paper, our objective is to detect failure patterns in the message logs using unlabelled data. To achieve our aim, we propose a methodology whereby a pre-processing step is first performed where redundant data is removed. A clustering algorithm is then executed on the resulting logs, and we further developed an unsupervised algorithm to detect failure patterns in the clustered log by harnessing the characteristics of these sequences. We evaluated our methodology on large production data, and results shows that, on average, an f-measure of 78% can be obtained without having data labels. The implication of our methodology is that a system administrator with little knowledge of the system can detect failure runs with reasonably high accuracy.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132690601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RSA is one the most well-known public-key cryptosystems widely used for secure data transfer. An RSA encryption key includes a modulus n which is the product of two large prime numbers p and q. If an RSA modulus n can be decomposed into p and q, the corresponding decryption key can be computed easily from them and the original message can be obtained using it. RSA cryptosystem relies on the hardness of factorization of RSA modulus. Suppose that we have a lot of encryption keys collected from the Web. If some of them are inappropriately generated so that they share the same prime number, then they can be decomposed by computing their GCD (Greatest Common Divisor). Actually, a previously published investigation showed that a certain ratio of RSA moduli in encryption keys in the Web are sharing prime numbers. We may find such weak RSA moduli n by computing the GCD of many pairs of RSA moduli. The main contribution of this paper is to present a new Euclidean algorithm for computing the GCD of all pairs of encryption moduli. The idea of our new Euclidean algorithm that we call Approximate Euclidean algorithm is to compute an approximation of quotient by just one 64-bit division and to use it for reducing the number of iterations of the Euclidean algorithm. We also present an implementation of Approximate Euclidean algorithm optimized for CUDA-enabled GPUs. The experimental results show that our implementation for 1024-bit GCD on GeForce GTX 780Ti runs more than 80 times faster than the Intel Xeon CPU implementation. Further, our GPU implementation is more than 9 times faster than the best known published GCD computation using the same generation GPU.
{"title":"Bulk GCD Computation Using a GPU to Break Weak RSA Keys","authors":"Toru Fujita, K. Nakano, Yasuaki Ito","doi":"10.1109/IPDPSW.2015.54","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.54","url":null,"abstract":"RSA is one the most well-known public-key cryptosystems widely used for secure data transfer. An RSA encryption key includes a modulus n which is the product of two large prime numbers p and q. If an RSA modulus n can be decomposed into p and q, the corresponding decryption key can be computed easily from them and the original message can be obtained using it. RSA cryptosystem relies on the hardness of factorization of RSA modulus. Suppose that we have a lot of encryption keys collected from the Web. If some of them are inappropriately generated so that they share the same prime number, then they can be decomposed by computing their GCD (Greatest Common Divisor). Actually, a previously published investigation showed that a certain ratio of RSA moduli in encryption keys in the Web are sharing prime numbers. We may find such weak RSA moduli n by computing the GCD of many pairs of RSA moduli. The main contribution of this paper is to present a new Euclidean algorithm for computing the GCD of all pairs of encryption moduli. The idea of our new Euclidean algorithm that we call Approximate Euclidean algorithm is to compute an approximation of quotient by just one 64-bit division and to use it for reducing the number of iterations of the Euclidean algorithm. We also present an implementation of Approximate Euclidean algorithm optimized for CUDA-enabled GPUs. The experimental results show that our implementation for 1024-bit GCD on GeForce GTX 780Ti runs more than 80 times faster than the Intel Xeon CPU implementation. Further, our GPU implementation is more than 9 times faster than the best known published GCD computation using the same generation GPU.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131597628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Interdependent cyber-physical systems (CPS) connect physical resources between independently motivated actors who seek to maximize profits while providing physical services to consumers. Cyber attacks in seemingly distant parts of these systems have local consequences, and techniques are needed to analyze and optimize defensive costs in the face of increasing cyber threats. This paper presents a technique for transforming physical interconnections between independent actors into a dependency analysis that can be applied to find optimal defensive investment strategies to protect assets from financially motivated adversaries in electric power grids.
{"title":"Optimizing Defensive Investments in Energy-Based Cyber-Physical Systems","authors":"Paul C. Wood, S. Bagchi, Alefiya Hussain","doi":"10.1109/IPDPSW.2015.112","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.112","url":null,"abstract":"Interdependent cyber-physical systems (CPS) connect physical resources between independently motivated actors who seek to maximize profits while providing physical services to consumers. Cyber attacks in seemingly distant parts of these systems have local consequences, and techniques are needed to analyze and optimize defensive costs in the face of increasing cyber threats. This paper presents a technique for transforming physical interconnections between independent actors into a dependency analysis that can be applied to find optimal defensive investment strategies to protect assets from financially motivated adversaries in electric power grids.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"232 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114423952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Siddesh, K. Srinivasa, Ishank Mishra, Abhinav Anurag, E. Uppal
Phylogenetic analysis has become essential part of research on the evolutionary tree of life. Distance-matrix methods of phylogenetic analysis explicitly rely on a measure of "genetic distance" between the sequences being classified, and therefore they require multiple sequence alignments as an input. Distance methods attempt to construct an all-to-all matrix from the sequence query set describing the distance between each sequence pair. Dynamic algorithms like Needleman-Wunsch algorithm (NWA) and Smith-Waterman algorithm (SWA) produce accurate alignments, but are computation intensive and are limited to the number and size of the sequences. The paper focuses towards optimizing phylogenetic analysis of large quantities of data using the hadoop Map/Reduce programming model. The proposed approach depends on NWA to produce sequence alignments and neighbor-joining methods, specifically UPGMA (Unweighted Pair Group Method with Arithmetic mean) to produce rooted trees. The experimental results demonstrate that proposed solution achieve significant improvements with respect to performance and throughput. The dynamic nature of the NWA coupled with data and computational parallelism of hadoop MapReduce programming model improves the throughput and accuracy of sequence alignment. Hence the proposed approach intends to carve out a new methodology towards optimizing phylogenetic analysis by achieving significant performance gain.
系统发育分析已成为研究生命进化树的重要组成部分。系统发育分析的距离矩阵方法明确地依赖于被分类序列之间的“遗传距离”测量,因此它们需要多个序列比对作为输入。距离方法试图从序列查询集构造一个全对全矩阵,描述每个序列对之间的距离。像Needleman-Wunsch算法(NWA)和Smith-Waterman算法(SWA)这样的动态算法可以产生精确的比对,但计算量大,并且受序列数量和大小的限制。本文的重点是使用hadoop Map/Reduce编程模型对大量数据进行优化系统发育分析。该方法依赖于NWA生成序列比对和邻居连接方法,特别是UPGMA (Unweighted Pair Group Method with Arithmetic mean)生成根树。实验结果表明,该方案在性能和吞吐量方面都有显著提高。NWA的动态性与hadoop MapReduce编程模型的数据和计算并行性相结合,提高了序列对齐的吞吐量和准确性。因此,提出的方法旨在开拓出一种新的方法,通过实现显著的性能增益来优化系统发育分析。
{"title":"Phylogenetic Analysis Using MapReduce Programming Model","authors":"G. Siddesh, K. Srinivasa, Ishank Mishra, Abhinav Anurag, E. Uppal","doi":"10.1109/IPDPSW.2015.57","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.57","url":null,"abstract":"Phylogenetic analysis has become essential part of research on the evolutionary tree of life. Distance-matrix methods of phylogenetic analysis explicitly rely on a measure of \"genetic distance\" between the sequences being classified, and therefore they require multiple sequence alignments as an input. Distance methods attempt to construct an all-to-all matrix from the sequence query set describing the distance between each sequence pair. Dynamic algorithms like Needleman-Wunsch algorithm (NWA) and Smith-Waterman algorithm (SWA) produce accurate alignments, but are computation intensive and are limited to the number and size of the sequences. The paper focuses towards optimizing phylogenetic analysis of large quantities of data using the hadoop Map/Reduce programming model. The proposed approach depends on NWA to produce sequence alignments and neighbor-joining methods, specifically UPGMA (Unweighted Pair Group Method with Arithmetic mean) to produce rooted trees. The experimental results demonstrate that proposed solution achieve significant improvements with respect to performance and throughput. The dynamic nature of the NWA coupled with data and computational parallelism of hadoop MapReduce programming model improves the throughput and accuracy of sequence alignment. Hence the proposed approach intends to carve out a new methodology towards optimizing phylogenetic analysis by achieving significant performance gain.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122728336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}