Xiangfei Jia, A. Trotman, Richard A. O'Keefe, Zhiyi Huang
Operating systems only provide general-purpose I/O optimisation since they have to service various types of applications. However, application level I/O optimisation can achieve better performance since an application has a better knowledge of how to optimise disk I/O for the application. In this paper we provide a solution for application-specific I/O for optimising a search engine. It shows a 28% improvement when compared to the general-purpose I/O optimisation of Linux. Our result also shows a 11% improvement when the Linux I/O optimisation is bypassed.
{"title":"Application-Specific Disk I/O Optimisation for a Search Engine","authors":"Xiangfei Jia, A. Trotman, Richard A. O'Keefe, Zhiyi Huang","doi":"10.1109/PDCAT.2008.61","DOIUrl":"https://doi.org/10.1109/PDCAT.2008.61","url":null,"abstract":"Operating systems only provide general-purpose I/O optimisation since they have to service various types of applications. However, application level I/O optimisation can achieve better performance since an application has a better knowledge of how to optimise disk I/O for the application. In this paper we provide a solution for application-specific I/O for optimising a search engine. It shows a 28% improvement when compared to the general-purpose I/O optimisation of Linux. Our result also shows a 11% improvement when the Linux I/O optimisation is bypassed.","PeriodicalId":282779,"journal":{"name":"2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121218544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the resurgence of virtualization technology, todaypsilas Internet data centers are shifting towards virtualized data centers. Internet applications tend to see dynamically varying workloads. To address the problem of performance management for multi-tier applications hosted in virtualized Internet data center, we propose a three-level automatic provisioning framework based on feedback control for multi-tier applications. Experiments demonstrate the effectiveness of our technique in SLA guarantees while obtaining improved resource utilization.
{"title":"A Dynamic Provisioning Framework for Multi-tier Internet Applications in Virtualized Data Center","authors":"Yi Jin, Xu Liu, Jianfeng Zhan, Shuang Gao","doi":"10.1109/PDCAT.2008.74","DOIUrl":"https://doi.org/10.1109/PDCAT.2008.74","url":null,"abstract":"With the resurgence of virtualization technology, todaypsilas Internet data centers are shifting towards virtualized data centers. Internet applications tend to see dynamically varying workloads. To address the problem of performance management for multi-tier applications hosted in virtualized Internet data center, we propose a three-level automatic provisioning framework based on feedback control for multi-tier applications. Experiments demonstrate the effectiveness of our technique in SLA guarantees while obtaining improved resource utilization.","PeriodicalId":282779,"journal":{"name":"2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126824007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A parallel algorithm, namely parallel block diagonal dominant (PBDD) algorithm, is proposed to solve block tridiagonal linear systems on multi-computers. This algorithm is based on divided-and-conquer idea of the PDD method. When the systems is strictly block diagonal dominant, the PBDD is highly parallel and provides approximate solutions that equals to the exact solutions within machine accuracy. The PBDD method has been implemented on a 64-node multi-computer. The analytic results match closely with the results measured from the numerical experiments.
{"title":"A Parallel Algorithm for Block Tridiagonal Systems","authors":"Heng Zhang, Wu Zhang, Xian-He Sun","doi":"10.1109/PDCAT.2008.21","DOIUrl":"https://doi.org/10.1109/PDCAT.2008.21","url":null,"abstract":"A parallel algorithm, namely parallel block diagonal dominant (PBDD) algorithm, is proposed to solve block tridiagonal linear systems on multi-computers. This algorithm is based on divided-and-conquer idea of the PDD method. When the systems is strictly block diagonal dominant, the PBDD is highly parallel and provides approximate solutions that equals to the exact solutions within machine accuracy. The PBDD method has been implemented on a 64-node multi-computer. The analytic results match closely with the results measured from the numerical experiments.","PeriodicalId":282779,"journal":{"name":"2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126535359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The development of numerical simulation software tools for the solution of real-world problems usually calls for domain experts in modeling. The GraPA framework, as an abstraction layer on top of hardware characteristics, supports modelers in two respects: one is the built-in support for co-processing of multiple models and the other is the generically delivered high performance achieved by implementing concurrency features of multicore and distributed memory architectures. Technically, GraPA is designed as a C++ template framework, where the modeler`s data structures and algorithms instantiate the framework. Using this approach, we handle parallel processing of lock-free data structures and message passing transparently to the modelers. In this paper, we report on the status of the implementation of GraPA and on its performance characteristics.
{"title":"A Framework for Concurrency in Numerical Simulations Using Lock Free Data Structures: The Graph Parallel Architecture GraPA","authors":"P. Klein, Dimo Maleshkov, D. Asenov","doi":"10.1109/PDCAT.2008.32","DOIUrl":"https://doi.org/10.1109/PDCAT.2008.32","url":null,"abstract":"The development of numerical simulation software tools for the solution of real-world problems usually calls for domain experts in modeling. The GraPA framework, as an abstraction layer on top of hardware characteristics, supports modelers in two respects: one is the built-in support for co-processing of multiple models and the other is the generically delivered high performance achieved by implementing concurrency features of multicore and distributed memory architectures. Technically, GraPA is designed as a C++ template framework, where the modeler`s data structures and algorithms instantiate the framework. Using this approach, we handle parallel processing of lock-free data structures and message passing transparently to the modelers. In this paper, we report on the status of the implementation of GraPA and on its performance characteristics.","PeriodicalId":282779,"journal":{"name":"2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125670247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The main contribution of this paper is to present hardware algorithms for redundant radix-2r number system in the FPGA to speed the arithmetic operations for numbers with many bits, which have applications in security systems such as RSA encryption and decryption. Our hardware algorithms accelerate arithmetic operations including addition, multiplication, and Montgomery modulo multiplication.Quite surprisingly, our hardware algorithms of the multiplication and Montgomery multiplication for two 1024-bit numbers runs only 64 clock cycles using redundant radix-216 number system. Also, the experimental results for Xilinx Virtex-II Pro Family FPGA XC2VP100-6 show that the clock frequency of our circuit is independent of the number of bits. The speed up factors of our hardware algorithm using the redundant number system over those using the conventional number system are 8.3 for 1024-bit addition, 3.4 for 1024-bit multiplication, and 2.5 for 1024-bit Montgomery modulo multiplication. Further, for 256-bit Montgomery modulo multiplication, our hardware algorithm runs in 0.38 mus, while a previously known implementation runs in 1.22 mus. Thus, our approach using redundant number system for arithmetic operations is very efficient.
{"title":"Redundant Radix-2r Number System for Accelerating Arithmetic Operations on the FPGAs","authors":"K. Kawakami, K. Shigemoto, K. Nakano","doi":"10.1109/PDCAT.2008.13","DOIUrl":"https://doi.org/10.1109/PDCAT.2008.13","url":null,"abstract":"The main contribution of this paper is to present hardware algorithms for redundant radix-2r number system in the FPGA to speed the arithmetic operations for numbers with many bits, which have applications in security systems such as RSA encryption and decryption. Our hardware algorithms accelerate arithmetic operations including addition, multiplication, and Montgomery modulo multiplication.Quite surprisingly, our hardware algorithms of the multiplication and Montgomery multiplication for two 1024-bit numbers runs only 64 clock cycles using redundant radix-216 number system. Also, the experimental results for Xilinx Virtex-II Pro Family FPGA XC2VP100-6 show that the clock frequency of our circuit is independent of the number of bits. The speed up factors of our hardware algorithm using the redundant number system over those using the conventional number system are 8.3 for 1024-bit addition, 3.4 for 1024-bit multiplication, and 2.5 for 1024-bit Montgomery modulo multiplication. Further, for 256-bit Montgomery modulo multiplication, our hardware algorithm runs in 0.38 mus, while a previously known implementation runs in 1.22 mus. Thus, our approach using redundant number system for arithmetic operations is very efficient.","PeriodicalId":282779,"journal":{"name":"2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"186 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123282467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a novel video service system DCSVS (distributed collaborative set-top-box video service), which encompasses several practical and effective solutions to both live and VoD (video-on-demand) services. DCSVS is established on an overlay DHT (distributed hash table) network, which improves Kademlia protocol to fit for real-time application. We use several types of pre-fetching to enhance continuity rate of VoD service, and we employ an effective scheduling algorithm based on transferring priorities. We design the inner-first and proxy-forward strategy to relieve transferring failure, and we use Window-based storage and hashing to achieve better system performance. Both theoretical analysis and experimental results show that strategies in DCSVS perform well in terms of efficiency and robustness, and could maintain a fine state and scalability in large-scale networking.
{"title":"DCSVS: Distributed Collaborative Set-Top-Box Video Service","authors":"Chao Liu, Hao Chen, D. Ye","doi":"10.1109/PDCAT.2008.17","DOIUrl":"https://doi.org/10.1109/PDCAT.2008.17","url":null,"abstract":"This paper presents a novel video service system DCSVS (distributed collaborative set-top-box video service), which encompasses several practical and effective solutions to both live and VoD (video-on-demand) services. DCSVS is established on an overlay DHT (distributed hash table) network, which improves Kademlia protocol to fit for real-time application. We use several types of pre-fetching to enhance continuity rate of VoD service, and we employ an effective scheduling algorithm based on transferring priorities. We design the inner-first and proxy-forward strategy to relieve transferring failure, and we use Window-based storage and hashing to achieve better system performance. Both theoretical analysis and experimental results show that strategies in DCSVS perform well in terms of efficiency and robustness, and could maintain a fine state and scalability in large-scale networking.","PeriodicalId":282779,"journal":{"name":"2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115063211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wireless sensor networks (WSNs) usually employ different ranging techniques to measure the distance between an unknown node and its neighboring anchor nodes, and based on the measured distance to estimate the position of the unknown node. This paper presents an effective Particle Swarm Optimization (PSO)-based Localization Scheme using the Radio Signal Strength (RSS) ranging technique. Modified from the iterative multilateration algorithm, our scheme is unique in adopting the location data of remote anchors provided by the closest neighbor anchors of an unknown node to estimate the unknown nodepsilas position and using the PSO algorithm to further reduce error accumulation. The new scheme meanwhile takes in a modified DV-distance approach to raise the success ratios of locating unknown nodes. Compared with related schemes, our scheme is shown through simulations to perform constantly better in increasing localization success ratios and decreasing location errors -- at reduced cost.
{"title":"An Effective PSO-Based Node Localization Scheme for Wireless Sensor Networks","authors":"Po-Jen Chuang, Cheng-Pei Wu","doi":"10.1109/PDCAT.2008.73","DOIUrl":"https://doi.org/10.1109/PDCAT.2008.73","url":null,"abstract":"Wireless sensor networks (WSNs) usually employ different ranging techniques to measure the distance between an unknown node and its neighboring anchor nodes, and based on the measured distance to estimate the position of the unknown node. This paper presents an effective Particle Swarm Optimization (PSO)-based Localization Scheme using the Radio Signal Strength (RSS) ranging technique. Modified from the iterative multilateration algorithm, our scheme is unique in adopting the location data of remote anchors provided by the closest neighbor anchors of an unknown node to estimate the unknown nodepsilas position and using the PSO algorithm to further reduce error accumulation. The new scheme meanwhile takes in a modified DV-distance approach to raise the success ratios of locating unknown nodes. Compared with related schemes, our scheme is shown through simulations to perform constantly better in increasing localization success ratios and decreasing location errors -- at reduced cost.","PeriodicalId":282779,"journal":{"name":"2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132228725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The iterative methods such as BiCGStab for solving electromagnetic field integer equations have a complexity of O(N2), which can be reduced to O(N logN) by multilevel fast multipole method (MLFMM). For large scale problems, MLFMM should be parallelized, and the iterative convergence can be accelerated by preconditioners such as incomplete inverse triangular factorization preconditioner. The interpolation based on spherical harmonic transform at each level of MLFMMpsilas octree can be further accelerated by FFT. Based on this acceleration scheme tested on distributed cluster, the results show this algorithm is feasible.
{"title":"Parallelization and Acceleration Scheme of Multilevel Fast Multipole Method","authors":"Wu Wang, Yangde Feng, Xue-bin Chi","doi":"10.1109/PDCAT.2008.34","DOIUrl":"https://doi.org/10.1109/PDCAT.2008.34","url":null,"abstract":"The iterative methods such as BiCGStab for solving electromagnetic field integer equations have a complexity of O(N2), which can be reduced to O(N logN) by multilevel fast multipole method (MLFMM). For large scale problems, MLFMM should be parallelized, and the iterative convergence can be accelerated by preconditioners such as incomplete inverse triangular factorization preconditioner. The interpolation based on spherical harmonic transform at each level of MLFMMpsilas octree can be further accelerated by FFT. Based on this acceleration scheme tested on distributed cluster, the results show this algorithm is feasible.","PeriodicalId":282779,"journal":{"name":"2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132249532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The recent technological advances have entrenched the potential benefits, when large population of wireless sensor nodes deployed in agricultural, industrial and environmental areas to predict the behavioral analysis of physical attributes such as temperature or gas. This work mainly focuses on the three dimensional temperature distribution of a specified field based on virtually deployed sensor nodes in a simulation environment. The parameters temperature and location are considered in the simulation model. In this work, we have evaluated the minimum number of nodes that are required to map the given space. Modeling & simulation has been dealt with in testing the network density on the space coverage. This work exploits a spatial correlation of temperature data in a given space. Finally the paper discusses the extension of approaches that leads to new research challenges due to the relationships between the obstacles within the environment.
{"title":"Portable Object Thermal Awareness: Modeling Intelligent Sensor Networks for Cool Store Applications","authors":"N. Yamani, A. Al-Anbuky, A. Gyasi-Agyei","doi":"10.1109/PDCAT.2008.37","DOIUrl":"https://doi.org/10.1109/PDCAT.2008.37","url":null,"abstract":"The recent technological advances have entrenched the potential benefits, when large population of wireless sensor nodes deployed in agricultural, industrial and environmental areas to predict the behavioral analysis of physical attributes such as temperature or gas. This work mainly focuses on the three dimensional temperature distribution of a specified field based on virtually deployed sensor nodes in a simulation environment. The parameters temperature and location are considered in the simulation model. In this work, we have evaluated the minimum number of nodes that are required to map the given space. Modeling & simulation has been dealt with in testing the network density on the space coverage. This work exploits a spatial correlation of temperature data in a given space. Finally the paper discusses the extension of approaches that leads to new research challenges due to the relationships between the obstacles within the environment.","PeriodicalId":282779,"journal":{"name":"2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114931090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Shorfuzzaman, P. Graham, Mehmet Rasit Eskicioglu
Data grids provide geographically distributed storage for large-scale data-intensive applications. Ensuring efficient access to such large and widely distributed datasets is hindered by high latencies. To speed up data access, data grid systems replicate data in multiple locations so a user can access the data from a nearby site. In addition to reducing data access time, replication also aims to use network and storage resources efficiently. While replication is a well-known technique, the problem of replica placement has not been widely studied for data grid environments. To obtain the best possible gains from replication, strategic placement of the replicas is critical. In a grid environment resource availability, network latency, and userspsila requests can vary. To address these issues a placement strategy is needed that adapts to dynamic behavior. This paper proposes a new dynamic replica placement algorithm for hierarchical data grids based on file ldquopopularityrdquo. Our goal is to place replicas close to the clients to reduce access time while using the network and storage efficiently thereby effectively balancing storage cost and access latency. We evaluate our algorithm using OptorSim which shows that our approach outperforms other techniques in terms of access time and bandwidth used.
{"title":"Popularity-Driven Dynamic Replica Placement in Hierarchical Data Grids","authors":"Mohammad Shorfuzzaman, P. Graham, Mehmet Rasit Eskicioglu","doi":"10.1109/PDCAT.2008.64","DOIUrl":"https://doi.org/10.1109/PDCAT.2008.64","url":null,"abstract":"Data grids provide geographically distributed storage for large-scale data-intensive applications. Ensuring efficient access to such large and widely distributed datasets is hindered by high latencies. To speed up data access, data grid systems replicate data in multiple locations so a user can access the data from a nearby site. In addition to reducing data access time, replication also aims to use network and storage resources efficiently. While replication is a well-known technique, the problem of replica placement has not been widely studied for data grid environments. To obtain the best possible gains from replication, strategic placement of the replicas is critical. In a grid environment resource availability, network latency, and userspsila requests can vary. To address these issues a placement strategy is needed that adapts to dynamic behavior. This paper proposes a new dynamic replica placement algorithm for hierarchical data grids based on file ldquopopularityrdquo. Our goal is to place replicas close to the clients to reduce access time while using the network and storage efficiently thereby effectively balancing storage cost and access latency. We evaluate our algorithm using OptorSim which shows that our approach outperforms other techniques in terms of access time and bandwidth used.","PeriodicalId":282779,"journal":{"name":"2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121729598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}