Radio Frequency Identification (RFID) techniques are widely used in many ubiquitous applications. The most important usage of RFID techniques is to read the tags within a reader's interrogation area such that the objects attached with those tags can be identified. In real practice, it is common that a tag is physically in the interrogation range, but cannot be read by the reader, due to the multipath effect and other interference. This phenomenon, namely the hidden tag problem, is a big challenge to achieve high identification rate. To address this problem, most prior works depend on empirical or measurement-based methods to tune the transmission power for readers. Such a case-by-case solution is impractical for generic implementation. In this paper, we theoretically and experimentally explore the reasons why hidden tag problem occurs. To alleviate its impact, we propose a unified and measurable model, PAL, to formulate this problem and its impact. Different from previous works, our solution is generic and fully compatible with existing EPC C1G2 protocol. The analysis and measurement based on our model can help to design and deploy RFID systems with high identification rate.
{"title":"Nowhere to hide: An empirical study on hidden UHF RFID tags","authors":"Rui Li, H. Ding, Jinsong Han, Shaoping Li, Xing Wang, Hui Liu, Jizhong Zhao","doi":"10.1109/PADSW.2014.7097860","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097860","url":null,"abstract":"Radio Frequency Identification (RFID) techniques are widely used in many ubiquitous applications. The most important usage of RFID techniques is to read the tags within a reader's interrogation area such that the objects attached with those tags can be identified. In real practice, it is common that a tag is physically in the interrogation range, but cannot be read by the reader, due to the multipath effect and other interference. This phenomenon, namely the hidden tag problem, is a big challenge to achieve high identification rate. To address this problem, most prior works depend on empirical or measurement-based methods to tune the transmission power for readers. Such a case-by-case solution is impractical for generic implementation. In this paper, we theoretically and experimentally explore the reasons why hidden tag problem occurs. To alleviate its impact, we propose a unified and measurable model, PAL, to formulate this problem and its impact. Different from previous works, our solution is generic and fully compatible with existing EPC C1G2 protocol. The analysis and measurement based on our model can help to design and deploy RFID systems with high identification rate.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"210 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133419279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097885
Tzu-Chi Huang, Kuo-Chih Chu, Ming-Fong Tsai
Cloud computing is the emerging and attractive technology and provides users with various services in a pay-as-you-go manner. Cloud computing nowadays does not limit resources of the services in a cloud to the computers that are far away from users and connected to each other in a data center with high speed networks at the same geographic location. Cloud computing may present a cloud to users by connecting resources at multiple geographic locations. By connecting resources at multiple geographic locations to organize a cloud, cloud computing may meet problems of communication interception, congestion, and interruption. Cloud computing should have a way to supply extra processing on demand for certain links between computers separated geographically. Since a MapReduce cloud is the key to the success of the large-scale computation, cloud computing can use the Smart MapReduce Cloud (SMRC) proposed in this paper to apply extra processing to intermediate data on demand while intermediate data is delivered among computers in the MapReduce cloud. In experiments, cloud computing is tested with several popular MapReduce applications to observe performances of data encryption and compression via XOR and GZIP functions in SMRC.
{"title":"Smart MapReduce cloud: Applying extra processing to intermediate data on demand","authors":"Tzu-Chi Huang, Kuo-Chih Chu, Ming-Fong Tsai","doi":"10.1109/PADSW.2014.7097885","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097885","url":null,"abstract":"Cloud computing is the emerging and attractive technology and provides users with various services in a pay-as-you-go manner. Cloud computing nowadays does not limit resources of the services in a cloud to the computers that are far away from users and connected to each other in a data center with high speed networks at the same geographic location. Cloud computing may present a cloud to users by connecting resources at multiple geographic locations. By connecting resources at multiple geographic locations to organize a cloud, cloud computing may meet problems of communication interception, congestion, and interruption. Cloud computing should have a way to supply extra processing on demand for certain links between computers separated geographically. Since a MapReduce cloud is the key to the success of the large-scale computation, cloud computing can use the Smart MapReduce Cloud (SMRC) proposed in this paper to apply extra processing to intermediate data on demand while intermediate data is delivered among computers in the MapReduce cloud. In experiments, cloud computing is tested with several popular MapReduce applications to observe performances of data encryption and compression via XOR and GZIP functions in SMRC.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115193964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biological frequent patterns usually correspond to the important function (or structure) in biological sequences. Along with the rapid growth of biological sequences, it is significant to find frequent patterns over a large bio-sequence efficiently. However, most of existing algorithms need to produce lots of short patterns or projected databases, which influence the efficiency badly and also increase the cost of space. Graphics processing units (GPUs) embracing many core computing devices, have been extensively applied to accelerate computation performance in many areas. In order to meet the demand of biologists, we redefine the frequent pattern problem with length constraints for finding frequent patterns. We present pruning optimization method for the serial algorithm (POSA), and based on this technique, we propose a parallel algorithm (POPA) which not only reduces the time complexity with a low space cost but also obtains better performance on CUDA. To validate the presented algorithms, we implemented the algorithms on multiple-core CPU and various GPU devices. Also, CUDA optimization techniques are applied to speed up calculation in the paper. Finally, experimental results show that compared with the serial algorithm on CPU with six cores, POSA achieves 1.2~4.5 speedup, and POPA gains 3~20 speedup.
生物频率模式通常与生物序列中的重要功能(或结构)相对应。随着生物序列的快速增长,在一个大的生物序列中有效地发现频繁模式是一个非常重要的问题。然而,现有的算法大多需要生成大量的短模式或投影数据库,这严重影响了算法的效率,也增加了空间成本。图形处理单元(Graphics processing unit, gpu)包含了许多核心计算设备,在许多领域被广泛应用于加速计算性能。为了满足生物学家的需求,我们重新定义了具有长度约束的频繁模式问题,以寻找频繁模式。提出了串行算法(POSA)的剪枝优化方法,并在此基础上提出了一种并行算法(POPA),该算法不仅以较低的空间成本降低了时间复杂度,而且在CUDA上获得了更好的性能。为了验证所提出的算法,我们在多核CPU和各种GPU设备上实现了算法。此外,本文还采用了CUDA优化技术来加快计算速度。实验结果表明,与六核CPU上的串行算法相比,POSA算法的速度提高了1.2~4.5,POPA算法的速度提高了3~20。
{"title":"GPU acceleration of finding frequent patterns over large biological sequence","authors":"Shufang Du, Longjiang Guo, Chunyu Ai, Jinbao Li, Meirui Ren, Yahong Guo","doi":"10.1109/PADSW.2014.7097865","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097865","url":null,"abstract":"Biological frequent patterns usually correspond to the important function (or structure) in biological sequences. Along with the rapid growth of biological sequences, it is significant to find frequent patterns over a large bio-sequence efficiently. However, most of existing algorithms need to produce lots of short patterns or projected databases, which influence the efficiency badly and also increase the cost of space. Graphics processing units (GPUs) embracing many core computing devices, have been extensively applied to accelerate computation performance in many areas. In order to meet the demand of biologists, we redefine the frequent pattern problem with length constraints for finding frequent patterns. We present pruning optimization method for the serial algorithm (POSA), and based on this technique, we propose a parallel algorithm (POPA) which not only reduces the time complexity with a low space cost but also obtains better performance on CUDA. To validate the presented algorithms, we implemented the algorithms on multiple-core CPU and various GPU devices. Also, CUDA optimization techniques are applied to speed up calculation in the paper. Finally, experimental results show that compared with the serial algorithm on CPU with six cores, POSA achieves 1.2~4.5 speedup, and POPA gains 3~20 speedup.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115145482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097859
A. Jouy, Jianguo Yao, G. Zhu
Avionic networks exerting the Avionics Full-Duplex Switched Ethernet (AFDX) protocol utilize a small amount of the bandwidth to transmit critical traffics. As there is an increasing demand on data exchange for non critical applications, it is of great interest to make use of the physically available capability of the network through optimal bandwidth allocation. In this paper, the problem of bandwidth allocation in AFDX networks is treated in the framework of Network Utility Maximization (NUM). In the present work, multi-path routing is used for non-critical applications to explore the available bandwidth and to improve system performance. The optimization problem is decomposed into a rate update subproblem and a traffic routing subproblem linked together by a pricing dynamic system. A distributed algorithm for bandwidth allocation with multi-path routing is developed and the convergence of the algorithm is proven using Lyapunov stability theory. Some issues related to the implementation of the devolved algorithm in the context of real AFDX networks are addressed and the corresponding solutions are provided. Finally, TrueTime based simulations conform the viability and the applicability of the proposed approach.
{"title":"Optimal bandwidth allocation with dynamic multi-path routing for non-critical traffic in AFDX networks","authors":"A. Jouy, Jianguo Yao, G. Zhu","doi":"10.1109/PADSW.2014.7097859","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097859","url":null,"abstract":"Avionic networks exerting the Avionics Full-Duplex Switched Ethernet (AFDX) protocol utilize a small amount of the bandwidth to transmit critical traffics. As there is an increasing demand on data exchange for non critical applications, it is of great interest to make use of the physically available capability of the network through optimal bandwidth allocation. In this paper, the problem of bandwidth allocation in AFDX networks is treated in the framework of Network Utility Maximization (NUM). In the present work, multi-path routing is used for non-critical applications to explore the available bandwidth and to improve system performance. The optimization problem is decomposed into a rate update subproblem and a traffic routing subproblem linked together by a pricing dynamic system. A distributed algorithm for bandwidth allocation with multi-path routing is developed and the convergence of the algorithm is proven using Lyapunov stability theory. Some issues related to the implementation of the devolved algorithm in the context of real AFDX networks are addressed and the corresponding solutions are provided. Finally, TrueTime based simulations conform the viability and the applicability of the proposed approach.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"85 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114090412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097858
Guoxing Luo, Zhigang Han, Li Lu, M. Hussain
Wormhole attack is one of the severe threats to wireless sensor and ad hoc networks. Most of the existing countermeasures either require specialized hardware or demand high network overheads in order to capture the specific symptoms induced by the wormholes, which in result, limits their applicability. In this paper, we exploit an inevitable symptom of wormholes and present Pworm, a passive wormhole detection and localization system based upon the key observation that a large amount of network traffic will be attracted by the wormholes. The proposed passive and real-time scheme silently observes the variations in network topology to infer the wormhole existence. Our approach relies solely on network routing information and does not necessitate specialized hardware or poses rigorous assumptions on network features. We evaluate our system performance through extensive simulations of 100 to 500 nodes for various network scales and show that Pworm is well suited for false alarms, scalability and time delay.
{"title":"Real-time and passive wormhole detection for wireless sensor networks","authors":"Guoxing Luo, Zhigang Han, Li Lu, M. Hussain","doi":"10.1109/PADSW.2014.7097858","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097858","url":null,"abstract":"Wormhole attack is one of the severe threats to wireless sensor and ad hoc networks. Most of the existing countermeasures either require specialized hardware or demand high network overheads in order to capture the specific symptoms induced by the wormholes, which in result, limits their applicability. In this paper, we exploit an inevitable symptom of wormholes and present Pworm, a passive wormhole detection and localization system based upon the key observation that a large amount of network traffic will be attracted by the wormholes. The proposed passive and real-time scheme silently observes the variations in network topology to infer the wormhole existence. Our approach relies solely on network routing information and does not necessitate specialized hardware or poses rigorous assumptions on network features. We evaluate our system performance through extensive simulations of 100 to 500 nodes for various network scales and show that Pworm is well suited for false alarms, scalability and time delay.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124413779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097883
Xing Tang, Haijian Zhang, Kunxiao Zhou, Jing Wang
In multi-hop WLANs, it is more efficient to let some clients relay traffic for the clients whose distances are a bit far away from the AP, which increases the transmission efficiency. The existing solutions for multi-hop WLANs try to choose one or two fixed best relays to forward packets. However, the links between far-away clients and the AP are intermittently connected, and end-to-end paths may not exit. In order to fill this gap, using opportunistic forwarding, our solutions do not fix a relay; the source simply broadcasts its signal, any client who received the signal will relay the traffic. In this paper, our object is to extend the connectivity by opportunistic forwarding for far-away clients in multi-hop WLAN accesses. The original contributions made by this paper include: 1) We develop a general model to analyze the throughput of opportunistic forwarding in multi-hop WLANs. 2) We classify the clients into three groups, and propose a centralized opportunistic routing relay algorithm for each group of clients to achieve the optimal system throughput. 3) We propose a distributed opportunistic routing relay protocol for clients' disconnected and connected actions. Extensive simulations have been done in NS-2. The simulation results show that our protocol can significantly increase the service's coverage and connectivity of the entire network, compared with those methods by traditional one-hop and multi-hop relay.
{"title":"Extending access point service coverage area through opportunistic forwarding in multi-hop collaborative relay WLANs","authors":"Xing Tang, Haijian Zhang, Kunxiao Zhou, Jing Wang","doi":"10.1109/PADSW.2014.7097883","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097883","url":null,"abstract":"In multi-hop WLANs, it is more efficient to let some clients relay traffic for the clients whose distances are a bit far away from the AP, which increases the transmission efficiency. The existing solutions for multi-hop WLANs try to choose one or two fixed best relays to forward packets. However, the links between far-away clients and the AP are intermittently connected, and end-to-end paths may not exit. In order to fill this gap, using opportunistic forwarding, our solutions do not fix a relay; the source simply broadcasts its signal, any client who received the signal will relay the traffic. In this paper, our object is to extend the connectivity by opportunistic forwarding for far-away clients in multi-hop WLAN accesses. The original contributions made by this paper include: 1) We develop a general model to analyze the throughput of opportunistic forwarding in multi-hop WLANs. 2) We classify the clients into three groups, and propose a centralized opportunistic routing relay algorithm for each group of clients to achieve the optimal system throughput. 3) We propose a distributed opportunistic routing relay protocol for clients' disconnected and connected actions. Extensive simulations have been done in NS-2. The simulation results show that our protocol can significantly increase the service's coverage and connectivity of the entire network, compared with those methods by traditional one-hop and multi-hop relay.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122408912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097810
Xiaonan Guo, Kaishun Wu, Qian Zhang
Recently, directional antenna has shown great potential to deploy in indoor environments and significantly improve wireless network capacity. Due to its directionality, the orientation of antenna becomes a critical issue especially for moving devices while exchanging data with directional AP. Previous work largely focused on scenarios, in which the clients are stable at most of the time and addressed the issue of moving device by remeasurement of the whole network information such as received signal strength, conflict graph. However, re-collection of such information will introduce significant overhead. To adjust directional antenna quickly and wisely in real-time, in this paper, we introduce ADAS, a system that adjust orientation of directional antenna with sensor hints from mobile devices. The key idea of ADAS system is to obtain the movement behavior and location information from sensors equipped in mobile devices, then the AP adapts its orientation accordingly. We implement ADAS system on commercial directional antenna and evaluate its performance under different configurations. The experiment results demonstrate that directional AP in ADAS system could adapt to mobility with less overhead.
{"title":"ADAS: Adjust directional antenna with sensor hints","authors":"Xiaonan Guo, Kaishun Wu, Qian Zhang","doi":"10.1109/PADSW.2014.7097810","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097810","url":null,"abstract":"Recently, directional antenna has shown great potential to deploy in indoor environments and significantly improve wireless network capacity. Due to its directionality, the orientation of antenna becomes a critical issue especially for moving devices while exchanging data with directional AP. Previous work largely focused on scenarios, in which the clients are stable at most of the time and addressed the issue of moving device by remeasurement of the whole network information such as received signal strength, conflict graph. However, re-collection of such information will introduce significant overhead. To adjust directional antenna quickly and wisely in real-time, in this paper, we introduce ADAS, a system that adjust orientation of directional antenna with sensor hints from mobile devices. The key idea of ADAS system is to obtain the movement behavior and location information from sensors equipped in mobile devices, then the AP adapts its orientation accordingly. We implement ADAS system on commercial directional antenna and evaluate its performance under different configurations. The experiment results demonstrate that directional AP in ADAS system could adapt to mobility with less overhead.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122527019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097877
Yuhua Lin, Haiying Shen
Question and Answer (Q&A) systems aggregate the collected intelligence of all users to provide satisfying answers for questions. A well-developed Q&A system should incorporate features such as high question response rate, high answer quality, a spam-free environment for users. Previous works use reputation systems to achieve the goals. However, these reputation systems evaluate a user with an overall rating for all questions the user has answered regardless of the question categories, thus the reputation score does not accurately reflect the user's ability to answer a question in a specific category. We propose SmartQ: a reputation based Q&A System. SmartQ employs a category and theme based reputation management system to evaluate users' willingness and capability to answer various kinds of questions. The reputation system facilitates the forwarding of a question to favorable experts, which improves the question response rate and answer quality. Also, SmartQ incorporates a lightweight spammer detection method to identify potential spammers. Our trace-driven simulation on PeerSim demonstrates the effectiveness of SmartQ in providing a good user experience. We then develop a real application of SmartQ and deploy it for use in a student group in Clemson University. The user feedback shows that SmartQ can provide high-quality answers for users in a community.
{"title":"SmartQ: A question and answer system for supplying high-quality and trustworthy answers","authors":"Yuhua Lin, Haiying Shen","doi":"10.1109/PADSW.2014.7097877","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097877","url":null,"abstract":"Question and Answer (Q&A) systems aggregate the collected intelligence of all users to provide satisfying answers for questions. A well-developed Q&A system should incorporate features such as high question response rate, high answer quality, a spam-free environment for users. Previous works use reputation systems to achieve the goals. However, these reputation systems evaluate a user with an overall rating for all questions the user has answered regardless of the question categories, thus the reputation score does not accurately reflect the user's ability to answer a question in a specific category. We propose SmartQ: a reputation based Q&A System. SmartQ employs a category and theme based reputation management system to evaluate users' willingness and capability to answer various kinds of questions. The reputation system facilitates the forwarding of a question to favorable experts, which improves the question response rate and answer quality. Also, SmartQ incorporates a lightweight spammer detection method to identify potential spammers. Our trace-driven simulation on PeerSim demonstrates the effectiveness of SmartQ in providing a good user experience. We then develop a real application of SmartQ and deploy it for use in a student group in Clemson University. The user feedback shows that SmartQ can provide high-quality answers for users in a community.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128353853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097852
Yiqung Liu, Xianyi Zhang, Chao Yang, Fangfang Liu, Yutong Lu
In this paper, we propose a hybrid algorithm to enable and accelerate the High Performance Conjugate Gradient (HPCG) benchmark on a heterogeneous node with an arbitrary number of accelerators. In the hybrid algorithm, each subdomain is assigned to a node after a three-dimensional domain decomposition. The subdomain is further divided to several regular inner blocks and an outer part with a flexible inner-outer partitioning strategy. Each inner task is assigned to a MIC device and the size is adjustable to adapt the accelerator's computational power. The only outer part is assigned to CPU and the thickness of boundary size is also adjustable to maintain load balance between CPU and MICs. By properly fusing the computational kernels with preceding ones, we present an asynchronous data transfer scheme to better overlap local computation with the PCI-express data transfer. All basic HPCG kernels, especially the time-consuming sparse matrix-vector multiplication (SpMV) and the symmetric Gauss-Seidel relaxation (SymGS), are extensively optimized for both CPU and MIC, on both algorithmic and architectural levels. On a single node of Tianhe-2 which is composed of an Intel Xeon processor and three Intel Xeon Phi coprocessors, we successfully obtain an aggregated performance of 50.2 Gflops, which is around 1.5% of the peak performance.
{"title":"Accelerating HPCG on Tianhe-2: A hybrid CPU-MIC algorithm","authors":"Yiqung Liu, Xianyi Zhang, Chao Yang, Fangfang Liu, Yutong Lu","doi":"10.1109/PADSW.2014.7097852","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097852","url":null,"abstract":"In this paper, we propose a hybrid algorithm to enable and accelerate the High Performance Conjugate Gradient (HPCG) benchmark on a heterogeneous node with an arbitrary number of accelerators. In the hybrid algorithm, each subdomain is assigned to a node after a three-dimensional domain decomposition. The subdomain is further divided to several regular inner blocks and an outer part with a flexible inner-outer partitioning strategy. Each inner task is assigned to a MIC device and the size is adjustable to adapt the accelerator's computational power. The only outer part is assigned to CPU and the thickness of boundary size is also adjustable to maintain load balance between CPU and MICs. By properly fusing the computational kernels with preceding ones, we present an asynchronous data transfer scheme to better overlap local computation with the PCI-express data transfer. All basic HPCG kernels, especially the time-consuming sparse matrix-vector multiplication (SpMV) and the symmetric Gauss-Seidel relaxation (SymGS), are extensively optimized for both CPU and MIC, on both algorithmic and architectural levels. On a single node of Tianhe-2 which is composed of an Intel Xeon processor and three Intel Xeon Phi coprocessors, we successfully obtain an aggregated performance of 50.2 Gflops, which is around 1.5% of the peak performance.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124702865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097846
Deng Chen, L. Du, Zhiping Jiang, Wei Xi, Jinsong Han, K. Zhao, Jizhong Zhao, Zhi Wang, Rui Li
Although fingerprint based localization is promising for indoor applications, its accuracy still remains a huge challenge. Most of existing approaches rely on the Radio Signal Strength (RSS) to generate fingerprints. However, merely using RSS is unable to accurately localize objects since such an one-dimensional fingerprint will be seriously influenced by the interference and multi-path effect in the indoor environment. In this paper, we propose a new localization approach based on multidimensional Wi-Fi fingerprint. Instead of only using RSS to construct fingerprint, we employ RSS, transmitted power, and channel information to construct an integrated fingerprint. The extended fingerprint enables fine-grained localization and tracking services. We also deign a cosine similarity based matching algorithm and enhanced particle filter mechanism to achieve accurate localization and tracking. Extensive experiment and implementation results show that the new fingerprint and proposed algorithms can achieve an accuracy within two meters in 90% of testing points, while demonstrating a good adaptability to complex indoor environments.
{"title":"A fine-grained indoor localization using multidimensional Wi-Fi fingerprinting","authors":"Deng Chen, L. Du, Zhiping Jiang, Wei Xi, Jinsong Han, K. Zhao, Jizhong Zhao, Zhi Wang, Rui Li","doi":"10.1109/PADSW.2014.7097846","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097846","url":null,"abstract":"Although fingerprint based localization is promising for indoor applications, its accuracy still remains a huge challenge. Most of existing approaches rely on the Radio Signal Strength (RSS) to generate fingerprints. However, merely using RSS is unable to accurately localize objects since such an one-dimensional fingerprint will be seriously influenced by the interference and multi-path effect in the indoor environment. In this paper, we propose a new localization approach based on multidimensional Wi-Fi fingerprint. Instead of only using RSS to construct fingerprint, we employ RSS, transmitted power, and channel information to construct an integrated fingerprint. The extended fingerprint enables fine-grained localization and tracking services. We also deign a cosine similarity based matching algorithm and enhanced particle filter mechanism to achieve accurate localization and tracking. Extensive experiment and implementation results show that the new fingerprint and proposed algorithms can achieve an accuracy within two meters in 90% of testing points, while demonstrating a good adaptability to complex indoor environments.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125229014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}