Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097908
Jr-Jung Chen, Yi-Fan Chung, Chiu-Ping Chang, C. King, Cheng-Hsin Hsu
Marathon is very popular in recent years. However, finishing the game is no easy task, especially for beginners. Regular practices and training are needed. With the availability of wearable devices, it is possible to develop virtual coaches that monitor the progresses of individual runners closeup and guide them through tailor-made training schedules. Unfortunately, most existing wearable devices only record physiological signals of the runners and rely on off-line processing to provide feedbacks. In this paper, we present the design and development of an on-line virtual coach, which performs real-time tracking and analysis of the physiological status of the runner and suggests appropriate adjustments on the exercise intensity. The proposed virtual coach is a pure software solution and can work with any wearable device that monitors the heart rate and running speed of the runner. The main challenge of our system is to predict when the runner will reach the various running states and instruct the runner to adjust the speed just ahead of time so that her/his body can react in time to maintain the required training intensity. Experiments on real users show that our proposed algorithms can correctly predict the running states of the runners and help them to better maintain the required intensity to maximize the training effects.
{"title":"A wearable virtual coach for Marathon beginners","authors":"Jr-Jung Chen, Yi-Fan Chung, Chiu-Ping Chang, C. King, Cheng-Hsin Hsu","doi":"10.1109/PADSW.2014.7097908","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097908","url":null,"abstract":"Marathon is very popular in recent years. However, finishing the game is no easy task, especially for beginners. Regular practices and training are needed. With the availability of wearable devices, it is possible to develop virtual coaches that monitor the progresses of individual runners closeup and guide them through tailor-made training schedules. Unfortunately, most existing wearable devices only record physiological signals of the runners and rely on off-line processing to provide feedbacks. In this paper, we present the design and development of an on-line virtual coach, which performs real-time tracking and analysis of the physiological status of the runner and suggests appropriate adjustments on the exercise intensity. The proposed virtual coach is a pure software solution and can work with any wearable device that monitors the heart rate and running speed of the runner. The main challenge of our system is to predict when the runner will reach the various running states and instruct the runner to adjust the speed just ahead of time so that her/his body can react in time to maintain the required training intensity. Experiments on real users show that our proposed algorithms can correctly predict the running states of the runners and help them to better maintain the required intensity to maximize the training effects.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130160402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097804
Haimiao Ding, Xiaofei Liao, Hai Jin, Xinqiao Lv, Rentong Guo
As multi-core platforms with hundreds or even more quantities of cores are popular, system optimization issues, including lock contentions, start to puzzle programmers who work on multi-core platforms. Locks are more convenient and clear than lock-free operations (for example, transactional memory) for multi-core programmers. However, lock contention has been recognized as a typical impediment to the performance of shared-memory parallel programs. This paper mainly discusses two important reasons that cause lock contention, including large critical sections and frequent lock requests. For current solutions, it is hard for programmers to find the locations of large critical sections and good scheme to reduce lock contentions on hot critical sections. This paper proposes FFlocker, a series of runtime solutions that reduce lock contention caused by the two issues. FFlocker includes a profiling algorithm to find the locations of large critical sections. Based on the profiling scheme, it binds the threads acquiring the same locks onto the same core. We evaluate our techniques with three benchmarks. The results show that FFlocker offers better performance than Function Flow and OpenMP.
{"title":"Reducing lock contention on multi-core platforms","authors":"Haimiao Ding, Xiaofei Liao, Hai Jin, Xinqiao Lv, Rentong Guo","doi":"10.1109/PADSW.2014.7097804","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097804","url":null,"abstract":"As multi-core platforms with hundreds or even more quantities of cores are popular, system optimization issues, including lock contentions, start to puzzle programmers who work on multi-core platforms. Locks are more convenient and clear than lock-free operations (for example, transactional memory) for multi-core programmers. However, lock contention has been recognized as a typical impediment to the performance of shared-memory parallel programs. This paper mainly discusses two important reasons that cause lock contention, including large critical sections and frequent lock requests. For current solutions, it is hard for programmers to find the locations of large critical sections and good scheme to reduce lock contentions on hot critical sections. This paper proposes FFlocker, a series of runtime solutions that reduce lock contention caused by the two issues. FFlocker includes a profiling algorithm to find the locations of large critical sections. Based on the profiling scheme, it binds the threads acquiring the same locks onto the same core. We evaluate our techniques with three benchmarks. The results show that FFlocker offers better performance than Function Flow and OpenMP.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126525681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097854
Dongliang Chu, C. Wu, Zongmin Wang, Yongqiang Wang
The over operator is commonly used for α-blending in various visualization techniques. In the current form, it is a binary operator and must respect the restriction of order dependency, hence posing a significant performance limit. This paper proposes a fully generalized version of this operator. Compared with its predecessor, the fully generalized over operator is not only n-operator compatible but also any-order friendly. To demonstrate the advantages of the proposed operator, we apply it to the asynchronous and order-dependent image composition problem in parallel visualization for big data science and further parallelize it for performance improvement. We conduct theoretical analyses to establish the performance superiority of the proposed over operator in comparison with its original form, which is further validated by extensive experimental results in the context of real-life scientific visualization.
{"title":"A fully generalized over operator with applications to image composition in parallel visualization for big data science","authors":"Dongliang Chu, C. Wu, Zongmin Wang, Yongqiang Wang","doi":"10.1109/PADSW.2014.7097854","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097854","url":null,"abstract":"The over operator is commonly used for α-blending in various visualization techniques. In the current form, it is a binary operator and must respect the restriction of order dependency, hence posing a significant performance limit. This paper proposes a fully generalized version of this operator. Compared with its predecessor, the fully generalized over operator is not only n-operator compatible but also any-order friendly. To demonstrate the advantages of the proposed operator, we apply it to the asynchronous and order-dependent image composition problem in parallel visualization for big data science and further parallelize it for performance improvement. We conduct theoretical analyses to establish the performance superiority of the proposed over operator in comparison with its original form, which is further validated by extensive experimental results in the context of real-life scientific visualization.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126796911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097871
Di Yang, J. Liao, Q. Qi, Jingyu Wang, Haifeng Sun, Shantao Jiang
Since the emergence of cloud datacenters provides an enormous amount of resources easily accessible to people, it is challenging to provide an efficient search framework in such a distributed environment. However, traditional search techniques only allow users to search images over exact-match keywords through a centralized index. These methods are insufficient to meet requirements of content based image retrieval (CBIR) and more powerful search frameworks are needed. In this paper, we present LCFIR, an effective image retrieval framework for fast content location in the distributed situation. It adopts the peer-to-peer paradigm and combines color and edge features. The basic idea is to construct multiple replicas of an image's index through exploiting the property of Locality Sensitive Hashing (LSH). Thus, the indexes of similar images are probabilistically gathered into the same node without the knowledge of any global information. The empirical results show that the system is able to yield high accuracy with load balancing, and only contacts a few number of the participating nodes.
{"title":"Combination feature for image retrieval in the distributed datacenter","authors":"Di Yang, J. Liao, Q. Qi, Jingyu Wang, Haifeng Sun, Shantao Jiang","doi":"10.1109/PADSW.2014.7097871","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097871","url":null,"abstract":"Since the emergence of cloud datacenters provides an enormous amount of resources easily accessible to people, it is challenging to provide an efficient search framework in such a distributed environment. However, traditional search techniques only allow users to search images over exact-match keywords through a centralized index. These methods are insufficient to meet requirements of content based image retrieval (CBIR) and more powerful search frameworks are needed. In this paper, we present LCFIR, an effective image retrieval framework for fast content location in the distributed situation. It adopts the peer-to-peer paradigm and combines color and edge features. The basic idea is to construct multiple replicas of an image's index through exploiting the property of Locality Sensitive Hashing (LSH). Thus, the indexes of similar images are probabilistically gathered into the same node without the knowledge of any global information. The empirical results show that the system is able to yield high accuracy with load balancing, and only contacts a few number of the participating nodes.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127558871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097896
Changlong Li, Xuehai Zhou, Mingming Sun, Kun Lu, Jinhong Zhou, Hang Zhuang, Dong Dai
With the development of cloud computing, more and more applications are moving to a distributed fashion to solve problems. These applications usually contain complex iterative or incremental procedures and have a more urgent requirement on low-latency. Thus many event-driven cloud frameworks are proposed. To optimize this kind of frameworks, an efficient strategy to minimize the execution time by redistributing work- loads is needed. Nowadays, load balance is a critical issue for the efficient operation of cloud platforms and many centralized schemes have already been proposed. However, few of them have been designed to support event-driven frameworks. Besides, as the cluster size and volume of tasks increases, centralized scheme will lead to a bottleneck of master node. In this paper, we demonstrate a decentralized load balancing scheme named DLBS for event-driven cloud frameworks and present two technologies to optimize it. In our design, schedulers are placed in every node for independently load-monitoring, autonomous decision-making and parallel task-scheduling. With the help of DLBS, master frees from the burden and tasks are executed with lower latency. We analyze the excellence of DLBS theoretically and proof it through simulation. At last, we implement and deploy it on a 64-machine cluster and demonstrate that it performs within 20% of an ideal scheme, which are consistent with simulation results.
{"title":"DLBS: Decentralized load balancing scheme for event-driven cloud frameworks","authors":"Changlong Li, Xuehai Zhou, Mingming Sun, Kun Lu, Jinhong Zhou, Hang Zhuang, Dong Dai","doi":"10.1109/PADSW.2014.7097896","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097896","url":null,"abstract":"With the development of cloud computing, more and more applications are moving to a distributed fashion to solve problems. These applications usually contain complex iterative or incremental procedures and have a more urgent requirement on low-latency. Thus many event-driven cloud frameworks are proposed. To optimize this kind of frameworks, an efficient strategy to minimize the execution time by redistributing work- loads is needed. Nowadays, load balance is a critical issue for the efficient operation of cloud platforms and many centralized schemes have already been proposed. However, few of them have been designed to support event-driven frameworks. Besides, as the cluster size and volume of tasks increases, centralized scheme will lead to a bottleneck of master node. In this paper, we demonstrate a decentralized load balancing scheme named DLBS for event-driven cloud frameworks and present two technologies to optimize it. In our design, schedulers are placed in every node for independently load-monitoring, autonomous decision-making and parallel task-scheduling. With the help of DLBS, master frees from the burden and tasks are executed with lower latency. We analyze the excellence of DLBS theoretically and proof it through simulation. At last, we implement and deploy it on a 64-machine cluster and demonstrate that it performs within 20% of an ideal scheme, which are consistent with simulation results.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115149752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097868
Zuhran Khan Khattak, M. Awais, Adnan Iqbal
Enourmous amount of data has resulted into large data centres. Virtual machines and virtual networks are an integral part of large data centres. As a result, software defined network controllers have emerged as viable solution to manage such networks. The performance analysis of network controllers is generally carried out through benchmarking. Although several benchmarking studies exist, recently launched OpenDaylight SDN Controller is not considered in any benchmarking study yet. In this paper, we present initial results of benchmarking of Open-Daylight SDN Controller with well known Floodlight controller. We present latency and throughput results of OpenDaylight SDN Controller and Floodlight under different scenarios. We find that OpenDaylight SDN Controller has low average responses as compared to Floodlight. We also note that the standard benchmarking tool - Cbench - has no support for real traffic patterns in a data centre, since data centre traffic is considerably complex. In addition to benchmarking of OpenDaylight SDN Controller, we propose modifications in Cbench to accommodate models of real traffic in data centres. We also discuss our initial implementation.
{"title":"Performance evaluation of OpenDaylight SDN controller","authors":"Zuhran Khan Khattak, M. Awais, Adnan Iqbal","doi":"10.1109/PADSW.2014.7097868","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097868","url":null,"abstract":"Enourmous amount of data has resulted into large data centres. Virtual machines and virtual networks are an integral part of large data centres. As a result, software defined network controllers have emerged as viable solution to manage such networks. The performance analysis of network controllers is generally carried out through benchmarking. Although several benchmarking studies exist, recently launched OpenDaylight SDN Controller is not considered in any benchmarking study yet. In this paper, we present initial results of benchmarking of Open-Daylight SDN Controller with well known Floodlight controller. We present latency and throughput results of OpenDaylight SDN Controller and Floodlight under different scenarios. We find that OpenDaylight SDN Controller has low average responses as compared to Floodlight. We also note that the standard benchmarking tool - Cbench - has no support for real traffic patterns in a data centre, since data centre traffic is considerably complex. In addition to benchmarking of OpenDaylight SDN Controller, we propose modifications in Cbench to accommodate models of real traffic in data centres. We also discuss our initial implementation.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127587796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097887
Tanapat Anusas-Amornkul
In a disaster hit area, infrastructure networks may be disconnected and main communications between victims and rescuers are not available. An interesting question is that how many people are trapped in a collapsed building and how close they are to a nearby rescuer's position. A smart-phone is widely used today and it normally has a Wi-Fi function to connect to the Internet. The phone can be used for saving victims' lives. In this paper, a communication model is proposed for a victim to communicate with others and also a rescuer to communicate with victims. State diagrams of the victim and rescuer models are proposed along with pseudo codes for the model operation. An example communication scenario is presented to demonstrate how the model works.
{"title":"A victim and rescuer communication model in collapsed buildings/structures","authors":"Tanapat Anusas-Amornkul","doi":"10.1109/PADSW.2014.7097887","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097887","url":null,"abstract":"In a disaster hit area, infrastructure networks may be disconnected and main communications between victims and rescuers are not available. An interesting question is that how many people are trapped in a collapsed building and how close they are to a nearby rescuer's position. A smart-phone is widely used today and it normally has a Wi-Fi function to connect to the Internet. The phone can be used for saving victims' lives. In this paper, a communication model is proposed for a victim to communicate with others and also a rescuer to communicate with victims. State diagrams of the victim and rescuer models are proposed along with pseudo codes for the model operation. An example communication scenario is presented to demonstrate how the model works.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133208045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097875
N. Ivaki, Filipe Araújo, F. Barros
Despite offering reliability against dropped and reordered packets, the widely adopted Transmission Control Protocol (TCP) provides nearly no recovery options for longterm network outages. When the network fails, developers must rollback the application to some coherent state on their own, using error-prone solutions. Overcoming this limitation is, therefore, a deeply investigated and challenging problem. Existing solutions range from transport-layer to application-layer protocols, including additions to TCP, usually transparent to the application. None of these solutions is perfect, because they all impact TCP's simplicity, performance or ubiquity, if not all. To avoid these shortcomings, we contain TCP connection crashes inside a single session layer exposed as a sockets interface. Based on this interface, we create a blocking and a non-blocking fault-tolerant design pattern. We explore the blocking design in an open source File Transfer Protocol (FTP) server and perform a thorough evaluation of performance, complexity and overhead of both designs. Our results show that using one of the patterns to tolerate TCP connection crashes, in new or existing applications, involves a very limited effort and negligible penalties.
{"title":"Session-based fault-tolerant design patterns","authors":"N. Ivaki, Filipe Araújo, F. Barros","doi":"10.1109/PADSW.2014.7097875","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097875","url":null,"abstract":"Despite offering reliability against dropped and reordered packets, the widely adopted Transmission Control Protocol (TCP) provides nearly no recovery options for longterm network outages. When the network fails, developers must rollback the application to some coherent state on their own, using error-prone solutions. Overcoming this limitation is, therefore, a deeply investigated and challenging problem. Existing solutions range from transport-layer to application-layer protocols, including additions to TCP, usually transparent to the application. None of these solutions is perfect, because they all impact TCP's simplicity, performance or ubiquity, if not all. To avoid these shortcomings, we contain TCP connection crashes inside a single session layer exposed as a sockets interface. Based on this interface, we create a blocking and a non-blocking fault-tolerant design pattern. We explore the blocking design in an open source File Transfer Protocol (FTP) server and perform a thorough evaluation of performance, complexity and overhead of both designs. Our results show that using one of the patterns to tolerate TCP connection crashes, in new or existing applications, involves a very limited effort and negligible penalties.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115883838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097840
Changmao Wu, Yunquan Zhang, Congli Yang, Yutong Lu
Developing an efficient and highly scalable ray tracer for the Metropolis light transport algorithm is becoming increasingly important as the request for photorealistic images becomes a common trend. Although the Metropolis light transport algorithm has produced some of the most realistic images to date, it usually takes a great amount of time to render an image. The development of an efficient and highly scalable ray tracer for the Metropolis light transport algorithm is hard due in large part to the irregular memory access patterns, the imbalanced workload of light-carrying paths and the complicated mathematical model and complex physical processes. In this paper, we present a highly scalable physically based parallel ray tracer for the Metropolis light transport algorithm. Firstly, we present the idea of snapshot and sub-snapshot, then propose a novel assignment partitioning algorithm for compute nodes and CPU cores since the demand-driven assignment partitioning algorithms don't work. Secondly, we propose a physically based parallel ray racing framework for the Metropolis light transport algorithm, which is based on a master-worker architecture. Finally, we discuss the issue of granularity of the assignment partitioning and some optimization strategies for improving overall performance, then a hybrid scheduling strategy combining a static and dynamic scheduling strategy is described. Experiments show that our physically based ray tracer almost reaches linear speedup by using 26,400 CPU cores on the Tianhe-2 supercomputer. Our ray tracer is more efficient and highly scalable.
{"title":"Physically based parallel ray tracer for the Metropolis light transport algorithm on the Tianhe-2 supercomputer","authors":"Changmao Wu, Yunquan Zhang, Congli Yang, Yutong Lu","doi":"10.1109/PADSW.2014.7097840","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097840","url":null,"abstract":"Developing an efficient and highly scalable ray tracer for the Metropolis light transport algorithm is becoming increasingly important as the request for photorealistic images becomes a common trend. Although the Metropolis light transport algorithm has produced some of the most realistic images to date, it usually takes a great amount of time to render an image. The development of an efficient and highly scalable ray tracer for the Metropolis light transport algorithm is hard due in large part to the irregular memory access patterns, the imbalanced workload of light-carrying paths and the complicated mathematical model and complex physical processes. In this paper, we present a highly scalable physically based parallel ray tracer for the Metropolis light transport algorithm. Firstly, we present the idea of snapshot and sub-snapshot, then propose a novel assignment partitioning algorithm for compute nodes and CPU cores since the demand-driven assignment partitioning algorithms don't work. Secondly, we propose a physically based parallel ray racing framework for the Metropolis light transport algorithm, which is based on a master-worker architecture. Finally, we discuss the issue of granularity of the assignment partitioning and some optimization strategies for improving overall performance, then a hybrid scheduling strategy combining a static and dynamic scheduling strategy is described. Experiments show that our physically based ray tracer almost reaches linear speedup by using 26,400 CPU cores on the Tianhe-2 supercomputer. Our ray tracer is more efficient and highly scalable.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123001694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ocean covers nearly 71% of our planet's surface, yet 95% of the ocean remains unexplored by human being, and wireless sensor networks are envisioned to perform monitoring tasks over the large portion of our world. However, deploying wireless sensor networks on the sea poses many challenges and for maritime surveillance security applications we may need to deploy sensors both on the sea surface and underwater for three-dimensional detection. In this paper, we propose a hybrid ocean sensor networks called Double-head maritime Sensor Networks (DSNs), which combine the advantages of wireless sensor networks and underwater acoustic sensor networks. By leveraging the unique characteristics of DSNs, we design a localization scheme LDSN which is consisted of two algorithms SML and FLA. We first use SML to localize moored anchor nodes as seed nodes. After the underwater sensor networks have been localized, the floating double-head nodes can figure out its instant position via FLA algorithm. We evaluate the scheme by simulations and the results show that the scheme can achieve a high localization accuracy.
{"title":"LDSN: Localization scheme for double-head maritime Sensor Networks","authors":"Hanjiang Luo, Kaishun Wu, Jiang Xiao, Zhongwen Guo","doi":"10.1109/PADSW.2014.7097813","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097813","url":null,"abstract":"Ocean covers nearly 71% of our planet's surface, yet 95% of the ocean remains unexplored by human being, and wireless sensor networks are envisioned to perform monitoring tasks over the large portion of our world. However, deploying wireless sensor networks on the sea poses many challenges and for maritime surveillance security applications we may need to deploy sensors both on the sea surface and underwater for three-dimensional detection. In this paper, we propose a hybrid ocean sensor networks called Double-head maritime Sensor Networks (DSNs), which combine the advantages of wireless sensor networks and underwater acoustic sensor networks. By leveraging the unique characteristics of DSNs, we design a localization scheme LDSN which is consisted of two algorithms SML and FLA. We first use SML to localize moored anchor nodes as seed nodes. After the underwater sensor networks have been localized, the floating double-head nodes can figure out its instant position via FLA algorithm. We evaluate the scheme by simulations and the results show that the scheme can achieve a high localization accuracy.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124171601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}