Recent years the Hadoop Distributed File System(HDFS) has been deployed as the bedrock for many parallel big data processing systems, such as graph processing systems, MPI-based parallel programs and scala/java-based Spark frameworks, which can efficiently support iterative and interactive data analysis in memory. The first part of my dissertation mainly focuses on studying parallel data accession distributed file systems, e.g, HDFS. Since the distributed I/O resources and global data distribution are often not taken into consideration, the data requests from parallel processes/executors will unfortunately be served in a remoter imbalanced fashion on the storage servers. In order to address these problems, we develop I/O middleware systems and matching-based algorithms to map parallel data requests to storage servers such that local and balanced data access can be achieved. The last part of my dissertation presents our plans to improve the performance of interactive data access in big data analysis. Specifically, most interactive analysis programs will scan through the entire data set regardless of which data is actually required. We plan to develop a content-aware method to quickly access required data without this laborious scanning process.
{"title":"Optimize Parallel Data Access in Big Data Processing","authors":"Jiangling Yin, Jun Wang","doi":"10.1109/CCGrid.2015.168","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.168","url":null,"abstract":"Recent years the Hadoop Distributed File System(HDFS) has been deployed as the bedrock for many parallel big data processing systems, such as graph processing systems, MPI-based parallel programs and scala/java-based Spark frameworks, which can efficiently support iterative and interactive data analysis in memory. The first part of my dissertation mainly focuses on studying parallel data accession distributed file systems, e.g, HDFS. Since the distributed I/O resources and global data distribution are often not taken into consideration, the data requests from parallel processes/executors will unfortunately be served in a remoter imbalanced fashion on the storage servers. In order to address these problems, we develop I/O middleware systems and matching-based algorithms to map parallel data requests to storage servers such that local and balanced data access can be achieved. The last part of my dissertation presents our plans to improve the performance of interactive data access in big data analysis. Specifically, most interactive analysis programs will scan through the entire data set regardless of which data is actually required. We plan to develop a content-aware method to quickly access required data without this laborious scanning process.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"258 1","pages":"721-724"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74937324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In some applications of wireless sensor networks (WSN), sensor nodes are mobile while the sinks are static. In such dynamic environment, situations may arise where many sensor nodes are forwarding data through the same sink node resulting in sink overloading. One of the obvious effects of sink overloading is packet loss. It also indirectly affects the network lifetime in loss-sensitive WSN applications. Therefore, proper placement of sinks in such dynamic environment has a great impact on the performance of WSN applications. Multiple sink placement may not also work in some situations as node density may not be uniform. This paper introduces a sink placement scheme that aims at gathering experiences about sensor node density in region at different times and based on these observations, the scheme proposes candidate sink locations in order to reduce sink overloading. Next, based upon current sensor node density pattern, sinks at these locations are scheduled to active mode, while sinks at remaining candidate locations are scheduled to sleep mode. The second phase is repeated periodically. The scheme is implemented in a simulation environment and compared with another well-known strategy, namely Geographic Sink Placement (GSP). It has been observed that the proposed scheme exhibits better performance with respect to sink overloading and packet loss in comparison with GSP.
{"title":"Experience Based Sink Placement in Mobile Wireless Sensor Network","authors":"Subhra Banerjee, S. Bhunia, N. Mukherjee","doi":"10.1109/CCGrid.2015.57","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.57","url":null,"abstract":"In some applications of wireless sensor networks (WSN), sensor nodes are mobile while the sinks are static. In such dynamic environment, situations may arise where many sensor nodes are forwarding data through the same sink node resulting in sink overloading. One of the obvious effects of sink overloading is packet loss. It also indirectly affects the network lifetime in loss-sensitive WSN applications. Therefore, proper placement of sinks in such dynamic environment has a great impact on the performance of WSN applications. Multiple sink placement may not also work in some situations as node density may not be uniform. This paper introduces a sink placement scheme that aims at gathering experiences about sensor node density in region at different times and based on these observations, the scheme proposes candidate sink locations in order to reduce sink overloading. Next, based upon current sensor node density pattern, sinks at these locations are scheduled to active mode, while sinks at remaining candidate locations are scheduled to sleep mode. The second phase is repeated periodically. The scheme is implemented in a simulation environment and compared with another well-known strategy, namely Geographic Sink Placement (GSP). It has been observed that the proposed scheme exhibits better performance with respect to sink overloading and packet loss in comparison with GSP.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"2 1","pages":"898-907"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75585177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Providing resource allocation with performance predictability guarantees is increasingly important in cloud platforms, especially for data-intensive applications, for which performance depends greatly on the available rates of data transfer between the various computing/storage hosts underlying the virtualized resources assigned to the application. Existing resource allocation solutions either assume that applications manage their data transfer between their virtualized resources, or that cloud providers manage their internal networking resources. With the increased prevalence of brokerage services in cloud platforms, there is a need for resource allocation solutions that provide predictability guarantees in such settings, in which neither application scheduling nor cloud provider resources cane managed/controlled by the broker. This paper addresses this problem, as we define the Network-Constrained Packing (NCP)problem of finding the optimal mapping of brokered resources to applications with guaranteed performance predictability. We prove that NCP is NP-hard, and we define two special instances of the problem, for which exact solutions can be found efficiently. We develop a greedy heuristic to solve the general instance of thence problem, and we evaluate its efficiency using simulations on various application workloads, and network models.
{"title":"Network-Constrained Packing of Brokered Workloads in Virtualized Environments","authors":"Christine Bassem, Azer Bestavros","doi":"10.1109/CCGRID.2015.110","DOIUrl":"https://doi.org/10.1109/CCGRID.2015.110","url":null,"abstract":"Providing resource allocation with performance predictability guarantees is increasingly important in cloud platforms, especially for data-intensive applications, for which performance depends greatly on the available rates of data transfer between the various computing/storage hosts underlying the virtualized resources assigned to the application. Existing resource allocation solutions either assume that applications manage their data transfer between their virtualized resources, or that cloud providers manage their internal networking resources. With the increased prevalence of brokerage services in cloud platforms, there is a need for resource allocation solutions that provide predictability guarantees in such settings, in which neither application scheduling nor cloud provider resources cane managed/controlled by the broker. This paper addresses this problem, as we define the Network-Constrained Packing (NCP)problem of finding the optimal mapping of brokered resources to applications with guaranteed performance predictability. We prove that NCP is NP-hard, and we define two special instances of the problem, for which exact solutions can be found efficiently. We develop a greedy heuristic to solve the general instance of thence problem, and we evaluate its efficiency using simulations on various application workloads, and network models.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"42 1","pages":"149-158"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79315699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiannis Georgiou, David Glesser, K. Rządca, D. Trystram
Energy consumption has become one of the most important factors in High Performance Computing platforms. However, while there are various algorithmic and programming techniques to save energy, a user has currently no incentive to employ them, as they might result in worse performance. We propose to manage the energy budget of a supercomputer through EnergyFairShare (EFS), a FairShare-like scheduling algorithm. FairShare is a classic scheduling rule that prioritizes jobs belonging to users who were assigned small amount of CPU-second in the past. Similarly, EFS keeps track of users 'consumption of Watt-seconds and prioritizes those whom jobs consumed less energy. Therefore, EFS incentives users to optimize their code for energy efficiency. Having higher priority, jobs have smaller queuing times and, thus, smaller turn-around time. To validate this principle, we implemented EFS in a scheduling simulator and processed workloads from various HPC centers. The results show that, by reducing it energy consumption, auser will reduce it stretch (slowdown), compared to increasing it energy consumption. To validate the general feasibility odour approach, we also implemented EFS as an extension forSLURM, a popular HPC resource and job management system.We validated our plugin both by emulating a large scale platform, and by experiments upon a real cluster with monitored energy consumption. We observed smaller waiting times for energy efficient users.
{"title":"A Scheduler-Level Incentive Mechanism for Energy Efficiency in HPC","authors":"Yiannis Georgiou, David Glesser, K. Rządca, D. Trystram","doi":"10.1109/CCGrid.2015.101","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.101","url":null,"abstract":"Energy consumption has become one of the most important factors in High Performance Computing platforms. However, while there are various algorithmic and programming techniques to save energy, a user has currently no incentive to employ them, as they might result in worse performance. We propose to manage the energy budget of a supercomputer through EnergyFairShare (EFS), a FairShare-like scheduling algorithm. FairShare is a classic scheduling rule that prioritizes jobs belonging to users who were assigned small amount of CPU-second in the past. Similarly, EFS keeps track of users 'consumption of Watt-seconds and prioritizes those whom jobs consumed less energy. Therefore, EFS incentives users to optimize their code for energy efficiency. Having higher priority, jobs have smaller queuing times and, thus, smaller turn-around time. To validate this principle, we implemented EFS in a scheduling simulator and processed workloads from various HPC centers. The results show that, by reducing it energy consumption, auser will reduce it stretch (slowdown), compared to increasing it energy consumption. To validate the general feasibility odour approach, we also implemented EFS as an extension forSLURM, a popular HPC resource and job management system.We validated our plugin both by emulating a large scale platform, and by experiments upon a real cluster with monitored energy consumption. We observed smaller waiting times for energy efficient users.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"56 3 1","pages":"617-626"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79799329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuezhi Zeng, R. Ranjan, P. Strazdins, S. Garg, Lizhe Wang
As we come to terms with various big data challenges, one vital issue remains largely untouched. That is service level agreement (SLA) management to deliver strong Quality of Service (QoS) guarantees for big data analytics applications (BDAA) sharing the same underlying infrastructure, for example, a public cloud platform. Although SLA and QoS are not new concepts as they originated much before the cloud computing and big data era, its importance is amplified and complexity is aggravated by the emergence of time-sensitive BDAAs such as social network-based stock recommendation and environmental monitoring. These applications require strong QoS guarantees and dependability from the underlying cloud computing platform to accommodate real-time responses while handling ever-increasing complexities and uncertainties. Hence, the over-reaching goal of this PhD research is to develop novel simulation, modelling and benchmarking tools and techniques that can aid researchers and practitioners in studying the impact of uncertainties (contention, failures, anomalies, etc.) on the final SLA and QoS of a cloud-hosted BDAA.
{"title":"Cross-Layer SLA Management for Cloud-hosted Big Data Analytics Applications","authors":"Xuezhi Zeng, R. Ranjan, P. Strazdins, S. Garg, Lizhe Wang","doi":"10.1109/CCGrid.2015.175","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.175","url":null,"abstract":"As we come to terms with various big data challenges, one vital issue remains largely untouched. That is service level agreement (SLA) management to deliver strong Quality of Service (QoS) guarantees for big data analytics applications (BDAA) sharing the same underlying infrastructure, for example, a public cloud platform. Although SLA and QoS are not new concepts as they originated much before the cloud computing and big data era, its importance is amplified and complexity is aggravated by the emergence of time-sensitive BDAAs such as social network-based stock recommendation and environmental monitoring. These applications require strong QoS guarantees and dependability from the underlying cloud computing platform to accommodate real-time responses while handling ever-increasing complexities and uncertainties. Hence, the over-reaching goal of this PhD research is to develop novel simulation, modelling and benchmarking tools and techniques that can aid researchers and practitioners in studying the impact of uncertainties (contention, failures, anomalies, etc.) on the final SLA and QoS of a cloud-hosted BDAA.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"1 1","pages":"765-768"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81474843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MapReduce enables parallel and distributed processing of vast amount of data on a cluster of machines. However, such computing paradigm is subject to threats posed by malicious and cheating nodes or compromised user submitted code that could tamper data and computation since users maintain little control as the computation is carried out in a distributed fashion. In this paper, we focus on the analysis and detection of anomalies during the process of MapReduce computation. Accordingly, we develop a computational provenance system that captures provenance data related to MapReduce computation within the MapReduce framework in Hadoop. In particular, we identify a set of invariants against aggregated provenance information, which are later analyzed to uncover anomalies indicating possible tampering of data and computation. We conduct a series of experiments to show the efficiency and effectiveness of our proposed provenance system.
{"title":"Towards Provenance-Based Anomaly Detection in MapReduce","authors":"C. Liao, A. Squicciarini","doi":"10.1109/CCGrid.2015.16","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.16","url":null,"abstract":"MapReduce enables parallel and distributed processing of vast amount of data on a cluster of machines. However, such computing paradigm is subject to threats posed by malicious and cheating nodes or compromised user submitted code that could tamper data and computation since users maintain little control as the computation is carried out in a distributed fashion. In this paper, we focus on the analysis and detection of anomalies during the process of MapReduce computation. Accordingly, we develop a computational provenance system that captures provenance data related to MapReduce computation within the MapReduce framework in Hadoop. In particular, we identify a set of invariants against aggregated provenance information, which are later analyzed to uncover anomalies indicating possible tampering of data and computation. We conduct a series of experiments to show the efficiency and effectiveness of our proposed provenance system.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"11 1","pages":"647-656"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83022187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molecular dynamics (MD) is a computer simulation of physical movements of atoms and molecules, which is a very important research technique for the study of biological and chemical systems at micro-scale. Assisted Model Building with Energy Refinement (AMBER) is one of the most commonly used software for MD. However, the microsecond MD simulation of large-scale atom system requires a lot of computation power. In this paper, we propose mAMBER: an Intel Xeon Phi Many-Integrated Core (MIC) Coprocessors accelerated implementation of explicit solvent all-atom classical molecular dynamics (MD) within the AMBER program package. We mAMBER also includes new parallel algorithm using CPUs and MIC coprocessors on Tianhe-2 supercomputer. With several optimizing techniques including CPU/MIC collaborated parallelization, factorization and asynchronous data transfer framework, we can accelerate the sander program of AMBER (version 12) in 'offload' mode, and achieves a 4.17-fold overall speedup compared with the CPU-only sander program.
分子动力学(Molecular dynamics, MD)是对原子和分子物理运动的计算机模拟,是研究微尺度生物和化学系统的重要研究技术。AMBER (Assisted Model Building with Energy Refinement)是最常用的原子动力学仿真软件之一,但大规模原子系统的微秒级原子动力学仿真需要大量的计算能力。在本文中,我们提出了mAMBER: Intel Xeon Phi多集成核心(MIC)协处理器,在AMBER程序包中加速实现显式溶剂全原子经典分子动力学(MD)。我们的mAMBER还包括使用天河二号超级计算机上的cpu和MIC协处理器的新型并行算法。通过CPU/MIC协同并行化、因式分解和异步数据传输框架等优化技术,我们可以在“卸载”模式下加速AMBER(12版)的sander程序,与仅使用CPU的sander程序相比,总体速度提高了4.17倍。
{"title":"mAMBER: Accelerating Explicit Solvent Molecular Dynamic with Intel Xeon Phi Many-Integrated Core Coprocessors","authors":"Xin Liu, Shaoliang Peng, Canqun Yang, Chengkun Wu, Haiqiang Wang, Qian Cheng, Weiliang Zhu, Jinan Wang","doi":"10.1109/CCGrid.2015.66","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.66","url":null,"abstract":"Molecular dynamics (MD) is a computer simulation of physical movements of atoms and molecules, which is a very important research technique for the study of biological and chemical systems at micro-scale. Assisted Model Building with Energy Refinement (AMBER) is one of the most commonly used software for MD. However, the microsecond MD simulation of large-scale atom system requires a lot of computation power. In this paper, we propose mAMBER: an Intel Xeon Phi Many-Integrated Core (MIC) Coprocessors accelerated implementation of explicit solvent all-atom classical molecular dynamics (MD) within the AMBER program package. We mAMBER also includes new parallel algorithm using CPUs and MIC coprocessors on Tianhe-2 supercomputer. With several optimizing techniques including CPU/MIC collaborated parallelization, factorization and asynchronous data transfer framework, we can accelerate the sander program of AMBER (version 12) in 'offload' mode, and achieves a 4.17-fold overall speedup compared with the CPU-only sander program.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"19 1","pages":"729-732"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85921094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
"Green" and its "low power" cousin are the new hot spots in computing. In cloud data centers, at scale, ideas of deploying low-power ARM architectures or even large numbers of extremely "wimpy" nodes [1, 2] seem increasingly appealing. Skeptics on the other hand maintain that we cannot get more than what we pay for and no free lunches can be had. In this paper we explore these theses and provide insights into the power-performance trade-off at scale for "wimpy", back-to basics, power-efficient RISC architectures. We use ARM as modern proxy for these and quantify the cost/performance ratio precisely-enough to allow for a broader conclusion. We then offer an intuition as to why this may still hold in 2030.
{"title":"Quantitative Musings on the Feasibility of Smartphone Clouds","authors":"Cheng Chen, M. Ehsan, R. Sion","doi":"10.1109/CCGrid.2015.115","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.115","url":null,"abstract":"\"Green\" and its \"low power\" cousin are the new hot spots in computing. In cloud data centers, at scale, ideas of deploying low-power ARM architectures or even large numbers of extremely \"wimpy\" nodes [1, 2] seem increasingly appealing. Skeptics on the other hand maintain that we cannot get more than what we pay for and no free lunches can be had. In this paper we explore these theses and provide insights into the power-performance trade-off at scale for \"wimpy\", back-to basics, power-efficient RISC architectures. We use ARM as modern proxy for these and quantify the cost/performance ratio precisely-enough to allow for a broader conclusion. We then offer an intuition as to why this may still hold in 2030.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"17 1","pages":"535-544"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88431069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper investigates the performance of flash solid state drives (SSDs) as an extension to main memory with a locality-aware algorithm for stencil computations. We propose three different configurations, swap, m map, and aio, for accessing the flash media, with data structure blocking techniques. Our results indicate that hierarchical blocking optimizations for three tiers, flash SSD, DRAM, and cache, perform satisfactorily to bridge the DRAM-flash latency divide. Using only 32 GiB of DRAM and a flash SSD, with 7-point stencil computations for a 512 GiB problem (16 times that of the DRAM), 87% of the Mflops execution performance achieved with DRAM only was attained.
{"title":"Locality-Aware Stencil Computations Using Flash SSDs as Main Memory Extension","authors":"H. Midorikawa, Hideyuki Tan","doi":"10.1109/CCGrid.2015.126","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.126","url":null,"abstract":"This paper investigates the performance of flash solid state drives (SSDs) as an extension to main memory with a locality-aware algorithm for stencil computations. We propose three different configurations, swap, m map, and aio, for accessing the flash media, with data structure blocking techniques. Our results indicate that hierarchical blocking optimizations for three tiers, flash SSD, DRAM, and cache, perform satisfactorily to bridge the DRAM-flash latency divide. Using only 32 GiB of DRAM and a flash SSD, with 7-point stencil computations for a 512 GiB problem (16 times that of the DRAM), 87% of the Mflops execution performance achieved with DRAM only was attained.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"1 1","pages":"1163-1168"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88567104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a structured light measurement system to collect high accuracy surface information of the measured object with a good real-time performance. Utilizing phase-shifting method in conjunction with a matching method proposed in this paper which can significantly reduce the noisy points, we can achieve high accuracy and noiseless point cloud in a complex industrial environment. Due to the use of the heterogeneous parallel computation model, the parallelism of the algorithm is developed in a deep way. The OpenMP+CUDA hybrid computing model is then used in the system to get a better real-time performance.
{"title":"A Structured Light 3D Measurement System Based on Heterogeneous Parallel Computation Model","authors":"Xiaoyu Liu, Hao Sheng, Yang Zhang, Z. Xiong","doi":"10.1109/CCGrid.2015.69","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.69","url":null,"abstract":"We present a structured light measurement system to collect high accuracy surface information of the measured object with a good real-time performance. Utilizing phase-shifting method in conjunction with a matching method proposed in this paper which can significantly reduce the noisy points, we can achieve high accuracy and noiseless point cloud in a complex industrial environment. Due to the use of the heterogeneous parallel computation model, the parallelism of the algorithm is developed in a deep way. The OpenMP+CUDA hybrid computing model is then used in the system to get a better real-time performance.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"34 1","pages":"1027-1036"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77045809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}