Pub Date : 2012-07-02DOI: 10.1109/HPCSim.2012.6266950
The project is focused on development and prototyping of remotely-controlled, semi-autonomous robotic solutions in domestic environments to support elderly people. In particular, the SRS project is demonstrating an innovative, practical and efficient system called “SRS robot” for personalised home care and assisted living.
{"title":"Case study: Multi-role shadow robotic system for independent living","authors":"","doi":"10.1109/HPCSim.2012.6266950","DOIUrl":"https://doi.org/10.1109/HPCSim.2012.6266950","url":null,"abstract":"The project is focused on development and prototyping of remotely-controlled, semi-autonomous robotic solutions in domestic environments to support elderly people. In particular, the SRS project is demonstrating an innovative, practical and efficient system called “SRS robot” for personalised home care and assisted living.","PeriodicalId":428764,"journal":{"name":"2012 International Conference on High Performance Computing & Simulation (HPCS)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116308705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-07-02DOI: 10.1109/HPCSim.2012.6266909
L. Igual, J. Soliva, Antonio Hernández-Vela, Sergio Escalera, Ó. Vilarroya, P. Radeva
This paper presents an automatic method for external and internal segmentation of the caudate nucleus in Magnetic Resonance Images (MRI) based on statistical and structural machine learning approaches. This method is applied in Attention-Deficit/Hyperactivity Disorder (ADHD) diagnosis. The external segmentation method adapts the Graph Cut energy-minimization model to make it suitable for segmenting small, low-contrast structures, such as the caudate nucleus. In particular, new energy function data and boundary potentials are defined and a supervised energy term based on contextual brain structures is added. Furthermore, the internal segmentation method learns a classifier based on shape features of the Region of Interest (ROI) in MRI slices. The results show accurate external and internal caudate segmentation in a real data set and similar performance of ADHD diagnostic test to manual annotation.
{"title":"Supervised brain segmentation and classification in diagnostic of Attention-Deficit/Hyperactivity Disorder","authors":"L. Igual, J. Soliva, Antonio Hernández-Vela, Sergio Escalera, Ó. Vilarroya, P. Radeva","doi":"10.1109/HPCSim.2012.6266909","DOIUrl":"https://doi.org/10.1109/HPCSim.2012.6266909","url":null,"abstract":"This paper presents an automatic method for external and internal segmentation of the caudate nucleus in Magnetic Resonance Images (MRI) based on statistical and structural machine learning approaches. This method is applied in Attention-Deficit/Hyperactivity Disorder (ADHD) diagnosis. The external segmentation method adapts the Graph Cut energy-minimization model to make it suitable for segmenting small, low-contrast structures, such as the caudate nucleus. In particular, new energy function data and boundary potentials are defined and a supervised energy term based on contextual brain structures is added. Furthermore, the internal segmentation method learns a classifier based on shape features of the Region of Interest (ROI) in MRI slices. The results show accurate external and internal caudate segmentation in a real data set and similar performance of ADHD diagnostic test to manual annotation.","PeriodicalId":428764,"journal":{"name":"2012 International Conference on High Performance Computing & Simulation (HPCS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134015537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-07-02DOI: 10.1109/HPCSim.2012.6266976
Tsozen Yeh, Shuwen Yang
Computers are indispensable to modern human society. Often computers host multiple programs running simultaneously. However, among those programs, some maybe more time-critical than others to users. Consequently, users would hope those time-critical programs to finish their execution as soon as possible. Generally speaking, the course of program execution includes CPU operation and hard disk operation (disk I/O). For the CPU operation, modern computer systems have the ability to adjust the CPU scheduling sequence according to program priority. Nevertheless, for the disk I/O, it is not quite the same. Most computer systems do not have effective ways to conduct disk I/O based on program priority. Compared with CPU, disk I/O speed is still about six orders of magnitude behind, making time-critical and high-priority programs involving disk I/O hard to achieve high performance as users would expect. Currently, Complete Fair Queuing (CFQ) is the default disk scheduler in the Linux operating system. Unfortunately, it only offers prioritized disk I/O to some extent. We propose and implement a new disk scheduler, namely Prioritized Complete Fair Queuing (PCFQ), by adding schemes of supporting truly prioritized disk I/O into CFQ in the Linux kernel. We compare the performance between PCFQ and CFQ under different situations. Our experimental results demonstrate that, for programs with high priority, PCFQ outperforms CFQ in all cases by reducing up to extra 59.7% of the program execution time on top of what CFQ can accomplish.
{"title":"Improving the program performance through prioritized disk operation","authors":"Tsozen Yeh, Shuwen Yang","doi":"10.1109/HPCSim.2012.6266976","DOIUrl":"https://doi.org/10.1109/HPCSim.2012.6266976","url":null,"abstract":"Computers are indispensable to modern human society. Often computers host multiple programs running simultaneously. However, among those programs, some maybe more time-critical than others to users. Consequently, users would hope those time-critical programs to finish their execution as soon as possible. Generally speaking, the course of program execution includes CPU operation and hard disk operation (disk I/O). For the CPU operation, modern computer systems have the ability to adjust the CPU scheduling sequence according to program priority. Nevertheless, for the disk I/O, it is not quite the same. Most computer systems do not have effective ways to conduct disk I/O based on program priority. Compared with CPU, disk I/O speed is still about six orders of magnitude behind, making time-critical and high-priority programs involving disk I/O hard to achieve high performance as users would expect. Currently, Complete Fair Queuing (CFQ) is the default disk scheduler in the Linux operating system. Unfortunately, it only offers prioritized disk I/O to some extent. We propose and implement a new disk scheduler, namely Prioritized Complete Fair Queuing (PCFQ), by adding schemes of supporting truly prioritized disk I/O into CFQ in the Linux kernel. We compare the performance between PCFQ and CFQ under different situations. Our experimental results demonstrate that, for programs with high priority, PCFQ outperforms CFQ in all cases by reducing up to extra 59.7% of the program execution time on top of what CFQ can accomplish.","PeriodicalId":428764,"journal":{"name":"2012 International Conference on High Performance Computing & Simulation (HPCS)","volume":"52 39","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132389276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-07-02DOI: 10.1109/HPCSim.2012.6266935
P. Gomes, S. Campos, A. Vieira
P2P systems are one of the most efficient data transport technologies in use today. Particularly, P2P live streaming systems have been growing in popularity recently. However, analyzing such systems is difficult. Developers are not able to realize a complete test due the due to system size and complex dynamic behavior. This may lead us to develop protocols with errors, unfair or even with low performance. One way of performing such an analysis is using formal methods. Model Checking is one such method that can be used for the formal verification of P2P systems. However it suffers from the combinatory explosion of states. The problem can be minimized with techniques such as abstraction and symmetry reduction. This work combines both techniques to produce reduced models that can be verified in feasible time. We present a methodology to generate abstract models of reactive systems semi-automatically, based on the model's symmetry. It defines modeling premises to make the abstraction procedure semiautomatic, i.e., without modification of the model. Moreover, it presents abstraction patterns based on the system symmetry and shows which properties are consistent with each pattern. The reductions obtained by the methodology were significant. In our test case of a P2P network, it has enabled the verification of liveness properties over the abstract models which did not finish with the original model after more than two weeks of intensive computation. Our results indicate that the use of model checking for the verification of P2P systems is feasible, and that our modeling methodology can increase the efficiency of the verification algorithms enough to enable the analysis of real complex P2P live streaming systems.
{"title":"Verification of P2P live streaming systems using symmetry-based semiautomatic abstractions","authors":"P. Gomes, S. Campos, A. Vieira","doi":"10.1109/HPCSim.2012.6266935","DOIUrl":"https://doi.org/10.1109/HPCSim.2012.6266935","url":null,"abstract":"P2P systems are one of the most efficient data transport technologies in use today. Particularly, P2P live streaming systems have been growing in popularity recently. However, analyzing such systems is difficult. Developers are not able to realize a complete test due the due to system size and complex dynamic behavior. This may lead us to develop protocols with errors, unfair or even with low performance. One way of performing such an analysis is using formal methods. Model Checking is one such method that can be used for the formal verification of P2P systems. However it suffers from the combinatory explosion of states. The problem can be minimized with techniques such as abstraction and symmetry reduction. This work combines both techniques to produce reduced models that can be verified in feasible time. We present a methodology to generate abstract models of reactive systems semi-automatically, based on the model's symmetry. It defines modeling premises to make the abstraction procedure semiautomatic, i.e., without modification of the model. Moreover, it presents abstraction patterns based on the system symmetry and shows which properties are consistent with each pattern. The reductions obtained by the methodology were significant. In our test case of a P2P network, it has enabled the verification of liveness properties over the abstract models which did not finish with the original model after more than two weeks of intensive computation. Our results indicate that the use of model checking for the verification of P2P systems is feasible, and that our modeling methodology can increase the efficiency of the verification algorithms enough to enable the analysis of real complex P2P live streaming systems.","PeriodicalId":428764,"journal":{"name":"2012 International Conference on High Performance Computing & Simulation (HPCS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126939988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-07-02DOI: 10.1109/HPCSim.2012.6266989
L. Djoudi, M. Achab
An automatic analysis and a quick resolution of the performance problem depends on having precise methodology to develop tool for exploring the relationship between performance and computation analysis. In this paper, we propose a strategy which allows the choice of analysis type, code level and tool used, by focusing on the hot part of code. Taking into account user criteria our system generates precise static/dynamic analysis for the selected part of code. It requires smaller computation times. It can be applied systematically without user intervention.
{"title":"Selective methodology based on user criteria to explore the relationship between performance and computation analysis","authors":"L. Djoudi, M. Achab","doi":"10.1109/HPCSim.2012.6266989","DOIUrl":"https://doi.org/10.1109/HPCSim.2012.6266989","url":null,"abstract":"An automatic analysis and a quick resolution of the performance problem depends on having precise methodology to develop tool for exploring the relationship between performance and computation analysis. In this paper, we propose a strategy which allows the choice of analysis type, code level and tool used, by focusing on the hot part of code. Taking into account user criteria our system generates precise static/dynamic analysis for the selected part of code. It requires smaller computation times. It can be applied systematically without user intervention.","PeriodicalId":428764,"journal":{"name":"2012 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129467162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-07-02DOI: 10.1109/HPCSim.2012.6266913
Shruthi S. Kubatur, M. Sid-Ahmed, M. Ahmadi
This paper proposes a neural network based framework to classify online Devanagari characters into one of 46 characters in the alphabet set. The uniqueness of this work is three-fold: (1) The feature extraction is just the Discrete Cosine Transform of the temporal sequence of the character points (utilizing the nature of online data input). We show that if used right, a simple feature set yielded by the DCT can be very reliable for accurate recognition of handwriting, (2) The mode of character input is through a computer mouse, and (3) We have built the online handwritten database of Devanagari characters from scratch, and there are some unique features in the way we have built up the database. Lastly, the testing has been carried on 2760 characters, and recognition rates of up to 97.2% are achieved.
{"title":"A neural network approach to online Devanagari handwritten character recognition","authors":"Shruthi S. Kubatur, M. Sid-Ahmed, M. Ahmadi","doi":"10.1109/HPCSim.2012.6266913","DOIUrl":"https://doi.org/10.1109/HPCSim.2012.6266913","url":null,"abstract":"This paper proposes a neural network based framework to classify online Devanagari characters into one of 46 characters in the alphabet set. The uniqueness of this work is three-fold: (1) The feature extraction is just the Discrete Cosine Transform of the temporal sequence of the character points (utilizing the nature of online data input). We show that if used right, a simple feature set yielded by the DCT can be very reliable for accurate recognition of handwriting, (2) The mode of character input is through a computer mouse, and (3) We have built the online handwritten database of Devanagari characters from scratch, and there are some unique features in the way we have built up the database. Lastly, the testing has been carried on 2760 characters, and recognition rates of up to 97.2% are achieved.","PeriodicalId":428764,"journal":{"name":"2012 International Conference on High Performance Computing & Simulation (HPCS)","volume":"120 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120851378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-07-02DOI: 10.1109/HPCSim.2012.6266952
Yun Wang, Brendan M. Kelly, S. Dolins
For a wireless sensor network (WSN) to operate successfully in an application of detecting a mobile intruder, the sensor's sensing of the intruder and the communication between the tasking sensor(s) and the base station should be jointly considered for effective intrusion detection. Most related work either treats the two tasks separately or considers both tasks for general-purpose WSN applications in terms of sensing coverage and network connectivity. This work instead investigates the effective intrusion detection problem in a partially connected random WSN from modeling, analysis, and simulation perspectives by integrating the K-sensing and communication tasks. Upper and lower bounds of effective K-sensing intrusion detection probability are mathematically formulated and theoretically derived. Monte-Carlo Simulations are conducted and outcomes are shown to support the theoretical analysis.
{"title":"Effective detection of a mobile intruder in a partially connected wireless sensor networks","authors":"Yun Wang, Brendan M. Kelly, S. Dolins","doi":"10.1109/HPCSim.2012.6266952","DOIUrl":"https://doi.org/10.1109/HPCSim.2012.6266952","url":null,"abstract":"For a wireless sensor network (WSN) to operate successfully in an application of detecting a mobile intruder, the sensor's sensing of the intruder and the communication between the tasking sensor(s) and the base station should be jointly considered for effective intrusion detection. Most related work either treats the two tasks separately or considers both tasks for general-purpose WSN applications in terms of sensing coverage and network connectivity. This work instead investigates the effective intrusion detection problem in a partially connected random WSN from modeling, analysis, and simulation perspectives by integrating the K-sensing and communication tasks. Upper and lower bounds of effective K-sensing intrusion detection probability are mathematically formulated and theoretically derived. Monte-Carlo Simulations are conducted and outcomes are shown to support the theoretical analysis.","PeriodicalId":428764,"journal":{"name":"2012 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117313103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-07-02DOI: 10.1109/HPCSim.2012.6266891
T. Nguyen, J. Désidéri, L. Trifan
Cloud computing infrastructures support system and network fault-tolerance. They transparently repair and prevent communication and software errors. They also allow duplication and migration of jobs and data to prevent hardware failures. However, only limited work has been done so far on application resilience, i.e., the ability to resume normal execution after errors and abnormal executions in distributed environments and clouds. This paper addresses open issues and solutions for application errors detection and management. It also overviews a testbed used to to design, deploy, execute, monitor, restart and resume distributed applications on cloud infrastructures in cases of failures.
{"title":"Applications resilience on clouds","authors":"T. Nguyen, J. Désidéri, L. Trifan","doi":"10.1109/HPCSim.2012.6266891","DOIUrl":"https://doi.org/10.1109/HPCSim.2012.6266891","url":null,"abstract":"Cloud computing infrastructures support system and network fault-tolerance. They transparently repair and prevent communication and software errors. They also allow duplication and migration of jobs and data to prevent hardware failures. However, only limited work has been done so far on application resilience, i.e., the ability to resume normal execution after errors and abnormal executions in distributed environments and clouds. This paper addresses open issues and solutions for application errors detection and management. It also overviews a testbed used to to design, deploy, execute, monitor, restart and resume distributed applications on cloud infrastructures in cases of failures.","PeriodicalId":428764,"journal":{"name":"2012 International Conference on High Performance Computing & Simulation (HPCS)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131889660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-07-02DOI: 10.1109/HPCSim.2012.6266886
I. Epicoco, M. Mirto, S. Mocavero, G. Aloisio
Within the EU IS-ENES project, the deployment of an e-infrastructure providing climate scientists with an efficient virtual proximity to distributed data and distributed computing resources is required. The access point of this infrastructure is represented by the v.E.R.C. (virtual Earth system modelling Resource Centre) web portal. It allows the Earth System Models (ESMs) scientists to run complex distributed workflows for executing ESM experiments and accessing to ESM data. The work describes the deployment of a grid prototype environment for running multi-model ensembles experiments. Considering existing grid infrastructures and services, the design of this grid prototype has been lead by the necessity to build a framework that leverage the external services offered within the European HPC ecosystem, e.g. DEISA, PRACE. The prototype allows exploiting advanced grid services, namely GRB services, developed at the University of Salento, Italy, and basic grid services offered by the Globus Toolkit middleware for submitting and monitoring the ensemble runs. The prototype has been deployed involving three sites: CMCC, DKRZ and BSC. A case study related to the HRT159, a global coupled ocean-atmosphere general circulation model (AOGCM) developed by CMCC-INGV, has been considered.
{"title":"Prototype of grid environment for earth system models","authors":"I. Epicoco, M. Mirto, S. Mocavero, G. Aloisio","doi":"10.1109/HPCSim.2012.6266886","DOIUrl":"https://doi.org/10.1109/HPCSim.2012.6266886","url":null,"abstract":"Within the EU IS-ENES project, the deployment of an e-infrastructure providing climate scientists with an efficient virtual proximity to distributed data and distributed computing resources is required. The access point of this infrastructure is represented by the v.E.R.C. (virtual Earth system modelling Resource Centre) web portal. It allows the Earth System Models (ESMs) scientists to run complex distributed workflows for executing ESM experiments and accessing to ESM data. The work describes the deployment of a grid prototype environment for running multi-model ensembles experiments. Considering existing grid infrastructures and services, the design of this grid prototype has been lead by the necessity to build a framework that leverage the external services offered within the European HPC ecosystem, e.g. DEISA, PRACE. The prototype allows exploiting advanced grid services, namely GRB services, developed at the University of Salento, Italy, and basic grid services offered by the Globus Toolkit middleware for submitting and monitoring the ensemble runs. The prototype has been deployed involving three sites: CMCC, DKRZ and BSC. A case study related to the HRT159, a global coupled ocean-atmosphere general circulation model (AOGCM) developed by CMCC-INGV, has been considered.","PeriodicalId":428764,"journal":{"name":"2012 International Conference on High Performance Computing & Simulation (HPCS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131697707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-07-02DOI: 10.1109/HPCSim.2012.6266884
M. Ujaldón
The computational power and memory bandwidth of graphics processing units (GPUs) have turned them into attractive platforms for general-purpose applications at significant speed gains versus their CPU counterparts [1]. In addition, an increasing number of today's state-of-the-art supercomputers include commodity GPUs to bring us unprecedented levels of performance in terms of raw GFLOPS and GFLOPS/cost. In this paper, we provide an introduction to CUDA programming paradigm with an emphasis on simulations which can exploit SIMD parallelism and high memory bandwidth on GPUs. OpenCL is also briefly described as a recent standardization effort to set up an open standard API for general-purpose manycore architectures.
{"title":"High performance computing and simulations on the GPU using CUDA","authors":"M. Ujaldón","doi":"10.1109/HPCSim.2012.6266884","DOIUrl":"https://doi.org/10.1109/HPCSim.2012.6266884","url":null,"abstract":"The computational power and memory bandwidth of graphics processing units (GPUs) have turned them into attractive platforms for general-purpose applications at significant speed gains versus their CPU counterparts [1]. In addition, an increasing number of today's state-of-the-art supercomputers include commodity GPUs to bring us unprecedented levels of performance in terms of raw GFLOPS and GFLOPS/cost. In this paper, we provide an introduction to CUDA programming paradigm with an emphasis on simulations which can exploit SIMD parallelism and high memory bandwidth on GPUs. OpenCL is also briefly described as a recent standardization effort to set up an open standard API for general-purpose manycore architectures.","PeriodicalId":428764,"journal":{"name":"2012 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131307508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}