Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903696
Ana Moreton-Fernandez, Arturo González-Escribano, D. Ferraris
Current multicomputers are typically built as interconnected clusters of shared-memory multicore computers. A common programming approach for these clusters is to simply use a message-passing paradigm, launching as many processes as cores available. Nevertheless, to better exploit the scalability of these clusters and highly-parallel multicore systems, it is needed to efficiently use their distributed- and shared-memory hierarchies. This implies to combine different programming paradigms and tools at different levels of the program design. This paper presents an approach to ease the programming for mixed distributed and shared memory parallel computers. The coordination at the distributed memory level is simplified using Hitmap, a library for distributed computing using hierarchical tiling of data structures. We show how this tool can be integrated with shared-memory programming models and automatic code generation tools to efficiently exploit the multicore environment of each multicomputer node. This approach allows to exploit the most appropriate techniques for each model, easily generating multilevel parallel programs that automatically adapt their communication and synchronization structures to the target machine. Our experimental results show how this approach mimics or even improves the best performance results obtained with manually optimized codes using pure MPI or OpenMP models.
{"title":"Exploiting distributed and shared memory hierarchies with Hitmap","authors":"Ana Moreton-Fernandez, Arturo González-Escribano, D. Ferraris","doi":"10.1109/HPCSim.2014.6903696","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903696","url":null,"abstract":"Current multicomputers are typically built as interconnected clusters of shared-memory multicore computers. A common programming approach for these clusters is to simply use a message-passing paradigm, launching as many processes as cores available. Nevertheless, to better exploit the scalability of these clusters and highly-parallel multicore systems, it is needed to efficiently use their distributed- and shared-memory hierarchies. This implies to combine different programming paradigms and tools at different levels of the program design. This paper presents an approach to ease the programming for mixed distributed and shared memory parallel computers. The coordination at the distributed memory level is simplified using Hitmap, a library for distributed computing using hierarchical tiling of data structures. We show how this tool can be integrated with shared-memory programming models and automatic code generation tools to efficiently exploit the multicore environment of each multicomputer node. This approach allows to exploit the most appropriate techniques for each model, easily generating multilevel parallel programs that automatically adapt their communication and synchronization structures to the target machine. Our experimental results show how this approach mimics or even improves the best performance results obtained with manually optimized codes using pure MPI or OpenMP models.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"83 1","pages":"278-286"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88604536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903777
Marcello Missiroli, Fabio Pierazzi, M. Colajanni
Location-based services relying on in-vehicle devices are becoming so common that it is likely that, in the near future, devices of some sorts will be installed on new vehicles by default. The pressure for a rapid adoption of these devices and services is not yet counterbalanced by an adequate awareness about system security and data privacy issues. For example, service providers might collect, elaborate and sell data belonging to cars, drivers and locations to a plethora of organizations that may be interested in acquiring such personal information. We propose a comprehensive scenario describing the entire process of data gathering, management and transmission related to in-vehicle devices, and for each phase we point out the most critical security and privacy threats. By referring to this scenario, we can outline issues and challenges that should be addressed by the academic and industry communities for a correct adoption of in-vehicle devices and related services.
{"title":"Security and privacy of location-based services for in-vehicle device systems","authors":"Marcello Missiroli, Fabio Pierazzi, M. Colajanni","doi":"10.1109/HPCSim.2014.6903777","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903777","url":null,"abstract":"Location-based services relying on in-vehicle devices are becoming so common that it is likely that, in the near future, devices of some sorts will be installed on new vehicles by default. The pressure for a rapid adoption of these devices and services is not yet counterbalanced by an adequate awareness about system security and data privacy issues. For example, service providers might collect, elaborate and sell data belonging to cars, drivers and locations to a plethora of organizations that may be interested in acquiring such personal information. We propose a comprehensive scenario describing the entire process of data gathering, management and transmission related to in-vehicle devices, and for each phase we point out the most critical security and privacy threats. By referring to this scenario, we can outline issues and challenges that should be addressed by the academic and industry communities for a correct adoption of in-vehicle devices and related services.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"14 1","pages":"841-848"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81835965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903706
S. Fiore, Alessandro D'Anca, D. Elia, Cosimo Palazzo, Dean N. Williams, Ian T Foster, G. Aloisio
The Ophidia project aims to provide a big data analytics platform solution that addresses scientific use cases related to large volumes of multidimensional data. In this work, the Ophidia software infrastructure is discussed in detail, presenting the entire software stack from level-0 (the Ophidia data store) to level-3 (the Ophidia web service front end). In particular, this paper presents the big data cube primitives provided by the Ophidia framework, discussing in detail the most relevant and available data cube manipulation operators. These primitives represent the proper foundations to build more complex data cube operators like the apex one presented in this paper. A massive data reduction experiment on a 1TB climate dataset is also presented to demonstrate the apex workflow in the context of the proposed framework.
{"title":"Ophidia: A full software stack for scientific data analytics","authors":"S. Fiore, Alessandro D'Anca, D. Elia, Cosimo Palazzo, Dean N. Williams, Ian T Foster, G. Aloisio","doi":"10.1109/HPCSim.2014.6903706","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903706","url":null,"abstract":"The Ophidia project aims to provide a big data analytics platform solution that addresses scientific use cases related to large volumes of multidimensional data. In this work, the Ophidia software infrastructure is discussed in detail, presenting the entire software stack from level-0 (the Ophidia data store) to level-3 (the Ophidia web service front end). In particular, this paper presents the big data cube primitives provided by the Ophidia framework, discussing in detail the most relevant and available data cube manipulation operators. These primitives represent the proper foundations to build more complex data cube operators like the apex one presented in this paper. A massive data reduction experiment on a 1TB climate dataset is also presented to demonstrate the apex workflow in the context of the proposed framework.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"30 1","pages":"343-350"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84606883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903703
Emanuele Panigati
Due to the high information load to which everyone is exposed in her everyday life, the rise of new, systems fully supporting pervasive information distribution, analysis and sharing becomes a key factor to allow a correct and useful interaction among humans and computer systems. This kind of systems must allow to manage, integrate, analyze, and possibly reason on, a large and heterogeneous set of data. The SuNDroPS system, briefly described in this paper, applies context-aware techniques to data gathering, shared services, and information distribution; the system is based on a context-aware approach that, applied to these tasks, leads to the reduction of the so-called information noise, delivering to the users only the portion of information that is useful in their current context.
{"title":"Personalized management of semantic, dynamic data in pervasive systems: Context-ADDICT revisited","authors":"Emanuele Panigati","doi":"10.1109/HPCSim.2014.6903703","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903703","url":null,"abstract":"Due to the high information load to which everyone is exposed in her everyday life, the rise of new, systems fully supporting pervasive information distribution, analysis and sharing becomes a key factor to allow a correct and useful interaction among humans and computer systems. This kind of systems must allow to manage, integrate, analyze, and possibly reason on, a large and heterogeneous set of data. The SuNDroPS system, briefly described in this paper, applies context-aware techniques to data gathering, shared services, and information distribution; the system is based on a context-aware approach that, applied to these tasks, leads to the reduction of the so-called information noise, delivering to the users only the portion of information that is useful in their current context.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"4 1","pages":"323-326"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84629804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903784
T. Wilde, A. Auweter, M. Patterson, H. Shoukourian, Herbert Huber, A. Bode, D. Labrenz, C. Cavazzoni
To determine whether a High-Performance Computing (HPC) data center is energy efficient, various aspects have to be taken into account: the data center's power distribution and cooling infrastructure, the HPC system itself, the influence of the system management software, and the HPC workloads; all can contribute to the overall energy efficiency of the data center. Currently, two well-established metrics are used to determine energy efficiency for HPC data centers and systems: Power Usage Effectiveness (PUE) and FLOPS per Watt (as defined by the Green500 in their ranking list). PUE evaluates the overhead for running a data center and FLOPS per Watt characterizes the energy efficiency of a system running the High-Performance Linpack (HPL) benchmark, i.e. floating point operations per second achieved with 1 watt of electrical power. Unfortunately, under closer examination even the combination of both metrics does not characterize the overall energy efficiency of a HPC data center. First, HPL does not constitute a representative workload for most of today's HPC applications and the rev 0.9 Green500 run rules for power measurements allows for excluding subsystems (e.g. networking, storage, cooling). Second, even a combination of PUE with FLOPS per Watt metric neglects that the total energy efficiency of a system can vary with the characteristics of the data center in which it is operated. This is due to different cooling technologies implemented in HPC systems and the difference in costs incurred by the data center removing the heat using these technologies. To address these issues, this paper introduces the metrics system PUE (sPUE) and Data center Workload Power Efficiency (DWPE). sPUE calculates the overhead for operating a given system in a certain data center. DWPE is then calculated by determining the energy efficiency of a specific workload and dividing it by the sPUE. DWPE can then be used to define the energy efficiency of running a given workload on a specific HPC system in a specific data center and is currently the only fully-integrated metric suitable for rating an HPC data center's energy efficiency. In addition, DWPE allows for predicting the energy efficiency of different HPC systems in existing HPC data centers, thus making it an ideal approach for guiding HPC system procurement. This paper concludes with a demonstration of the application of DWPE using a set of representative HPC workloads.
{"title":"DWPE, a new data center energy-efficiency metric bridging the gap between infrastructure and workload","authors":"T. Wilde, A. Auweter, M. Patterson, H. Shoukourian, Herbert Huber, A. Bode, D. Labrenz, C. Cavazzoni","doi":"10.1109/HPCSim.2014.6903784","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903784","url":null,"abstract":"To determine whether a High-Performance Computing (HPC) data center is energy efficient, various aspects have to be taken into account: the data center's power distribution and cooling infrastructure, the HPC system itself, the influence of the system management software, and the HPC workloads; all can contribute to the overall energy efficiency of the data center. Currently, two well-established metrics are used to determine energy efficiency for HPC data centers and systems: Power Usage Effectiveness (PUE) and FLOPS per Watt (as defined by the Green500 in their ranking list). PUE evaluates the overhead for running a data center and FLOPS per Watt characterizes the energy efficiency of a system running the High-Performance Linpack (HPL) benchmark, i.e. floating point operations per second achieved with 1 watt of electrical power. Unfortunately, under closer examination even the combination of both metrics does not characterize the overall energy efficiency of a HPC data center. First, HPL does not constitute a representative workload for most of today's HPC applications and the rev 0.9 Green500 run rules for power measurements allows for excluding subsystems (e.g. networking, storage, cooling). Second, even a combination of PUE with FLOPS per Watt metric neglects that the total energy efficiency of a system can vary with the characteristics of the data center in which it is operated. This is due to different cooling technologies implemented in HPC systems and the difference in costs incurred by the data center removing the heat using these technologies. To address these issues, this paper introduces the metrics system PUE (sPUE) and Data center Workload Power Efficiency (DWPE). sPUE calculates the overhead for operating a given system in a certain data center. DWPE is then calculated by determining the energy efficiency of a specific workload and dividing it by the sPUE. DWPE can then be used to define the energy efficiency of running a given workload on a specific HPC system in a specific data center and is currently the only fully-integrated metric suitable for rating an HPC data center's energy efficiency. In addition, DWPE allows for predicting the energy efficiency of different HPC systems in existing HPC data centers, thus making it an ideal approach for guiding HPC system procurement. This paper concludes with a demonstration of the application of DWPE using a set of representative HPC workloads.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"893-901"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79841866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903741
A. Bria, G. Iannello, P. Soda, Hanchuan Peng, G. Erbacci, G. Fiameni, Giacomo Mariani, R. Mucci, M. Rorro, F. Pavone, L. Silvestri, P. Frasconi, Roberto Cortini
Scientific problems dealing with the processing of large amounts of data require efforts in the integration of proper services and applications to facilitate the research activity, interacting with high performance computing resources. Easier access to these resources have a profound impact on research in neuroscience, leading to advances in the management and processing of neuro-anatomical images. An ever increasing amount of data are constantly collected with a consequent demand of top-class computational resources to process them. In this paper, a HPC infrastructure for the management and the processing of neuro-anatomical images is presented, introducing the effort made to optimize and integrate specific applications in order to fully exploit the available resources.
{"title":"A HPC infrastructure for processing and visualizing neuro-anatomical images obtained by Confocal Light Sheet Microscopy","authors":"A. Bria, G. Iannello, P. Soda, Hanchuan Peng, G. Erbacci, G. Fiameni, Giacomo Mariani, R. Mucci, M. Rorro, F. Pavone, L. Silvestri, P. Frasconi, Roberto Cortini","doi":"10.1109/HPCSim.2014.6903741","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903741","url":null,"abstract":"Scientific problems dealing with the processing of large amounts of data require efforts in the integration of proper services and applications to facilitate the research activity, interacting with high performance computing resources. Easier access to these resources have a profound impact on research in neuroscience, leading to advances in the management and processing of neuro-anatomical images. An ever increasing amount of data are constantly collected with a consequent demand of top-class computational resources to process them. In this paper, a HPC infrastructure for the management and the processing of neuro-anatomical images is presented, introducing the effort made to optimize and integrate specific applications in order to fully exploit the available resources.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"7 1","pages":"592-599"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83601573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSIM.2014.6903713
Javier Conejero, María Blanca Caminero, C. Carrión
Over the last few years, Big Data analysis (i.e., crunching enormous amounts of data from different sources to extract useful knowledge for improving business objectives) has attracted huge attention from enterprises and research institutions. One of the most successful paradigms that has gained popularity in order to analyse this huge amount of data, is MapReduce (and particularly Hadoop, its open source implementation). However, Hadoop-based applications require massive amounts of resources in order to conduct different analysis of large amounts of data. This growing requirements that research and enterprises demand from the actual computing infrastructures empowers the Cloud computing utilization, where there is an increasing demand of Hadoop as a Service. Since Hadoop requires a distributed environment in order to operate, a significant problem is where resources are located. Focusing in Cloud environments, this problem lays mainly on the criteria for Virtual Machine (VM) placement. The work presented in this paper focuses on the analysis of performance, power consumption and resource usage by Hadoop applications when deploying Hadoop on Virtual Clusters (VCs) within a private IaaS Cloud. More precisely, the impact of different VM placement strategies on Hadoop-based application performance, power consumption and resource usage is measured. As a result, some conclusions on the optimal criteria for VM deployment are provided.
{"title":"Analysing Hadoop performance in a multi-user IaaS Cloud","authors":"Javier Conejero, María Blanca Caminero, C. Carrión","doi":"10.1109/HPCSIM.2014.6903713","DOIUrl":"https://doi.org/10.1109/HPCSIM.2014.6903713","url":null,"abstract":"Over the last few years, Big Data analysis (i.e., crunching enormous amounts of data from different sources to extract useful knowledge for improving business objectives) has attracted huge attention from enterprises and research institutions. One of the most successful paradigms that has gained popularity in order to analyse this huge amount of data, is MapReduce (and particularly Hadoop, its open source implementation). However, Hadoop-based applications require massive amounts of resources in order to conduct different analysis of large amounts of data. This growing requirements that research and enterprises demand from the actual computing infrastructures empowers the Cloud computing utilization, where there is an increasing demand of Hadoop as a Service. Since Hadoop requires a distributed environment in order to operate, a significant problem is where resources are located. Focusing in Cloud environments, this problem lays mainly on the criteria for Virtual Machine (VM) placement. The work presented in this paper focuses on the analysis of performance, power consumption and resource usage by Hadoop applications when deploying Hadoop on Virtual Clusters (VCs) within a private IaaS Cloud. More precisely, the impact of different VM placement strategies on Hadoop-based application performance, power consumption and resource usage is measured. As a result, some conclusions on the optimal criteria for VM deployment are provided.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"6 1","pages":"399-406"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90071729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903802
Seetha Pothapragada
Thrombogenicity in cardiovascular devices and pathologies is associated with flow-induced shear stress activation of platelets resulting from pathological flow patterns. This platelet activation process poses a major modeling challenge as it covers disparate spatiotemporal scales, from flow down to cellular, to subcellular, and to molecular scales. This challenge can be resolved by implementing multiscale simulations feasible only on supercomputers. The simulation must couple the macroscopic effects of blood plasma flow and stresses to a microscopic platelet dynamics. In an attempt to model this complex and multiscale behavior we have first developed a phenomenological three-dimensional coarse-grained molecular dynamics (CGMD) particle-based model. This model depicts resting platelets and simulates their characteristic filopodia formation observed during activation. Simulations results are compared with in vitro measurements of activated platelet morphological changes, such as the core axes and filopodia thicknesses and lengths, after exposure to the prescribed flow-induced shear stresses. More recently, we extended this model by incorporating the platelet in Dissipative Particle Dynamics (DPD) blood plasma flow and developed a dynamic coupling scheme that allows the simulation of flow-induced shear stress platelet activation. This portion of research is in progress.
{"title":"Supercomputer simulations of platelet activation in blood plasma at multiple scales","authors":"Seetha Pothapragada","doi":"10.1109/HPCSim.2014.6903802","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903802","url":null,"abstract":"Thrombogenicity in cardiovascular devices and pathologies is associated with flow-induced shear stress activation of platelets resulting from pathological flow patterns. This platelet activation process poses a major modeling challenge as it covers disparate spatiotemporal scales, from flow down to cellular, to subcellular, and to molecular scales. This challenge can be resolved by implementing multiscale simulations feasible only on supercomputers. The simulation must couple the macroscopic effects of blood plasma flow and stresses to a microscopic platelet dynamics. In an attempt to model this complex and multiscale behavior we have first developed a phenomenological three-dimensional coarse-grained molecular dynamics (CGMD) particle-based model. This model depicts resting platelets and simulates their characteristic filopodia formation observed during activation. Simulations results are compared with in vitro measurements of activated platelet morphological changes, such as the core axes and filopodia thicknesses and lengths, after exposure to the prescribed flow-induced shear stresses. More recently, we extended this model by incorporating the platelet in Dissipative Particle Dynamics (DPD) blood plasma flow and developed a dynamic coupling scheme that allows the simulation of flow-induced shear stress platelet activation. This portion of research is in progress.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"60 1","pages":"1011-1013"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72731147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903787
P. Gepner, V. Gamayunov, Wieslawa Litke, L. Sauge, C. Mazauric
In Intel's CPU releasing model, the new Ivy Bridge is a “TICK” that follows Sandy Bridge's (“TOCK”) microarchitecture principles, however, after undergoing a die shrink it is manufactured at 22nm. It also incorporates new micro-architectural upgrades. In this paper we shall evaluate the performance of a 16 bi-socket node cluster based on this 3rd generation Intel Xeon Processor E5-2697v2 meant for server and workstation market. The new architectural improvements are assessed via High Performance Computing Challenge (HPCC) benchmarks and NAS Parallel Benchmarks (NPB) where the interconnect technology is challenged by the standard Intel® MPI Benchmark suite performance evaluator. Finally we tested performance of the new system using the subset of the benchmark from PRACE consortium. We compare achieved results against the outcomes of the tests performed on clusters based on previous generations of Intel Xeon processors: Intel Xeon E5-2680 (“Sandy Bridge-EP”), Intel Xeon 5680 (“Westmere-EP”) and Intel Xeon 5570 (“Nehalem-EP”) respectively.
{"title":"Evaluation of Intel Xeon E5-2600v2 based cluster for technical computing workloads","authors":"P. Gepner, V. Gamayunov, Wieslawa Litke, L. Sauge, C. Mazauric","doi":"10.1109/HPCSim.2014.6903787","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903787","url":null,"abstract":"In Intel's CPU releasing model, the new Ivy Bridge is a “TICK” that follows Sandy Bridge's (“TOCK”) microarchitecture principles, however, after undergoing a die shrink it is manufactured at 22nm. It also incorporates new micro-architectural upgrades. In this paper we shall evaluate the performance of a 16 bi-socket node cluster based on this 3rd generation Intel Xeon Processor E5-2697v2 meant for server and workstation market. The new architectural improvements are assessed via High Performance Computing Challenge (HPCC) benchmarks and NAS Parallel Benchmarks (NPB) where the interconnect technology is challenged by the standard Intel® MPI Benchmark suite performance evaluator. Finally we tested performance of the new system using the subset of the benchmark from PRACE consortium. We compare achieved results against the outcomes of the tests performed on clusters based on previous generations of Intel Xeon processors: Intel Xeon E5-2680 (“Sandy Bridge-EP”), Intel Xeon 5680 (“Westmere-EP”) and Intel Xeon 5570 (“Nehalem-EP”) respectively.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"5 1","pages":"919-926"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79609984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903765
Joan O. Omeru, David B. Thomas
The pricing of financial derivatives is an important problem in risk analysis and real-time trading. The need for faster and more accurate pricing has led financial institutions to adopt GPU technology, but this means we need new pricing algorithms designed specifically for GPU architectures. This research tackles the design of adaptable algorithms for option evaluation using lattices, a commonly used numerical technique. Usually lattice nodes are placed on a fixed grid at a high resolution, but by coarsening the grid in areas of low error, we can reduce run-time without a reduction in accuracy. We show that this adaptable grid can be designed to map onto the underlying architecture of warp-based GPUs, providing a tradeoff between faster execution at the same error, or lower error for the same execution speed. We implemented this algorithm in platform-independent OpenCL, and evaluated it on the Nvidia Quadro K4000, across different option classes. We present accuracy and speed-up results from using our hybrid lattice mesh model over an equivalent standard lattice implementation.
{"title":"A GPU accelerated hybrid lattice-grid algorithm for options pricing","authors":"Joan O. Omeru, David B. Thomas","doi":"10.1109/HPCSim.2014.6903765","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903765","url":null,"abstract":"The pricing of financial derivatives is an important problem in risk analysis and real-time trading. The need for faster and more accurate pricing has led financial institutions to adopt GPU technology, but this means we need new pricing algorithms designed specifically for GPU architectures. This research tackles the design of adaptable algorithms for option evaluation using lattices, a commonly used numerical technique. Usually lattice nodes are placed on a fixed grid at a high resolution, but by coarsening the grid in areas of low error, we can reduce run-time without a reduction in accuracy. We show that this adaptable grid can be designed to map onto the underlying architecture of warp-based GPUs, providing a tradeoff between faster execution at the same error, or lower error for the same execution speed. We implemented this algorithm in platform-independent OpenCL, and evaluated it on the Nvidia Quadro K4000, across different option classes. We present accuracy and speed-up results from using our hybrid lattice mesh model over an equivalent standard lattice implementation.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"4 12 1","pages":"758-765"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78474644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}