Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903782
N. Mohamed, Latifa Al-Muhairi, J. Al-Jaroodi, I. Jawhar
Underwater Acoustic Sensor Networks (UASNs) can be used to monitor long underwater pipeline structures for oil, gas, and water. In this case, a special type of UASNs, UASN-P (UASN for long pipelines) is used. One of the main challenges of using UASN-P is the reliability of the connections among the nodes. Faults in a few contiguous nodes may cause the creation of holes which will result in dividing the network into multiple disconnected segments. As a result, sensor nodes that are located between holes may not be able to deliver their sensed information which negativity affects the network sensing coverage. This paper provides an analysis of the different types of faults in UASN-P and studies their negative impact on the sensing coverage. We utilize Autonomous Underwater Vehicles (AUVs) and develop two models to overcome these faults and enhance coverage. The first model utilizes AUVs to function as mobile sensor nodes to cover the network holes while the second model uses the AUVs to deliver and deploy fixed sensor nodes in the network holes to replace faulty nodes. In both models, placed nodes can provide additional sensing coverage as well as enable connectivity among disconnected segments in the UASN-P. A strategy for best allocation using a limited number of sensors or sensing vehicles is developed. In addition, evaluations and comparison between both models are provided.
{"title":"A fault-tolerant acoustic sensor network for monitoring underwater pipelines","authors":"N. Mohamed, Latifa Al-Muhairi, J. Al-Jaroodi, I. Jawhar","doi":"10.1109/HPCSim.2014.6903782","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903782","url":null,"abstract":"Underwater Acoustic Sensor Networks (UASNs) can be used to monitor long underwater pipeline structures for oil, gas, and water. In this case, a special type of UASNs, UASN-P (UASN for long pipelines) is used. One of the main challenges of using UASN-P is the reliability of the connections among the nodes. Faults in a few contiguous nodes may cause the creation of holes which will result in dividing the network into multiple disconnected segments. As a result, sensor nodes that are located between holes may not be able to deliver their sensed information which negativity affects the network sensing coverage. This paper provides an analysis of the different types of faults in UASN-P and studies their negative impact on the sensing coverage. We utilize Autonomous Underwater Vehicles (AUVs) and develop two models to overcome these faults and enhance coverage. The first model utilizes AUVs to function as mobile sensor nodes to cover the network holes while the second model uses the AUVs to deliver and deploy fixed sensor nodes in the network holes to replace faulty nodes. In both models, placed nodes can provide additional sensing coverage as well as enable connectivity among disconnected segments in the UASN-P. A strategy for best allocation using a limited number of sensors or sensing vehicles is developed. In addition, evaluations and comparison between both models are provided.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"02 1","pages":"877-884"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85960472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903776
S. Mirri, Catia Prandi, P. Salomoni
This work presents mPASS (mobile Pervasive Accessibility Social Sensing), a social and ubiquitous context aware system to provide users with personalized and accessible pedestrian paths and maps. In order to collect a complete data set, our system gathers information from different sources: sensing, crowdsourcing and data produced by local authors and disability organizations. Gathered information are tailored to user's needs and preferences on the basis of his/her context, defined by his/her location, his/her profile and quality of data about the personalized path. To support the effectiveness of our approach, we have developed a prototype, which is described in this paper, together with some results of the context-based adaptation.
{"title":"A context-aware system for personalized and accessible pedestrian paths","authors":"S. Mirri, Catia Prandi, P. Salomoni","doi":"10.1109/HPCSim.2014.6903776","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903776","url":null,"abstract":"This work presents mPASS (mobile Pervasive Accessibility Social Sensing), a social and ubiquitous context aware system to provide users with personalized and accessible pedestrian paths and maps. In order to collect a complete data set, our system gathers information from different sources: sensing, crowdsourcing and data produced by local authors and disability organizations. Gathered information are tailored to user's needs and preferences on the basis of his/her context, defined by his/her location, his/her profile and quality of data about the personalized path. To support the effectiveness of our approach, we have developed a prototype, which is described in this paper, together with some results of the context-based adaptation.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"51 1","pages":"833-840"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81141549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903670
Lin Ma, R. Chamberlain, Kunal Agrawal
The recently developed Threaded Many-core Memory (TMM) model provides a framework for analyzing algorithms for highly-threaded many-core machines such as GPUs. In particular, it tries to capture the fact that these machines hide memory latencies via the use of a large number of threads and large memory bandwidth. The TMM model analysis contains two components: computational complexity and memory complexity. A model is only useful if it can explain and predict empirical data. In this work, we investigate the effectiveness of the TMM model. We analyze algorithms for 5 classic problems - suffix tree/array for string matching, fast Fourier transform, merge sort, list ranking, and all-pairs shortest paths-under this model, and compare the results of the analysis with the experimental findings of ours and other researchers who have implemented and measured the performance of these algorithms on an spectrum of diverse GPUs. We find that the TMM model is able to predict important and sometimes previously unexplained trends and artifacts in the experimental data.
{"title":"Analysis of classic algorithms on GPUs","authors":"Lin Ma, R. Chamberlain, Kunal Agrawal","doi":"10.1109/HPCSim.2014.6903670","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903670","url":null,"abstract":"The recently developed Threaded Many-core Memory (TMM) model provides a framework for analyzing algorithms for highly-threaded many-core machines such as GPUs. In particular, it tries to capture the fact that these machines hide memory latencies via the use of a large number of threads and large memory bandwidth. The TMM model analysis contains two components: computational complexity and memory complexity. A model is only useful if it can explain and predict empirical data. In this work, we investigate the effectiveness of the TMM model. We analyze algorithms for 5 classic problems - suffix tree/array for string matching, fast Fourier transform, merge sort, list ranking, and all-pairs shortest paths-under this model, and compare the results of the analysis with the experimental findings of ours and other researchers who have implemented and measured the performance of these algorithms on an spectrum of diverse GPUs. We find that the TMM model is able to predict important and sometimes previously unexplained trends and artifacts in the experimental data.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"65-73"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85846575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903791
D. Dosimont, Generoso Pagano, Guillaume Huard, Vania Marangozova-Martin, J. Vincent
The growing complexity of computer system hardware and software makes their behavior analysis a challenging task. In this context, tracing appears to be a promising solution as it provides relevant information about the system execution. However, trace analysis techniques and tools lack in providing the analyst the way to perform an efficient analysis flow because of several issues. First, traces contain a huge volume of data difficult to store, load in memory and work with. Then, the analysis flow is hindered by various result formats, provided by different analysis techniques, often incompatible. Last, analysis frameworks lack an entry point to understand the traced application general behavior. Indeed, traditional visualization techniques suffer from time and space scalability issues due to screen size, and are not able to represent the full trace. In this article, we present how to do an efficient analysis by using the Shneiderman's mantra: “Overview first, zoom and filter, then details on demand”. Our methodology is based on FrameSoC, a trace management infrastructure that provides solutions for trace storage, data access, and analysis flow, managing analysis results and tool. Ocelotl, a visualization tool, takes advantage of FrameSoC and shows a synthetic representation of a trace by using a time aggregation. This visualization solves scalability issues and provides an entry point for the analysis by showing phases and behavior disruptions, with the objective of getting more details by focusing on the interesting trace parts.
{"title":"Efficient analysis methodology for huge application traces","authors":"D. Dosimont, Generoso Pagano, Guillaume Huard, Vania Marangozova-Martin, J. Vincent","doi":"10.1109/HPCSim.2014.6903791","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903791","url":null,"abstract":"The growing complexity of computer system hardware and software makes their behavior analysis a challenging task. In this context, tracing appears to be a promising solution as it provides relevant information about the system execution. However, trace analysis techniques and tools lack in providing the analyst the way to perform an efficient analysis flow because of several issues. First, traces contain a huge volume of data difficult to store, load in memory and work with. Then, the analysis flow is hindered by various result formats, provided by different analysis techniques, often incompatible. Last, analysis frameworks lack an entry point to understand the traced application general behavior. Indeed, traditional visualization techniques suffer from time and space scalability issues due to screen size, and are not able to represent the full trace. In this article, we present how to do an efficient analysis by using the Shneiderman's mantra: “Overview first, zoom and filter, then details on demand”. Our methodology is based on FrameSoC, a trace management infrastructure that provides solutions for trace storage, data access, and analysis flow, managing analysis results and tool. Ocelotl, a visualization tool, takes advantage of FrameSoC and shows a synthetic representation of a trace by using a time aggregation. This visualization solves scalability issues and provides an entry point for the analysis by showing phases and behavior disruptions, with the objective of getting more details by focusing on the interesting trace parts.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"23 1","pages":"951-958"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77832099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903790
B. Putigny, Brice Goglin, Denis Barthou
The increasing computation capability of servers comes with a dramatic increase of their complexity through many cores, multiple levels of caches and NUMA architectures. Exploiting the computing power is increasingly harder and programmers need ways to understand the performance behavior. We present an innovative approach for predicting the performance of memory-bound multi-threaded applications. It relies on micro-benchmarks and a compositional model, combining measures of micro-benchmarks in order to model larger codes. Our memory model takes into account cache sizes and cache coherence protocols, having a large impact on performance of multi-threaded codes. Applying this model to real world HPC kernels shows that it can predict their performance with good accuracy, helping taking optimization decisions to increase application's performance.
{"title":"A benchmark-based performance model for memory-bound HPC applications","authors":"B. Putigny, Brice Goglin, Denis Barthou","doi":"10.1109/HPCSim.2014.6903790","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903790","url":null,"abstract":"The increasing computation capability of servers comes with a dramatic increase of their complexity through many cores, multiple levels of caches and NUMA architectures. Exploiting the computing power is increasingly harder and programmers need ways to understand the performance behavior. We present an innovative approach for predicting the performance of memory-bound multi-threaded applications. It relies on micro-benchmarks and a compositional model, combining measures of micro-benchmarks in order to model larger codes. Our memory model takes into account cache sizes and cache coherence protocols, having a large impact on performance of multi-threaded codes. Applying this model to real world HPC kernels shows that it can predict their performance with good accuracy, helping taking optimization decisions to increase application's performance.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"77 1","pages":"943-950"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83878670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903683
F. Cicirelli, Agostino Forestiero, Andrea Giordano, C. Mastroianni
This paper presents an approach for the efficient parallel/distributed execution of ant algorithms, based on multi-agent systems. A very popular clustering problem, i.e., the spatially sorting of items belonging to a number of predefined classes, is taken as a use case. The approach consists in partitioning the problem space to a number of parallel nodes. Data consistency and conflict issues, which may arise when multiple agents concurrently access shared data, are transparently handled using a purposely developed notion of logical time. The developer remains in charge only of defining the behavior of the agents modeling the ants, without coping with issues related to parallel/distributed programming and performance optimization. Experimental results show that the approach is scalable and can be adopted to speed up the ant algorithm execution when the problem size is large, as may be in the case of massive data analysis and clustering.
{"title":"An approach for scalable parallel execution of ant algorithms","authors":"F. Cicirelli, Agostino Forestiero, Andrea Giordano, C. Mastroianni","doi":"10.1109/HPCSim.2014.6903683","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903683","url":null,"abstract":"This paper presents an approach for the efficient parallel/distributed execution of ant algorithms, based on multi-agent systems. A very popular clustering problem, i.e., the spatially sorting of items belonging to a number of predefined classes, is taken as a use case. The approach consists in partitioning the problem space to a number of parallel nodes. Data consistency and conflict issues, which may arise when multiple agents concurrently access shared data, are transparently handled using a purposely developed notion of logical time. The developer remains in charge only of defining the behavior of the agents modeling the ants, without coping with issues related to parallel/distributed programming and performance optimization. Experimental results show that the approach is scalable and can be adopted to speed up the ant algorithm execution when the problem size is large, as may be in the case of massive data analysis and clustering.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"12 1","pages":"170-177"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88935556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903779
P. Barsocchi, E. Ferro, L. Fortunati, Fabio Mavilia, Filippo Palumbo
The increasing demand for building services and comfort levels, together with the rise in time spent inside buildings, assure an upward trend in energy demand for the future. In this paper we present a long term energy monitoring system called EMS@CNR that is able to measure the energy consumed by end users in office environments. The system has been tested monitoring the power consumption of a testbed room of the CNR research area in Pisa. The proposed infrastructure stands as an enabling technology for future in-building location-based services. As preliminary results we showed the potentiality of EMS@CNR in long term monitoring of the user working behaviors.
{"title":"EMS@CNR: An Energy monitoring sensor network infrastructure for in-building location-based services","authors":"P. Barsocchi, E. Ferro, L. Fortunati, Fabio Mavilia, Filippo Palumbo","doi":"10.1109/HPCSim.2014.6903779","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903779","url":null,"abstract":"The increasing demand for building services and comfort levels, together with the rise in time spent inside buildings, assure an upward trend in energy demand for the future. In this paper we present a long term energy monitoring system called EMS@CNR that is able to measure the energy consumed by end users in office environments. The system has been tested monitoring the power consumption of a testbed room of the CNR research area in Pisa. The proposed infrastructure stands as an enabling technology for future in-building location-based services. As preliminary results we showed the potentiality of EMS@CNR in long term monitoring of the user working behaviors.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"98 1","pages":"857-862"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79217580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903761
L. D’Amore, A. Murli, V. Boccia, L. Carracciuolo
This paper addresses the scientific challenges related to high level implementation strategies which steer the NEMO (Nucleus for European Modelling of the Ocean) code toward the effective exploitation of the opportunities offered by exascale systems. We consider, as case studies, two components of the NEMO ocean model (OPA-Ocean PArallelization): the Sea Surface Height equation solver and the Variational Data Assimilation module. The advantages rising from the insertion of consolidated scientific libraries in the NEMO code are highlighted: such advantages concern both the “software quality” improvement (see the software quality parameters like robustness, portability, resilience, etc.) and the reduction of time spent for software development and maintenance. Finally, we consider the Shallow Water equations as a toy model for NEMO ocean model to show how the use of PETSc objects predisposes the application to gain a good level of scalability and efficiency when the most suitable level of abstraction is used.
{"title":"Insertion of PETSc in the NEMO stack software driving NEMO towards exascale computing","authors":"L. D’Amore, A. Murli, V. Boccia, L. Carracciuolo","doi":"10.1109/HPCSim.2014.6903761","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903761","url":null,"abstract":"This paper addresses the scientific challenges related to high level implementation strategies which steer the NEMO (Nucleus for European Modelling of the Ocean) code toward the effective exploitation of the opportunities offered by exascale systems. We consider, as case studies, two components of the NEMO ocean model (OPA-Ocean PArallelization): the Sea Surface Height equation solver and the Variational Data Assimilation module. The advantages rising from the insertion of consolidated scientific libraries in the NEMO code are highlighted: such advantages concern both the “software quality” improvement (see the software quality parameters like robustness, portability, resilience, etc.) and the reduction of time spent for software development and maintenance. Finally, we consider the Shallow Water equations as a toy model for NEMO ocean model to show how the use of PETSc objects predisposes the application to gain a good level of scalability and efficiency when the most suitable level of abstraction is used.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"63 1","pages":"724-731"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91043766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903735
M. Bernaschi, Marco Cianfriglia, Antonio Di Marco, A. Sabellico, G. Me, Giancarlo Carbone, G. Totaro
We describe a solution for fast indexing and searching within large heterogeneous data sets whose main purpose is to support investigators that need to analyze forensic disk images originated by seizures or created from bodies of evidence. Our approach is based on a combination of techniques aimed at improving efficiency and reliability of the indexing process.We do not rely on existing frameworks like Hadoop but borrow concepts from different contexts including High Performance Computing and Database management.
{"title":"Forensic disk image indexing and search in an HPC environment","authors":"M. Bernaschi, Marco Cianfriglia, Antonio Di Marco, A. Sabellico, G. Me, Giancarlo Carbone, G. Totaro","doi":"10.1109/HPCSim.2014.6903735","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903735","url":null,"abstract":"We describe a solution for fast indexing and searching within large heterogeneous data sets whose main purpose is to support investigators that need to analyze forensic disk images originated by seizures or created from bodies of evidence. Our approach is based on a combination of techniques aimed at improving efficiency and reliability of the indexing process.We do not rely on existing frameworks like Hadoop but borrow concepts from different contexts including High Performance Computing and Database management.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"26 1","pages":"558-565"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78417697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903671
Brice Goglin
Modern computing platforms are increasingly complex, with multiple cores, shared caches, and NUMA architectures. Parallel applications developers have to take locality into account before they can expect good efficiency on these platforms. Thus there is a strong need for a portable tool gathering and exposing this information. The Hardware Locality project (hwloc) offers a tree representation of the hardware based on the inclusion and localities of the CPU and memory resources. It is already widely used for affinity-based task placement in high performance computing. In this article we present how hwloc is extended to describe more than computing and memory resources. Indeed, I/O device locality is becoming another important aspect of locality since high performance GPUs, network or InfiniBand interfaces possess privileged access to some of the cores and memory banks. hwloc integrates this knowledge into its topology representation and offers an interoperability API to extend existing libraries such as CUDA with locality information. We also describe how hwloc now helps process managers and batch schedulers to deal with the topology of multiple cluster nodes, together with compression for better scalability up to thousands of nodes.
{"title":"Managing the topology of heterogeneous cluster nodes with hardware locality (hwloc)","authors":"Brice Goglin","doi":"10.1109/HPCSim.2014.6903671","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903671","url":null,"abstract":"Modern computing platforms are increasingly complex, with multiple cores, shared caches, and NUMA architectures. Parallel applications developers have to take locality into account before they can expect good efficiency on these platforms. Thus there is a strong need for a portable tool gathering and exposing this information. The Hardware Locality project (hwloc) offers a tree representation of the hardware based on the inclusion and localities of the CPU and memory resources. It is already widely used for affinity-based task placement in high performance computing. In this article we present how hwloc is extended to describe more than computing and memory resources. Indeed, I/O device locality is becoming another important aspect of locality since high performance GPUs, network or InfiniBand interfaces possess privileged access to some of the cores and memory banks. hwloc integrates this knowledge into its topology representation and offers an interoperability API to extend existing libraries such as CUDA with locality information. We also describe how hwloc now helps process managers and batch schedulers to deal with the topology of multiple cluster nodes, together with compression for better scalability up to thousands of nodes.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"34 1","pages":"74-81"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84928044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}