Sai Prashanth Muralidhara, M. Kandemir, Orhan Kislal
Modern multicore architectures have multiple cores connected to a hierarchical cache structure resulting in heterogeneity in cache sharing across different subsets of cores. In these systems, overall throughput and efficiency depends heavily on a careful mapping of applications to available cores. In this paper, we study the problem of application-to-core mapping with the goal of trying to improve the overall cache performance in the presence of a hierarchical multi-level cache structure. We propose to sample the memory access patterns of individual applications and build their reuse distance distributions. Further, we propose to use these reuse distance distributions to compute an application-to-core mapping that tries to improve the overall cache performance, and consequently, the overall throughput. We show that our proposed mapping scheme is very effective in practice yielding throughput benefits of about 39% over the worst case mapping and about 30% over the default operating system based mapping. We believe, as larger chip multiprocessors with deeper cache hierarchies are projected to be the norm in the future, efficient mapping of applications to cores will become a vital requirement to extract the maximum possible performance from these systems.
{"title":"Reuse distance based performance modeling and workload mapping","authors":"Sai Prashanth Muralidhara, M. Kandemir, Orhan Kislal","doi":"10.1145/2212908.2212936","DOIUrl":"https://doi.org/10.1145/2212908.2212936","url":null,"abstract":"Modern multicore architectures have multiple cores connected to a hierarchical cache structure resulting in heterogeneity in cache sharing across different subsets of cores. In these systems, overall throughput and efficiency depends heavily on a careful mapping of applications to available cores. In this paper, we study the problem of application-to-core mapping with the goal of trying to improve the overall cache performance in the presence of a hierarchical multi-level cache structure. We propose to sample the memory access patterns of individual applications and build their reuse distance distributions. Further, we propose to use these reuse distance distributions to compute an application-to-core mapping that tries to improve the overall cache performance, and consequently, the overall throughput. We show that our proposed mapping scheme is very effective in practice yielding throughput benefits of about 39% over the worst case mapping and about 30% over the default operating system based mapping. We believe, as larger chip multiprocessors with deeper cache hierarchies are projected to be the norm in the future, efficient mapping of applications to cores will become a vital requirement to extract the maximum possible performance from these systems.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130310935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
While artificial intelligence (AI) in games is often associated with enhancing the behavior of non-player characters, at its cutting edge AI offers the potential for entirely new kinds of gaming experiences. In this talk I will focus on this frontier of AI in games through three examples of games from my research that are not only enhanced by AI, but would not even be possible without the unique AI techniques behind them. In these experimental games, called NERO, Galactic Arms Race, and Petalz, players become teachers, AI creates its own content, and unique creations are explicitly bred and traded by the players themselves. The discussion will focus on the inspiration for the technologies behind these games (including some related applications) and the long-term implications of unique and creative AI algorithms for gaming.
{"title":"How AI can change the way we play games","authors":"Kenneth O. Stanley","doi":"10.1145/2212908.2212956","DOIUrl":"https://doi.org/10.1145/2212908.2212956","url":null,"abstract":"While artificial intelligence (AI) in games is often associated with enhancing the behavior of non-player characters, at its cutting edge AI offers the potential for entirely new kinds of gaming experiences. In this talk I will focus on this frontier of AI in games through three examples of games from my research that are not only enhanced by AI, but would not even be possible without the unique AI techniques behind them. In these experimental games, called NERO, Galactic Arms Race, and Petalz, players become teachers, AI creates its own content, and unique creations are explicitly bred and traded by the players themselves. The discussion will focus on the inspiration for the technologies behind these games (including some related applications) and the long-term implications of unique and creative AI algorithms for gaming.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"25 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130415218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The long heralded transition of photonic technology from a rack to rack interconnect to an integral part of the system architecture is underway. Silicon photonics, where the optical communications devices are fabricated using the same materials and processes as CMOS logic, will allow 3D or monolithically integrated devices to be created, minimizing the overhead for moving between the electronic and photonic domains. System architects will then be free to exploit the unique characteristics of photonic communications such as broadband switching and distance independence. Photonic interconnects are very sensitive to the performance of connectors, and so may favor architectures where redundancy and reconfiguration are used in preference to replacement.
{"title":"Towards truly integrated photonic and electronic computing","authors":"M. McLaren","doi":"10.1145/2212908.2212910","DOIUrl":"https://doi.org/10.1145/2212908.2212910","url":null,"abstract":"The long heralded transition of photonic technology from a rack to rack interconnect to an integral part of the system architecture is underway. Silicon photonics, where the optical communications devices are fabricated using the same materials and processes as CMOS logic, will allow 3D or monolithically integrated devices to be created, minimizing the overhead for moving between the electronic and photonic domains. System architects will then be free to exploit the unique characteristics of photonic communications such as broadband switching and distance independence. Photonic interconnects are very sensitive to the performance of connectors, and so may favor architectures where redundancy and reconfiguration are used in preference to replacement.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126652058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DEEP is a multipartner international cooperation project supported by the EU FP7 that introduces a flexible global system architecture using general purpose and manycore processor architectures (based on IntelMIC: many integrated core architecture). With XTOLL, DEEP uses a very powerful interconnection structure, which allows for the arrangement of different application oriented ratios between general purpose processor and accelerator. The project includes research and development on program technologies, tools, applications, and looks at energy efficient computing methodologies.
{"title":"DEEP: an exascale prototype architecture based on a flexible configuration","authors":"A. Bode","doi":"10.1145/2212908.2212960","DOIUrl":"https://doi.org/10.1145/2212908.2212960","url":null,"abstract":"DEEP is a multipartner international cooperation project supported by the EU FP7 that introduces a flexible global system architecture using general purpose and manycore processor architectures (based on IntelMIC: many integrated core architecture). With XTOLL, DEEP uses a very powerful interconnection structure, which allows for the arrangement of different application oriented ratios between general purpose processor and accelerator. The project includes research and development on program technologies, tools, applications, and looks at energy efficient computing methodologies.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114204704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The current trend in high performance computing (HPC) systems is to deploy parallel computers equipped with general purpose multi-core processors and possibly multi-core streaming accelerators. However, the performance of these multi-cores is often constrained by the limited external bandwidth or by badly matching data access patterns. The latter reduces the size of useful data during memory transactions. A change in the application algorithm can improve the memory accesses but a hardware support mechanism for an application specific data arrangement in the memory hierarchy can significantly boost the performance for many application domains. In this work, we present a conceptual computing architecture named BSArc (Blacksmith Streaming Architecture). BSArc introduces a forging front-end to efficiently distribute data to a large set of simple streaming processors in the back-end. We apply this concept to a SIMT execution model and present a design space exploration in the context of a GPU-like streaming architecture with a reconfigurable application specific front-end. These design space explorations are carried out on a streaming architectural simulator that models BSArc. We evaluate the performance advantages for the BSArc design against a standard L2 cache in a GPU-like device. In our evaluations we use three application kernels: 2D-FFT, Matrix-Matrix Multiplication and 3D-Stencil. The results show that employing an application specific arrangement of data on these kernels achieves an average speedup of 2.3× compared to a standard cache for a GPU-like streaming device.
{"title":"BSArc: blacksmith streaming architecture for HPC accelerators","authors":"M. Shafiq, M. Pericàs, N. Navarro, E. Ayguadé","doi":"10.1145/2212908.2212914","DOIUrl":"https://doi.org/10.1145/2212908.2212914","url":null,"abstract":"The current trend in high performance computing (HPC) systems is to deploy parallel computers equipped with general purpose multi-core processors and possibly multi-core streaming accelerators. However, the performance of these multi-cores is often constrained by the limited external bandwidth or by badly matching data access patterns. The latter reduces the size of useful data during memory transactions. A change in the application algorithm can improve the memory accesses but a hardware support mechanism for an application specific data arrangement in the memory hierarchy can significantly boost the performance for many application domains.\u0000 In this work, we present a conceptual computing architecture named BSArc (Blacksmith Streaming Architecture). BSArc introduces a forging front-end to efficiently distribute data to a large set of simple streaming processors in the back-end. We apply this concept to a SIMT execution model and present a design space exploration in the context of a GPU-like streaming architecture with a reconfigurable application specific front-end. These design space explorations are carried out on a streaming architectural simulator that models BSArc. We evaluate the performance advantages for the BSArc design against a standard L2 cache in a GPU-like device. In our evaluations we use three application kernels: 2D-FFT, Matrix-Matrix Multiplication and 3D-Stencil. The results show that employing an application specific arrangement of data on these kernels achieves an average speedup of 2.3× compared to a standard cache for a GPU-like streaming device.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125588631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
More than a decade after the early research efforts on the use of artificial intelligence (AI) in computer games and the establishment of a new AI domain the term ``game AI'' needs to be redefined. Traditionally, the tasks associated with game AI revolved around non player character (NPC) behavior at different levels of control, varying from navigation and pathfinding to decision making. Commercial-standard games developed over the last 15 years and current game productions, however, suggest that the traditional challenges of game AI have been well addressed via the use of sophisticated AI approaches, not necessarily following or inspired by advances in academic practices. The marginal penetration of traditional academic game AI methods in industrial productions has been mainly due to the lack of constructive communication between academia and industry in the early days of academic game AI, and the inability of academic game AI to propose methods that would significantly advance existing development processes or provide scalable solutions to real world problems. Recently, however, there has been a shift of research focus as the current plethora of AI uses in games is breaking the non-player character AI tradition. A number of those alternative AI uses have already shown a significant potential for the design of better games. This paper presents four key game AI research areas that are currently reshaping the research roadmap in the game AI field and evidently put the game AI term under a new perspective. These game AI flagship research areas include the computational modeling of player experience, the procedural generation of content, the mining of player data on massive-scale and the alternative AI research foci for enhancing NPC capabilities.
{"title":"Game AI revisited","authors":"Georgios N. Yannakakis","doi":"10.1145/2212908.2212954","DOIUrl":"https://doi.org/10.1145/2212908.2212954","url":null,"abstract":"More than a decade after the early research efforts on the use of artificial intelligence (AI) in computer games and the establishment of a new AI domain the term ``game AI'' needs to be redefined. Traditionally, the tasks associated with game AI revolved around non player character (NPC) behavior at different levels of control, varying from navigation and pathfinding to decision making. Commercial-standard games developed over the last 15 years and current game productions, however, suggest that the traditional challenges of game AI have been well addressed via the use of sophisticated AI approaches, not necessarily following or inspired by advances in academic practices. The marginal penetration of traditional academic game AI methods in industrial productions has been mainly due to the lack of constructive communication between academia and industry in the early days of academic game AI, and the inability of academic game AI to propose methods that would significantly advance existing development processes or provide scalable solutions to real world problems. Recently, however, there has been a shift of research focus as the current plethora of AI uses in games is breaking the non-player character AI tradition. A number of those alternative AI uses have already shown a significant potential for the design of better games.\u0000 This paper presents four key game AI research areas that are currently reshaping the research roadmap in the game AI field and evidently put the game AI term under a new perspective. These game AI flagship research areas include the computational modeling of player experience, the procedural generation of content, the mining of player data on massive-scale and the alternative AI research foci for enhancing NPC capabilities.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115463508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Massively Parallel Systems-on-chip represent the new frontier of integrated computing systems for general purpose computing. The integration of a huge number of cores poses several issues such as the efficiency and flexibility of the interconnection network in order to serve in the best way the different traffic patterns that can arise. In this paper we present the CYBER architecture, an advanced Network-on-Chip (NoC) for concurrent hybrid switching with prioritized best effort Quality of Service. Compared to similar architectures, CYBER allows the simultaneous exploitation of packet switching and circuit switching, providing two different priorities to packets in order to be able to transmit urgent messages (e.g. signalling) while long-lasting transactions and huge packets congestion are present. In terms of the typical NoC metrics, evaluated on synthetic traffic representative of several application categories, their standard trend is degraded while serving both circuit and packet switching simultaneously but the architecture preserves a predictable behaviour. A CMOS 90nm implementation reveals a maximum operating frequency of about 1GHz.
{"title":"Concurrent hybrid switching for massively parallel systems-on-chip: the CYBER architecture","authors":"F. Palumbo, D. Pani, A. Congiu, L. Raffo","doi":"10.1145/2212908.2212933","DOIUrl":"https://doi.org/10.1145/2212908.2212933","url":null,"abstract":"Massively Parallel Systems-on-chip represent the new frontier of integrated computing systems for general purpose computing. The integration of a huge number of cores poses several issues such as the efficiency and flexibility of the interconnection network in order to serve in the best way the different traffic patterns that can arise.\u0000 In this paper we present the CYBER architecture, an advanced Network-on-Chip (NoC) for concurrent hybrid switching with prioritized best effort Quality of Service. Compared to similar architectures, CYBER allows the simultaneous exploitation of packet switching and circuit switching, providing two different priorities to packets in order to be able to transmit urgent messages (e.g. signalling) while long-lasting transactions and huge packets congestion are present. In terms of the typical NoC metrics, evaluated on synthetic traffic representative of several application categories, their standard trend is degraded while serving both circuit and packet switching simultaneously but the architecture preserves a predictable behaviour. A CMOS 90nm implementation reveals a maximum operating frequency of about 1GHz.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114501396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Bertolli, A. Betts, P. Kelly, G. Mudalige, M. Giles
Applications based on unstructured meshes are typically compute intensive, leading to long running times. In principle, state-of-the-art hardware, such as multi-core CPUs and many-core GPUs, could be used for their acceleration but these esoteric architectures require specialised knowledge to achieve optimal performance. OP2 is a parallel programming layer which attempts to ease this programming burden by allowing programmers to express parallel iterations over elements in the unstructured mesh through an API call, a so-called OP2-loop. The OP2 compiler infrastructure then uses source-to-source transformations to realise a parallel implementation of each OP2-loop and discover opportunities for optimisation. In this paper, we describe how several compiler techniques can be effectively utilised in tandem to increase the performance of unstructured mesh applications. In particular, we show how whole-program analysis --- which is often inhibited due to the size of the control flow graph - often becomes feasible as a result of the OP2 programming model, facilitating aggressive optimisation. We subsequently show how whole-program analysis then becomes an enabler to OP2-loop optimisations. Based on this, we show how a classical technique, namely loop fusion, which is typically difficult to apply to unstructured mesh applications, can be defined at compile-time. We examine the limits of its application and show experimental results on a computational fluid dynamic application benchmark, assessing the performance gains due to loop fusion.
{"title":"Mesh independent loop fusion for unstructured mesh applications","authors":"C. Bertolli, A. Betts, P. Kelly, G. Mudalige, M. Giles","doi":"10.1145/2212908.2212917","DOIUrl":"https://doi.org/10.1145/2212908.2212917","url":null,"abstract":"Applications based on unstructured meshes are typically compute intensive, leading to long running times. In principle, state-of-the-art hardware, such as multi-core CPUs and many-core GPUs, could be used for their acceleration but these esoteric architectures require specialised knowledge to achieve optimal performance. OP2 is a parallel programming layer which attempts to ease this programming burden by allowing programmers to express parallel iterations over elements in the unstructured mesh through an API call, a so-called OP2-loop. The OP2 compiler infrastructure then uses source-to-source transformations to realise a parallel implementation of each OP2-loop and discover opportunities for optimisation.\u0000 In this paper, we describe how several compiler techniques can be effectively utilised in tandem to increase the performance of unstructured mesh applications. In particular, we show how whole-program analysis --- which is often inhibited due to the size of the control flow graph - often becomes feasible as a result of the OP2 programming model, facilitating aggressive optimisation. We subsequently show how whole-program analysis then becomes an enabler to OP2-loop optimisations. Based on this, we show how a classical technique, namely loop fusion, which is typically difficult to apply to unstructured mesh applications, can be defined at compile-time. We examine the limits of its application and show experimental results on a computational fluid dynamic application benchmark, assessing the performance gains due to loop fusion.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132022786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-core and many-core were already major trends for the past six years and are expected to continue for the next decade. With these trends of parallel computing, it becomes increasingly difficult to decide on which processor to run a given application, mainly because the programming of these processors has become increasingly challenging. In this work, we present a model to predict the performance of a given application on a multi-core or many-core processor. Since programming these processors can be challenging and time consuming, our model does not require source code to be available for the target processor. This is in contrast to existing performance prediction techniques such as mathematical models and simulators, which require code to be available and optimized for the target architecture. To enable performance prediction prior to algorithm implementation, we classify algorithms using an existing algorithm classification. For each class, we create a specific instance of the roofline model, resulting in a new class-specific model. This new model, named the boat hull model, enables performance prediction and processor selection prior to the development of architecture specific code. We demonstrate the boat hull model using GPUs and CPUs as target architectures. We show that performance is accurately predicted for an example real-life application.
{"title":"The boat hull model: enabling performance prediction for parallel computing prior to code development","authors":"C. Nugteren, H. Corporaal","doi":"10.1145/2212908.2212937","DOIUrl":"https://doi.org/10.1145/2212908.2212937","url":null,"abstract":"Multi-core and many-core were already major trends for the past six years and are expected to continue for the next decade. With these trends of parallel computing, it becomes increasingly difficult to decide on which processor to run a given application, mainly because the programming of these processors has become increasingly challenging.\u0000 In this work, we present a model to predict the performance of a given application on a multi-core or many-core processor. Since programming these processors can be challenging and time consuming, our model does not require source code to be available for the target processor. This is in contrast to existing performance prediction techniques such as mathematical models and simulators, which require code to be available and optimized for the target architecture.\u0000 To enable performance prediction prior to algorithm implementation, we classify algorithms using an existing algorithm classification. For each class, we create a specific instance of the roofline model, resulting in a new class-specific model. This new model, named the boat hull model, enables performance prediction and processor selection prior to the development of architecture specific code.\u0000 We demonstrate the boat hull model using GPUs and CPUs as target architectures. We show that performance is accurately predicted for an example real-life application.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131571201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Galluppi, Sergio Davies, Alexander D. Rast, T. Sharp, L. Plana, S. Furber
Simulation of large networks of neurons is a powerful and increasingly prominent methodology for investigate brain functions and structures. Dedicated parallel hardware is a natural candidate for simulating the dynamic activity of many non-linear units communicating asynchronously. It is only scientifically useful, however, if the simulation tools can be configured and run easily and quickly. We present a method to map network models to computational nodes on the SpiNNaker system, a programmable parallel neurally-inspired hardware architecture, by exploiting the hierarchies built in the model. This PArtitioning and Configuration MANager (PACMAN) system supports arbitrary network topologies and arbitrary membrane potential and synapse dynamics, and (most importantly) decouples the model from the device, allowing a variety of languages (PyNN, Nengo, etc.) to drive the simulation hardware. Model representation operates on a Population/Projection level rather than a single-neuron and connection level, exploiting hierarchical properties to lower the complexity of allocating resources and mapping the model onto the system. PACMAN can be thus be used to generate structures coming from different models and front-ends, either with a host-based process, or by parallelising it on the SpiNNaker machine itself to speed up the generation process greatly. We describe the approach with a first implementation of the framework used to configure the current generation of SpiNNaker machines and present results from a set of key benchmarks. The system allows researchers to exploit dedicated simulation hardware which may otherwise be difficult to program. In effect, PACMAN provides automated hardware acceleration for some commonly used network simulators while also pointing towards the advantages of hierarchical configuration for large, domain-specific hardware systems.
{"title":"A hierachical configuration system for a massively parallel neural hardware platform","authors":"F. Galluppi, Sergio Davies, Alexander D. Rast, T. Sharp, L. Plana, S. Furber","doi":"10.1145/2212908.2212934","DOIUrl":"https://doi.org/10.1145/2212908.2212934","url":null,"abstract":"Simulation of large networks of neurons is a powerful and increasingly prominent methodology for investigate brain functions and structures. Dedicated parallel hardware is a natural candidate for simulating the dynamic activity of many non-linear units communicating asynchronously. It is only scientifically useful, however, if the simulation tools can be configured and run easily and quickly. We present a method to map network models to computational nodes on the SpiNNaker system, a programmable parallel neurally-inspired hardware architecture, by exploiting the hierarchies built in the model. This PArtitioning and Configuration MANager (PACMAN) system supports arbitrary network topologies and arbitrary membrane potential and synapse dynamics, and (most importantly) decouples the model from the device, allowing a variety of languages (PyNN, Nengo, etc.) to drive the simulation hardware. Model representation operates on a Population/Projection level rather than a single-neuron and connection level, exploiting hierarchical properties to lower the complexity of allocating resources and mapping the model onto the system. PACMAN can be thus be used to generate structures coming from different models and front-ends, either with a host-based process, or by parallelising it on the SpiNNaker machine itself to speed up the generation process greatly. We describe the approach with a first implementation of the framework used to configure the current generation of SpiNNaker machines and present results from a set of key benchmarks. The system allows researchers to exploit dedicated simulation hardware which may otherwise be difficult to program. In effect, PACMAN provides automated hardware acceleration for some commonly used network simulators while also pointing towards the advantages of hierarchical configuration for large, domain-specific hardware systems.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114774521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}