The biggest difficulty that students face when learning programming is in developing the necessary cognitive skills that allows them to apply what they have learnt. It is generally accepted that programming is one of those things that can only be learnt by doing and actively engaging with it. Parallel programming is a prime example of a programming area that students commonly struggle with. A major inhibitor is due to some of its abstract concepts, making it difficult to grasp a true understanding of the underlying principles in a traditional classroom setting. This paper discusses the underlying principles that motivated the development of Active Classroom Programmer (ACP), a tool for students to learn effective programming strategies with the guidance of their instructor. ACP aims to increase students skills in applying programming topics, by immediately engaging them with the newly introduced material. This is especially important in parallel programming, as the topics quickly progress onto the many parallelisation caveats (such as thread-safety, race conditions, and so on). While laboratory or homework exercises provide students with valuable hands-on experience (to apply newly taught concepts), this opportunity generally arrives too late after the material is presented in the lesson. To address this, a collection of parallel programming exercises are being developed for the NSF/IEEE-TCPP Curriculum Initiative on Parallel and Distributed Computing (as an Early Adopter award), with the help of ACP. Instructors are welcome to utilise any of the developed exercises, or even request a private ACP account for their own courses to program with their students.
{"title":"The Active classroom: Students and Instructors Parallel Programming in Parallel","authors":"Nasser Giacaman, Simar Kalra, O. Sinnen","doi":"10.1109/IPDPSW.2015.24","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.24","url":null,"abstract":"The biggest difficulty that students face when learning programming is in developing the necessary cognitive skills that allows them to apply what they have learnt. It is generally accepted that programming is one of those things that can only be learnt by doing and actively engaging with it. Parallel programming is a prime example of a programming area that students commonly struggle with. A major inhibitor is due to some of its abstract concepts, making it difficult to grasp a true understanding of the underlying principles in a traditional classroom setting. This paper discusses the underlying principles that motivated the development of Active Classroom Programmer (ACP), a tool for students to learn effective programming strategies with the guidance of their instructor. ACP aims to increase students skills in applying programming topics, by immediately engaging them with the newly introduced material. This is especially important in parallel programming, as the topics quickly progress onto the many parallelisation caveats (such as thread-safety, race conditions, and so on). While laboratory or homework exercises provide students with valuable hands-on experience (to apply newly taught concepts), this opportunity generally arrives too late after the material is presented in the lesson. To address this, a collection of parallel programming exercises are being developed for the NSF/IEEE-TCPP Curriculum Initiative on Parallel and Distributed Computing (as an Early Adopter award), with the help of ACP. Instructors are welcome to utilise any of the developed exercises, or even request a private ACP account for their own courses to program with their students.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114831031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Computing a matching in a graph is one of "the hardest simple problems" in discrete mathematics and computer science. It is simple since most variants of matching can be solved in polynomial time, yet hard because the running times are high and the algorithms are complex. It is even more challenging to design parallel algorithms for matching, since many algorithms rely on searching for long paths in a graph, or implicitly communicate information along long paths, and thus have little concurrency. However, in the last fifteen years there has been much work in developing parallel matching algorithms via approximation: we do not find optimal matchings, but look for matchings that are guaranteed to be within a constant factor of being optimal. There has been a flurry of activity in designing and implementing such algorithms, and now we have efficient algorithms for computing matchings on multicore shared memory computers. This talk will survey this body of work in matching algorithms.
{"title":"PCO Keynote","authors":"A. Pothen","doi":"10.1109/IPDPSW.2015.178","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.178","url":null,"abstract":"Computing a matching in a graph is one of \"the hardest simple problems\" in discrete mathematics and computer science. It is simple since most variants of matching can be solved in polynomial time, yet hard because the running times are high and the algorithms are complex. It is even more challenging to design parallel algorithms for matching, since many algorithms rely on searching for long paths in a graph, or implicitly communicate information along long paths, and thus have little concurrency. However, in the last fifteen years there has been much work in developing parallel matching algorithms via approximation: we do not find optimal matchings, but look for matchings that are guaranteed to be within a constant factor of being optimal. There has been a flurry of activity in designing and implementing such algorithms, and now we have efficient algorithms for computing matchings on multicore shared memory computers. This talk will survey this body of work in matching algorithms.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126352892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When the size of parallel systems increases, centralized algorithms to schedule tasks on the system can induce a significant overhead. This is why decentralized scheduling algorithms have been developed. The most popular one certainly is work-stealing because of its interesting theoretical guarantees. Parallel systems have evolved from homogeneous clusters to fully heterogeneous ones such as GPU-accelerated clusters. We investigate in this paper decentralized scheduling algorithms for heterogeneous systems. The guarantees of work-stealing algorithms no longer hold on such systems because it is an a posteriori algorithm which highly depends on the initial distribution of work. We focus on a priori decentralized scheduling algorithms for heterogeneous systems and we propose two distributed algorithms to balance the load on unrelated machines for two particular cases. The first one exploits a low heterogeneity in the task set and reaches an approximation ratio linear in the number of types of tasks. The second one focuses on the case where the system only uses two different types of machines and we show it is a 2-approximation if the system converges. In the case it does not converge, we study the dynamic equilibrium of the system. In the homogeneous case, we numerically compute the probability density function of the load imbalance and show that the imbalance is low on average. And we show using simulation that the heterogeneous case is similar to the homogeneous case and that the imbalance is low in both cases.
{"title":"Considerations on Distributed Load Balancing for Fully Heterogeneous Machines: Two Particular Cases","authors":"Nathanaël Cheriere, Erik Saule","doi":"10.1109/IPDPSW.2015.36","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.36","url":null,"abstract":"When the size of parallel systems increases, centralized algorithms to schedule tasks on the system can induce a significant overhead. This is why decentralized scheduling algorithms have been developed. The most popular one certainly is work-stealing because of its interesting theoretical guarantees. Parallel systems have evolved from homogeneous clusters to fully heterogeneous ones such as GPU-accelerated clusters. We investigate in this paper decentralized scheduling algorithms for heterogeneous systems. The guarantees of work-stealing algorithms no longer hold on such systems because it is an a posteriori algorithm which highly depends on the initial distribution of work. We focus on a priori decentralized scheduling algorithms for heterogeneous systems and we propose two distributed algorithms to balance the load on unrelated machines for two particular cases. The first one exploits a low heterogeneity in the task set and reaches an approximation ratio linear in the number of types of tasks. The second one focuses on the case where the system only uses two different types of machines and we show it is a 2-approximation if the system converges. In the case it does not converge, we study the dynamic equilibrium of the system. In the homogeneous case, we numerically compute the probability density function of the load imbalance and show that the imbalance is low on average. And we show using simulation that the heterogeneous case is similar to the homogeneous case and that the imbalance is low in both cases.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133901331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In distributed storage systems like parallel file systems or storage virtualization middleware, data replication is the mainly used solution to implement data avaialability. The more replicas are distributed among nodes, the more robust is the storage system. However, the price to pay for this dependability becomes significant, due to both direct costs (the price of disks) and indirect costs (the energy consumption of this large amount of disks needed). In order to lower the disk space needed for a given availalbility, Erasure Resilient Codes (referred to as ERC after this) are of interest and start to be implemented in this context. However, the use of such codes involves some new problems in data management. In fact, if some constraints like data concurrency can be solved using classical ways, others like coherency protocols need some adaptations in order to fit this context. In this paper, we present an adaptation of trapezoid protocol in the context of ERC schemes (instead of full replication). This new quorum protocol shows an increase of storage space efficiency while maintaining a high level of availability for read and writes operations.
{"title":"Trapezoid Quorum Protocol Dedicated to Erasure Resilient Coding Based Schemes","authors":"T. J. R. Relaza, J. Jorda, A. Mzoughi","doi":"10.1109/IPDPSW.2015.108","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.108","url":null,"abstract":"In distributed storage systems like parallel file systems or storage virtualization middleware, data replication is the mainly used solution to implement data avaialability. The more replicas are distributed among nodes, the more robust is the storage system. However, the price to pay for this dependability becomes significant, due to both direct costs (the price of disks) and indirect costs (the energy consumption of this large amount of disks needed). In order to lower the disk space needed for a given availalbility, Erasure Resilient Codes (referred to as ERC after this) are of interest and start to be implemented in this context. However, the use of such codes involves some new problems in data management. In fact, if some constraints like data concurrency can be solved using classical ways, others like coherency protocols need some adaptations in order to fit this context. In this paper, we present an adaptation of trapezoid protocol in the context of ERC schemes (instead of full replication). This new quorum protocol shows an increase of storage space efficiency while maintaining a high level of availability for read and writes operations.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133499258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We provide an abstraction for expressing graph algorithms in which the vertices and edges of the graph provide locality and communication structure and graph data are represented by property maps that associate vertices and edges to arbitrary user-defined data. Operations on the graph are expressed as patterns, which allow limited traversal of the graph and modification of property maps for the traversed fragments of the graph. Traversal is implicit, and is automatically computed from the pattern's access of property map values. Patterns are declarative, but they can be used in imperative algorithms by using strategies that run in epochs. Strategies are user defined programs that apply patterns in a certain way (e.g., We provide fixed point, once, and ?-stepping strategies), including chaining patterns in an arbitrary way. Patterns are applied in epochs, which provide synchronization across a distributed system, guaranteeing that all patterns have been applied by the end of an epoch.
{"title":"Declarative Patterns for Imperative Distributed Graph Algorithms","authors":"Marcin Zalewski, N. Edmonds, A. Lumsdaine","doi":"10.1109/IPDPSW.2015.78","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.78","url":null,"abstract":"We provide an abstraction for expressing graph algorithms in which the vertices and edges of the graph provide locality and communication structure and graph data are represented by property maps that associate vertices and edges to arbitrary user-defined data. Operations on the graph are expressed as patterns, which allow limited traversal of the graph and modification of property maps for the traversed fragments of the graph. Traversal is implicit, and is automatically computed from the pattern's access of property map values. Patterns are declarative, but they can be used in imperative algorithms by using strategies that run in epochs. Strategies are user defined programs that apply patterns in a certain way (e.g., We provide fixed point, once, and ?-stepping strategies), including chaining patterns in an arbitrary way. Patterns are applied in epochs, which provide synchronization across a distributed system, guaranteeing that all patterns have been applied by the end of an epoch.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124009199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The last decades have been characterized by an ever growing requirement in terms of computing and storage resources. This tendency has recently put the pressure on the ability to efficiently manage the power required to operate the huge amount of electrical components associated with state-of-the-art high performance computing systems. The power consumption of a supercomputer needs to be adjusted based on varying power budget or electricity availabilities. As a consequence, Resource and Job Management Systems have to be adequately adapted in order to efficiently schedule jobs with optimized performance while limiting power usage whenever needed. We introduce in this paper a new scheduling strategy that can adapt the executed workload to a limited power budget. The originality of this approach relies upon a combination of speed scaling and node shutdown techniques for power reductions. It is implemented into the widely used resource and job management system SLURM. Finally, it is validated through large scale emulations using real production workload traces of the supercomputer Curie.
{"title":"Adaptive Resource and Job Management for Limited Power Consumption","authors":"Yiannis Georgiou, David Glesser, D. Trystram","doi":"10.1109/IPDPSW.2015.118","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.118","url":null,"abstract":"The last decades have been characterized by an ever growing requirement in terms of computing and storage resources. This tendency has recently put the pressure on the ability to efficiently manage the power required to operate the huge amount of electrical components associated with state-of-the-art high performance computing systems. The power consumption of a supercomputer needs to be adjusted based on varying power budget or electricity availabilities. As a consequence, Resource and Job Management Systems have to be adequately adapted in order to efficiently schedule jobs with optimized performance while limiting power usage whenever needed. We introduce in this paper a new scheduling strategy that can adapt the executed workload to a limited power budget. The originality of this approach relies upon a combination of speed scaling and node shutdown techniques for power reductions. It is implemented into the widely used resource and job management system SLURM. Finally, it is validated through large scale emulations using real production workload traces of the supercomputer Curie.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124011930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Benedict, R. Rejitha, P. Gschwandtner, R. Prodan, T. Fahringer
OpenMP, with its extended parallelism features and support for radically changing HPC architectures, spurred to a surge in developing parallel applications among the HPC application developers community, leading to severe energy consumption issues. Consequently, a notion of addressing the energy consumption issue of HPC applications in an automated fashion increased among compiler developers although the underlying optimization search space could increase tremendously. This paper proposes a Random Forest Modeling (RFM) approach for predicting the energy consumption of OpenMP applications in compilers. The approach was tested using OpenMP applications, such as, NAS benchmarks, matrix multiplication, n-body simulations, and stencil applications while tuning the applications based on energy, problem size, and other performance concerns. The proposed RFM approach predicted the energy consumption of code variants with less than 0.699 Mean Square Error (MSE) and 0.998 R2 value when the testing dataset had energy variations between 0.024 joules and 150.23 joules. In addition, the influences of energy variations, number of independent variables used, and the proportion of testing dataset used during the RFM modeling process are discussed.
{"title":"Energy Prediction of OpenMP Applications Using Random Forest Modeling Approach","authors":"S. Benedict, R. Rejitha, P. Gschwandtner, R. Prodan, T. Fahringer","doi":"10.1109/IPDPSW.2015.12","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.12","url":null,"abstract":"OpenMP, with its extended parallelism features and support for radically changing HPC architectures, spurred to a surge in developing parallel applications among the HPC application developers community, leading to severe energy consumption issues. Consequently, a notion of addressing the energy consumption issue of HPC applications in an automated fashion increased among compiler developers although the underlying optimization search space could increase tremendously. This paper proposes a Random Forest Modeling (RFM) approach for predicting the energy consumption of OpenMP applications in compilers. The approach was tested using OpenMP applications, such as, NAS benchmarks, matrix multiplication, n-body simulations, and stencil applications while tuning the applications based on energy, problem size, and other performance concerns. The proposed RFM approach predicted the energy consumption of code variants with less than 0.699 Mean Square Error (MSE) and 0.998 R2 value when the testing dataset had energy variations between 0.024 joules and 150.23 joules. In addition, the influences of energy variations, number of independent variables used, and the proportion of testing dataset used during the RFM modeling process are discussed.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129772216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tim Cramer, R. Dietrich, C. Terboven, Matthias S. Müller, W. Nagel
The requirement for large compute capabilities led to a wide use of accelerated high performance computing systems. In order to lower the burden for programming these new architectures, user friendly programming paradigms like OpenACC and OpenMP have come to existence. They offer pragmas to shift effort from the programmer to the compiler and runtime system, particularly for data management. However, for further improvement of the usability an adequate tools support is required as well. In our work we present in detail a general extension to the upcoming OpenMP tools interface (OMPT) with respect to the new OpenMP 4.0 target constructs. This extension aims to be a portable, vendor- and platform independent interface to enable the use of performance analysis tools with OpenMP for Accelerators. Finally, we evaluate the approach in a reference implementation to prove the validity and usability with the help of an instrumented OpenMP runtime and the Score-P measurement infrastructure.
{"title":"Performance Analysis for Target Devices with the OpenMP Tools Interface","authors":"Tim Cramer, R. Dietrich, C. Terboven, Matthias S. Müller, W. Nagel","doi":"10.1109/IPDPSW.2015.27","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.27","url":null,"abstract":"The requirement for large compute capabilities led to a wide use of accelerated high performance computing systems. In order to lower the burden for programming these new architectures, user friendly programming paradigms like OpenACC and OpenMP have come to existence. They offer pragmas to shift effort from the programmer to the compiler and runtime system, particularly for data management. However, for further improvement of the usability an adequate tools support is required as well. In our work we present in detail a general extension to the upcoming OpenMP tools interface (OMPT) with respect to the new OpenMP 4.0 target constructs. This extension aims to be a portable, vendor- and platform independent interface to enable the use of performance analysis tools with OpenMP for Accelerators. Finally, we evaluate the approach in a reference implementation to prove the validity and usability with the help of an instrumented OpenMP runtime and the Score-P measurement infrastructure.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128346038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large-Scale systems (LSSs) continue to attract more attention from the scientific community for addressing high-performance computing. Providing fault tolerance in distributed systems is a challenge. This challenge doubtlessly becomes more difficult in LSSs. Distributed snapshots are an important building block for distributed systems, and, among other applications, are useful for providing fault tolerance. This paper motivates the need for providing fault tolerance in LSSs and focuses on the limitations behind this provision. It then presents an innovative and scalable distributed snapshots approach for LSSs. In this approach, upon a new snapshot, a process coordinates only with the processes that it has communicated with since the last snapshot. Our protocol improves the Chandy and Lamport distributed snapshot protocol which was presented in 1985. This improvement may enable developers and planners of systems to consider this protocol. We compare the performance of our new approach to the performance of other existing well-known distributed snapshot approaches using stochastic models. The results show that our approach achieves lower overhead with significant improvement.
{"title":"Communication Pattern-Based Distributed Snapshots in Large-Scale Systems","authors":"Salem Saker, A. Agbaria","doi":"10.1109/IPDPSW.2015.117","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.117","url":null,"abstract":"Large-Scale systems (LSSs) continue to attract more attention from the scientific community for addressing high-performance computing. Providing fault tolerance in distributed systems is a challenge. This challenge doubtlessly becomes more difficult in LSSs. Distributed snapshots are an important building block for distributed systems, and, among other applications, are useful for providing fault tolerance. This paper motivates the need for providing fault tolerance in LSSs and focuses on the limitations behind this provision. It then presents an innovative and scalable distributed snapshots approach for LSSs. In this approach, upon a new snapshot, a process coordinates only with the processes that it has communicated with since the last snapshot. Our protocol improves the Chandy and Lamport distributed snapshot protocol which was presented in 1985. This improvement may enable developers and planners of systems to consider this protocol. We compare the performance of our new approach to the performance of other existing well-known distributed snapshot approaches using stochastic models. The results show that our approach achieves lower overhead with significant improvement.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128405719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Today, ARM is becoming the mainstream family of processors in the high-performance embedded systems domain. In this context, adding a run-time reconfigurable FPGA device to the ARM processor into a single chip makes it possible to combine high performance and flexibility. In this paper, we propose a low-complexity design of system virtualization running on the Zynq platform. Virtualization of software and hardware resources are managed by a custom microkernel. The dedicated features to efficiently manage the dynamic partial reconfiguration (DPR) technology are described in details. The performance of the DPR management is evaluated and presented at the end of this paper.
{"title":"Mini-NOVA: A Lightweight ARM-based Virtualization Microkernel Supporting Dynamic Partial Reconfiguration","authors":"Tian Xia, Jean-Christophe Prévotet, F. Nouvel","doi":"10.1109/IPDPSW.2015.72","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.72","url":null,"abstract":"Today, ARM is becoming the mainstream family of processors in the high-performance embedded systems domain. In this context, adding a run-time reconfigurable FPGA device to the ARM processor into a single chip makes it possible to combine high performance and flexibility. In this paper, we propose a low-complexity design of system virtualization running on the Zynq platform. Virtualization of software and hardware resources are managed by a custom microkernel. The dedicated features to efficiently manage the dynamic partial reconfiguration (DPR) technology are described in details. The performance of the DPR management is evaluated and presented at the end of this paper.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128197270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}