Big Data allied to the Internet of Things nowadays provides a powerful resource that various organizations are increasingly exploiting for applications ranging from decision support, predictive and prescriptive analytics, to knowledge extraction and intelligence discovery. In analytics and data mining processes, it is usually desirable to have as much data as possible, though it is often more important that the data is of high quality thereby two of the most important problems are raised when handling large datasets: sampling and feature selection. This paper addresses the sampling problem and presents a heuristic method to find the "critical sampling" of big datasets. The concept of the critical sampling size of a dataset D is that there is a minimum number of samples of D that is required for a given data analytic task to achieve satisfactory performance. The problem is very important in data mining, as the size of data sets directly relates to the cost of executing the data mining task. Since the problem of determining the critical sampling size is intractable, in this paper we study heuristic methods to find the critical sampling. Several datasets were used to conduct experiments using three versions of the heuristic sampling method for evaluation. Preliminary results obtained have shown the existence of an apparent critical sampling size for all the datasets being tested, which is generally much smaller than the size of the whole dataset. Further, the proposed heuristic method provides a practical solution to find a useful critical sampling for data mining tasks.
{"title":"Finding the Critical Sampling of Big Datasets","authors":"José Silva, B. Ribeiro, A. Sung","doi":"10.1145/3075564.3078886","DOIUrl":"https://doi.org/10.1145/3075564.3078886","url":null,"abstract":"Big Data allied to the Internet of Things nowadays provides a powerful resource that various organizations are increasingly exploiting for applications ranging from decision support, predictive and prescriptive analytics, to knowledge extraction and intelligence discovery. In analytics and data mining processes, it is usually desirable to have as much data as possible, though it is often more important that the data is of high quality thereby two of the most important problems are raised when handling large datasets: sampling and feature selection. This paper addresses the sampling problem and presents a heuristic method to find the \"critical sampling\" of big datasets. The concept of the critical sampling size of a dataset D is that there is a minimum number of samples of D that is required for a given data analytic task to achieve satisfactory performance. The problem is very important in data mining, as the size of data sets directly relates to the cost of executing the data mining task. Since the problem of determining the critical sampling size is intractable, in this paper we study heuristic methods to find the critical sampling. Several datasets were used to conduct experiments using three versions of the heuristic sampling method for evaluation. Preliminary results obtained have shown the existence of an apparent critical sampling size for all the datasets being tested, which is generally much smaller than the size of the whole dataset. Further, the proposed heuristic method provides a practical solution to find a useful critical sampling for data mining tasks.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121871016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work applies side channel analysis on hardware implementations of two CAESAR candidates, Keyak and Ascon. Both algorithms are cryptographic sponges with an iterated permutation. The algorithms share an s-box so attacks on the non-linear step of the permutation are similar. This work presents the first results of a DPA attack on Keyak using traces generated by an FPGA. A new attack is crafted for a larger sensitive variable to reduce the number of traces. It also presents and applies the first CPA attack on Ascon. Using a toy-sized threshold implementation of Ascon we try to give insight in the order of the steps of a permutation.
{"title":"DPA on hardware implementations of Ascon and Keyak","authors":"Niels Samwel, J. Daemen","doi":"10.1145/3075564.3079067","DOIUrl":"https://doi.org/10.1145/3075564.3079067","url":null,"abstract":"This work applies side channel analysis on hardware implementations of two CAESAR candidates, Keyak and Ascon. Both algorithms are cryptographic sponges with an iterated permutation. The algorithms share an s-box so attacks on the non-linear step of the permutation are similar. This work presents the first results of a DPA attack on Keyak using traces generated by an FPGA. A new attack is crafted for a larger sensitive variable to reduce the number of traces. It also presents and applies the first CPA attack on Ascon. Using a toy-sized threshold implementation of Ascon we try to give insight in the order of the steps of a permutation.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126289652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ao Mo-Hellenbrand, Isaías A. Comprés Ureña, O. Meister, H. Bungartz, M. Gerndt, M. Bader
Realization of resource awareness and elasticity in hardware and software is an answer to many problems and challenges we are facing in High Performance Computing (HPC) today. Resource utilization inefficiency is a real problem in current HPC systems due to the current static, inflexible resource assignment configuration. One way to resolve this problem is to change the static resource assignment setting---by introducing runtime resource elasticity, which requires both malleability in software implementation and support for runtime resource adaptation in the system infrastructure. In this paper, we show a successful implementation of a malleable tsunami simulation realized on an elastic MPI infrastructure we previously proposed. We also prove that introducing malleability to such a tightly coupled parallel application can be beneficial.
{"title":"A Large-Scale Malleable Tsunami Simulation Realized on an Elastic MPI Infrastructure","authors":"Ao Mo-Hellenbrand, Isaías A. Comprés Ureña, O. Meister, H. Bungartz, M. Gerndt, M. Bader","doi":"10.1145/3075564.3075585","DOIUrl":"https://doi.org/10.1145/3075564.3075585","url":null,"abstract":"Realization of resource awareness and elasticity in hardware and software is an answer to many problems and challenges we are facing in High Performance Computing (HPC) today. Resource utilization inefficiency is a real problem in current HPC systems due to the current static, inflexible resource assignment configuration. One way to resolve this problem is to change the static resource assignment setting---by introducing runtime resource elasticity, which requires both malleability in software implementation and support for runtime resource adaptation in the system infrastructure. In this paper, we show a successful implementation of a malleable tsunami simulation realized on an elastic MPI infrastructure we previously proposed. We also prove that introducing malleability to such a tightly coupled parallel application can be beneficial.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130322005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Increasing performance of computing systems necessitates providing solutions for improving scalability and productivity. In recent times, data-driven Program eXecution Models (PXMs) are gaining popularity due to their superior support compared to traditional von Neumann execution models. However, exposing the benefits of such PXMs within a high-level programming language remains a challenge. Although many high-level programming languages and APIs support concurrency and multi-threading (e.g., C++11, Java, OpenMP, MPI, etc.), their synchronisation models make large use of mutex and locks, generally leading to poor system performance. Conversely, one major appeal of Go programming language is the way it supports concurrency: goroutines (tagged functions) are mapped on OS threads and communicate each other through data structures buffering input data (channels). By forcing goroutines to exchange data only through channels, it is possible to enable a data-driven execution. This paper proposes a first attempt to map goroutines on a data-driven based PXM. Go compilation procedure and the run-time library are modified to exploit the execution of fine-grain threads on an abstracted parallel machine model.
{"title":"Let's Go: a Data-Driven Multi-Threading Support","authors":"A. Scionti, Somnath Mazumdar","doi":"10.1145/3075564.3075596","DOIUrl":"https://doi.org/10.1145/3075564.3075596","url":null,"abstract":"Increasing performance of computing systems necessitates providing solutions for improving scalability and productivity. In recent times, data-driven Program eXecution Models (PXMs) are gaining popularity due to their superior support compared to traditional von Neumann execution models. However, exposing the benefits of such PXMs within a high-level programming language remains a challenge. Although many high-level programming languages and APIs support concurrency and multi-threading (e.g., C++11, Java, OpenMP, MPI, etc.), their synchronisation models make large use of mutex and locks, generally leading to poor system performance. Conversely, one major appeal of Go programming language is the way it supports concurrency: goroutines (tagged functions) are mapped on OS threads and communicate each other through data structures buffering input data (channels). By forcing goroutines to exchange data only through channels, it is possible to enable a data-driven execution. This paper proposes a first attempt to map goroutines on a data-driven based PXM. Go compilation procedure and the run-time library are modified to exploit the execution of fine-grain threads on an abstracted parallel machine model.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132374452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joshua Landwehr, Joshua D. Suetterlein, J. Manzano, A. Márquez, K. Barker, G. Gao
One promising effort as we progress toward exascale is the development of fine grain execution models. These models display an innate agility providing new avenues to address the challenges presented by futures systems such as extreme parallelism, restrictive power constraints, and fault tolerance. These opportunities however, may be prematurely abandoned if the system software, particularly a distributed runtime, is incapable of scaling. One potentially limiting factor is the enforcement of the memory model in a runtime. In a shared memory environment, weaker memory models are preferred since they promote parallelism and optimizations. This is not necessarily the case for distributed systems as a weaker model may lead to increased coherency operations and memory usage based on the application's communication patterns and memory requirements. Moreover, unlike shared memory models which rely on hardware to lessen the costs of coherence, distributed memory models are forced to rely on expensive runtime calls and network operations. This paper presents the design and implementation of a distributed memory coherency model in a high performance implementation of the Open Community Runtime as an exemplar fine grain execution model. We compare the performance and number of coherence operations of an instance of the OCR standard with our novel model, called Cache DAG consistency (CDAG). Leveraging CDAG consistency, we demonstrate up to a 3.7X reduction in messages and 11x increase in performance for select benchmarks running at scale.
{"title":"Designing Scalable Distributed Memory Models: A Case Study","authors":"Joshua Landwehr, Joshua D. Suetterlein, J. Manzano, A. Márquez, K. Barker, G. Gao","doi":"10.1145/3075564.3077425","DOIUrl":"https://doi.org/10.1145/3075564.3077425","url":null,"abstract":"One promising effort as we progress toward exascale is the development of fine grain execution models. These models display an innate agility providing new avenues to address the challenges presented by futures systems such as extreme parallelism, restrictive power constraints, and fault tolerance. These opportunities however, may be prematurely abandoned if the system software, particularly a distributed runtime, is incapable of scaling. One potentially limiting factor is the enforcement of the memory model in a runtime. In a shared memory environment, weaker memory models are preferred since they promote parallelism and optimizations. This is not necessarily the case for distributed systems as a weaker model may lead to increased coherency operations and memory usage based on the application's communication patterns and memory requirements. Moreover, unlike shared memory models which rely on hardware to lessen the costs of coherence, distributed memory models are forced to rely on expensive runtime calls and network operations. This paper presents the design and implementation of a distributed memory coherency model in a high performance implementation of the Open Community Runtime as an exemplar fine grain execution model. We compare the performance and number of coherence operations of an instance of the OCR standard with our novel model, called Cache DAG consistency (CDAG). Leveraging CDAG consistency, we demonstrate up to a 3.7X reduction in messages and 11x increase in performance for select benchmarks running at scale.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130836171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Akula, P. Calyam, Ronny Bazan Antequera, Raymond E. Leto
Advanced Manufacturing Apps that perform complex modeling and simulation are now becoming available in Marketplaces. App stakeholders face two fundamental challenges in cloud engineering of the App Marketplaces: (i) orchestration of service chaining through an App Runtime, and (ii) finding a suitable cost model to evaluate App pricing strategies. In this paper, we address these challenges by proposing a new cloud architecture that aims at supporting an 'App Marketplace' that thrives on agile development, organic collaboration and scalable sales of next generation Manufacturing Apps requiring high-performance simulation and modeling. We describe how we are realizing the vision of this architecture through an App Runtime we have developed that leverages a Resource Brokering Service to create App chaining mechanisms. We also detail a new cost model that could be part of an Accounting Service in our proposed architecture to address the issues of cost accounting and pricing of the Apps faced by the App developers while using cloud infrastructures for hosting their Apps. Lastly, we describe experiments with a real-world implementation of our App Runtime and cost model in a WheelSim testbed that uses NSF GENI Cloud and Ohio Supercomputer Center resources. Our results show benefits to an App developer in terms of: satisfactory user experience, lower design time and lower cost/simulation.
{"title":"Advanced Manufacturing Collaboration in a Cloud-based App Marketplace","authors":"A. Akula, P. Calyam, Ronny Bazan Antequera, Raymond E. Leto","doi":"10.1145/3075564.3077547","DOIUrl":"https://doi.org/10.1145/3075564.3077547","url":null,"abstract":"Advanced Manufacturing Apps that perform complex modeling and simulation are now becoming available in Marketplaces. App stakeholders face two fundamental challenges in cloud engineering of the App Marketplaces: (i) orchestration of service chaining through an App Runtime, and (ii) finding a suitable cost model to evaluate App pricing strategies. In this paper, we address these challenges by proposing a new cloud architecture that aims at supporting an 'App Marketplace' that thrives on agile development, organic collaboration and scalable sales of next generation Manufacturing Apps requiring high-performance simulation and modeling. We describe how we are realizing the vision of this architecture through an App Runtime we have developed that leverages a Resource Brokering Service to create App chaining mechanisms. We also detail a new cost model that could be part of an Accounting Service in our proposed architecture to address the issues of cost accounting and pricing of the Apps faced by the App developers while using cloud infrastructures for hosting their Apps. Lastly, we describe experiments with a real-world implementation of our App Runtime and cost model in a WheelSim testbed that uses NSF GENI Cloud and Ohio Supercomputer Center resources. Our results show benefits to an App developer in terms of: satisfactory user experience, lower design time and lower cost/simulation.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131948853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Optimizing the memory access behavior is an important challenge to improve the performance and energy consumption of parallel applications on shared memory architectures. Modern systems contain complex memory hierarchies with multiple memory controllers and several levels of caches. In such machines, analyzing the affinity between threads and data to map them to the hardware hierarchy reduces the cost of memory accesses. In this paper, we introduce a hybrid technique to optimize the memory access behavior of parallel applications. It is based on a compiler optimization that inserts code to predict, at runtime, the memory access behavior of the application and an OS mechanism that uses this information to optimize the mapping of threads and data. In contrast to previous work, our proposal uses a proactive technique to improve the future memory access behavior using predictions instead of the past behavior. Our mechanism achieves substantial performance gains for a variety of parallel applications.
{"title":"Optimizing memory affinity with a hybrid compiler/OS approach","authors":"M. Diener, E. Cruz, M. Alves, E. Borin, P. Navaux","doi":"10.1145/3075564.3075566","DOIUrl":"https://doi.org/10.1145/3075564.3075566","url":null,"abstract":"Optimizing the memory access behavior is an important challenge to improve the performance and energy consumption of parallel applications on shared memory architectures. Modern systems contain complex memory hierarchies with multiple memory controllers and several levels of caches. In such machines, analyzing the affinity between threads and data to map them to the hardware hierarchy reduces the cost of memory accesses. In this paper, we introduce a hybrid technique to optimize the memory access behavior of parallel applications. It is based on a compiler optimization that inserts code to predict, at runtime, the memory access behavior of the application and an OS mechanism that uses this information to optimize the mapping of threads and data. In contrast to previous work, our proposal uses a proactive technique to improve the future memory access behavior using predictions instead of the past behavior. Our mechanism achieves substantial performance gains for a variety of parallel applications.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116967896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Aprile, J. Wüthrich, Luca Baldassarre, Y. Leblebici, V. Cevher
This work presents an area and power efficient encoding system for wireless implantable devices capable of monitoring the electrical activity of the brain. Such devices are becoming an important tool for understanding, real-time monitoring, and potentially treating mental diseases such as epilepsy and depression. Recent advances on compressive sensing (CS) have shown a huge potential for sub-Nyquist sampling of neuronal signals. However, its implementation is still facing critical issues in delivering sufficient performance and in hardware complexity. In this work, we explore the tradeoffs between area and power requirements applying a novel DCT Learning-Based Compressive Subsampling approach on a human iEEG dataset. The proposed method achieves compression rates up to 64x, increasing the reconstruction performance and reducing the wireless transmission costs with respect to recent state-of-art. This new fully digital architecture handles the data compression of each individual neural acquisition channel with an area of 490 x 650/μm in 0.18 μm CMOS technology, and a power dissipation of only 2μW.
这项工作提出了一种面积和功率有效的编码系统,用于无线植入式设备,能够监测大脑的电活动。这种设备正在成为理解、实时监测和潜在治疗癫痫和抑郁症等精神疾病的重要工具。压缩感知(CS)的最新进展显示了神经元信号亚奈奎斯特采样的巨大潜力。然而,它的实现在提供足够的性能和硬件复杂性方面仍然面临着关键问题。在这项工作中,我们在人类iEEG数据集上应用一种新的基于DCT学习的压缩子采样方法来探索面积和功率需求之间的权衡。该方法实现了高达64倍的压缩率,提高了重建性能,并降低了无线传输成本。这种全新的全数字架构处理每个单独的神经采集通道的数据压缩,其面积为490 x 650/μm,采用0.18 μm CMOS技术,功耗仅为2μW。
{"title":"DCT Learning-Based Hardware Design for Neural Signal Acquisition Systems","authors":"C. Aprile, J. Wüthrich, Luca Baldassarre, Y. Leblebici, V. Cevher","doi":"10.1145/3075564.3078890","DOIUrl":"https://doi.org/10.1145/3075564.3078890","url":null,"abstract":"This work presents an area and power efficient encoding system for wireless implantable devices capable of monitoring the electrical activity of the brain. Such devices are becoming an important tool for understanding, real-time monitoring, and potentially treating mental diseases such as epilepsy and depression. Recent advances on compressive sensing (CS) have shown a huge potential for sub-Nyquist sampling of neuronal signals. However, its implementation is still facing critical issues in delivering sufficient performance and in hardware complexity. In this work, we explore the tradeoffs between area and power requirements applying a novel DCT Learning-Based Compressive Subsampling approach on a human iEEG dataset. The proposed method achieves compression rates up to 64x, increasing the reconstruction performance and reducing the wireless transmission costs with respect to recent state-of-art. This new fully digital architecture handles the data compression of each individual neural acquisition channel with an area of 490 x 650/μm in 0.18 μm CMOS technology, and a power dissipation of only 2μW.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115589762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In 2007, under the auspices of the Industry/University Cooperative Research Centers (I/URC) program of the National Science Foundation, we established the Center for High-performance Reconfigurable Computing (CHREC) to facilitate scientific and engineering research in architectures, algorithms, software, services, applications, and performance optimization and evaluation for the advancement of multi-paradigm reconfigurable computing --- "reconfigurable" in both hardware or software. Each of the university sites in CHREC --- University of Pittsburgh, University of Florida, Brigham Young University, and Virginia Tech --- contributes unique expertise and capabilities for research in this critical field. Reflecting upon our ten-year odyssey with CHREC, we achieved the following successes in collaborative partnership with our CHREC members from industry and other government agencies: (1) established the nations first multidisciplinary research center in reconfigurable high-performance computing as a basis for long-term partnership and collaboration amongst industry, academe, and government; (2) directly supported the research needs of our center members in a cost-effective manner with pooled and leveraged resources and maximized synergy; (3) enhanced the educational experience for a diverse set of top-quality graduate and undergraduate students; and (4) advanced the knowledge and technologies in this field and ensured commercial relevance of the research with rapid and effective technology transfer.
{"title":"Center for High-Performance Reconfigurable Computing (CHREC): A Ten-Year Odyssey","authors":"W. Feng, A. George, H. Lamm, M. Wirthlin","doi":"10.1145/3075564.3095082","DOIUrl":"https://doi.org/10.1145/3075564.3095082","url":null,"abstract":"In 2007, under the auspices of the Industry/University Cooperative Research Centers (I/URC) program of the National Science Foundation, we established the Center for High-performance Reconfigurable Computing (CHREC) to facilitate scientific and engineering research in architectures, algorithms, software, services, applications, and performance optimization and evaluation for the advancement of multi-paradigm reconfigurable computing --- \"reconfigurable\" in both hardware or software. Each of the university sites in CHREC --- University of Pittsburgh, University of Florida, Brigham Young University, and Virginia Tech --- contributes unique expertise and capabilities for research in this critical field. Reflecting upon our ten-year odyssey with CHREC, we achieved the following successes in collaborative partnership with our CHREC members from industry and other government agencies: (1) established the nations first multidisciplinary research center in reconfigurable high-performance computing as a basis for long-term partnership and collaboration amongst industry, academe, and government; (2) directly supported the research needs of our center members in a cost-effective manner with pooled and leveraged resources and maximized synergy; (3) enhanced the educational experience for a diverse set of top-quality graduate and undergraduate students; and (4) advanced the knowledge and technologies in this field and ensured commercial relevance of the research with rapid and effective technology transfer.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127787555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Inherent resilience of applications enables the design paradigm of approximate computing that exploits computation in-exactness by trading off output quality for runtime system resources. When executing such quality-scalable applications on multiprocessor embedded systems, it is expected not only to achieve the highest possible output quality, but also to handle the critical thermal challenge spurred by vastly increased chip density. While the rising temperature causes significant quality distortion at runtime, existing thermal-management techniques, such as dynamic frequency scaling, rarely take into account the trade-off possibilities between output quality and thermal budget. In this paper, we explore the application-level quality-scaling features of resilient applications to achieve effective temperature control as well as quality maximization. We propose an efficient iterative pseudo quadratic programming heuristic to decide the optimal frequency and application execution cycles, in order to achieve quality optimization, under temperature, timing, and energy constraints. Our approaches are evaluated using realistic benchmarks with known platform thermal parameters. The proposed methods show a 98.5% quality improvement with temperature violation awareness.
{"title":"Quality Optimization of Resilient Applications under Temperature Constraints","authors":"Heng Yu, Y. Ha, Jing Wang","doi":"10.1145/3075564.3075577","DOIUrl":"https://doi.org/10.1145/3075564.3075577","url":null,"abstract":"Inherent resilience of applications enables the design paradigm of approximate computing that exploits computation in-exactness by trading off output quality for runtime system resources. When executing such quality-scalable applications on multiprocessor embedded systems, it is expected not only to achieve the highest possible output quality, but also to handle the critical thermal challenge spurred by vastly increased chip density. While the rising temperature causes significant quality distortion at runtime, existing thermal-management techniques, such as dynamic frequency scaling, rarely take into account the trade-off possibilities between output quality and thermal budget. In this paper, we explore the application-level quality-scaling features of resilient applications to achieve effective temperature control as well as quality maximization. We propose an efficient iterative pseudo quadratic programming heuristic to decide the optimal frequency and application execution cycles, in order to achieve quality optimization, under temperature, timing, and energy constraints. Our approaches are evaluated using realistic benchmarks with known platform thermal parameters. The proposed methods show a 98.5% quality improvement with temperature violation awareness.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125312724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}