首页 > 最新文献

Proceedings of the Computing Frontiers Conference最新文献

英文 中文
Finding the Critical Sampling of Big Datasets 寻找大数据集的关键采样
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3078886
José Silva, B. Ribeiro, A. Sung
Big Data allied to the Internet of Things nowadays provides a powerful resource that various organizations are increasingly exploiting for applications ranging from decision support, predictive and prescriptive analytics, to knowledge extraction and intelligence discovery. In analytics and data mining processes, it is usually desirable to have as much data as possible, though it is often more important that the data is of high quality thereby two of the most important problems are raised when handling large datasets: sampling and feature selection. This paper addresses the sampling problem and presents a heuristic method to find the "critical sampling" of big datasets. The concept of the critical sampling size of a dataset D is that there is a minimum number of samples of D that is required for a given data analytic task to achieve satisfactory performance. The problem is very important in data mining, as the size of data sets directly relates to the cost of executing the data mining task. Since the problem of determining the critical sampling size is intractable, in this paper we study heuristic methods to find the critical sampling. Several datasets were used to conduct experiments using three versions of the heuristic sampling method for evaluation. Preliminary results obtained have shown the existence of an apparent critical sampling size for all the datasets being tested, which is generally much smaller than the size of the whole dataset. Further, the proposed heuristic method provides a practical solution to find a useful critical sampling for data mining tasks.
如今,与物联网相结合的大数据提供了一种强大的资源,各种组织越来越多地利用它来进行决策支持、预测和规范分析、知识提取和智能发现等应用。在分析和数据挖掘过程中,通常希望拥有尽可能多的数据,尽管通常更重要的是数据质量高,因此在处理大型数据集时提出了两个最重要的问题:采样和特征选择。本文解决了抽样问题,提出了一种寻找大数据集“关键抽样”的启发式方法。数据集D的临界采样大小的概念是,给定的数据分析任务需要D的最小样本数量才能达到令人满意的性能。这个问题在数据挖掘中非常重要,因为数据集的大小直接关系到执行数据挖掘任务的成本。由于确定临界样本量的问题比较棘手,本文研究了启发式方法来确定临界样本量。使用几个数据集进行实验,使用三种版本的启发式抽样方法进行评估。获得的初步结果表明,所有被测试的数据集都存在明显的临界样本量,通常比整个数据集的样本量小得多。此外,提出的启发式方法为数据挖掘任务寻找有用的关键采样提供了一个实用的解决方案。
{"title":"Finding the Critical Sampling of Big Datasets","authors":"José Silva, B. Ribeiro, A. Sung","doi":"10.1145/3075564.3078886","DOIUrl":"https://doi.org/10.1145/3075564.3078886","url":null,"abstract":"Big Data allied to the Internet of Things nowadays provides a powerful resource that various organizations are increasingly exploiting for applications ranging from decision support, predictive and prescriptive analytics, to knowledge extraction and intelligence discovery. In analytics and data mining processes, it is usually desirable to have as much data as possible, though it is often more important that the data is of high quality thereby two of the most important problems are raised when handling large datasets: sampling and feature selection. This paper addresses the sampling problem and presents a heuristic method to find the \"critical sampling\" of big datasets. The concept of the critical sampling size of a dataset D is that there is a minimum number of samples of D that is required for a given data analytic task to achieve satisfactory performance. The problem is very important in data mining, as the size of data sets directly relates to the cost of executing the data mining task. Since the problem of determining the critical sampling size is intractable, in this paper we study heuristic methods to find the critical sampling. Several datasets were used to conduct experiments using three versions of the heuristic sampling method for evaluation. Preliminary results obtained have shown the existence of an apparent critical sampling size for all the datasets being tested, which is generally much smaller than the size of the whole dataset. Further, the proposed heuristic method provides a practical solution to find a useful critical sampling for data mining tasks.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121871016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
DPA on hardware implementations of Ascon and Keyak Ascon和Keyak硬件实现的DPA
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3079067
Niels Samwel, J. Daemen
This work applies side channel analysis on hardware implementations of two CAESAR candidates, Keyak and Ascon. Both algorithms are cryptographic sponges with an iterated permutation. The algorithms share an s-box so attacks on the non-linear step of the permutation are similar. This work presents the first results of a DPA attack on Keyak using traces generated by an FPGA. A new attack is crafted for a larger sensitive variable to reduce the number of traces. It also presents and applies the first CPA attack on Ascon. Using a toy-sized threshold implementation of Ascon we try to give insight in the order of the steps of a permutation.
本工作对两个CAESAR候选器件Keyak和Ascon的硬件实现进行了侧信道分析。这两种算法都是具有迭代排列的加密海绵。这些算法共享一个s盒,因此对排列的非线性步骤的攻击是相似的。这项工作介绍了使用FPGA生成的迹线对Keyak进行DPA攻击的第一个结果。一个新的攻击是为一个更大的敏感变量设计的,以减少痕迹的数量。提出并应用了Ascon的第一次CPA攻击。使用Ascon的一个玩具大小的阈值实现,我们试图给出排列步骤的顺序。
{"title":"DPA on hardware implementations of Ascon and Keyak","authors":"Niels Samwel, J. Daemen","doi":"10.1145/3075564.3079067","DOIUrl":"https://doi.org/10.1145/3075564.3079067","url":null,"abstract":"This work applies side channel analysis on hardware implementations of two CAESAR candidates, Keyak and Ascon. Both algorithms are cryptographic sponges with an iterated permutation. The algorithms share an s-box so attacks on the non-linear step of the permutation are similar. This work presents the first results of a DPA attack on Keyak using traces generated by an FPGA. A new attack is crafted for a larger sensitive variable to reduce the number of traces. It also presents and applies the first CPA attack on Ascon. Using a toy-sized threshold implementation of Ascon we try to give insight in the order of the steps of a permutation.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126289652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
A Large-Scale Malleable Tsunami Simulation Realized on an Elastic MPI Infrastructure 在弹性MPI基础上实现的大规模可塑海啸模拟
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3075585
Ao Mo-Hellenbrand, Isaías A. Comprés Ureña, O. Meister, H. Bungartz, M. Gerndt, M. Bader
Realization of resource awareness and elasticity in hardware and software is an answer to many problems and challenges we are facing in High Performance Computing (HPC) today. Resource utilization inefficiency is a real problem in current HPC systems due to the current static, inflexible resource assignment configuration. One way to resolve this problem is to change the static resource assignment setting---by introducing runtime resource elasticity, which requires both malleability in software implementation and support for runtime resource adaptation in the system infrastructure. In this paper, we show a successful implementation of a malleable tsunami simulation realized on an elastic MPI infrastructure we previously proposed. We also prove that introducing malleability to such a tightly coupled parallel application can be beneficial.
在硬件和软件中实现资源感知和弹性是解决高性能计算(HPC)面临的许多问题和挑战的答案。由于当前静态的、不灵活的资源分配配置,资源利用效率低下是当前高性能计算系统的一个现实问题。解决这个问题的一种方法是改变静态资源分配设置——通过引入运行时资源弹性,这需要软件实现中的延展性和系统基础结构中对运行时资源适应的支持。在本文中,我们展示了在我们之前提出的弹性MPI基础设施上实现的可延展海啸模拟的成功实现。我们还证明了在这种紧密耦合的并行应用程序中引入延展性是有益的。
{"title":"A Large-Scale Malleable Tsunami Simulation Realized on an Elastic MPI Infrastructure","authors":"Ao Mo-Hellenbrand, Isaías A. Comprés Ureña, O. Meister, H. Bungartz, M. Gerndt, M. Bader","doi":"10.1145/3075564.3075585","DOIUrl":"https://doi.org/10.1145/3075564.3075585","url":null,"abstract":"Realization of resource awareness and elasticity in hardware and software is an answer to many problems and challenges we are facing in High Performance Computing (HPC) today. Resource utilization inefficiency is a real problem in current HPC systems due to the current static, inflexible resource assignment configuration. One way to resolve this problem is to change the static resource assignment setting---by introducing runtime resource elasticity, which requires both malleability in software implementation and support for runtime resource adaptation in the system infrastructure. In this paper, we show a successful implementation of a malleable tsunami simulation realized on an elastic MPI infrastructure we previously proposed. We also prove that introducing malleability to such a tightly coupled parallel application can be beneficial.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130322005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Let's Go: a Data-Driven Multi-Threading Support Let's Go:一个数据驱动的多线程支持
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3075596
A. Scionti, Somnath Mazumdar
Increasing performance of computing systems necessitates providing solutions for improving scalability and productivity. In recent times, data-driven Program eXecution Models (PXMs) are gaining popularity due to their superior support compared to traditional von Neumann execution models. However, exposing the benefits of such PXMs within a high-level programming language remains a challenge. Although many high-level programming languages and APIs support concurrency and multi-threading (e.g., C++11, Java, OpenMP, MPI, etc.), their synchronisation models make large use of mutex and locks, generally leading to poor system performance. Conversely, one major appeal of Go programming language is the way it supports concurrency: goroutines (tagged functions) are mapped on OS threads and communicate each other through data structures buffering input data (channels). By forcing goroutines to exchange data only through channels, it is possible to enable a data-driven execution. This paper proposes a first attempt to map goroutines on a data-driven based PXM. Go compilation procedure and the run-time library are modified to exploit the execution of fine-grain threads on an abstracted parallel machine model.
提高计算系统的性能需要提供提高可伸缩性和生产力的解决方案。近年来,数据驱动的程序执行模型(PXMs)越来越受欢迎,因为与传统的冯·诺伊曼执行模型相比,它们具有更好的支持。然而,在高级编程语言中展示这些pxml的好处仍然是一个挑战。尽管许多高级编程语言和api支持并发性和多线程(例如c++ 11、Java、OpenMP、MPI等),但它们的同步模型大量使用互斥锁和锁,通常导致系统性能较差。相反,Go编程语言的一个主要吸引力是它支持并发性的方式:将goout程(标记函数)映射到OS线程上,并通过缓冲输入数据(通道)的数据结构相互通信。通过强制程序只通过通道交换数据,可以启用数据驱动的执行。本文提出了在基于数据驱动的PXM上映射程序的第一次尝试。对Go编译过程和运行库进行了修改,以利用抽象并行机模型上细粒度线程的执行。
{"title":"Let's Go: a Data-Driven Multi-Threading Support","authors":"A. Scionti, Somnath Mazumdar","doi":"10.1145/3075564.3075596","DOIUrl":"https://doi.org/10.1145/3075564.3075596","url":null,"abstract":"Increasing performance of computing systems necessitates providing solutions for improving scalability and productivity. In recent times, data-driven Program eXecution Models (PXMs) are gaining popularity due to their superior support compared to traditional von Neumann execution models. However, exposing the benefits of such PXMs within a high-level programming language remains a challenge. Although many high-level programming languages and APIs support concurrency and multi-threading (e.g., C++11, Java, OpenMP, MPI, etc.), their synchronisation models make large use of mutex and locks, generally leading to poor system performance. Conversely, one major appeal of Go programming language is the way it supports concurrency: goroutines (tagged functions) are mapped on OS threads and communicate each other through data structures buffering input data (channels). By forcing goroutines to exchange data only through channels, it is possible to enable a data-driven execution. This paper proposes a first attempt to map goroutines on a data-driven based PXM. Go compilation procedure and the run-time library are modified to exploit the execution of fine-grain threads on an abstracted parallel machine model.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132374452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Designing Scalable Distributed Memory Models: A Case Study 设计可扩展的分布式内存模型:一个案例研究
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3077425
Joshua Landwehr, Joshua D. Suetterlein, J. Manzano, A. Márquez, K. Barker, G. Gao
One promising effort as we progress toward exascale is the development of fine grain execution models. These models display an innate agility providing new avenues to address the challenges presented by futures systems such as extreme parallelism, restrictive power constraints, and fault tolerance. These opportunities however, may be prematurely abandoned if the system software, particularly a distributed runtime, is incapable of scaling. One potentially limiting factor is the enforcement of the memory model in a runtime. In a shared memory environment, weaker memory models are preferred since they promote parallelism and optimizations. This is not necessarily the case for distributed systems as a weaker model may lead to increased coherency operations and memory usage based on the application's communication patterns and memory requirements. Moreover, unlike shared memory models which rely on hardware to lessen the costs of coherence, distributed memory models are forced to rely on expensive runtime calls and network operations. This paper presents the design and implementation of a distributed memory coherency model in a high performance implementation of the Open Community Runtime as an exemplar fine grain execution model. We compare the performance and number of coherence operations of an instance of the OCR standard with our novel model, called Cache DAG consistency (CDAG). Leveraging CDAG consistency, we demonstrate up to a 3.7X reduction in messages and 11x increase in performance for select benchmarks running at scale.
在我们迈向百亿亿次的过程中,一个有希望的努力是开发细粒度执行模型。这些模型显示出固有的敏捷性,为解决未来系统所带来的挑战提供了新的途径,例如极端并行性、限制性功率约束和容错性。但是,如果系统软件(特别是分布式运行时)无法扩展,这些机会可能会被过早地放弃。一个潜在的限制因素是在运行时中执行内存模型。在共享内存环境中,较弱的内存模型是首选,因为它们促进了并行性和优化。对于分布式系统来说,情况并非如此,因为较弱的模型可能会导致基于应用程序的通信模式和内存需求的一致性操作和内存使用增加。此外,与依赖硬件来降低一致性成本的共享内存模型不同,分布式内存模型被迫依赖昂贵的运行时调用和网络操作。本文提出了一个分布式内存一致性模型的设计和实现,作为一个典型的细粒度执行模型,用于开放社区运行时的高性能实现。我们将OCR标准实例的性能和一致性操作的数量与我们的新模型(称为缓存DAG一致性(CDAG))进行了比较。利用CDAG一致性,我们演示了在大规模运行的选定基准测试中,消息减少了3.7倍,性能提高了11倍。
{"title":"Designing Scalable Distributed Memory Models: A Case Study","authors":"Joshua Landwehr, Joshua D. Suetterlein, J. Manzano, A. Márquez, K. Barker, G. Gao","doi":"10.1145/3075564.3077425","DOIUrl":"https://doi.org/10.1145/3075564.3077425","url":null,"abstract":"One promising effort as we progress toward exascale is the development of fine grain execution models. These models display an innate agility providing new avenues to address the challenges presented by futures systems such as extreme parallelism, restrictive power constraints, and fault tolerance. These opportunities however, may be prematurely abandoned if the system software, particularly a distributed runtime, is incapable of scaling. One potentially limiting factor is the enforcement of the memory model in a runtime. In a shared memory environment, weaker memory models are preferred since they promote parallelism and optimizations. This is not necessarily the case for distributed systems as a weaker model may lead to increased coherency operations and memory usage based on the application's communication patterns and memory requirements. Moreover, unlike shared memory models which rely on hardware to lessen the costs of coherence, distributed memory models are forced to rely on expensive runtime calls and network operations. This paper presents the design and implementation of a distributed memory coherency model in a high performance implementation of the Open Community Runtime as an exemplar fine grain execution model. We compare the performance and number of coherence operations of an instance of the OCR standard with our novel model, called Cache DAG consistency (CDAG). Leveraging CDAG consistency, we demonstrate up to a 3.7X reduction in messages and 11x increase in performance for select benchmarks running at scale.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130836171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Advanced Manufacturing Collaboration in a Cloud-based App Marketplace 基于云的应用程序市场中的先进制造协作
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3077547
A. Akula, P. Calyam, Ronny Bazan Antequera, Raymond E. Leto
Advanced Manufacturing Apps that perform complex modeling and simulation are now becoming available in Marketplaces. App stakeholders face two fundamental challenges in cloud engineering of the App Marketplaces: (i) orchestration of service chaining through an App Runtime, and (ii) finding a suitable cost model to evaluate App pricing strategies. In this paper, we address these challenges by proposing a new cloud architecture that aims at supporting an 'App Marketplace' that thrives on agile development, organic collaboration and scalable sales of next generation Manufacturing Apps requiring high-performance simulation and modeling. We describe how we are realizing the vision of this architecture through an App Runtime we have developed that leverages a Resource Brokering Service to create App chaining mechanisms. We also detail a new cost model that could be part of an Accounting Service in our proposed architecture to address the issues of cost accounting and pricing of the Apps faced by the App developers while using cloud infrastructures for hosting their Apps. Lastly, we describe experiments with a real-world implementation of our App Runtime and cost model in a WheelSim testbed that uses NSF GENI Cloud and Ohio Supercomputer Center resources. Our results show benefits to an App developer in terms of: satisfactory user experience, lower design time and lower cost/simulation.
执行复杂建模和仿真的先进制造应用程序现在可以在市场上使用。应用程序利益相关者在应用程序市场的云工程中面临两个基本挑战:(i)通过应用程序运行时编排服务链,以及(ii)找到合适的成本模型来评估应用程序定价策略。在本文中,我们通过提出一种新的云架构来解决这些挑战,该架构旨在支持“应用程序市场”,该应用程序市场在需要高性能仿真和建模的下一代制造应用程序的敏捷开发、有机协作和可扩展销售方面蓬勃发展。我们描述了如何通过我们开发的应用运行时来实现这个架构的愿景,该运行时利用资源代理服务来创建应用链接机制。我们还详细介绍了一个新的成本模型,它可能是我们提议的架构中会计服务的一部分,以解决应用程序开发人员在使用云基础设施托管其应用程序时所面临的应用程序成本会计和定价问题。最后,我们描述了在使用NSF GENI云和俄亥俄超级计算机中心资源的WheelSim测试平台上实现我们的应用程序运行时和成本模型的实际实验。我们的研究结果为应用开发者带来了以下好处:令人满意的用户体验,更短的设计时间和更低的成本/模拟。
{"title":"Advanced Manufacturing Collaboration in a Cloud-based App Marketplace","authors":"A. Akula, P. Calyam, Ronny Bazan Antequera, Raymond E. Leto","doi":"10.1145/3075564.3077547","DOIUrl":"https://doi.org/10.1145/3075564.3077547","url":null,"abstract":"Advanced Manufacturing Apps that perform complex modeling and simulation are now becoming available in Marketplaces. App stakeholders face two fundamental challenges in cloud engineering of the App Marketplaces: (i) orchestration of service chaining through an App Runtime, and (ii) finding a suitable cost model to evaluate App pricing strategies. In this paper, we address these challenges by proposing a new cloud architecture that aims at supporting an 'App Marketplace' that thrives on agile development, organic collaboration and scalable sales of next generation Manufacturing Apps requiring high-performance simulation and modeling. We describe how we are realizing the vision of this architecture through an App Runtime we have developed that leverages a Resource Brokering Service to create App chaining mechanisms. We also detail a new cost model that could be part of an Accounting Service in our proposed architecture to address the issues of cost accounting and pricing of the Apps faced by the App developers while using cloud infrastructures for hosting their Apps. Lastly, we describe experiments with a real-world implementation of our App Runtime and cost model in a WheelSim testbed that uses NSF GENI Cloud and Ohio Supercomputer Center resources. Our results show benefits to an App developer in terms of: satisfactory user experience, lower design time and lower cost/simulation.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131948853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Optimizing memory affinity with a hybrid compiler/OS approach 使用混合编译器/操作系统方法优化内存关联
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3075566
M. Diener, E. Cruz, M. Alves, E. Borin, P. Navaux
Optimizing the memory access behavior is an important challenge to improve the performance and energy consumption of parallel applications on shared memory architectures. Modern systems contain complex memory hierarchies with multiple memory controllers and several levels of caches. In such machines, analyzing the affinity between threads and data to map them to the hardware hierarchy reduces the cost of memory accesses. In this paper, we introduce a hybrid technique to optimize the memory access behavior of parallel applications. It is based on a compiler optimization that inserts code to predict, at runtime, the memory access behavior of the application and an OS mechanism that uses this information to optimize the mapping of threads and data. In contrast to previous work, our proposal uses a proactive technique to improve the future memory access behavior using predictions instead of the past behavior. Our mechanism achieves substantial performance gains for a variety of parallel applications.
优化内存访问行为是提高共享内存架构上并行应用程序性能和能耗的一个重要挑战。现代系统包含复杂的内存层次结构,具有多个内存控制器和多个缓存级别。在这样的机器中,分析线程和数据之间的关系,将它们映射到硬件层次结构,可以降低内存访问的成本。在本文中,我们介绍了一种混合技术来优化并行应用程序的内存访问行为。它基于一个编译器优化,在运行时插入代码来预测应用程序的内存访问行为,以及一个使用该信息来优化线程和数据映射的操作系统机制。与以往的工作不同,我们的建议使用一种主动的技术来改善未来的记忆访问行为,使用预测而不是过去的行为。我们的机制为各种并行应用程序实现了显著的性能提升。
{"title":"Optimizing memory affinity with a hybrid compiler/OS approach","authors":"M. Diener, E. Cruz, M. Alves, E. Borin, P. Navaux","doi":"10.1145/3075564.3075566","DOIUrl":"https://doi.org/10.1145/3075564.3075566","url":null,"abstract":"Optimizing the memory access behavior is an important challenge to improve the performance and energy consumption of parallel applications on shared memory architectures. Modern systems contain complex memory hierarchies with multiple memory controllers and several levels of caches. In such machines, analyzing the affinity between threads and data to map them to the hardware hierarchy reduces the cost of memory accesses. In this paper, we introduce a hybrid technique to optimize the memory access behavior of parallel applications. It is based on a compiler optimization that inserts code to predict, at runtime, the memory access behavior of the application and an OS mechanism that uses this information to optimize the mapping of threads and data. In contrast to previous work, our proposal uses a proactive technique to improve the future memory access behavior using predictions instead of the past behavior. Our mechanism achieves substantial performance gains for a variety of parallel applications.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116967896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
DCT Learning-Based Hardware Design for Neural Signal Acquisition Systems 基于DCT学习的神经信号采集系统硬件设计
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3078890
C. Aprile, J. Wüthrich, Luca Baldassarre, Y. Leblebici, V. Cevher
This work presents an area and power efficient encoding system for wireless implantable devices capable of monitoring the electrical activity of the brain. Such devices are becoming an important tool for understanding, real-time monitoring, and potentially treating mental diseases such as epilepsy and depression. Recent advances on compressive sensing (CS) have shown a huge potential for sub-Nyquist sampling of neuronal signals. However, its implementation is still facing critical issues in delivering sufficient performance and in hardware complexity. In this work, we explore the tradeoffs between area and power requirements applying a novel DCT Learning-Based Compressive Subsampling approach on a human iEEG dataset. The proposed method achieves compression rates up to 64x, increasing the reconstruction performance and reducing the wireless transmission costs with respect to recent state-of-art. This new fully digital architecture handles the data compression of each individual neural acquisition channel with an area of 490 x 650/μm in 0.18 μm CMOS technology, and a power dissipation of only 2μW.
这项工作提出了一种面积和功率有效的编码系统,用于无线植入式设备,能够监测大脑的电活动。这种设备正在成为理解、实时监测和潜在治疗癫痫和抑郁症等精神疾病的重要工具。压缩感知(CS)的最新进展显示了神经元信号亚奈奎斯特采样的巨大潜力。然而,它的实现在提供足够的性能和硬件复杂性方面仍然面临着关键问题。在这项工作中,我们在人类iEEG数据集上应用一种新的基于DCT学习的压缩子采样方法来探索面积和功率需求之间的权衡。该方法实现了高达64倍的压缩率,提高了重建性能,并降低了无线传输成本。这种全新的全数字架构处理每个单独的神经采集通道的数据压缩,其面积为490 x 650/μm,采用0.18 μm CMOS技术,功耗仅为2μW。
{"title":"DCT Learning-Based Hardware Design for Neural Signal Acquisition Systems","authors":"C. Aprile, J. Wüthrich, Luca Baldassarre, Y. Leblebici, V. Cevher","doi":"10.1145/3075564.3078890","DOIUrl":"https://doi.org/10.1145/3075564.3078890","url":null,"abstract":"This work presents an area and power efficient encoding system for wireless implantable devices capable of monitoring the electrical activity of the brain. Such devices are becoming an important tool for understanding, real-time monitoring, and potentially treating mental diseases such as epilepsy and depression. Recent advances on compressive sensing (CS) have shown a huge potential for sub-Nyquist sampling of neuronal signals. However, its implementation is still facing critical issues in delivering sufficient performance and in hardware complexity. In this work, we explore the tradeoffs between area and power requirements applying a novel DCT Learning-Based Compressive Subsampling approach on a human iEEG dataset. The proposed method achieves compression rates up to 64x, increasing the reconstruction performance and reducing the wireless transmission costs with respect to recent state-of-art. This new fully digital architecture handles the data compression of each individual neural acquisition channel with an area of 490 x 650/μm in 0.18 μm CMOS technology, and a power dissipation of only 2μW.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115589762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Center for High-Performance Reconfigurable Computing (CHREC): A Ten-Year Odyssey 高性能可重构计算中心(CHREC):十年之旅
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3095082
W. Feng, A. George, H. Lamm, M. Wirthlin
In 2007, under the auspices of the Industry/University Cooperative Research Centers (I/URC) program of the National Science Foundation, we established the Center for High-performance Reconfigurable Computing (CHREC) to facilitate scientific and engineering research in architectures, algorithms, software, services, applications, and performance optimization and evaluation for the advancement of multi-paradigm reconfigurable computing --- "reconfigurable" in both hardware or software. Each of the university sites in CHREC --- University of Pittsburgh, University of Florida, Brigham Young University, and Virginia Tech --- contributes unique expertise and capabilities for research in this critical field. Reflecting upon our ten-year odyssey with CHREC, we achieved the following successes in collaborative partnership with our CHREC members from industry and other government agencies: (1) established the nations first multidisciplinary research center in reconfigurable high-performance computing as a basis for long-term partnership and collaboration amongst industry, academe, and government; (2) directly supported the research needs of our center members in a cost-effective manner with pooled and leveraged resources and maximized synergy; (3) enhanced the educational experience for a diverse set of top-quality graduate and undergraduate students; and (4) advanced the knowledge and technologies in this field and ensured commercial relevance of the research with rapid and effective technology transfer.
2007年,在国家科学基金产学研合作研究中心(I/URC)项目的支持下,我们建立了高性能可重构计算中心(CHREC),以促进在架构、算法、软件、服务、应用、性能优化和评估方面的科学和工程研究,以推进多范式可重构计算——硬件或软件的“可重构”。CHREC的每一所大学——匹兹堡大学、佛罗里达大学、杨百翰大学和弗吉尼亚理工大学——都为这一关键领域的研究贡献了独特的专业知识和能力。回顾我们与中国计算机研究中心的十年历程,我们与来自业界和其他政府机构的中国计算机研究中心成员的合作取得了以下成功:(1)建立了全国第一个可重构高性能计算多学科研究中心,为产学研和政府之间的长期合作奠定了基础;(2)集中资源、撬动资源、发挥最大协同效应,以高性价比的方式直接支持中心成员的研究需求;(3)提高了各类高素质研究生和本科生的教育体验;(4)推进该领域的知识和技术,并通过快速有效的技术转移确保研究的商业相关性。
{"title":"Center for High-Performance Reconfigurable Computing (CHREC): A Ten-Year Odyssey","authors":"W. Feng, A. George, H. Lamm, M. Wirthlin","doi":"10.1145/3075564.3095082","DOIUrl":"https://doi.org/10.1145/3075564.3095082","url":null,"abstract":"In 2007, under the auspices of the Industry/University Cooperative Research Centers (I/URC) program of the National Science Foundation, we established the Center for High-performance Reconfigurable Computing (CHREC) to facilitate scientific and engineering research in architectures, algorithms, software, services, applications, and performance optimization and evaluation for the advancement of multi-paradigm reconfigurable computing --- \"reconfigurable\" in both hardware or software. Each of the university sites in CHREC --- University of Pittsburgh, University of Florida, Brigham Young University, and Virginia Tech --- contributes unique expertise and capabilities for research in this critical field. Reflecting upon our ten-year odyssey with CHREC, we achieved the following successes in collaborative partnership with our CHREC members from industry and other government agencies: (1) established the nations first multidisciplinary research center in reconfigurable high-performance computing as a basis for long-term partnership and collaboration amongst industry, academe, and government; (2) directly supported the research needs of our center members in a cost-effective manner with pooled and leveraged resources and maximized synergy; (3) enhanced the educational experience for a diverse set of top-quality graduate and undergraduate students; and (4) advanced the knowledge and technologies in this field and ensured commercial relevance of the research with rapid and effective technology transfer.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127787555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quality Optimization of Resilient Applications under Temperature Constraints 温度约束下弹性应用的质量优化
Pub Date : 2017-05-15 DOI: 10.1145/3075564.3075577
Heng Yu, Y. Ha, Jing Wang
Inherent resilience of applications enables the design paradigm of approximate computing that exploits computation in-exactness by trading off output quality for runtime system resources. When executing such quality-scalable applications on multiprocessor embedded systems, it is expected not only to achieve the highest possible output quality, but also to handle the critical thermal challenge spurred by vastly increased chip density. While the rising temperature causes significant quality distortion at runtime, existing thermal-management techniques, such as dynamic frequency scaling, rarely take into account the trade-off possibilities between output quality and thermal budget. In this paper, we explore the application-level quality-scaling features of resilient applications to achieve effective temperature control as well as quality maximization. We propose an efficient iterative pseudo quadratic programming heuristic to decide the optimal frequency and application execution cycles, in order to achieve quality optimization, under temperature, timing, and energy constraints. Our approaches are evaluated using realistic benchmarks with known platform thermal parameters. The proposed methods show a 98.5% quality improvement with temperature violation awareness.
应用程序固有的弹性支持近似计算的设计范例,这种设计范例通过牺牲输出质量换取运行时系统资源来利用精确计算。当在多处理器嵌入式系统上执行这种高质量可扩展的应用程序时,不仅要实现最高的输出质量,还要处理芯片密度大幅增加所带来的关键热挑战。虽然温度升高会在运行时导致严重的质量失真,但现有的热管理技术,如动态频率缩放,很少考虑输出质量和热预算之间的权衡。在本文中,我们探讨了弹性应用的应用级质量缩放特征,以实现有效的温度控制和质量最大化。我们提出了一种有效的迭代伪二次规划启发式方法来确定最优频率和应用程序执行周期,以便在温度,时间和能量约束下实现质量优化。我们的方法是使用已知平台热参数的实际基准进行评估的。采用温度违例感知的方法,质量提高98.5%。
{"title":"Quality Optimization of Resilient Applications under Temperature Constraints","authors":"Heng Yu, Y. Ha, Jing Wang","doi":"10.1145/3075564.3075577","DOIUrl":"https://doi.org/10.1145/3075564.3075577","url":null,"abstract":"Inherent resilience of applications enables the design paradigm of approximate computing that exploits computation in-exactness by trading off output quality for runtime system resources. When executing such quality-scalable applications on multiprocessor embedded systems, it is expected not only to achieve the highest possible output quality, but also to handle the critical thermal challenge spurred by vastly increased chip density. While the rising temperature causes significant quality distortion at runtime, existing thermal-management techniques, such as dynamic frequency scaling, rarely take into account the trade-off possibilities between output quality and thermal budget. In this paper, we explore the application-level quality-scaling features of resilient applications to achieve effective temperature control as well as quality maximization. We propose an efficient iterative pseudo quadratic programming heuristic to decide the optimal frequency and application execution cycles, in order to achieve quality optimization, under temperature, timing, and energy constraints. Our approaches are evaluated using realistic benchmarks with known platform thermal parameters. The proposed methods show a 98.5% quality improvement with temperature violation awareness.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125312724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
Proceedings of the Computing Frontiers Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1