首页 > 最新文献

2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)最新文献

英文 中文
Adaptive energy minimization of embedded heterogeneous systems using regression-based learning 基于回归学习的嵌入式异构系统自适应能量最小化
Sheng Yang, R. Shafik, G. Merrett, Edward A. Stott, Joshua M. Levine, James J. Davis, B. Al-Hashimi
Modern embedded systems consist of heterogeneous computing resources with diverse energy and performance trade-offs. This is because these resources exercise the application tasks differently, generating varying workloads and energy consumption. As a result, minimizing energy consumption in these systems is challenging as continuous adaptation between application task mapping (i.e. allocating tasks among the computing resources) and dynamic voltage/frequency scaling (DVFS) is required. Existing approaches have limitations due to lack of such adaptation with practical validation (Table I). This paper addresses such limitation and proposes a novel adaptive energy minimization approach for embedded heterogeneous systems. Fundamental to this approach is a runtime model, generated through regression-based learning of energy/performance trade-offs between different computing resources in the system. Using this model, an application task is suitably mapped on a computing resource during runtime, ensuring minimum energy consumption for a given application performance requirement. Such mapping is also coupled with a DVFS control to adapt to performance and workload variations. The proposed approach is designed, engineered and validated on a Zynq-ZC702 platform, consisting of CPU, DSP and FPGA cores. Using several image processing applications as case studies, it was demonstrated that our proposed approach can achieve significant energy savings (>70%), when compared to the existing approaches.
现代嵌入式系统由具有不同能量和性能权衡的异构计算资源组成。这是因为这些资源以不同的方式执行应用程序任务,产生不同的工作负载和能耗。因此,在这些系统中,最小化能耗是具有挑战性的,因为需要在应用任务映射(即在计算资源之间分配任务)和动态电压/频率缩放(DVFS)之间进行持续适应。由于缺乏这种适应性和实际验证,现有方法存在局限性(表1)。本文解决了这一局限性,并提出了一种新的嵌入式异构系统自适应能量最小化方法。该方法的基础是运行时模型,该模型是通过基于回归的系统中不同计算资源之间的能量/性能权衡学习生成的。使用此模型,在运行时将应用程序任务适当地映射到计算资源上,确保给定应用程序性能需求的最小能耗。这种映射还与DVFS控制相结合,以适应性能和工作负载的变化。该方法在Zynq-ZC702平台上进行了设计、工程和验证,该平台由CPU、DSP和FPGA内核组成。通过几个图像处理应用的案例研究表明,与现有方法相比,我们提出的方法可以实现显著的节能(>70%)。
{"title":"Adaptive energy minimization of embedded heterogeneous systems using regression-based learning","authors":"Sheng Yang, R. Shafik, G. Merrett, Edward A. Stott, Joshua M. Levine, James J. Davis, B. Al-Hashimi","doi":"10.1109/PATMOS.2015.7347594","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347594","url":null,"abstract":"Modern embedded systems consist of heterogeneous computing resources with diverse energy and performance trade-offs. This is because these resources exercise the application tasks differently, generating varying workloads and energy consumption. As a result, minimizing energy consumption in these systems is challenging as continuous adaptation between application task mapping (i.e. allocating tasks among the computing resources) and dynamic voltage/frequency scaling (DVFS) is required. Existing approaches have limitations due to lack of such adaptation with practical validation (Table I). This paper addresses such limitation and proposes a novel adaptive energy minimization approach for embedded heterogeneous systems. Fundamental to this approach is a runtime model, generated through regression-based learning of energy/performance trade-offs between different computing resources in the system. Using this model, an application task is suitably mapped on a computing resource during runtime, ensuring minimum energy consumption for a given application performance requirement. Such mapping is also coupled with a DVFS control to adapt to performance and workload variations. The proposed approach is designed, engineered and validated on a Zynq-ZC702 platform, consisting of CPU, DSP and FPGA cores. Using several image processing applications as case studies, it was demonstrated that our proposed approach can achieve significant energy savings (>70%), when compared to the existing approaches.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"171 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114017906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
A versatile and reliable glitch filter for clocks 一个多功能和可靠的时钟故障滤波器
Robert Najvirt, A. Steininger
In today's complex system-on-chip architectures the protection of the clock(s) against glitches introduced by environmental disturbances, attackers, or gating measures is becoming increasingly important. Glitch protection is a delicate issue in the digital domain, as it is inherently coupled with metastability issues. The circuit we propose in this paper outputs a clock that strictly follows an input reference clock in the regular case, but guarantees a minimum output pulse width even in case of arbitrary behavior of the reference. We will give a thorough analysis showing that, unlike most existing solutions, our circuit can handle metastability without any residual risk of upsets. Still its implementation is very simple. Our theoretical claims will be supported by simulation results. Furthermore, we will give some examples on possible use cases for such a circuit, like clock gating, clock self-repair, or defense against clock attacks.
在当今复杂的片上系统架构中,保护时钟免受环境干扰、攻击者或门控措施引入的故障变得越来越重要。故障保护在数字领域是一个微妙的问题,因为它固有地与亚稳态问题相结合。本文提出的电路在正常情况下严格遵循输入参考时钟输出时钟,但即使在参考时钟任意行为的情况下也保证最小的输出脉冲宽度。我们将给出彻底的分析,表明与大多数现有的解决方案不同,我们的电路可以处理亚稳态而没有任何剩余的扰流风险。但是它的实现非常简单。我们的理论主张将得到仿真结果的支持。此外,我们将给出一些关于这种电路的可能用例的示例,如时钟门控,时钟自我修复或防御时钟攻击。
{"title":"A versatile and reliable glitch filter for clocks","authors":"Robert Najvirt, A. Steininger","doi":"10.1109/PATMOS.2015.7347599","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347599","url":null,"abstract":"In today's complex system-on-chip architectures the protection of the clock(s) against glitches introduced by environmental disturbances, attackers, or gating measures is becoming increasingly important. Glitch protection is a delicate issue in the digital domain, as it is inherently coupled with metastability issues. The circuit we propose in this paper outputs a clock that strictly follows an input reference clock in the regular case, but guarantees a minimum output pulse width even in case of arbitrary behavior of the reference. We will give a thorough analysis showing that, unlike most existing solutions, our circuit can handle metastability without any residual risk of upsets. Still its implementation is very simple. Our theoretical claims will be supported by simulation results. Furthermore, we will give some examples on possible use cases for such a circuit, like clock gating, clock self-repair, or defense against clock attacks.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133367304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Efficient parallelization of the Discrete Wavelet Transform algorithm using memory-oblivious optimizations 使用记忆无关优化的离散小波变换算法的有效并行化
A. Keliris, Vasilis Dimitsas, O. Kremmyda, D. Gizopoulos, M. Maniatakos
As the rate of single-thread CPU performance improvement per generation has diminished due to lower transistor-speed scaling and energy related issues, researchers and industry have shifted their interest towards multi-core and many-core architectures for improving performance. Comparisons between optimized applications for parallel architectures have been quantified many times in the literature, but contradictory results have been reported mainly due to biased methods of evaluating and comparing these architectures. In this paper, we present memory-oblivious optimizations of the widely used Discrete Wavelet Transform (DWT), and provide detailed comparisons of the algorithm on Intel and AMD multi-core CPUs, Nvidia many-core GPUs, as well as the Intel's Xeon Phi many-core coprocessor. Our results indicate that, compared to their respective non-optimized single thread implementations, memory-oblivious optimization delivers up to 17.9×-197.2× performance improvement for the various architectures examined. Furthermore, compared to the state-of-the-art, the presented CPU and GPU memory-oblivious implementations are 2.6× and 1.3× faster respectively than the fastest implementations of DWT currently available in the literature. No comparison to the state-of-the-art can be made for the Xeon Phi, as, to the best of our knowledge, this is the first study that optimizes the DWT for this newfangled architecture.
由于较低的晶体管速度缩放和能源相关问题,每一代单线程CPU性能的提高速度已经降低,研究人员和工业界已经将他们的兴趣转向多核和多核架构,以提高性能。在文献中,并行架构优化应用程序之间的比较已经被量化了很多次,但由于评估和比较这些架构的方法存在偏差,因此报告了相互矛盾的结果。在本文中,我们提出了广泛使用的离散小波变换(DWT)的内存无关优化,并详细比较了该算法在Intel和AMD多核cpu、Nvidia多核gpu以及Intel的Xeon Phi多核协处理器上的性能。我们的结果表明,与各自未优化的单线程实现相比,无关内存的优化为所研究的各种体系结构提供了17.9×-197.2×性能改进。此外,与最先进的技术相比,所提出的CPU和GPU内存无关实现分别比目前文献中最快的DWT实现快2.6倍和1.3倍。对于Xeon Phi处理器来说,目前还无法与最先进的技术进行比较,因为据我们所知,这是第一次针对这种新颖的架构优化DWT的研究。
{"title":"Efficient parallelization of the Discrete Wavelet Transform algorithm using memory-oblivious optimizations","authors":"A. Keliris, Vasilis Dimitsas, O. Kremmyda, D. Gizopoulos, M. Maniatakos","doi":"10.1109/PATMOS.2015.7347583","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347583","url":null,"abstract":"As the rate of single-thread CPU performance improvement per generation has diminished due to lower transistor-speed scaling and energy related issues, researchers and industry have shifted their interest towards multi-core and many-core architectures for improving performance. Comparisons between optimized applications for parallel architectures have been quantified many times in the literature, but contradictory results have been reported mainly due to biased methods of evaluating and comparing these architectures. In this paper, we present memory-oblivious optimizations of the widely used Discrete Wavelet Transform (DWT), and provide detailed comparisons of the algorithm on Intel and AMD multi-core CPUs, Nvidia many-core GPUs, as well as the Intel's Xeon Phi many-core coprocessor. Our results indicate that, compared to their respective non-optimized single thread implementations, memory-oblivious optimization delivers up to 17.9×-197.2× performance improvement for the various architectures examined. Furthermore, compared to the state-of-the-art, the presented CPU and GPU memory-oblivious implementations are 2.6× and 1.3× faster respectively than the fastest implementations of DWT currently available in the literature. No comparison to the state-of-the-art can be made for the Xeon Phi, as, to the best of our knowledge, this is the first study that optimizes the DWT for this newfangled architecture.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128885572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Constructing stability-based clock gating with hierarchical clustering 构造基于稳定性的分层聚类时钟门控
Bao Le, Djordje Maksimovic, D. Sengupta, Erhan Ergin, Ryan Berryhill, A. Veneris
In modern designs, a complex clock distribution network is employed to distribute the clock signal(s) to all the sequential elements. As the functionality of these sequential elements depends heavily on usage scenarios, it is vital that the clock network is optimized for these scenarios. This paper introduces a clock network power optimization methodology based on design usage patterns and stability based clock gating. Specifically, whenever a register retains its value from the previous cycle, a clock gating implementation shuts off its clock and disables data loading to enable power reduction. We first introduce the notion of a stability pattern and its correlation with clock gating efficiency. Next, we introduce a methodology to identify efficient clock gating implementations. In this framework, a clustering algorithm leveraging stability patterns iteratively computes more effective gating implementations. Each implementation is evaluated further on area overhead and critical path delay. If it satisfies all criteria, it is implemented in the design; otherwise, it is sent back to the clustering algorithm to compute new clock gating implementations. Empirical results show 22.6% reduction in clock network power and 16.0% reduction in total power consumption. This confirms the practicality and robustness of the proposed methodology.
在现代设计中,采用复杂的时钟分配网络将时钟信号分配给所有顺序元件。由于这些顺序元素的功能在很大程度上取决于使用场景,因此针对这些场景对时钟网络进行优化是至关重要的。介绍了一种基于设计使用模式和稳定性的时钟门控的时钟网络功率优化方法。具体来说,每当寄存器保留前一个周期的值时,时钟门控实现就会关闭其时钟并禁用数据加载以实现功耗降低。我们首先介绍稳定模式的概念及其与时钟门控效率的关系。接下来,我们将介绍一种识别有效时钟门控实现的方法。在这个框架中,利用稳定性模式的聚类算法迭代地计算更有效的门控实现。每个实现都进一步评估了面积开销和关键路径延迟。如果满足所有条件,则在设计中实现;否则,它将被发送回聚类算法以计算新的时钟门控实现。实验结果表明,时钟网络功耗降低22.6%,总功耗降低16.0%。这证实了所提出方法的实用性和稳健性。
{"title":"Constructing stability-based clock gating with hierarchical clustering","authors":"Bao Le, Djordje Maksimovic, D. Sengupta, Erhan Ergin, Ryan Berryhill, A. Veneris","doi":"10.1109/PATMOS.2015.7347593","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347593","url":null,"abstract":"In modern designs, a complex clock distribution network is employed to distribute the clock signal(s) to all the sequential elements. As the functionality of these sequential elements depends heavily on usage scenarios, it is vital that the clock network is optimized for these scenarios. This paper introduces a clock network power optimization methodology based on design usage patterns and stability based clock gating. Specifically, whenever a register retains its value from the previous cycle, a clock gating implementation shuts off its clock and disables data loading to enable power reduction. We first introduce the notion of a stability pattern and its correlation with clock gating efficiency. Next, we introduce a methodology to identify efficient clock gating implementations. In this framework, a clustering algorithm leveraging stability patterns iteratively computes more effective gating implementations. Each implementation is evaluated further on area overhead and critical path delay. If it satisfies all criteria, it is implemented in the design; otherwise, it is sent back to the clustering algorithm to compute new clock gating implementations. Empirical results show 22.6% reduction in clock network power and 16.0% reduction in total power consumption. This confirms the practicality and robustness of the proposed methodology.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126401840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Inferring custom architectures from OpenCL 从OpenCL推断自定义架构
Krzysztof Kepa, Ritesh Soni, P. Athanas
OpenCL has emerged as the de facto cross-platform standard in the GPU-based HPC computing domain. However, in FPGA-based HPC systems, OpenCL-to-FPGA compilers often yield suboptimal results due to the rigid architecture, limited shared-memory, and non-existent inter-work-item communication pathways implied by the OpenCL model. In this work, a methodology of inferring application-specific OpenCL “work-item” interfaces based on kernel code analysis is explored. A proof-of-concept prototype is implemented using an OpenCL source-to-source translator, which allows automated generation of the FPGA-based hardware accelerators directly from the OpenCL sources. The type and implementation of the inferred interface is tailored to match the data access patterns within the kernel. The inferred interface outperforms limitations of the OpenCL rigid architecture and communication model. The presented approach achieves a ~30x speedup over the generic memory-based approach for a 16 work-items application. A set of OpenCL coding patterns targeting FPGA-based HPC systems is also introduced. This technique is demonstrated on a popular bioinformatics algorithm, yet is applicable to any such algorithm with non-standard inter-cell communications.
OpenCL已经成为基于gpu的高性能计算领域事实上的跨平台标准。然而,在基于fpga的HPC系统中,由于严格的体系结构、有限的共享内存以及OpenCL模型所隐含的不存在的工作项间通信路径,OpenCL到fpga的编译器经常产生次优结果。在这项工作中,探索了一种基于内核代码分析推断特定于应用程序的OpenCL“工作项”接口的方法。概念验证原型是使用OpenCL源到源转换器实现的,它允许直接从OpenCL源自动生成基于fpga的硬件加速器。对推断接口的类型和实现进行了调整,以匹配内核中的数据访问模式。推断的接口超越了OpenCL严格架构和通信模型的限制。对于一个16个工作项的应用程序,所提出的方法比基于内存的通用方法实现了约30倍的加速。介绍了一套针对基于fpga的高性能计算系统的OpenCL编码模式。该技术在一种流行的生物信息学算法上得到了演示,但适用于任何非标准细胞间通信的此类算法。
{"title":"Inferring custom architectures from OpenCL","authors":"Krzysztof Kepa, Ritesh Soni, P. Athanas","doi":"10.1109/PATMOS.2015.7347581","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347581","url":null,"abstract":"OpenCL has emerged as the de facto cross-platform standard in the GPU-based HPC computing domain. However, in FPGA-based HPC systems, OpenCL-to-FPGA compilers often yield suboptimal results due to the rigid architecture, limited shared-memory, and non-existent inter-work-item communication pathways implied by the OpenCL model. In this work, a methodology of inferring application-specific OpenCL “work-item” interfaces based on kernel code analysis is explored. A proof-of-concept prototype is implemented using an OpenCL source-to-source translator, which allows automated generation of the FPGA-based hardware accelerators directly from the OpenCL sources. The type and implementation of the inferred interface is tailored to match the data access patterns within the kernel. The inferred interface outperforms limitations of the OpenCL rigid architecture and communication model. The presented approach achieves a ~30x speedup over the generic memory-based approach for a 16 work-items application. A set of OpenCL coding patterns targeting FPGA-based HPC systems is also introduced. This technique is demonstrated on a popular bioinformatics algorithm, yet is applicable to any such algorithm with non-standard inter-cell communications.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132096907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Calculation of worst-case execution time for multicore processors using deterministic execution 使用确定性执行的多核处理器最坏情况执行时间的计算
Hamid Mushtaq, Z. Al-Ars, K. Bertels
Safety critical real time systems need to meet strict timing deadlines. We use a model checking based approach to calculate the WCET, where we apply optimizations to reduce the number of states stored by the model checker. Furthermore, we used deterministic shared memory accesses to further reduce calculation time, memory and number of states needed for calculating WCET. By optimizing the model checking code, we were able to complete benchmarks which otherwise were having state explosion problems. Furthermore, by using deterministic execution, we significantly reduced the calculation time (up to 158×), memory (up to 89×) and states needed (up to 188×) for calculating WCET with a negligible increase (up to 4%) in the calculated WCET for a multicore system with 4 cores. Lastly, unlike other state-of-the-art approaches, that perform binary search to search the WCET by running several iterations, our method calculates WCET in just one iteration. Taking all these optimizations into consideration, the gain in speed was from 1775× to 2471× for 4 threads.
安全关键实时系统需要满足严格的时间期限。我们使用基于模型检查的方法来计算WCET,其中我们应用优化来减少模型检查器存储的状态数量。此外,我们使用确定性共享内存访问来进一步减少计算WCET所需的计算时间、内存和状态数。通过优化模型检查代码,我们能够完成有状态爆炸问题的基准测试。此外,通过使用确定性执行,我们显著减少了计算WCET所需的计算时间(最多158x)、内存(最多89x)和状态(最多188x),而对于具有4核的多核系统,计算的WCET的增加可以忽略不计(最多4%)。最后,与其他通过运行多次迭代来执行二进制搜索来搜索WCET的先进方法不同,我们的方法仅在一次迭代中计算WCET。考虑到所有这些优化,对于4个线程,速度的增益从1775x提高到2471x。
{"title":"Calculation of worst-case execution time for multicore processors using deterministic execution","authors":"Hamid Mushtaq, Z. Al-Ars, K. Bertels","doi":"10.1109/PATMOS.2015.7347584","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347584","url":null,"abstract":"Safety critical real time systems need to meet strict timing deadlines. We use a model checking based approach to calculate the WCET, where we apply optimizations to reduce the number of states stored by the model checker. Furthermore, we used deterministic shared memory accesses to further reduce calculation time, memory and number of states needed for calculating WCET. By optimizing the model checking code, we were able to complete benchmarks which otherwise were having state explosion problems. Furthermore, by using deterministic execution, we significantly reduced the calculation time (up to 158×), memory (up to 89×) and states needed (up to 188×) for calculating WCET with a negligible increase (up to 4%) in the calculated WCET for a multicore system with 4 cores. Lastly, unlike other state-of-the-art approaches, that perform binary search to search the WCET by running several iterations, our method calculates WCET in just one iteration. Taking all these optimizations into consideration, the gain in speed was from 1775× to 2471× for 4 threads.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126661343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Frequency-domain modeling of ground bounce and substrate noise for synchronous and GALS systems 同步和GALS系统的地面弹跳和衬底噪声的频域建模
M. Babić, Xin Fan, M. Krstic
In this work, the ground bounce noise has been modeled and analyzed in frequency domain, for both synchronous and GALS (globally asynchronous, locally synchronous) systems. The analysis has been performed analytically, and validated by numerical simulations in MATLAB. Package parasitics and power distribution network have been coarsely modeled by a simple lumped model, while switching currents have been modeled as periodic triangular pulses. Dominant components of spectrum are determined, and the impact of their distribution on the requirements for substrate modeling has been discussed. It has been concluded that resistive substrate approximation introduces large errors for systems with small decoupling capacitances, while it can be satisfactory for systems with large decoupling capacitances.
在这项工作中,对同步和GALS(全局异步,局部同步)系统的地面弹跳噪声进行了频域建模和分析。通过MATLAB的数值仿真验证了分析结果的正确性。封装寄生和配电网络用简单的集总模型进行了粗略的建模,而开关电流则被建模为周期三角形脉冲。确定了光谱的主要成分,并讨论了它们的分布对衬底建模要求的影响。结果表明,电阻基板近似对于去耦电容较小的系统误差较大,而对于去耦电容较大的系统误差较小。
{"title":"Frequency-domain modeling of ground bounce and substrate noise for synchronous and GALS systems","authors":"M. Babić, Xin Fan, M. Krstic","doi":"10.1109/PATMOS.2015.7347597","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347597","url":null,"abstract":"In this work, the ground bounce noise has been modeled and analyzed in frequency domain, for both synchronous and GALS (globally asynchronous, locally synchronous) systems. The analysis has been performed analytically, and validated by numerical simulations in MATLAB. Package parasitics and power distribution network have been coarsely modeled by a simple lumped model, while switching currents have been modeled as periodic triangular pulses. Dominant components of spectrum are determined, and the impact of their distribution on the requirements for substrate modeling has been discussed. It has been concluded that resistive substrate approximation introduces large errors for systems with small decoupling capacitances, while it can be satisfactory for systems with large decoupling capacitances.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"17 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130920077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
An unconventional computing technique for ultra-fast and ultra-low power data mining 一种超高速、超低功耗数据挖掘的非常规计算技术
V. Canals, A. Morro, A. Oliver, M. Alomar, J. Rosselló
In this work we review the basic principles of stochastic logic and propose its application to probabilistic-based pattern-recognition analysis. The proposed technique is the implementation of a parallel comparison of data with respect to various pre-stored categories. We design smart pulse-based stochastic-logic blocks to provide an efficient pattern recognition analysis. The proposed architecture can speed-up the screening process of huge databases by two orders of magnitude with respect classical software-based solutions, thus implying a great improvement in terms of total performance (speed and power dissipation).
在这项工作中,我们回顾了随机逻辑的基本原理,并提出了它在基于概率的模式识别分析中的应用。所提出的技术是相对于各种预先存储的类别的数据的并行比较的实现。我们设计了基于脉冲的智能随机逻辑块,以提供有效的模式识别分析。与传统的基于软件的解决方案相比,所提出的体系结构可以将大型数据库的筛选过程加快两个数量级,从而意味着在总体性能(速度和功耗)方面有很大的改进。
{"title":"An unconventional computing technique for ultra-fast and ultra-low power data mining","authors":"V. Canals, A. Morro, A. Oliver, M. Alomar, J. Rosselló","doi":"10.1109/PATMOS.2015.7347585","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347585","url":null,"abstract":"In this work we review the basic principles of stochastic logic and propose its application to probabilistic-based pattern-recognition analysis. The proposed technique is the implementation of a parallel comparison of data with respect to various pre-stored categories. We design smart pulse-based stochastic-logic blocks to provide an efficient pattern recognition analysis. The proposed architecture can speed-up the screening process of huge databases by two orders of magnitude with respect classical software-based solutions, thus implying a great improvement in terms of total performance (speed and power dissipation).","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123066317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluation and mitigation of aging effects on a digital on-chip voltage and temperature sensor 数字片上电压和温度传感器老化效应的评估和缓解
M. Altieri, S. Lesecq, D. Puschini, O. Héron, E. Beigné, J. Rodas
Power efficiency is a tremendous challenge for high performance embedded systems under energy constraints. Fine grain Dynamic Voltage and Frequency Scaling approaches are usually implemented in order to meet these conflicting objectives. Moreover, these techniques can be improved if local and on-the-fly monitoring of the dynamic variations is performed. A low-cost onchip general purpose sensor associated with an appropriate data fusion technique has been recently developed in order to monitor local temperature and voltage conditions. However, reliability has become a major concern as the technology scales below 40nm. The aging variation is not anymore negligible and must be taken into account during the monitor design and operation. This paper revisits such a sensor under both BTI and HCI aging effects in 28nm STMicroelectronics technology. A simple recalibration method is also proposed to mitigate the aging effects on the VT estimation.
在能源限制下,电源效率是高性能嵌入式系统面临的巨大挑战。为了满足这些相互冲突的目标,通常采用细粒度动态电压和频率缩放方法。此外,如果对动态变化进行局部和实时监测,这些技术可以得到改进。最近开发了一种低成本的片上通用传感器,结合了适当的数据融合技术,以监测局部温度和电压条件。然而,随着技术规模低于40纳米,可靠性已成为主要问题。老化变化不再是可以忽略不计的,必须在监视器的设计和运行中加以考虑。本文回顾了28nm意法半导体技术中BTI和HCI老化效应下的传感器。提出了一种简单的再标定方法,以减轻老化对VT估计的影响。
{"title":"Evaluation and mitigation of aging effects on a digital on-chip voltage and temperature sensor","authors":"M. Altieri, S. Lesecq, D. Puschini, O. Héron, E. Beigné, J. Rodas","doi":"10.1109/PATMOS.2015.7347595","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347595","url":null,"abstract":"Power efficiency is a tremendous challenge for high performance embedded systems under energy constraints. Fine grain Dynamic Voltage and Frequency Scaling approaches are usually implemented in order to meet these conflicting objectives. Moreover, these techniques can be improved if local and on-the-fly monitoring of the dynamic variations is performed. A low-cost onchip general purpose sensor associated with an appropriate data fusion technique has been recently developed in order to monitor local temperature and voltage conditions. However, reliability has become a major concern as the technology scales below 40nm. The aging variation is not anymore negligible and must be taken into account during the monitor design and operation. This paper revisits such a sensor under both BTI and HCI aging effects in 28nm STMicroelectronics technology. A simple recalibration method is also proposed to mitigate the aging effects on the VT estimation.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127605468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Unified Power Format (UPF) methodology in a vendor independent flow 统一电源格式(UPF)方法在供应商独立的流程
Emilie Garat, David Coriat, E. Beigné, L. Stefanazzi
To provide designers with an efficient low power design flow, several methodologies have been proposed such as the Unified Power Format (UPF). The main issue faced by designers is the non-interoperability of those methods across different Computer Aided Design (CAD) tools. Although the UPF standard was originally created with interoperability in mind, few of its constructs are actually supported by all CAD vendors. In this paper, we aim at providing a UPF 2.0 methodology that is compatible with different tools. The proposed case study is a circuit with three power domains and a cross-vendor UPF specification. This paper demonstrates a full low power design flow, with formal power checking, power aware simulation, synthesis and back-end.
为了给设计人员提供高效的低功耗设计流程,已经提出了几种方法,如统一功率格式(UPF)。设计人员面临的主要问题是这些方法在不同的计算机辅助设计(CAD)工具之间的非互操作性。尽管UPF标准最初在创建时就考虑到互操作性,但实际上所有CAD供应商都支持它的一些结构。在本文中,我们的目标是提供与不同工具兼容的UPF 2.0方法。所提出的案例研究是一个具有三个功率域和跨供应商UPF规范的电路。本文演示了一个完整的低功耗设计流程,包括正式的功耗检测、功耗感知仿真、综合和后端。
{"title":"Unified Power Format (UPF) methodology in a vendor independent flow","authors":"Emilie Garat, David Coriat, E. Beigné, L. Stefanazzi","doi":"10.1109/PATMOS.2015.7347591","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347591","url":null,"abstract":"To provide designers with an efficient low power design flow, several methodologies have been proposed such as the Unified Power Format (UPF). The main issue faced by designers is the non-interoperability of those methods across different Computer Aided Design (CAD) tools. Although the UPF standard was originally created with interoperability in mind, few of its constructs are actually supported by all CAD vendors. In this paper, we aim at providing a UPF 2.0 methodology that is compatible with different tools. The proposed case study is a circuit with three power domains and a cross-vendor UPF specification. This paper demonstrates a full low power design flow, with formal power checking, power aware simulation, synthesis and back-end.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132035696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1