首页 > 最新文献

2008 45th ACM/IEEE Design Automation Conference最新文献

英文 中文
Characterizing chip-multiprocessor variability-tolerance 芯片-多处理器可变容错特性
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391550
S. Herbert, Diana Marculescu
Spatially-correlated intra-die process variations result in significant core-to-core frequency variations in chip-multiprocessors. An analytical model for frequency island chip-multiprocessor throughput is introduced. The improved variability-tolerance of FI-CMPs over their globally-clocked counterparts is quantified across a range of core counts and sizes under constant die area. The benefits are highest for designs consisting of many small cores, with the throughput of a globally-clocked design with 70 small cores increasing by 8.8% when per-core frequency islands are used. The small- core FI-CMP also loses only 7.2% of its nominal performance to process variations, the least among any of the designs.
在芯片多处理器中,空间相关的芯片内部工艺变化导致显著的核心到核心频率变化。介绍了频率岛芯片多处理机吞吐量的解析模型。在恒定的模具面积下,在核数和尺寸范围内量化了fi - cmp比其全球时钟对应的改进的变异性容限。对于由许多小内核组成的设计,好处是最高的,当使用每核频率岛时,具有70个小内核的全局时钟设计的吞吐量增加了8.8%。小核FI-CMP也只损失了7.2%的标称性能的工艺变化,在任何设计中最小。
{"title":"Characterizing chip-multiprocessor variability-tolerance","authors":"S. Herbert, Diana Marculescu","doi":"10.1145/1391469.1391550","DOIUrl":"https://doi.org/10.1145/1391469.1391550","url":null,"abstract":"Spatially-correlated intra-die process variations result in significant core-to-core frequency variations in chip-multiprocessors. An analytical model for frequency island chip-multiprocessor throughput is introduced. The improved variability-tolerance of FI-CMPs over their globally-clocked counterparts is quantified across a range of core counts and sizes under constant die area. The benefits are highest for designs consisting of many small cores, with the throughput of a globally-clocked design with 70 small cores increasing by 8.8% when per-core frequency islands are used. The small- core FI-CMP also loses only 7.2% of its nominal performance to process variations, the least among any of the designs.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128314585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
Why should we do 3D integration? 我们为什么要做3D集成?
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391643
W. Haensch
3D integration offers a technology that meets the requirements of the current trend in high performance microprocessors. It offers the opportunity to continue the performance trends the industry enjoyed in the past. To take advantage of this opportunity system architecture and design needs to utilize the new possibilities that 3D integration provides.
3D集成提供了一种满足当前高性能微处理器趋势要求的技术。它提供了延续该行业过去所享有的业绩趋势的机会。为了利用这一机会,系统架构和设计需要利用3D集成提供的新可能性。
{"title":"Why should we do 3D integration?","authors":"W. Haensch","doi":"10.1145/1391469.1391643","DOIUrl":"https://doi.org/10.1145/1391469.1391643","url":null,"abstract":"3D integration offers a technology that meets the requirements of the current trend in high performance microprocessors. It offers the opportunity to continue the performance trends the industry enjoyed in the past. To take advantage of this opportunity system architecture and design needs to utilize the new possibilities that 3D integration provides.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129842814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Exploring locking & partitioning for predictable shared caches on multi-cores 探索多核可预测共享缓存的锁定和分区
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391545
Vivy Suhendra, T. Mitra
Multi-core architectures consisting of multiple processing cores on a chip have become increasingly prevalent. Synthesizing hard realtime applications onto these platforms is quite challenging, as the contention among the cores for various shared resources leads to inherent timing unpredictability. This paper proposes the use of shared cache in a predictable manner through a combination of locking and partitioning mechanisms. We explore possible design choices and evaluate their effects on the worst-case application performance. Our study reveals certain design principles that strongly dictate the performance of a predictable memory hierarchy.
由一个芯片上的多个处理核心组成的多核架构已经变得越来越普遍。将硬实时应用程序合成到这些平台上是相当具有挑战性的,因为内核之间对各种共享资源的争用会导致固有的时间不可预测性。本文建议通过锁和分区机制的结合,以一种可预测的方式使用共享缓存。我们探索可能的设计选择,并评估它们对最坏情况下应用程序性能的影响。我们的研究揭示了某些设计原则,这些原则强烈地决定了可预测的内存层次结构的性能。
{"title":"Exploring locking & partitioning for predictable shared caches on multi-cores","authors":"Vivy Suhendra, T. Mitra","doi":"10.1145/1391469.1391545","DOIUrl":"https://doi.org/10.1145/1391469.1391545","url":null,"abstract":"Multi-core architectures consisting of multiple processing cores on a chip have become increasingly prevalent. Synthesizing hard realtime applications onto these platforms is quite challenging, as the contention among the cores for various shared resources leads to inherent timing unpredictability. This paper proposes the use of shared cache in a predictable manner through a combination of locking and partitioning mechanisms. We explore possible design choices and evaluate their effects on the worst-case application performance. Our study reveals certain design principles that strongly dictate the performance of a predictable memory hierarchy.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"163 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121293692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 175
Stochastic modeling of a thermally-managed multi-core system 热管理多核系统的随机建模
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391657
Hwisung Jung, Peng Rong, Massoud Pedram
Achieving high performance under a peak temperature limit is a first-order concern for VLSI designers. This paper presents a new abstract model of a thermally-managed system, where a stochastic process model is employed to capture the system performance and thermal behavior. We formulate the problem of dynamic thermal management (DTM) as the problem of minimizing the energy cost of the system for a given level of performance under a peak temperature constraint by using a controllable Markovian decision process (MDP) model. The key rationale for utilizing MDP for solving the DTM problem is to manage the stochastic behavior of the temperature states of the system under online re-configuration of its micro-architecture and/or dynamic voltage-frequency scaling. Experimental results demonstrate the effectiveness of the modeling framework and the proposed DTM technique.
在峰值温度限制下实现高性能是VLSI设计人员最关心的问题。本文提出了一种新的热管理系统的抽象模型,其中采用随机过程模型来捕捉系统的性能和热行为。本文利用可控马尔可夫决策过程(MDP)模型,将动态热管理(DTM)问题表述为在峰值温度约束下,在给定性能水平下,系统能量成本最小化的问题。利用MDP解决DTM问题的关键原理是管理系统在微结构在线重新配置和/或动态电压频率缩放下温度状态的随机行为。实验结果验证了该建模框架和DTM技术的有效性。
{"title":"Stochastic modeling of a thermally-managed multi-core system","authors":"Hwisung Jung, Peng Rong, Massoud Pedram","doi":"10.1145/1391469.1391657","DOIUrl":"https://doi.org/10.1145/1391469.1391657","url":null,"abstract":"Achieving high performance under a peak temperature limit is a first-order concern for VLSI designers. This paper presents a new abstract model of a thermally-managed system, where a stochastic process model is employed to capture the system performance and thermal behavior. We formulate the problem of dynamic thermal management (DTM) as the problem of minimizing the energy cost of the system for a given level of performance under a peak temperature constraint by using a controllable Markovian decision process (MDP) model. The key rationale for utilizing MDP for solving the DTM problem is to manage the stochastic behavior of the temperature states of the system under online re-configuration of its micro-architecture and/or dynamic voltage-frequency scaling. Experimental results demonstrate the effectiveness of the modeling framework and the proposed DTM technique.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"317 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116364670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Signature based Boolean matching in the presence of don’t cares 基于签名的布尔匹配在不关心存在
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391635
A. Abdollahi
Boolean matching is to determine whether two functions are equivalent under input permutation and input/output phase assignment. This paper will address the Boolean Matching problem for incompletely specified functions. Signatures have previously been used for Boolean matching of completely specified functions. In this paper for the first time we use signatures to determine the equivalency of incompletely specified functions.
布尔匹配是判断两个函数在输入置换和输入/输出相位分配下是否等价。本文将讨论不完全指定函数的布尔匹配问题。签名以前被用于完全指定函数的布尔匹配。本文首次利用签名来确定不完全指定函数的等价性。
{"title":"Signature based Boolean matching in the presence of don’t cares","authors":"A. Abdollahi","doi":"10.1145/1391469.1391635","DOIUrl":"https://doi.org/10.1145/1391469.1391635","url":null,"abstract":"Boolean matching is to determine whether two functions are equivalent under input permutation and input/output phase assignment. This paper will address the Boolean Matching problem for incompletely specified functions. Signatures have previously been used for Boolean matching of completely specified functions. In this paper for the first time we use signatures to determine the equivalency of incompletely specified functions.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116502608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Challenges in gate level modeling for delay and SI at 65nm and below 65nm及以下的延迟和SI栅极级建模的挑战
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391590
Igor Keller, K. Tam, Vinod Kariat
In this paper we review the prior art and recent advances in the area of standard cell modeling for delay and noise analyses, suggest a taxonomy of different cell models, and discuss their strengths and weaknesses. We also discuss challenges in cell modeling for delay and noise analyses arising in new submicron process nodes.
在本文中,我们回顾了延迟和噪声分析标准细胞模型领域的现有技术和最新进展,提出了不同细胞模型的分类,并讨论了它们的优缺点。我们还讨论了在新的亚微米工艺节点中产生的延迟和噪声分析的细胞建模中的挑战。
{"title":"Challenges in gate level modeling for delay and SI at 65nm and below","authors":"Igor Keller, K. Tam, Vinod Kariat","doi":"10.1145/1391469.1391590","DOIUrl":"https://doi.org/10.1145/1391469.1391590","url":null,"abstract":"In this paper we review the prior art and recent advances in the area of standard cell modeling for delay and noise analyses, suggest a taxonomy of different cell models, and discuss their strengths and weaknesses. We also discuss challenges in cell modeling for delay and noise analyses arising in new submicron process nodes.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"361 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121642834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
SHIELD: A software hardware design methodology for security and reliability of MPSoCs SHIELD:一种用于mpsoc安全性和可靠性的软硬件设计方法
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391686
K. Patel, S. Parameswaran
Security of MPSoCs is an emerging area of concern in embedded systems. Security is jeopardized by code injection attacks, which are the most common types of software attacks. Previous attempts to detect code injection in MPSoCs have been burdened with significant performance overheads. In this work, we present a hardware/software methodology "SHIELD" to detect code injection attacks in MPSoCs. SHIELD instruments the software programs running on application processors in the MPSoC and also extracts control flow and basic block execution time information for runtime checking. We employ a dedicated security processor (monitor processor) to supervise the application processors on the MPSoC. Custom hardware is designed and used in the monitor and application processors. The monitor processor uses the custom hardware to rapidly analyze information communicated to it from the application processors at runtime. We have implemented SHIELD on a commercial extensible processor (Xtensa LX2) and tested it on a multiprocessor JPEG encoder program. In addition to code injection attacks, the system is also able to detect 83% of bit flips errors in the control flow instructions. The experiments show that SHIELD produces systems with runtime which is at least 9 times faster than the previous solution. SHIELD incurs a runtime (clock cycles) performance overhead of only 6.6% and an area overhead of 26.9%, when compared to a non-secure system.
mpsoc的安全性是嵌入式系统关注的一个新兴领域。代码注入攻击是最常见的软件攻击类型,它会危及安全性。以前在mpsoc中检测代码注入的尝试已经负担了显著的性能开销。在这项工作中,我们提出了一种硬件/软件方法“SHIELD”来检测mpsoc中的代码注入攻击。SHIELD测量MPSoC中应用处理器上运行的软件程序,并提取控制流和基本块执行时间信息,用于运行时检查。我们采用专用的安全处理器(监控处理器)来监督MPSoC上的应用处理器。定制硬件是在监视器和应用处理器中设计和使用的。监视器处理器使用自定义硬件在运行时快速分析从应用程序处理器传递给它的信息。我们已经在一个商业可扩展处理器(Xtensa LX2)上实现了SHIELD,并在一个多处理器JPEG编码器程序上进行了测试。除了代码注入攻击外,该系统还能够检测到控制流指令中83%的位翻转错误。实验表明,SHIELD产生的系统运行时间比以前的解决方案至少快9倍。与非安全系统相比,SHIELD的运行时(时钟周期)性能开销仅为6.6%,面积开销为26.9%。
{"title":"SHIELD: A software hardware design methodology for security and reliability of MPSoCs","authors":"K. Patel, S. Parameswaran","doi":"10.1145/1391469.1391686","DOIUrl":"https://doi.org/10.1145/1391469.1391686","url":null,"abstract":"Security of MPSoCs is an emerging area of concern in embedded systems. Security is jeopardized by code injection attacks, which are the most common types of software attacks. Previous attempts to detect code injection in MPSoCs have been burdened with significant performance overheads. In this work, we present a hardware/software methodology \"SHIELD\" to detect code injection attacks in MPSoCs. SHIELD instruments the software programs running on application processors in the MPSoC and also extracts control flow and basic block execution time information for runtime checking. We employ a dedicated security processor (monitor processor) to supervise the application processors on the MPSoC. Custom hardware is designed and used in the monitor and application processors. The monitor processor uses the custom hardware to rapidly analyze information communicated to it from the application processors at runtime. We have implemented SHIELD on a commercial extensible processor (Xtensa LX2) and tested it on a multiprocessor JPEG encoder program. In addition to code injection attacks, the system is also able to detect 83% of bit flips errors in the control flow instructions. The experiments show that SHIELD produces systems with runtime which is at least 9 times faster than the previous solution. SHIELD incurs a runtime (clock cycles) performance overhead of only 6.6% and an area overhead of 26.9%, when compared to a non-secure system.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128141197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
A practical reconfigurable hardware accelerator for boolean satisfiability solvers 一个实用的可重构硬件加速器,用于布尔可满足性求解
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391669
John D. Davis, Zhangxi Tan, Fang Yu, Lintao Zhang
We present a practical FPGA-based accelerator for solving Boolean Satisfiability problems (SAT). Unlike previous efforts for hardware accelerated SAT solving, our design focuses on accelerating the most time consuming part of the SAT solver - Boolean Constraint Propagation (BCP), leaving the choices of heuristics such as branching order, restarting policy, and learning and backtracking to software. Our novel approach uses an application-specific architecture instead of an instance-specific one to avoid time-consuming FPGA synthesis for each SAT instance. By avoiding global signal wires and carefully pipelining the design, our BCP accelerator is able to achieve much higher clock frequency than that of previous work. In addition, it can load SAT instances in milliseconds, can handle SAT instances with tens of thousands of variables and clauses using a single FPGA, and can easily scale to handle more clauses by using multiple FPGAs. Our evaluation on a cycle-accurate simulator shows that the FPGA co-processor can achieve 3.7-38.6x speedup on BCP compared to state-of-the-art software SAT solvers.
我们提出了一个实用的基于fpga的求解布尔可满足性问题(SAT)的加速器。与之前硬件加速SAT求解的努力不同,我们的设计侧重于加速SAT求解器中最耗时的部分——布尔约束传播(BCP),将分支顺序、重新启动策略、学习和回溯等启发式选择留给软件。我们的新方法使用特定于应用程序的架构而不是特定于实例的架构,以避免为每个SAT实例进行耗时的FPGA合成。通过避免全局信号线和精心的流水线设计,我们的BCP加速器可以获得比以前工作更高的时钟频率。此外,它可以在毫秒内加载SAT实例,可以使用单个FPGA处理具有数万个变量和子句的SAT实例,并且可以通过使用多个FPGA轻松扩展以处理更多子句。我们在周期精确模拟器上的评估表明,与最先进的软件SAT求解器相比,FPGA协处理器在BCP上可以实现3.7-38.6倍的加速。
{"title":"A practical reconfigurable hardware accelerator for boolean satisfiability solvers","authors":"John D. Davis, Zhangxi Tan, Fang Yu, Lintao Zhang","doi":"10.1145/1391469.1391669","DOIUrl":"https://doi.org/10.1145/1391469.1391669","url":null,"abstract":"We present a practical FPGA-based accelerator for solving Boolean Satisfiability problems (SAT). Unlike previous efforts for hardware accelerated SAT solving, our design focuses on accelerating the most time consuming part of the SAT solver - Boolean Constraint Propagation (BCP), leaving the choices of heuristics such as branching order, restarting policy, and learning and backtracking to software. Our novel approach uses an application-specific architecture instead of an instance-specific one to avoid time-consuming FPGA synthesis for each SAT instance. By avoiding global signal wires and carefully pipelining the design, our BCP accelerator is able to achieve much higher clock frequency than that of previous work. In addition, it can load SAT instances in milliseconds, can handle SAT instances with tens of thousands of variables and clauses using a single FPGA, and can easily scale to handle more clauses by using multiple FPGAs. Our evaluation on a cycle-accurate simulator shows that the FPGA co-processor can achieve 3.7-38.6x speedup on BCP compared to state-of-the-art software SAT solvers.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"81 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128147189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Automated design of self-adjusting pipelines 自调节管道自动化设计
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391523
Jieyi Long, S. Memik
We propose a self-adjusting pipeline structure to enhance chip performance and robustness considering the effects of process variations. We achieve this by introducing delay sensors to monitor internal timing violations within a pipeline stage and variable clock skew buffers to adjust the timing of the pipeline stage based on the feedback from the delay sensors. Furthermore, we formulate the delay sensor insertion and variable clock skew configuration problem as a stochastic mixed-integer programming problem and propose a simulated-annealing based algorithm to solve it. A comparison between the designs with and without the self-adjusting enhancement reveals that, we are able to improve the average performance of a batch of chips by 9.5%.
考虑到工艺变化的影响,我们提出了一种自调节管道结构来提高芯片性能和鲁棒性。我们通过引入延迟传感器来监控管道阶段的内部时间违规和可变时钟倾斜缓冲区来根据延迟传感器的反馈调整管道阶段的时间来实现这一点。此外,我们将延迟传感器插入和可变时钟偏差配置问题化为一个随机混合整数规划问题,并提出了一种基于模拟退火的算法来求解该问题。通过对采用和不采用自调节增强的设计进行比较,我们可以将一批芯片的平均性能提高9.5%。
{"title":"Automated design of self-adjusting pipelines","authors":"Jieyi Long, S. Memik","doi":"10.1145/1391469.1391523","DOIUrl":"https://doi.org/10.1145/1391469.1391523","url":null,"abstract":"We propose a self-adjusting pipeline structure to enhance chip performance and robustness considering the effects of process variations. We achieve this by introducing delay sensors to monitor internal timing violations within a pipeline stage and variable clock skew buffers to adjust the timing of the pipeline stage based on the feedback from the delay sensors. Furthermore, we formulate the delay sensor insertion and variable clock skew configuration problem as a stochastic mixed-integer programming problem and propose a simulated-annealing based algorithm to solve it. A comparison between the designs with and without the self-adjusting enhancement reveals that, we are able to improve the average performance of a batch of chips by 9.5%.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125595186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Run-time instruction set selection in a transmutable embedded processor 可变嵌入式处理器中的运行时指令集选择
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391486
L. Bauer, M. Shafique, J. Henkel
We are presenting a new concept of an application-specific processor that is capable of transmuting its instruction set according to non-predictive application behavior during run-time. In those scenarios, current (extensible) embedded processors are less efficient since they are not run-time adaptive. We have identified the instruction set selection to be a critical step to perform at run time and hence we focus this paper on that crucial part. Our paradigm conducts as many steps as possible at compile/design time and as little as necessary at run time with the constraint to provide a sufficient flexibility to react to non-predictive application behavior efficiently We provide an in-depth analysis of our scheme and achieve a speed-up of up to 7.19times (average: 3.63times) compared to state-of-the-art adaptive approaches (like [19]). As an application, we have employed a whole H.264 video encoder though our scheme is by principle applicable to many other embedded applications. Our results are evaluated by an implementation of the instruction set selection for our transmutable processor on an FPGA platform.
我们提出了一个特定于应用程序的处理器的新概念,该处理器能够在运行时根据非预测性应用程序行为转换其指令集。在这些场景中,当前的(可扩展的)嵌入式处理器效率较低,因为它们不是运行时自适应的。我们已经确定指令集选择是在运行时执行的关键步骤,因此我们将本文的重点放在这个关键部分上。我们的范式在编译/设计时执行尽可能多的步骤,在运行时尽可能少地执行约束,以提供足够的灵活性来有效地响应非预测性应用程序行为。我们对我们的方案进行了深入分析,与最先进的自适应方法(如[19])相比,实现了高达7.19倍(平均:3.63倍)的加速。作为一个应用,我们使用了一个完整的H.264视频编码器,尽管我们的方案原则上适用于许多其他嵌入式应用。我们的结果通过在FPGA平台上实现我们的可变处理器的指令集选择来评估。
{"title":"Run-time instruction set selection in a transmutable embedded processor","authors":"L. Bauer, M. Shafique, J. Henkel","doi":"10.1145/1391469.1391486","DOIUrl":"https://doi.org/10.1145/1391469.1391486","url":null,"abstract":"We are presenting a new concept of an application-specific processor that is capable of transmuting its instruction set according to non-predictive application behavior during run-time. In those scenarios, current (extensible) embedded processors are less efficient since they are not run-time adaptive. We have identified the instruction set selection to be a critical step to perform at run time and hence we focus this paper on that crucial part. Our paradigm conducts as many steps as possible at compile/design time and as little as necessary at run time with the constraint to provide a sufficient flexibility to react to non-predictive application behavior efficiently We provide an in-depth analysis of our scheme and achieve a speed-up of up to 7.19times (average: 3.63times) compared to state-of-the-art adaptive approaches (like [19]). As an application, we have employed a whole H.264 video encoder though our scheme is by principle applicable to many other embedded applications. Our results are evaluated by an implementation of the instruction set selection for our transmutable processor on an FPGA platform.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130230970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
期刊
2008 45th ACM/IEEE Design Automation Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1