首页 > 最新文献

Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors最新文献

英文 中文
Checking equivalence for circuits containing incompletely specified boxes 检查包含不完全指定盒子的电路的等效性
Christoph Scholl, B. Becker
We consider the problem of checking whether an implementation which contains parts with incomplete information is equivalent to a given full specification. We study implementations which are not completely specified, but contain boxes which are associated with incompletely specified functions (called Incompletely Specified Boxes or IS-Boxes). After motivating the use of implementations with Incompletely Specified Boxes we define our notion of equivalence for this kind of implementations and present a method to solve the problem. A series of experimental results demonstrates the effectiveness and feasibility of the methods presented.
我们考虑检查一个包含不完整信息的部分的实现是否等同于给定的完整规范的问题。我们研究的实现不是完全指定的,但包含与不完全指定的功能相关的框(称为不完全指定框或is - box)。在激励使用不完全指定框实现之后,我们定义了这类实现的等价概念,并提出了一种解决问题的方法。一系列的实验结果证明了所提方法的有效性和可行性。
{"title":"Checking equivalence for circuits containing incompletely specified boxes","authors":"Christoph Scholl, B. Becker","doi":"10.1109/ICCD.2002.1106748","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106748","url":null,"abstract":"We consider the problem of checking whether an implementation which contains parts with incomplete information is equivalent to a given full specification. We study implementations which are not completely specified, but contain boxes which are associated with incompletely specified functions (called Incompletely Specified Boxes or IS-Boxes). After motivating the use of implementations with Incompletely Specified Boxes we define our notion of equivalence for this kind of implementations and present a method to solve the problem. A series of experimental results demonstrates the effectiveness and feasibility of the methods presented.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116092083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Improving the efficiency of circuit-to-BDD conversion by gate and input ordering 通过栅极和输入顺序提高电路到bdd转换的效率
F. Aloul, I. Markov, K. Sakallah
Boolean functions are fundamental to synthesis and verification of digital logic, and compact representations of Boolean functions have great practical significance. Popular representations, such as CNF, DNF, circuits and ROBDDs [4], offer different advantages and are preferred for different tasks. Conversion between those representations is common, especially when one is used to represent the input and another speeds up relevant algorithms. Our work addresses the construction of ROBDDs that represent outputs of a given Boolean circuit. It is used in synthesis and verification. Earlier works (Fujita, Fujisawa, and Kawato, 1988. Malik et al., 1988.) proposed ordering circuit inputs and gates by graph traversals. We contribute orderings based on circuit partitioning and placement, leveraging the progress in recursive bisection and multi-level min-cut partitioning achieved in late 1990s. Our empirical results show that the proposed orderings based on circuit partitioning and placement are more successful than straightforward DFS and BFS, as well as related heuristics.
布尔函数是数字逻辑综合与验证的基础,布尔函数的紧凑表示具有重要的现实意义。流行的表示形式,如CNF、DNF、电路和robdd[4],提供了不同的优势,适用于不同的任务。这些表示之间的转换是常见的,特别是当一种表示输入而另一种表示加速相关算法时。我们的工作解决了表示给定布尔电路输出的robdd的构造。它用于合成和验证。早期作品(藤田、藤泽和川藤,1988年)。Malik等人,1988)提出通过图遍历排序电路输入和门。我们利用20世纪90年代末在递归对分和多层次最小切割划分方面取得的进展,贡献了基于电路划分和放置的排序。我们的实证结果表明,所提出的基于电路划分和放置的排序比直接的DFS和BFS以及相关的启发式方法更成功。
{"title":"Improving the efficiency of circuit-to-BDD conversion by gate and input ordering","authors":"F. Aloul, I. Markov, K. Sakallah","doi":"10.1109/ICCD.2002.1106749","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106749","url":null,"abstract":"Boolean functions are fundamental to synthesis and verification of digital logic, and compact representations of Boolean functions have great practical significance. Popular representations, such as CNF, DNF, circuits and ROBDDs [4], offer different advantages and are preferred for different tasks. Conversion between those representations is common, especially when one is used to represent the input and another speeds up relevant algorithms. Our work addresses the construction of ROBDDs that represent outputs of a given Boolean circuit. It is used in synthesis and verification. Earlier works (Fujita, Fujisawa, and Kawato, 1988. Malik et al., 1988.) proposed ordering circuit inputs and gates by graph traversals. We contribute orderings based on circuit partitioning and placement, leveraging the progress in recursive bisection and multi-level min-cut partitioning achieved in late 1990s. Our empirical results show that the proposed orderings based on circuit partitioning and placement are more successful than straightforward DFS and BFS, as well as related heuristics.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":" 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120828846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Trace Cache performance parameters 跟踪缓存性能参数
A. Hossain, D. Pease, James S. Burns, N. Parveen
Instruction fetch mechanism is a performance bottleneck of a Superscalar Processor. The fetch performance of the processor can be improved with the aid of an instruction memory structure known as Trace Cache. This paper presents parameters and analytical expressions, which describe instruction fetch performance of a Trace Cache microarchitecture. The instruction fetch rates predicted by the expressions differ by seven percent from the simulated fetch rates for SPEC2000 benchmark programs. Presented analytical expressions are implemented in a computer program named Tulip. Tulip is used to explore parameters, and their influence on fetch performance. Tulip is also used to understand Trace Cache performance tradeoffs.
指令获取机制是超标量处理器的性能瓶颈。处理器的读取性能可以借助称为跟踪缓存的指令存储器结构来提高。给出了描述跟踪缓存微体系结构指令获取性能的参数和解析表达式。表达式预测的指令获取速率与SPEC2000基准程序的模拟获取速率相差7%。给出的解析表达式在Tulip计算机程序中实现。Tulip用于探索参数及其对抓取性能的影响。Tulip还用于理解跟踪缓存性能权衡。
{"title":"Trace Cache performance parameters","authors":"A. Hossain, D. Pease, James S. Burns, N. Parveen","doi":"10.1109/ICCD.2002.1106793","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106793","url":null,"abstract":"Instruction fetch mechanism is a performance bottleneck of a Superscalar Processor. The fetch performance of the processor can be improved with the aid of an instruction memory structure known as Trace Cache. This paper presents parameters and analytical expressions, which describe instruction fetch performance of a Trace Cache microarchitecture. The instruction fetch rates predicted by the expressions differ by seven percent from the simulated fetch rates for SPEC2000 benchmark programs. Presented analytical expressions are implemented in a computer program named Tulip. Tulip is used to explore parameters, and their influence on fetch performance. Tulip is also used to understand Trace Cache performance tradeoffs.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127217452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A system-level solution to domino synthesis with 2 GHz application 2 GHz应用程序的domino合成的系统级解决方案
B. Chappell, Xinning Wang, Priyadarsan Patra, Prashant Saxena, J. Vendrell, Satyanarayan Gupta, S. Varadarajan, W. Gomes, S. Hussain, H. Krishnamurthy, M. Venkateshmurthy, S. Jain
System structure and a taped out 0.18u 2 GHz product application result are described for a domino synthesis capability that covers all aspects of domino design, from estimation to silicon-ready layout, with custom-class optimization. The described optimization flow, abstraction modes, and key cost factors deliver power-optimized, noise-correct domino performance on complex logic.
描述了多米诺骨牌合成能力的系统结构和带出的0.18u 2 GHz产品应用结果,该能力涵盖了多米诺骨牌设计的所有方面,从估计到硅就绪布局,并具有定制级优化。所描述的优化流程、抽象模式和关键成本因素可在复杂逻辑上实现功耗优化、噪声校正的多米诺骨牌性能。
{"title":"A system-level solution to domino synthesis with 2 GHz application","authors":"B. Chappell, Xinning Wang, Priyadarsan Patra, Prashant Saxena, J. Vendrell, Satyanarayan Gupta, S. Varadarajan, W. Gomes, S. Hussain, H. Krishnamurthy, M. Venkateshmurthy, S. Jain","doi":"10.1109/ICCD.2002.1106765","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106765","url":null,"abstract":"System structure and a taped out 0.18u 2 GHz product application result are described for a domino synthesis capability that covers all aspects of domino design, from estimation to silicon-ready layout, with custom-class optimization. The described optimization flow, abstraction modes, and key cost factors deliver power-optimized, noise-correct domino performance on complex logic.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134071061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Analysis of blocking dynamic circuits 阻塞动态电路的分析
T. Thorp, D. Liu, P. Trivedi
In order for dynamic circuits to operate correctly, their inputs must be monotonically rising during evaluation. Blocking dynamic circuits satisfy this constraint by delaying evaluation until all inputs have been properly setup relative to the evaluation clock. By viewing dynamic gates as latches, we demonstrate that the optimal delay of a blocking dynamic gate may occur when the setup time is negative. With blocking dynamic circuits, cascading low-skew dynamic gates allows each dynamic gate to tolerate a degraded input level. The larger noise margin provides greater flexibility with the delay vs. noise margin trade-off (i.e. the circuit robustness vs. speed tradeoff). This paper generalizes blocking dynamic circuits and provides a systematic approach for assigning clock phases, given delay and noise margin constraints. Using this framework, one can analyze any logic network consisting of blocking dynamic circuits.
为了使动态电路正常工作,它们的输入必须在评估过程中单调上升。阻塞动态电路通过延迟评估来满足这一约束,直到所有输入都相对于评估时钟被正确设置。通过将动态门视为锁存器,我们证明了当设置时间为负时,阻塞动态门的最佳延迟可能发生。通过阻塞动态电路,级联低斜动态门允许每个动态门容忍降低的输入电平。较大的噪声裕度在延迟与噪声裕度权衡(即电路鲁棒性与速度权衡)方面提供了更大的灵活性。本文概括了阻塞动态电路,并在给定延迟和噪声裕度约束的情况下,提供了分配时钟相位的系统方法。使用该框架,可以分析任何由阻塞动态电路组成的逻辑网络。
{"title":"Analysis of blocking dynamic circuits","authors":"T. Thorp, D. Liu, P. Trivedi","doi":"10.1109/ICCD.2002.1106758","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106758","url":null,"abstract":"In order for dynamic circuits to operate correctly, their inputs must be monotonically rising during evaluation. Blocking dynamic circuits satisfy this constraint by delaying evaluation until all inputs have been properly setup relative to the evaluation clock. By viewing dynamic gates as latches, we demonstrate that the optimal delay of a blocking dynamic gate may occur when the setup time is negative. With blocking dynamic circuits, cascading low-skew dynamic gates allows each dynamic gate to tolerate a degraded input level. The larger noise margin provides greater flexibility with the delay vs. noise margin trade-off (i.e. the circuit robustness vs. speed tradeoff). This paper generalizes blocking dynamic circuits and provides a systematic approach for assigning clock phases, given delay and noise margin constraints. Using this framework, one can analyze any logic network consisting of blocking dynamic circuits.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128644315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Branch behavior of a commercial OLTP workload on Intel IA32 processors Intel IA32处理器上商业OLTP工作负载的分支行为
M. Annavaram, T. Diep, John Paul Shen
This paper presents a detailed branch characterization of an Oracle based commercial on-line transaction processing workload, Oracle Database Benchmark (ODB), running on an IA32 processor. We ran a well-tuned ODB on Simics, a full system simulator, to collect the instruction traces used in this study. We compare the branch behavior of ODB with the branch behaviors of gcc, gzip and mcf from the SPECINT 2000 benchmark suite. Contrary to the popular belief that databases have unpredictable branches, we show that using larger predictors that capture enough branch history information, and using branch prediction schemes that reduce aliasing, conditional branches in ODB are more predictable than in gcc, gzip and mcf Due to frequent context switching in ODB, a hardware return address stack is ineffective in predicting return addresses for ODB. Based on further analysis, we propose and evaluate an enhanced return address predictor, which reduces return address mispredictions in ODB by 40%.
本文介绍了在IA32处理器上运行的基于Oracle的商业在线事务处理工作负载Oracle Database Benchmark (ODB)的详细分支特征。我们在Simics(一个完整的系统模拟器)上运行了一个调优的ODB,以收集本研究中使用的指令跟踪。我们将ODB的分支行为与SPECINT 2000基准套件中的gcc、gzip和mcf的分支行为进行了比较。与普遍认为数据库具有不可预测的分支的观点相反,我们展示了使用更大的预测器来捕获足够的分支历史信息,并使用减少别名的分支预测方案,ODB中的条件分支比gcc、gzip和mcf中的条件分支更具可预测性。由于ODB中频繁的上下文切换,硬件返回地址堆栈在预测ODB的返回地址方面是无效的。在进一步分析的基础上,我们提出并评估了一个增强的返回地址预测器,它将ODB中的返回地址错误预测减少了40%。
{"title":"Branch behavior of a commercial OLTP workload on Intel IA32 processors","authors":"M. Annavaram, T. Diep, John Paul Shen","doi":"10.1109/ICCD.2002.1106777","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106777","url":null,"abstract":"This paper presents a detailed branch characterization of an Oracle based commercial on-line transaction processing workload, Oracle Database Benchmark (ODB), running on an IA32 processor. We ran a well-tuned ODB on Simics, a full system simulator, to collect the instruction traces used in this study. We compare the branch behavior of ODB with the branch behaviors of gcc, gzip and mcf from the SPECINT 2000 benchmark suite. Contrary to the popular belief that databases have unpredictable branches, we show that using larger predictors that capture enough branch history information, and using branch prediction schemes that reduce aliasing, conditional branches in ODB are more predictable than in gcc, gzip and mcf Due to frequent context switching in ODB, a hardware return address stack is ineffective in predicting return addresses for ODB. Based on further analysis, we propose and evaluate an enhanced return address predictor, which reduces return address mispredictions in ODB by 40%.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116897515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
JMA: the Java-multithreading architecture for embedded processors JMA:用于嵌入式处理器的java多线程体系结构
Panit Watcharawitch, S. Moore
Embedded processors are increasingly deployed in applications requiring high performance with good real-time characteristics whilst being low power. Parallelism has to be extracted in order to improve the performance at an architectural level. Extracting instruction level parallelism requires extensive speculation which adds complexity and increases power consumption. Alternatively, parallelism can be provided at the thread level. Many embedded applications can be written in a threaded manner in Java which can be directly translated to use hardware-level multithreaded operations. This paper presents an architectural study of JMA, a high-performance multithreaded architecture which supports Java-multithreading and realtime scheduling whilst remaining low-power.
嵌入式处理器越来越多地部署在需要高性能、实时性好、低功耗的应用中。为了在架构级别上提高性能,必须提取并行性。提取指令级并行性需要大量的推测,这增加了复杂性并增加了功耗。或者,可以在线程级别提供并行性。许多嵌入式应用程序可以在Java中以线程方式编写,可以直接转换为使用硬件级多线程操作。本文介绍了JMA的体系结构研究,JMA是一种高性能多线程体系结构,支持java多线程和实时调度,同时保持低功耗。
{"title":"JMA: the Java-multithreading architecture for embedded processors","authors":"Panit Watcharawitch, S. Moore","doi":"10.1109/ICCD.2002.1106824","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106824","url":null,"abstract":"Embedded processors are increasingly deployed in applications requiring high performance with good real-time characteristics whilst being low power. Parallelism has to be extracted in order to improve the performance at an architectural level. Extracting instruction level parallelism requires extensive speculation which adds complexity and increases power consumption. Alternatively, parallelism can be provided at the thread level. Many embedded applications can be written in a threaded manner in Java which can be directly translated to use hardware-level multithreaded operations. This paper presents an architectural study of JMA, a high-performance multithreaded architecture which supports Java-multithreading and realtime scheduling whilst remaining low-power.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":" 48","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113952600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A standard-cell placement tool for designs with high row utilization 用于高行利用率设计的标准单元放置工具
Xiaojian Yang, Bo-Kyung Choi, M. Sarrafzadeh
In this paper we study the correlation between wirelength and routability for standard-cell placement problem, under the modern place-and-route environment. We present a placement tool named Dragon (version 2.1), and show its ability to produce good quality placement for designs with high row utilization. Compared to an industrial placer and an academic state-of-the-art placer, Dragon can produce placement with better routability and shorter total wirelength. We describe many novel algorithmic details and implementation details of this placement tool. Experimental results show that minimizing wirelength improves routability and layout quality.
本文研究了在现代布放路由环境下标准小区布放问题的无线长度与可达性之间的关系。我们介绍了一个名为Dragon的放置工具(版本2.1),并展示了它能够为具有高行利用率的设计产生高质量的放置。与工业砂矿和最先进的学术砂矿相比,Dragon可以提供更好的可达性和更短的总无线长度。我们描述了许多新颖的算法细节和实现的细节,这个放置工具。实验结果表明,减小布线长度可以提高可达性和布局质量。
{"title":"A standard-cell placement tool for designs with high row utilization","authors":"Xiaojian Yang, Bo-Kyung Choi, M. Sarrafzadeh","doi":"10.1109/ICCD.2002.1106746","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106746","url":null,"abstract":"In this paper we study the correlation between wirelength and routability for standard-cell placement problem, under the modern place-and-route environment. We present a placement tool named Dragon (version 2.1), and show its ability to produce good quality placement for designs with high row utilization. Compared to an industrial placer and an academic state-of-the-art placer, Dragon can produce placement with better routability and shorter total wirelength. We describe many novel algorithmic details and implementation details of this placement tool. Experimental results show that minimizing wirelength improves routability and layout quality.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117098726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Methodologies and tools for pipelined on-chip interconnect 流水线片上互连的方法和工具
L. Scheffer
As processes shrink, gate delay improves much faster than the delay in long wires. Therefore, the long wires increasingly determine the maximum clock rate, and hence performance, of more and more chips. One solution to this problem is to pipeline the global interconnect, enabling the whole chip to run at the speed of local operations. While known to work well, this optimization is seldom used because of practical difficulties - it is hard to change the RTL, test vectors become invalid, and it's hard to prove correctness of any changes. Here we look at some ways these difficulties could be overcome.
随着过程的缩小,门延迟的改善要比长导线中的延迟快得多。因此,长导线越来越多地决定了最大时钟速率,从而决定了越来越多的芯片的性能。这个问题的一个解决方案是通过管道实现全球互连,使整个芯片能够以本地操作的速度运行。虽然已知这种优化可以很好地工作,但由于实际困难,很少使用这种优化—很难更改RTL,测试向量变得无效,并且很难证明任何更改的正确性。下面我们来看看克服这些困难的一些方法。
{"title":"Methodologies and tools for pipelined on-chip interconnect","authors":"L. Scheffer","doi":"10.1109/ICCD.2002.1106763","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106763","url":null,"abstract":"As processes shrink, gate delay improves much faster than the delay in long wires. Therefore, the long wires increasingly determine the maximum clock rate, and hence performance, of more and more chips. One solution to this problem is to pipeline the global interconnect, enabling the whole chip to run at the speed of local operations. While known to work well, this optimization is seldom used because of practical difficulties - it is hard to change the RTL, test vectors become invalid, and it's hard to prove correctness of any changes. Here we look at some ways these difficulties could be overcome.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"195 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116105560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Power-constrained microprocessor design 功耗受限的微处理器设计
H. P. Hofstee
Power dissipation and power density have become first-order design constraints, even for high-performance systems. For future designs it will be the dominant constraint. In this paper we suggest a systematic approach to optimizing a processor design under (only) a power constraint. The approach uses the energy-performance ratio (EPR) of the various design parameters as the key to identifying opportunities for improving energy-efficiency.
功耗和功率密度已成为一阶设计限制,即使对于高性能系统也是如此。对于未来的设计,这将是主要的限制。在本文中,我们提出了一个系统的方法来优化处理器设计下(仅)功率限制。该方法使用各种设计参数的能量性能比(EPR)作为识别提高能源效率机会的关键。
{"title":"Power-constrained microprocessor design","authors":"H. P. Hofstee","doi":"10.1109/ICCD.2002.1106740","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106740","url":null,"abstract":"Power dissipation and power density have become first-order design constraints, even for high-performance systems. For future designs it will be the dominant constraint. In this paper we suggest a systematic approach to optimizing a processor design under (only) a power constraint. The approach uses the energy-performance ratio (EPR) of the various design parameters as the key to identifying opportunities for improving energy-efficiency.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132269001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
期刊
Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1