首页 > 最新文献

2008 IEEE International Conference on Computer Design最新文献

英文 中文
Leveraging speculative architectures for run-time program validation 利用推测性架构进行运行时程序验证
Pub Date : 2008-10-01 DOI: 10.1145/2512456
Juan Carlos Martínez Santos, Yunsi Fei
Program execution can be tampered by malicious attackers through exploiting software vulnerabilities. Changing the program behavior by compromising control data and decision data has become the most serious threat to computer systems security. Although several hardware approaches have been presented to validate program execution, they mostly suffer great hardware area or poor ambiguity handling. In this paper, we propose a new hardware-based approach by leveraging the existing speculative architectures for run-time program validation. The on-chip branch target buffer (BTB) is utilized as a cache of the legitimate control flow transfers stored in a secure memory region. In addition, the BTB is extended to store the correct program path information. At each indirect branch site, the BTB is used to validate the decision history of conditional branches before it, and more information about the future decision path is fetched to monitor the execution path at run-time. Implementation of this approach is transparent to the upper operating system and programs. Thus, it is applicable to legacy code. Due to good code locality of the executable programs and effectiveness of branch prediction, the frequency of run-time control flow validations against the secure off-chip memory is low. Our experimental results show a negligible performance penalty and small storage overhead with ambiguity reduced.
恶意攻击者可以利用软件漏洞篡改程序的执行。通过破坏控制数据和决策数据来改变程序行为已成为计算机系统安全的最严重威胁。虽然已经提出了几种硬件方法来验证程序的执行,但它们大多存在很大的硬件面积或较差的歧义处理。在本文中,我们提出了一种新的基于硬件的方法,利用现有的推测架构进行运行时程序验证。片上分支目标缓冲区(BTB)被用作存储在安全内存区域的合法控制流传输的缓存。此外,扩展了BTB以存储正确的程序路径信息。在每个间接分支站点,BTB用于验证条件分支之前的决策历史,并获取有关未来决策路径的更多信息,以便在运行时监视执行路径。这种方法的实现对上层操作系统和程序是透明的。因此,它适用于遗留代码。由于可执行程序的良好代码局部性和分支预测的有效性,针对安全片外存储器的运行时控制流验证的频率很低。我们的实验结果表明,在减少歧义的情况下,性能损失可以忽略不计,存储开销很小。
{"title":"Leveraging speculative architectures for run-time program validation","authors":"Juan Carlos Martínez Santos, Yunsi Fei","doi":"10.1145/2512456","DOIUrl":"https://doi.org/10.1145/2512456","url":null,"abstract":"Program execution can be tampered by malicious attackers through exploiting software vulnerabilities. Changing the program behavior by compromising control data and decision data has become the most serious threat to computer systems security. Although several hardware approaches have been presented to validate program execution, they mostly suffer great hardware area or poor ambiguity handling. In this paper, we propose a new hardware-based approach by leveraging the existing speculative architectures for run-time program validation. The on-chip branch target buffer (BTB) is utilized as a cache of the legitimate control flow transfers stored in a secure memory region. In addition, the BTB is extended to store the correct program path information. At each indirect branch site, the BTB is used to validate the decision history of conditional branches before it, and more information about the future decision path is fetched to monitor the execution path at run-time. Implementation of this approach is transparent to the upper operating system and programs. Thus, it is applicable to legacy code. Due to good code locality of the executable programs and effectiveness of branch prediction, the frequency of run-time control flow validations against the secure off-chip memory is low. Our experimental results show a negligible performance penalty and small storage overhead with ambiguity reduced.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129455481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Seamless sequence of software defined radio designs through hardware reconfigurability of FPGAs 通过fpga的硬件可重构性实现软件无线电设计的无缝序列
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751871
A. H. Gholamipour, E. Bozorgzadeh, L. Bao
Software Defined Radio (SDR) base stations can compensate for failures in disaster scenarios by assimilating different communication technologies. FPGAs play an important role in the platform of an SDR base station because of flexibility and DSP processing power that they deliver. The flexibility of FPGAs comes at the high cost of reconfiguration time overhead which can be a serious deterrence because of QoS requirements of real time traffic. In this paper we propose a solution to reduce reconfiguration time overhead at system-level where we are provided the configuration of each wireless system. Following that we step further and integrate our solution in to a floorplanner to generate placements for wireless systems which can systematically hide or reduce reconfiguration time overhead. Our experiments show the effectiveness of our approach.
软件定义无线电(SDR)基站可以通过吸收不同的通信技术来补偿灾难场景中的故障。fpga由于其灵活性和DSP处理能力在SDR基站平台中扮演着重要的角色。fpga的灵活性是以高昂的重新配置时间开销为代价的,由于实时流量的QoS要求,这可能是一个严重的阻碍。在本文中,我们提出了一个解决方案,以减少系统级的重新配置时间开销,其中我们提供了每个无线系统的配置。接下来,我们进一步将我们的解决方案集成到地板规划器中,以生成无线系统的放置位置,从而可以系统地隐藏或减少重新配置的时间开销。我们的实验证明了我们方法的有效性。
{"title":"Seamless sequence of software defined radio designs through hardware reconfigurability of FPGAs","authors":"A. H. Gholamipour, E. Bozorgzadeh, L. Bao","doi":"10.1109/ICCD.2008.4751871","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751871","url":null,"abstract":"Software Defined Radio (SDR) base stations can compensate for failures in disaster scenarios by assimilating different communication technologies. FPGAs play an important role in the platform of an SDR base station because of flexibility and DSP processing power that they deliver. The flexibility of FPGAs comes at the high cost of reconfiguration time overhead which can be a serious deterrence because of QoS requirements of real time traffic. In this paper we propose a solution to reduce reconfiguration time overhead at system-level where we are provided the configuration of each wireless system. Following that we step further and integrate our solution in to a floorplanner to generate placements for wireless systems which can systematically hide or reduce reconfiguration time overhead. Our experiments show the effectiveness of our approach.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128197281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
In-field NoC-based SoC testing with distributed test vector storage 现场基于noc的SoC测试与分布式测试向量存储
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751863
J. Lee, R. Mahapatra
The operational lifetimes of SoC and microprocessors face growing threats from technology scaling and increasing device temperature and power density. In-field (or on-line) testing of NoC-based SoC is an important technique in ensuring system integrity throughout this potentially shorter lifetime. Whether in-field testing is conducted concurrently with normal applications or executed in isolation, application intrusion must be minimized in order to maintain system availability. Specialized infrastructure IP have been proposed to manage on-line testing by scheduling tests and delivering test vectors to the various cores within the SoC from a centralized location. However, as the number of cores integrated into a single chip continues to increase, issuing test vectors from a centralized location is not a scalable solution. These increased distances that test vectors must travel have become a major concern for on-line testing because of its direct impact on application intrusion in terms of energy consumption, network load, and latency. In this paper, we apply a distributed storage technique to bound and minimize this distance, thereby minimizing network load, energy consumption, and test delivery latency across the entire network. Our experiments show that test delivery latency and energy consumption is reduced by approximately 90% for moderately sized NoC.
SoC和微处理器的运行寿命面临着技术扩展和器件温度和功率密度不断提高的威胁。基于noc的SoC的现场(或在线)测试是确保系统在可能较短的使用寿命内完整性的重要技术。无论现场测试是与正常应用程序并发进行还是单独执行,都必须将应用程序入侵最小化,以维护系统可用性。专门的基础设施IP已经提出,通过调度测试和从集中位置向SoC内的各种内核交付测试向量来管理在线测试。然而,随着集成到单个芯片中的核心数量不断增加,从集中位置发布测试向量并不是一个可扩展的解决方案。这些增加的测试向量必须移动的距离已经成为在线测试的主要关注点,因为它在能源消耗、网络负载和延迟方面对应用程序入侵有直接影响。在本文中,我们应用分布式存储技术来绑定和最小化这个距离,从而最小化整个网络的网络负载、能量消耗和测试交付延迟。我们的实验表明,对于中等大小的NoC,测试传递延迟和能耗降低了大约90%。
{"title":"In-field NoC-based SoC testing with distributed test vector storage","authors":"J. Lee, R. Mahapatra","doi":"10.1109/ICCD.2008.4751863","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751863","url":null,"abstract":"The operational lifetimes of SoC and microprocessors face growing threats from technology scaling and increasing device temperature and power density. In-field (or on-line) testing of NoC-based SoC is an important technique in ensuring system integrity throughout this potentially shorter lifetime. Whether in-field testing is conducted concurrently with normal applications or executed in isolation, application intrusion must be minimized in order to maintain system availability. Specialized infrastructure IP have been proposed to manage on-line testing by scheduling tests and delivering test vectors to the various cores within the SoC from a centralized location. However, as the number of cores integrated into a single chip continues to increase, issuing test vectors from a centralized location is not a scalable solution. These increased distances that test vectors must travel have become a major concern for on-line testing because of its direct impact on application intrusion in terms of energy consumption, network load, and latency. In this paper, we apply a distributed storage technique to bound and minimize this distance, thereby minimizing network load, energy consumption, and test delivery latency across the entire network. Our experiments show that test delivery latency and energy consumption is reduced by approximately 90% for moderately sized NoC.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124496628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Temperature-aware clock tree synthesis considering spatiotemporal hot spot correlations 考虑时空热点相关性的温度感知时钟树合成
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751848
Chunchen Liu, Junjie Su, Yiyu Shi
Temperature variation in microprocessors is a workload dependent problem. In such a design, the clock skew should be minimized with respect to temperature variation. Existing work has studied clock tree embedding perturbation considering time variant temperature variation. There is no existing method that can reduce skew variation. This paper develops an efficient yet effective simultaneous hotspot avoid embedding and thermal aware routing (TMST) method, where hotspot embedding avoid tree topology located in area with high temperature possibility and thermal aware routing reduce skew in tree path with more smooth temperature area. With a thermally tolerable tree structure, our method can reduce not only delay skew but also skew variation (skew violation range). Compared with existing temperature-aware clock tree method, our TMST solution reduces skew variation by 2X compared with the greedy-DME (GDME) method of Edahiro and existing thermal aware clock synthesis TACO and PECO. With the scale from 100 down to 1 temperature maps, our TMST also guarantees the smallest wire length overflow. TMST reduces the worst case skew up to 4X than PECO and 5X than TACO.
微处理器中的温度变化是一个与工作负载相关的问题。在这样的设计中,时钟的偏差应该与温度变化有关。已有的工作研究了考虑时变温度变化的时钟树嵌入微扰。目前还没有一种方法可以减少偏度变化。本文提出了一种高效且有效的同时嵌入热点和热感知路由(TMST)方法,其中热点嵌入避免了位于高温可能性区域的树状拓扑,热感知路由减少了温度区域更光滑的树状路径的偏移。该方法采用热容许树结构,不仅可以减少延迟偏斜,还可以减少偏斜变化(偏斜违反范围)。与现有的温度感知时钟树方法相比,我们的TMST解决方案比Edahiro的贪婪- dme (GDME)方法和现有的热感知时钟合成TACO和PECO减少了2X的偏度变化。从100到1温度图的比例,我们的TMST还保证最小的电线长度溢出。TMST减少了最坏情况下的倾斜,比PECO高4倍,比TACO高5倍。
{"title":"Temperature-aware clock tree synthesis considering spatiotemporal hot spot correlations","authors":"Chunchen Liu, Junjie Su, Yiyu Shi","doi":"10.1109/ICCD.2008.4751848","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751848","url":null,"abstract":"Temperature variation in microprocessors is a workload dependent problem. In such a design, the clock skew should be minimized with respect to temperature variation. Existing work has studied clock tree embedding perturbation considering time variant temperature variation. There is no existing method that can reduce skew variation. This paper develops an efficient yet effective simultaneous hotspot avoid embedding and thermal aware routing (TMST) method, where hotspot embedding avoid tree topology located in area with high temperature possibility and thermal aware routing reduce skew in tree path with more smooth temperature area. With a thermally tolerable tree structure, our method can reduce not only delay skew but also skew variation (skew violation range). Compared with existing temperature-aware clock tree method, our TMST solution reduces skew variation by 2X compared with the greedy-DME (GDME) method of Edahiro and existing thermal aware clock synthesis TACO and PECO. With the scale from 100 down to 1 temperature maps, our TMST also guarantees the smallest wire length overflow. TMST reduces the worst case skew up to 4X than PECO and 5X than TACO.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122443834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Applying speculation techniques to implement functional units 运用推测技术来实现功能单元
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751843
Alberto A. Del Barrio, M. Molina, J. Mendias, Esther Andres Perez, R. Hermida, F. Tirado
This paper justifies the use of estimation and prediction of carries to increase the performance of functional units built with the replication of full adders while keeping a low area penalization. Adders and multipliers are the most representative modules in this group of functional units. The use of these design techniques allows the implementation of modules with performance improvements ranging from 20% to 50% with only an area overheads around 5%. These functional units are suitable for asynchronous circuits but they could also be introduced in synchronous circuits with speculative techniques. The basic idea consists in estimating the carry out from some parts of the functional units, allowing every part to operate independently and in parallel. These modules are connected to build bigger ones. Results from simulations show that for some applications it is possible to make predictions even more accurate that the bit-based estimation. Predictions have also the advantage they can be introduced in the multipliers design, whether estimators cannot. These predictions are similar to the ones used in the branch prediction in a processor.
本文证明了利用进位的估计和预测来提高由满加法器复制构建的功能单元的性能,同时保持低面积惩罚。加法器和乘法器是这组功能单元中最具代表性的模块。使用这些设计技术,模块的性能提升幅度在20%到50%之间,而面积开销仅为5%左右。这些功能单元适用于异步电路,但它们也可以通过推测技术引入同步电路。其基本思想在于估计功能单元的某些部分的执行情况,允许每个部分独立并行地操作。这些模块被连接起来建造更大的模块。模拟结果表明,对于某些应用程序,可以做出比基于位的估计更准确的预测。预测还有一个优点,它们可以被引入乘数设计中,而估计器则不能。这些预测类似于处理器中的分支预测中使用的预测。
{"title":"Applying speculation techniques to implement functional units","authors":"Alberto A. Del Barrio, M. Molina, J. Mendias, Esther Andres Perez, R. Hermida, F. Tirado","doi":"10.1109/ICCD.2008.4751843","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751843","url":null,"abstract":"This paper justifies the use of estimation and prediction of carries to increase the performance of functional units built with the replication of full adders while keeping a low area penalization. Adders and multipliers are the most representative modules in this group of functional units. The use of these design techniques allows the implementation of modules with performance improvements ranging from 20% to 50% with only an area overheads around 5%. These functional units are suitable for asynchronous circuits but they could also be introduced in synchronous circuits with speculative techniques. The basic idea consists in estimating the carry out from some parts of the functional units, allowing every part to operate independently and in parallel. These modules are connected to build bigger ones. Results from simulations show that for some applications it is possible to make predictions even more accurate that the bit-based estimation. Predictions have also the advantage they can be introduced in the multipliers design, whether estimators cannot. These predictions are similar to the ones used in the branch prediction in a processor.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125878468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Systematic design of high-radix Montgomery multipliers for RSA processors RSA处理器高基数Montgomery乘法器的系统设计
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751894
A. Miyamoto, N. Homma, T. Aoki, Akashi Satoh
The present paper proposes a systematic design approach to provide the optimal high-radix Montgomery multipliers for an RSA processor satisfying user requirements. We introduces three multiplier-based architectures using different intermediate-data forms ((i) single form, (ii) semi carry-save form, and (iii) carry-save form, and combined them with a wide variety of arithmetic components. Their radices are also parameterized from 28 to 264. A total of 202 designs for 1,024-bit RSA processors were obtained for each radix, and were synthesized using a 90-nm CMOS standard cell library. The smallest design of 0.9 Kgates with 137.8 ms/RSA to the fastest design of 1.8 ms/RSA at 74.7 Kgates were then obtained. In addition, the optimal design to meet the user requirements can be easily obtained from all the combinations. In addition to choosing the datapath architecture, the arithmetic component, and the radix parameters, the proposed systematic approach can also adopt other process technologies.
本文提出了一种系统的设计方法,为满足用户需求的RSA处理器提供最佳的高基数蒙哥马利乘法器。我们介绍了三种基于乘数的架构,使用不同的中间数据形式((i)单一形式,(ii)半进位保存形式和(iii)进位保存形式),并将它们与各种各样的算术组件结合起来。它们的根也从28到264参数化。每个基数共获得202个1024位RSA处理器设计,并使用90 nm CMOS标准单元库进行合成。从137.8 ms/RSA的最小设计0.9 Kgates到74.7 Kgates 1.8 ms/RSA的最快设计。此外,可以很容易地从所有组合中获得满足用户要求的最优设计。除了选择数据路径体系结构、算法组件和基数参数外,该方法还可以采用其他处理技术。
{"title":"Systematic design of high-radix Montgomery multipliers for RSA processors","authors":"A. Miyamoto, N. Homma, T. Aoki, Akashi Satoh","doi":"10.1109/ICCD.2008.4751894","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751894","url":null,"abstract":"The present paper proposes a systematic design approach to provide the optimal high-radix Montgomery multipliers for an RSA processor satisfying user requirements. We introduces three multiplier-based architectures using different intermediate-data forms ((i) single form, (ii) semi carry-save form, and (iii) carry-save form, and combined them with a wide variety of arithmetic components. Their radices are also parameterized from 28 to 264. A total of 202 designs for 1,024-bit RSA processors were obtained for each radix, and were synthesized using a 90-nm CMOS standard cell library. The smallest design of 0.9 Kgates with 137.8 ms/RSA to the fastest design of 1.8 ms/RSA at 74.7 Kgates were then obtained. In addition, the optimal design to meet the user requirements can be easily obtained from all the combinations. In addition to choosing the datapath architecture, the arithmetic component, and the radix parameters, the proposed systematic approach can also adopt other process technologies.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127832731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Early stage FPGA interconnect leakage power estimation 早期FPGA互连漏功率估计
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751898
Shilpa Bhoj, D. Bhatia
Increasing transistor densities, rising popularity in mobile applications and migration towards eco-friendly computing systems have made power dissipation a key FPGA design issue. To meet stringent budgets, system architects need accurate estimates of power distribution at various design stages. In this work, we make several key contributions to FPGA leakage power estimation. First, we develop an accurate and efficient model to estimate total interconnect leakage power at various design stages prior to routing. Our methods derive leakage power estimates based on predicted values of routing congestion and interconnect resource utilization. We then extend the model to accomodate complex segmented routing architectures and low leakage architectures. Finally we formulate relations to generate post place leakage power estimates of individual routing channels. Our models for overall leakage power estimation achieve average accuracy rates of 93% and 89% for uniform and segmented routing architectures respectively. Experimentation results also establish the accuracy of the channel level estimation models at 85% and 80% for uniform and segmented routing structures. Our models and techniques would help designers make informed decisions by providing information on the power consumption of the interconnect fabric well before routing. Additionally, the equations can be used for architectural explorations and embedded in power and thermal aware CAD tools.
晶体管密度的增加,移动应用的日益普及以及向环保计算系统的迁移使得功耗成为FPGA设计的关键问题。为了满足严格的预算,系统架构师需要在各个设计阶段对功率分布进行准确的估计。在这项工作中,我们对FPGA泄漏功率估计做出了几项关键贡献。首先,我们开发了一个准确有效的模型来估计在布线之前的各个设计阶段的总互连泄漏功率。我们的方法基于路由拥塞和互连资源利用率的预测值得出泄漏功率估计。然后,我们扩展模型以适应复杂的分段路由架构和低泄漏架构。最后,我们建立了生成各个路由通道的后置泄漏功率估计的关系式。我们的整体泄漏功率估计模型在均匀和分段路由架构下分别达到93%和89%的平均准确率。实验结果还表明,对于均匀和分段路由结构,信道电平估计模型的精度分别为85%和80%。我们的模型和技术可以在布线之前提供有关互连结构功耗的信息,从而帮助设计人员做出明智的决策。此外,这些方程可以用于建筑探索,并嵌入到功率和热感知CAD工具中。
{"title":"Early stage FPGA interconnect leakage power estimation","authors":"Shilpa Bhoj, D. Bhatia","doi":"10.1109/ICCD.2008.4751898","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751898","url":null,"abstract":"Increasing transistor densities, rising popularity in mobile applications and migration towards eco-friendly computing systems have made power dissipation a key FPGA design issue. To meet stringent budgets, system architects need accurate estimates of power distribution at various design stages. In this work, we make several key contributions to FPGA leakage power estimation. First, we develop an accurate and efficient model to estimate total interconnect leakage power at various design stages prior to routing. Our methods derive leakage power estimates based on predicted values of routing congestion and interconnect resource utilization. We then extend the model to accomodate complex segmented routing architectures and low leakage architectures. Finally we formulate relations to generate post place leakage power estimates of individual routing channels. Our models for overall leakage power estimation achieve average accuracy rates of 93% and 89% for uniform and segmented routing architectures respectively. Experimentation results also establish the accuracy of the channel level estimation models at 85% and 80% for uniform and segmented routing structures. Our models and techniques would help designers make informed decisions by providing information on the power consumption of the interconnect fabric well before routing. Additionally, the equations can be used for architectural explorations and embedded in power and thermal aware CAD tools.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129021637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Router and cell library co-development for improving redundant via insertion at pins 路由器和单元库共同开发,通过插针提高冗余度
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751929
Wei-Chiu Tseng, Yu-Hsing Chen, Rung-Bin Lin
In this paper we propose a synergetic approach that integrates router design and cell library engineering for improving post-routing via1 (via between M1 and M2) doubling rate at pins. We develop a double-via (DV) aware multilevel router to exploit the via1 doubling possibilities provided to the cells in a conventional as well as a DV-driven cell library. Compared to a non-DV-aware router using a conventional cell library, our approach using a DV-driven library can on average raise via1 doubling rate by 34%, raise total via doubling rate by 11%, reduce the total number of vias by 3%, and reduce the total number of via1s by 8%. All this can be achieved without incurring any performance and area penalties.
在本文中,我们提出了一种集成了路由器设计和单元库工程的协同方法,以提高引脚的路由后via1(通过M1和M2之间)加倍率。我们开发了一种双通道(DV)感知的多电平路由器,以利用传统和DV驱动的单元库中提供的via1加倍可能性。与使用传统蜂窝库的非dv感知路由器相比,我们使用dv驱动库的方法平均可将via1加倍率提高34%,将总通过加倍率提高11%,减少总过孔数3%,减少总过孔数8%。所有这些都可以在不产生任何性能和区域损失的情况下实现。
{"title":"Router and cell library co-development for improving redundant via insertion at pins","authors":"Wei-Chiu Tseng, Yu-Hsing Chen, Rung-Bin Lin","doi":"10.1109/ICCD.2008.4751929","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751929","url":null,"abstract":"In this paper we propose a synergetic approach that integrates router design and cell library engineering for improving post-routing via1 (via between M1 and M2) doubling rate at pins. We develop a double-via (DV) aware multilevel router to exploit the via1 doubling possibilities provided to the cells in a conventional as well as a DV-driven cell library. Compared to a non-DV-aware router using a conventional cell library, our approach using a DV-driven library can on average raise via1 doubling rate by 34%, raise total via doubling rate by 11%, reduce the total number of vias by 3%, and reduce the total number of via1s by 8%. All this can be achieved without incurring any performance and area penalties.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129308142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Comparative analysis of NBTI effects on low power and high performance flip-flops NBTI对低功耗和高性能人字拖影响的对比分析
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751862
K. Ramakrishnan, Xiaoxia Wu, N. Vijaykrishnan, Yuan Xie
Mitigating the circuit aging effect in digital circuits has become a very important concern for current and future technology nodes. Negative Bias Temperature Instability (NBTI) is one of the most important circuit aging mechanisms, which can incur timing errors. Flip-flops play a vital role as storage elements in pipelined architectures and are prone to effects of aging. NBTI increases the transistor threshold voltage, affecting the performance of the chip. In this paper, we study the effects of NBTI on the timing characteristics of different types of low power and high performance flip-flops. Factors such as input data probability and temperature which affect the degradation rate are also analyzed.
减轻数字电路中的电路老化效应已成为当前和未来技术节点非常关注的问题。负偏置温度不稳定性(NBTI)是电路老化最重要的机制之一,它会导致时序误差。人字拖在流水线架构中扮演着至关重要的存储元素,并且容易受到老化的影响。NBTI增加了晶体管的阈值电压,影响芯片的性能。本文研究了NBTI对不同类型的低功耗高性能触发器时序特性的影响。分析了输入数据概率、温度等因素对降解率的影响。
{"title":"Comparative analysis of NBTI effects on low power and high performance flip-flops","authors":"K. Ramakrishnan, Xiaoxia Wu, N. Vijaykrishnan, Yuan Xie","doi":"10.1109/ICCD.2008.4751862","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751862","url":null,"abstract":"Mitigating the circuit aging effect in digital circuits has become a very important concern for current and future technology nodes. Negative Bias Temperature Instability (NBTI) is one of the most important circuit aging mechanisms, which can incur timing errors. Flip-flops play a vital role as storage elements in pipelined architectures and are prone to effects of aging. NBTI increases the transistor threshold voltage, affecting the performance of the chip. In this paper, we study the effects of NBTI on the timing characteristics of different types of low power and high performance flip-flops. Factors such as input data probability and temperature which affect the degradation rate are also analyzed.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"322 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116364565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Optimization of Propagate Partial SAD and SAD tree motion estimation hardwired engine for H.264 H.264中传播部分SAD和SAD树运动估计硬线引擎的优化
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751881
Zhenyu Liu, S. Goto, T. Ikenaga
Variable block size motion estimation algorithm is the effcient approach to reduce the temporal redundancies and it has been adopted by the latest video coding standard H.264/AVC. The computational complexity augment coming from the variable block size technique makes the hardwired accelerator essential, especially for real-time applications. In this paper, the authors apply the architecture level and the circuits level approaches to improve the performance of Propagate Partial SAD and SAD Tree hardwired engines, which outperform other counterparts when considering the impact of supporting the variable block size technique. Experiments demonstrate that by using the proposed approaches, compared with the original architectures, 14.7% and 18.0% hardware cost can be saved for Propagate Partial SAD architecture and SAD Tree architecture, respectively. With TSMC 0.18 mm 1P6M CMOS technology, the proposed Propagate Partial SAD architecture attains 231.6 MHz operating frequency at a cost of 84.1 k gates. Correspondingly, the execution speed of the optimized SAD Tree architecture is improved to 204.8 MHz with 88.5 k gate hardware overhead.
变块大小运动估计算法是减少时间冗余的有效方法,已被最新的视频编码标准H.264/AVC所采用。可变块大小技术带来的计算复杂性的增加使得硬连线加速器变得必不可少,特别是在实时应用中。在本文中,作者采用体系结构级和电路级的方法来提高传播部分SAD和SAD树硬连线引擎的性能,在考虑支持可变块大小技术的影响时,它们优于其他同类引擎。实验表明,采用本文提出的方法,与原有结构相比,可分别节省14.7%和18.0%的硬件成本。采用台积电0.18 mm 1P6M CMOS技术,所提出的Propagate Partial SAD架构以84.1 k栅极成本达到231.6 MHz的工作频率。相应地,优化后的SAD树架构的执行速度提高到204.8 MHz,栅极硬件开销为88.5 k。
{"title":"Optimization of Propagate Partial SAD and SAD tree motion estimation hardwired engine for H.264","authors":"Zhenyu Liu, S. Goto, T. Ikenaga","doi":"10.1109/ICCD.2008.4751881","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751881","url":null,"abstract":"Variable block size motion estimation algorithm is the effcient approach to reduce the temporal redundancies and it has been adopted by the latest video coding standard H.264/AVC. The computational complexity augment coming from the variable block size technique makes the hardwired accelerator essential, especially for real-time applications. In this paper, the authors apply the architecture level and the circuits level approaches to improve the performance of Propagate Partial SAD and SAD Tree hardwired engines, which outperform other counterparts when considering the impact of supporting the variable block size technique. Experiments demonstrate that by using the proposed approaches, compared with the original architectures, 14.7% and 18.0% hardware cost can be saved for Propagate Partial SAD architecture and SAD Tree architecture, respectively. With TSMC 0.18 mm 1P6M CMOS technology, the proposed Propagate Partial SAD architecture attains 231.6 MHz operating frequency at a cost of 84.1 k gates. Correspondingly, the execution speed of the optimized SAD Tree architecture is improved to 204.8 MHz with 88.5 k gate hardware overhead.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114443297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
2008 IEEE International Conference on Computer Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1