首页 > 最新文献

2012 IEEE 30th International Conference on Computer Design (ICCD)最新文献

英文 中文
Oblivious routing design for mesh networks to achieve a new worst-case throughput bound 网状网络的遗忘路由设计,以实现新的最坏情况吞吐量边界
Pub Date : 2012-12-13 DOI: 10.1109/ICCD.2012.6378674
Guang Sun, Chia-Wei Chang, Bill Lin, Lieguang Zeng
1/2 network capacity is often believed to be the limit of worst-case throughput for mesh networks. However, this paper provides a new worst-case throughput bound, which is higher than 1/2 network capacity, for odd radix two-dimensional mesh networks. In addition, we propose a routing algorithm called U2TURN that can achieve this worst-case throughput bound for odd radix meshes. For even radix meshes, we prove that U2TURN achieves the optimal worst-case throughput, namely, half of network capacity. U2TURN considers all routing paths with at most 2 turns and distributes the traffic loads uniformly in both X and Y dimensions. Theoretical analysis and simulation results show that U2TURN outperforms existing routing algorithms in worst-case throughput. Moreover, U2TURN achieves good average-throughput at the expense of approximately 1.5× minimal average hop count. For asymmetric meshes, we further propose an algorithm called “U2TURN-A” and provide theoretical analysis for different algorithms. Both theoretical analysis and simulation show that U2TURN and U2TURN-A outperform existing algorithms VAL, DOR and O1TURN in both worst-case and average throughput for asymmetric meshes.
1/2网络容量通常被认为是网状网络的最坏情况吞吐量的极限。然而,本文为奇基二维网状网络提供了一个新的最坏情况吞吐量边界,该边界高于网络容量的1/2。此外,我们提出了一种名为U2TURN的路由算法,可以实现奇基网格的最坏情况吞吐量界限。对于偶数基网格,我们证明了U2TURN实现了最优最坏吞吐量,即网络容量的一半。U2TURN考虑所有最多2圈的路由路径,在X维和Y维上均匀分配流量负载。理论分析和仿真结果表明,U2TURN算法在最坏吞吐量方面优于现有的路由算法。此外,U2TURN以大约1.5倍的最小平均跳数为代价实现了良好的平均吞吐量。针对非对称网格,我们进一步提出了一种名为“U2TURN-A”的算法,并对不同算法进行了理论分析。理论分析和仿真结果表明,U2TURN和U2TURN- a算法在非对称网格的最坏吞吐量和平均吞吐量方面都优于现有的VAL、DOR和O1TURN算法。
{"title":"Oblivious routing design for mesh networks to achieve a new worst-case throughput bound","authors":"Guang Sun, Chia-Wei Chang, Bill Lin, Lieguang Zeng","doi":"10.1109/ICCD.2012.6378674","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378674","url":null,"abstract":"1/2 network capacity is often believed to be the limit of worst-case throughput for mesh networks. However, this paper provides a new worst-case throughput bound, which is higher than 1/2 network capacity, for odd radix two-dimensional mesh networks. In addition, we propose a routing algorithm called U2TURN that can achieve this worst-case throughput bound for odd radix meshes. For even radix meshes, we prove that U2TURN achieves the optimal worst-case throughput, namely, half of network capacity. U2TURN considers all routing paths with at most 2 turns and distributes the traffic loads uniformly in both X and Y dimensions. Theoretical analysis and simulation results show that U2TURN outperforms existing routing algorithms in worst-case throughput. Moreover, U2TURN achieves good average-throughput at the expense of approximately 1.5× minimal average hop count. For asymmetric meshes, we further propose an algorithm called “U2TURN-A” and provide theoretical analysis for different algorithms. Both theoretical analysis and simulation show that U2TURN and U2TURN-A outperform existing algorithms VAL, DOR and O1TURN in both worst-case and average throughput for asymmetric meshes.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131500847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Modeling economics of LSI design and manufacturing for test design selection 为测试设计选择进行大规模集成电路设计和制造的经济建模
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378701
H. Ichihara, N. Shimizu, T. Iwagaki, Tomoo Inoue
Many test designs (or DFTs: designs-for-testability) have been proposed to overcome various issues around LSI testing. In this paper, we propose a cost and benefit model for comparing several test designs in terms of the final profit of logic LSI design and manufacturing. Test designs can affect chip area, testing time, test generation time and fault coverage; in the proposed model, we clarify the relationship among these factors for major three test designs: scan design, built-in self-test (BIST) design and test compression design. The proposed model reveals the final profit for each test design in a given LSI design and manufacturing environment, so that it can designate a suitable test design in the early stage of LSI design flow. We show an example of application of the proposed model for test design selection in a given environment.
许多测试设计(或DFTs:可测试性设计)已经提出,以克服围绕LSI测试的各种问题。在本文中,我们提出了一个成本效益模型,以比较几种测试设计的逻辑LSI设计和制造的最终利润。测试设计会影响芯片面积、测试时间、测试生成时间和故障覆盖率;在该模型中,我们明确了三种主要测试设计:扫描设计、内置自检(BIST)设计和测试压缩设计中这些因素之间的关系。该模型揭示了给定的LSI设计和制造环境中每个测试设计的最终利润,从而可以在LSI设计流程的早期阶段指定合适的测试设计。我们展示了在给定环境中应用所提出的模型进行测试设计选择的示例。
{"title":"Modeling economics of LSI design and manufacturing for test design selection","authors":"H. Ichihara, N. Shimizu, T. Iwagaki, Tomoo Inoue","doi":"10.1109/ICCD.2012.6378701","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378701","url":null,"abstract":"Many test designs (or DFTs: designs-for-testability) have been proposed to overcome various issues around LSI testing. In this paper, we propose a cost and benefit model for comparing several test designs in terms of the final profit of logic LSI design and manufacturing. Test designs can affect chip area, testing time, test generation time and fault coverage; in the proposed model, we clarify the relationship among these factors for major three test designs: scan design, built-in self-test (BIST) design and test compression design. The proposed model reveals the final profit for each test design in a given LSI design and manufacturing environment, so that it can designate a suitable test design in the early stage of LSI design flow. We show an example of application of the proposed model for test design selection in a given environment.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124931566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DuSCA: A multi-channeling strategy for doubling communication capacity in wireless NoC DuSCA:无线NoC中通信容量加倍的多信道策略
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378620
Yi Wang, Dan Zhao, Jian Li
To bridge the widening gap between computation requirements and communication efficiency faced by many-core chips, Wireless Network-on-Chip (WiNoC) has been proposed by using ultra-wideband interconnect. While prior research has demonstrated the salient features of WiNoC as high perlink data rate, high accumulated bandwidth, high flexibility, low overhead and low power consumption, this research aims to develop a multi-access WiNoC to substantially improve the end-to-end performance of on-chip communication. Enabled by time hopping PPM multi-channel capability, we propose an efficient multi-channel distribution and arbitration scheme for improving communication concurrency and resolving channel competition among multiple users to achieve the desired network performance. Our simulation studies based on synthetic traffics demonstrate the efficiency, cost effectiveness and scalability of the channel arbitration scheme and the promising network performance of WiNoC.
为了解决多核芯片所面临的计算需求与通信效率之间日益扩大的差距,利用超宽带互连技术提出了无线片上网络(WiNoC)。先前的研究已经证明了WiNoC具有高链路数据速率、高累积带宽、高灵活性、低开销和低功耗等显著特点,本研究旨在开发一种多访问WiNoC,以大幅提高片上通信的端到端性能。在时间跳变PPM多信道能力的支持下,我们提出了一种有效的多信道分配和仲裁方案,以提高通信并发性并解决多用户之间的信道竞争,从而达到理想的网络性能。基于合成流量的仿真研究证明了信道仲裁方案的效率、成本效益和可扩展性,以及WiNoC具有良好的网络性能。
{"title":"DuSCA: A multi-channeling strategy for doubling communication capacity in wireless NoC","authors":"Yi Wang, Dan Zhao, Jian Li","doi":"10.1109/ICCD.2012.6378620","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378620","url":null,"abstract":"To bridge the widening gap between computation requirements and communication efficiency faced by many-core chips, Wireless Network-on-Chip (WiNoC) has been proposed by using ultra-wideband interconnect. While prior research has demonstrated the salient features of WiNoC as high perlink data rate, high accumulated bandwidth, high flexibility, low overhead and low power consumption, this research aims to develop a multi-access WiNoC to substantially improve the end-to-end performance of on-chip communication. Enabled by time hopping PPM multi-channel capability, we propose an efficient multi-channel distribution and arbitration scheme for improving communication concurrency and resolving channel competition among multiple users to achieve the desired network performance. Our simulation studies based on synthetic traffics demonstrate the efficiency, cost effectiveness and scalability of the channel arbitration scheme and the promising network performance of WiNoC.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115245735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A comparative study of wearout mechanisms in state-of-art microprocessors 最新微处理器耗损机制的比较研究
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378651
Chang-Chih Chen, Fahad Ahmed, L. Milor
In this work, we perform a comparative study of different wearout mechanisms affecting the state-of-art microprocessor systems. Taking into account the detailed thermal and electrical stress profiles, we present a methodology to accurately estimate the lifetime due to each mechanism. The lifetime-limiting wearout mechanisms are highlighted using standard benchmarks along with the reliability-critical microprocessor functional units.
在这项工作中,我们对影响最先进的微处理器系统的不同磨损机制进行了比较研究。考虑到详细的热应力和电应力分布,我们提出了一种方法来准确估计由于每种机制的寿命。使用标准基准以及可靠性关键微处理器功能单元,突出了寿命限制磨损机制。
{"title":"A comparative study of wearout mechanisms in state-of-art microprocessors","authors":"Chang-Chih Chen, Fahad Ahmed, L. Milor","doi":"10.1109/ICCD.2012.6378651","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378651","url":null,"abstract":"In this work, we perform a comparative study of different wearout mechanisms affecting the state-of-art microprocessor systems. Taking into account the detailed thermal and electrical stress profiles, we present a methodology to accurately estimate the lifetime due to each mechanism. The lifetime-limiting wearout mechanisms are highlighted using standard benchmarks along with the reliability-critical microprocessor functional units.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114318783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Adaptable intrusion detection using partial runtime reconfiguration 使用部分运行时重新配置的适应性入侵检测
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378633
M. Rahmatian, H. Kooti, I. Harris, E. Bozorgzadeh
Intrusion detection approaches have been presented which detect anomalous malware behavior at runtime. Most techniques involve software-based analysis which is too slow to support the tight timing constraints often imposed on embedded systems. We propose a hardware-based intrusion detection approach which does not alter the functional performance of the system. When using a real-time operating system, the executing process changes several times each second, requiring fast adaptation on the part of the intrusion detection mechanism. We present a technique to exploit the partial runtime reconfiguration feature present on many modern field programmable gate arrays (FPGAs) to adapt intrusion detection to a new process at each context switch. The use of runtime reconfiguration enables the flexibility of software-based approaches with the performance benefits of hardware-based approaches.
已经提出了在运行时检测恶意软件异常行为的入侵检测方法。大多数技术涉及基于软件的分析,速度太慢,无法支持嵌入式系统经常施加的严格时间限制。我们提出了一种不改变系统功能性能的基于硬件的入侵检测方法。当使用实时操作系统时,执行进程每秒会发生多次变化,这就要求入侵检测机制能够快速适应。我们提出了一种技术,利用许多现代现场可编程门阵列(fpga)上存在的部分运行时重构特征,使入侵检测适应每次上下文切换时的新进程。运行时重新配置的使用使基于软件的方法具有灵活性,同时具有基于硬件的方法的性能优势。
{"title":"Adaptable intrusion detection using partial runtime reconfiguration","authors":"M. Rahmatian, H. Kooti, I. Harris, E. Bozorgzadeh","doi":"10.1109/ICCD.2012.6378633","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378633","url":null,"abstract":"Intrusion detection approaches have been presented which detect anomalous malware behavior at runtime. Most techniques involve software-based analysis which is too slow to support the tight timing constraints often imposed on embedded systems. We propose a hardware-based intrusion detection approach which does not alter the functional performance of the system. When using a real-time operating system, the executing process changes several times each second, requiring fast adaptation on the part of the intrusion detection mechanism. We present a technique to exploit the partial runtime reconfiguration feature present on many modern field programmable gate arrays (FPGAs) to adapt intrusion detection to a new process at each context switch. The use of runtime reconfiguration enables the flexibility of software-based approaches with the performance benefits of hardware-based approaches.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131137093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ECC string: Flexible ECC management for low-cost error protection of L2 caches ECC字符串:灵活的ECC管理,用于L2缓存的低成本错误保护
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378699
Jeongkyu Hong, Soontae Kim
Conventional error correcting codes (ECC) scheme for caches is based on fixed mapping between cache words and ECC check bits, and fixed ECC word granularity, which leads to inefficient usage of ECC check bits. In contrast, we propose to use the ECC check bits flexibly for low-cost error protections of L2 caches. Our ECC scheme works at word level while the conventional ECC scheme works at cache line or set level; Our scheme protects only dirty words. In addition, our scheme utilizes variable ECC word granularities; Dirty words that are unlikely to be modified further are protected together with larger ECC word granularity. Our scheme reduces DRAM and data bus energy overheads by 28% and 45% on average, respectively, with the same area overhead as the previously proposed competitive scheme.
传统的缓存纠错码(ECC)方案是基于固定的缓存字与ECC校验位的映射关系和固定的ECC字粒度,导致ECC校验位的利用率低下。相反,我们建议灵活地使用ECC校验位用于L2缓存的低成本错误保护。我们的ECC方案工作在字级,而传统的ECC方案工作在缓存线或设置级;我们的计划只保护脏话。此外,我们的方案利用可变的ECC字粒度;不太可能被进一步修改的脏词与更大的ECC词粒度一起受到保护。我们的方案将DRAM和数据总线的能量开销平均分别降低了28%和45%,与之前提出的竞争方案具有相同的面积开销。
{"title":"ECC string: Flexible ECC management for low-cost error protection of L2 caches","authors":"Jeongkyu Hong, Soontae Kim","doi":"10.1109/ICCD.2012.6378699","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378699","url":null,"abstract":"Conventional error correcting codes (ECC) scheme for caches is based on fixed mapping between cache words and ECC check bits, and fixed ECC word granularity, which leads to inefficient usage of ECC check bits. In contrast, we propose to use the ECC check bits flexibly for low-cost error protections of L2 caches. Our ECC scheme works at word level while the conventional ECC scheme works at cache line or set level; Our scheme protects only dirty words. In addition, our scheme utilizes variable ECC word granularities; Dirty words that are unlikely to be modified further are protected together with larger ECC word granularity. Our scheme reduces DRAM and data bus energy overheads by 28% and 45% on average, respectively, with the same area overhead as the previously proposed competitive scheme.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123675770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A retrospective look at xpipes: The exciting ride from a design experience to a design platform for nanoscale networks-on-chip 回顾xpipes:从设计体验到纳米级片上网络设计平台的激动人心的历程
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378614
D. Bertozzi, L. Benini
This paper gives a retrospective look at the xpipes framework, and documents its evolution from a promising network-on-chip (NoC) design experience to a comprehensive design platform for the next-generation of nanoscale NoCs. Since the early days of xpipes, its cross-layer approach to NoC design has given a significant contribution to bridge the gap between the NoC concept and an industry-relevant interconnect technology.
本文回顾了xpipes框架,并记录了它从一个有前途的片上网络(NoC)设计经验到下一代纳米级NoC的综合设计平台的演变。自xpipes成立以来,其NoC设计的跨层方法为弥合NoC概念与行业相关互连技术之间的差距做出了重大贡献。
{"title":"A retrospective look at xpipes: The exciting ride from a design experience to a design platform for nanoscale networks-on-chip","authors":"D. Bertozzi, L. Benini","doi":"10.1109/ICCD.2012.6378614","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378614","url":null,"abstract":"This paper gives a retrospective look at the xpipes framework, and documents its evolution from a promising network-on-chip (NoC) design experience to a comprehensive design platform for the next-generation of nanoscale NoCs. Since the early days of xpipes, its cross-layer approach to NoC design has given a significant contribution to bridge the gap between the NoC concept and an industry-relevant interconnect technology.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128208816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exposing vulnerabilities of untrusted computing platforms 暴露不可信计算平台的漏洞
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378629
Yier Jin, M. Maniatakos, Y. Makris
This work seeks to expose the vulnerability of un-trusted computing platforms used in critical systems to hardware Trojans and combined hardware/software attacks. As part of our entry in the Cyber Security Awareness Week (CSAW) Embedded System Challenge hosted by NYU-Poly in 2011, we developed and presented 10 such processor-level hardware Trojans. These are split in five categories with various impacts, such as altering instruction memory, modifying the communication channel, stealing user information, changing interrupt handler location and RC-5 encryption algorithm checking of a medium complexity micro-processor (8051). Our work serves as a good starting point for researchers to develop Trojan detection and prevention methodologies on modern processor and to ensure trustworthiness of computing platforms.
这项工作旨在暴露关键系统中使用的不可信计算平台对硬件木马和硬件/软件组合攻击的脆弱性。作为2011年由纽约理工大学主办的网络安全意识周(CSAW)嵌入式系统挑战的一部分,我们开发并展示了10个这样的处理器级硬件木马。这些影响分为五类,如改变指令存储器、修改通信通道、窃取用户信息、改变中断处理程序位置和中等复杂微处理器(8051)的RC-5加密算法检查。本研究为研究人员在现代处理器上开发木马检测和预防方法以及确保计算平台的可信度提供了良好的起点。
{"title":"Exposing vulnerabilities of untrusted computing platforms","authors":"Yier Jin, M. Maniatakos, Y. Makris","doi":"10.1109/ICCD.2012.6378629","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378629","url":null,"abstract":"This work seeks to expose the vulnerability of un-trusted computing platforms used in critical systems to hardware Trojans and combined hardware/software attacks. As part of our entry in the Cyber Security Awareness Week (CSAW) Embedded System Challenge hosted by NYU-Poly in 2011, we developed and presented 10 such processor-level hardware Trojans. These are split in five categories with various impacts, such as altering instruction memory, modifying the communication channel, stealing user information, changing interrupt handler location and RC-5 encryption algorithm checking of a medium complexity micro-processor (8051). Our work serves as a good starting point for researchers to develop Trojan detection and prevention methodologies on modern processor and to ensure trustworthiness of computing platforms.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127208683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Designing pipelined delay lines with dynamically-adaptive granularity for low-energy applications 低能耗应用中动态自适应粒度的流水线延迟线设计
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378660
Christos Vezyrtzis, Y. Tsividis, S. Nowick
A calibrated delay line is a key component in many modern digital systems. Traditionally, these lines are designed as real-time pipelines with static granularity, fine enough to handle a worst-case input rate. However, due to their rigid structure, they have sub-optimal energy for low- and varying-rate input streams. We introduce a complete methodology for designing reconfigurable delay lines which dynamically adapt granularity to traffic, on-the-fly, without stalling or disturbing normal operation. These lines have two modes: coarse- and fine-grain. During sparser traffic, the system is reconfigured to coarse-grain mode, thereby reducing total energy, and it reverts to fine-grain mode during denser traffic. In each case, overall delay is preserved. This strategy is especially beneficial for applications where input traffic is highly varied. The particular focus of this paper is on one promising domain, continuous-time digital signal processors (CT DSP's), a new class of processors targeting low-energy applications. The proposed system includes two lightweight asynchronous control blocks: a digital controller to continuously monitor input traffic, and a micropipeline to dynamically reconfigure the entire delay line. With a complete implementation in a 0.13 um IBM CMOS technology, post-layout simulations demonstrate an average overall dynamic power reduction up to 45.5% compared to a non-adaptive design, with only minimal area overhead. The design methodology is modular, supporting extensions to multiple configuration modes to provide even greater power reduction for a variety of input traffic. While results are presented for CT DSP's, significant benefits are also expected in many other domains where delay lines are used.
经过校准的延迟线是许多现代数字系统的关键部件。传统上,这些线被设计为具有静态粒度的实时管道,足以处理最坏情况的输入率。然而,由于它们的刚性结构,它们在低速率和变速率输入流中具有次优能量。我们介绍了一个完整的方法来设计可重构延迟线,它动态地适应交通的粒度,在飞行中,不拖延或干扰正常运行。这些线条有粗纹和细纹两种模式。在流量稀疏时,系统重新配置为粗粒度模式,从而减少总能量;在流量密集时,系统恢复为细粒度模式。在每种情况下,总体延迟都是保留的。这种策略对于输入流量变化很大的应用程序特别有用。本文特别关注的是一个有前途的领域,连续时间数字信号处理器(CT DSP),这是一种针对低能耗应用的新型处理器。该系统包括两个轻量级异步控制模块:一个用于连续监控输入流量的数字控制器和一个用于动态重新配置整个延迟线的微管道。通过0.13 um IBM CMOS技术的完整实现,布局后仿真表明,与非自适应设计相比,平均整体动态功耗降低高达45.5%,面积开销最小。设计方法是模块化的,支持扩展到多种配置模式,为各种输入流量提供更大的功耗降低。虽然给出了CT DSP的结果,但在使用延迟线的许多其他领域也有望获得显着的好处。
{"title":"Designing pipelined delay lines with dynamically-adaptive granularity for low-energy applications","authors":"Christos Vezyrtzis, Y. Tsividis, S. Nowick","doi":"10.1109/ICCD.2012.6378660","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378660","url":null,"abstract":"A calibrated delay line is a key component in many modern digital systems. Traditionally, these lines are designed as real-time pipelines with static granularity, fine enough to handle a worst-case input rate. However, due to their rigid structure, they have sub-optimal energy for low- and varying-rate input streams. We introduce a complete methodology for designing reconfigurable delay lines which dynamically adapt granularity to traffic, on-the-fly, without stalling or disturbing normal operation. These lines have two modes: coarse- and fine-grain. During sparser traffic, the system is reconfigured to coarse-grain mode, thereby reducing total energy, and it reverts to fine-grain mode during denser traffic. In each case, overall delay is preserved. This strategy is especially beneficial for applications where input traffic is highly varied. The particular focus of this paper is on one promising domain, continuous-time digital signal processors (CT DSP's), a new class of processors targeting low-energy applications. The proposed system includes two lightweight asynchronous control blocks: a digital controller to continuously monitor input traffic, and a micropipeline to dynamically reconfigure the entire delay line. With a complete implementation in a 0.13 um IBM CMOS technology, post-layout simulations demonstrate an average overall dynamic power reduction up to 45.5% compared to a non-adaptive design, with only minimal area overhead. The design methodology is modular, supporting extensions to multiple configuration modes to provide even greater power reduction for a variety of input traffic. While results are presented for CT DSP's, significant benefits are also expected in many other domains where delay lines are used.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130051721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Fast development of hardware-based run-time monitors through architecture framework and high-level synthesis 通过架构框架和高级综合快速开发基于硬件的运行时监视器
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378669
Mohamed Ismail, G. Suh
Recent work has shown that hardware-based runtime monitoring techniques can significantly enhance security and reliability of computing systems with minimal performance and energy overheads. However, the cost and time for implementing such a hardware-based mechanism presents a major challenge in deploying the run-time monitoring techniques in real systems. This paper addresses this design complexity problem through a common architecture framework and high-level synthesis. Similar to customizable processors such as Tensilica Xtensa where designers only need to write a small piece of code that describes a custom instruction, our framework enables designers to only specify monitoring operations. The framework provides common functions such as collecting a trace of execution, maintaining meta-data, and interfacing with software. To further reduce the design complexity, we also explore using a high-level synthesis tool (Cadence C-to-Silicon) so that hardware monitors can be described in a high-level language (SystemC) instead of in RTL such as Verilog and VHDL. To evaluate our approach, we implemented a set of monitors including soft-error checking, uninitialized memory checking, dynamic information flow tracking, and array boundary checking in our framework. Our results suggest that our monitor framework can greatly reduce the amount of code that needs to be specified for each extension and the high-level synthesis can achieve comparable area, performance, and power consumption to handwritten RTL.
最近的研究表明,基于硬件的运行时监控技术可以以最小的性能和能耗开销显著提高计算系统的安全性和可靠性。然而,实现这种基于硬件的机制的成本和时间是在实际系统中部署运行时监视技术的主要挑战。本文通过一个通用的体系结构框架和高级综合来解决这个设计复杂性问题。与Tensilica Xtensa等可定制处理器类似,设计人员只需要编写一小段代码来描述自定义指令,我们的框架使设计人员只需指定监控操作。该框架提供了诸如收集执行跟踪、维护元数据以及与软件接口等常用功能。为了进一步降低设计复杂性,我们还探索了使用高级合成工具(Cadence C-to-Silicon),以便硬件监视器可以用高级语言(SystemC)来描述,而不是像Verilog和VHDL这样的RTL。为了评估我们的方法,我们在框架中实现了一组监视器,包括软错误检查、未初始化内存检查、动态信息流跟踪和数组边界检查。我们的结果表明,我们的监视器框架可以大大减少需要为每个扩展指定的代码量,并且高级合成可以实现与手写RTL相当的面积、性能和功耗。
{"title":"Fast development of hardware-based run-time monitors through architecture framework and high-level synthesis","authors":"Mohamed Ismail, G. Suh","doi":"10.1109/ICCD.2012.6378669","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378669","url":null,"abstract":"Recent work has shown that hardware-based runtime monitoring techniques can significantly enhance security and reliability of computing systems with minimal performance and energy overheads. However, the cost and time for implementing such a hardware-based mechanism presents a major challenge in deploying the run-time monitoring techniques in real systems. This paper addresses this design complexity problem through a common architecture framework and high-level synthesis. Similar to customizable processors such as Tensilica Xtensa where designers only need to write a small piece of code that describes a custom instruction, our framework enables designers to only specify monitoring operations. The framework provides common functions such as collecting a trace of execution, maintaining meta-data, and interfacing with software. To further reduce the design complexity, we also explore using a high-level synthesis tool (Cadence C-to-Silicon) so that hardware monitors can be described in a high-level language (SystemC) instead of in RTL such as Verilog and VHDL. To evaluate our approach, we implemented a set of monitors including soft-error checking, uninitialized memory checking, dynamic information flow tracking, and array boundary checking in our framework. Our results suggest that our monitor framework can greatly reduce the amount of code that needs to be specified for each extension and the high-level synthesis can achieve comparable area, performance, and power consumption to handwritten RTL.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130998944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2012 IEEE 30th International Conference on Computer Design (ICCD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1