Pub Date : 2012-12-13DOI: 10.1109/ICCD.2012.6378674
Guang Sun, Chia-Wei Chang, Bill Lin, Lieguang Zeng
1/2 network capacity is often believed to be the limit of worst-case throughput for mesh networks. However, this paper provides a new worst-case throughput bound, which is higher than 1/2 network capacity, for odd radix two-dimensional mesh networks. In addition, we propose a routing algorithm called U2TURN that can achieve this worst-case throughput bound for odd radix meshes. For even radix meshes, we prove that U2TURN achieves the optimal worst-case throughput, namely, half of network capacity. U2TURN considers all routing paths with at most 2 turns and distributes the traffic loads uniformly in both X and Y dimensions. Theoretical analysis and simulation results show that U2TURN outperforms existing routing algorithms in worst-case throughput. Moreover, U2TURN achieves good average-throughput at the expense of approximately 1.5× minimal average hop count. For asymmetric meshes, we further propose an algorithm called “U2TURN-A” and provide theoretical analysis for different algorithms. Both theoretical analysis and simulation show that U2TURN and U2TURN-A outperform existing algorithms VAL, DOR and O1TURN in both worst-case and average throughput for asymmetric meshes.
{"title":"Oblivious routing design for mesh networks to achieve a new worst-case throughput bound","authors":"Guang Sun, Chia-Wei Chang, Bill Lin, Lieguang Zeng","doi":"10.1109/ICCD.2012.6378674","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378674","url":null,"abstract":"1/2 network capacity is often believed to be the limit of worst-case throughput for mesh networks. However, this paper provides a new worst-case throughput bound, which is higher than 1/2 network capacity, for odd radix two-dimensional mesh networks. In addition, we propose a routing algorithm called U2TURN that can achieve this worst-case throughput bound for odd radix meshes. For even radix meshes, we prove that U2TURN achieves the optimal worst-case throughput, namely, half of network capacity. U2TURN considers all routing paths with at most 2 turns and distributes the traffic loads uniformly in both X and Y dimensions. Theoretical analysis and simulation results show that U2TURN outperforms existing routing algorithms in worst-case throughput. Moreover, U2TURN achieves good average-throughput at the expense of approximately 1.5× minimal average hop count. For asymmetric meshes, we further propose an algorithm called “U2TURN-A” and provide theoretical analysis for different algorithms. Both theoretical analysis and simulation show that U2TURN and U2TURN-A outperform existing algorithms VAL, DOR and O1TURN in both worst-case and average throughput for asymmetric meshes.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131500847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378701
H. Ichihara, N. Shimizu, T. Iwagaki, Tomoo Inoue
Many test designs (or DFTs: designs-for-testability) have been proposed to overcome various issues around LSI testing. In this paper, we propose a cost and benefit model for comparing several test designs in terms of the final profit of logic LSI design and manufacturing. Test designs can affect chip area, testing time, test generation time and fault coverage; in the proposed model, we clarify the relationship among these factors for major three test designs: scan design, built-in self-test (BIST) design and test compression design. The proposed model reveals the final profit for each test design in a given LSI design and manufacturing environment, so that it can designate a suitable test design in the early stage of LSI design flow. We show an example of application of the proposed model for test design selection in a given environment.
{"title":"Modeling economics of LSI design and manufacturing for test design selection","authors":"H. Ichihara, N. Shimizu, T. Iwagaki, Tomoo Inoue","doi":"10.1109/ICCD.2012.6378701","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378701","url":null,"abstract":"Many test designs (or DFTs: designs-for-testability) have been proposed to overcome various issues around LSI testing. In this paper, we propose a cost and benefit model for comparing several test designs in terms of the final profit of logic LSI design and manufacturing. Test designs can affect chip area, testing time, test generation time and fault coverage; in the proposed model, we clarify the relationship among these factors for major three test designs: scan design, built-in self-test (BIST) design and test compression design. The proposed model reveals the final profit for each test design in a given LSI design and manufacturing environment, so that it can designate a suitable test design in the early stage of LSI design flow. We show an example of application of the proposed model for test design selection in a given environment.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124931566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378620
Yi Wang, Dan Zhao, Jian Li
To bridge the widening gap between computation requirements and communication efficiency faced by many-core chips, Wireless Network-on-Chip (WiNoC) has been proposed by using ultra-wideband interconnect. While prior research has demonstrated the salient features of WiNoC as high perlink data rate, high accumulated bandwidth, high flexibility, low overhead and low power consumption, this research aims to develop a multi-access WiNoC to substantially improve the end-to-end performance of on-chip communication. Enabled by time hopping PPM multi-channel capability, we propose an efficient multi-channel distribution and arbitration scheme for improving communication concurrency and resolving channel competition among multiple users to achieve the desired network performance. Our simulation studies based on synthetic traffics demonstrate the efficiency, cost effectiveness and scalability of the channel arbitration scheme and the promising network performance of WiNoC.
{"title":"DuSCA: A multi-channeling strategy for doubling communication capacity in wireless NoC","authors":"Yi Wang, Dan Zhao, Jian Li","doi":"10.1109/ICCD.2012.6378620","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378620","url":null,"abstract":"To bridge the widening gap between computation requirements and communication efficiency faced by many-core chips, Wireless Network-on-Chip (WiNoC) has been proposed by using ultra-wideband interconnect. While prior research has demonstrated the salient features of WiNoC as high perlink data rate, high accumulated bandwidth, high flexibility, low overhead and low power consumption, this research aims to develop a multi-access WiNoC to substantially improve the end-to-end performance of on-chip communication. Enabled by time hopping PPM multi-channel capability, we propose an efficient multi-channel distribution and arbitration scheme for improving communication concurrency and resolving channel competition among multiple users to achieve the desired network performance. Our simulation studies based on synthetic traffics demonstrate the efficiency, cost effectiveness and scalability of the channel arbitration scheme and the promising network performance of WiNoC.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115245735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378651
Chang-Chih Chen, Fahad Ahmed, L. Milor
In this work, we perform a comparative study of different wearout mechanisms affecting the state-of-art microprocessor systems. Taking into account the detailed thermal and electrical stress profiles, we present a methodology to accurately estimate the lifetime due to each mechanism. The lifetime-limiting wearout mechanisms are highlighted using standard benchmarks along with the reliability-critical microprocessor functional units.
{"title":"A comparative study of wearout mechanisms in state-of-art microprocessors","authors":"Chang-Chih Chen, Fahad Ahmed, L. Milor","doi":"10.1109/ICCD.2012.6378651","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378651","url":null,"abstract":"In this work, we perform a comparative study of different wearout mechanisms affecting the state-of-art microprocessor systems. Taking into account the detailed thermal and electrical stress profiles, we present a methodology to accurately estimate the lifetime due to each mechanism. The lifetime-limiting wearout mechanisms are highlighted using standard benchmarks along with the reliability-critical microprocessor functional units.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114318783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378633
M. Rahmatian, H. Kooti, I. Harris, E. Bozorgzadeh
Intrusion detection approaches have been presented which detect anomalous malware behavior at runtime. Most techniques involve software-based analysis which is too slow to support the tight timing constraints often imposed on embedded systems. We propose a hardware-based intrusion detection approach which does not alter the functional performance of the system. When using a real-time operating system, the executing process changes several times each second, requiring fast adaptation on the part of the intrusion detection mechanism. We present a technique to exploit the partial runtime reconfiguration feature present on many modern field programmable gate arrays (FPGAs) to adapt intrusion detection to a new process at each context switch. The use of runtime reconfiguration enables the flexibility of software-based approaches with the performance benefits of hardware-based approaches.
{"title":"Adaptable intrusion detection using partial runtime reconfiguration","authors":"M. Rahmatian, H. Kooti, I. Harris, E. Bozorgzadeh","doi":"10.1109/ICCD.2012.6378633","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378633","url":null,"abstract":"Intrusion detection approaches have been presented which detect anomalous malware behavior at runtime. Most techniques involve software-based analysis which is too slow to support the tight timing constraints often imposed on embedded systems. We propose a hardware-based intrusion detection approach which does not alter the functional performance of the system. When using a real-time operating system, the executing process changes several times each second, requiring fast adaptation on the part of the intrusion detection mechanism. We present a technique to exploit the partial runtime reconfiguration feature present on many modern field programmable gate arrays (FPGAs) to adapt intrusion detection to a new process at each context switch. The use of runtime reconfiguration enables the flexibility of software-based approaches with the performance benefits of hardware-based approaches.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131137093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378699
Jeongkyu Hong, Soontae Kim
Conventional error correcting codes (ECC) scheme for caches is based on fixed mapping between cache words and ECC check bits, and fixed ECC word granularity, which leads to inefficient usage of ECC check bits. In contrast, we propose to use the ECC check bits flexibly for low-cost error protections of L2 caches. Our ECC scheme works at word level while the conventional ECC scheme works at cache line or set level; Our scheme protects only dirty words. In addition, our scheme utilizes variable ECC word granularities; Dirty words that are unlikely to be modified further are protected together with larger ECC word granularity. Our scheme reduces DRAM and data bus energy overheads by 28% and 45% on average, respectively, with the same area overhead as the previously proposed competitive scheme.
{"title":"ECC string: Flexible ECC management for low-cost error protection of L2 caches","authors":"Jeongkyu Hong, Soontae Kim","doi":"10.1109/ICCD.2012.6378699","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378699","url":null,"abstract":"Conventional error correcting codes (ECC) scheme for caches is based on fixed mapping between cache words and ECC check bits, and fixed ECC word granularity, which leads to inefficient usage of ECC check bits. In contrast, we propose to use the ECC check bits flexibly for low-cost error protections of L2 caches. Our ECC scheme works at word level while the conventional ECC scheme works at cache line or set level; Our scheme protects only dirty words. In addition, our scheme utilizes variable ECC word granularities; Dirty words that are unlikely to be modified further are protected together with larger ECC word granularity. Our scheme reduces DRAM and data bus energy overheads by 28% and 45% on average, respectively, with the same area overhead as the previously proposed competitive scheme.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123675770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378614
D. Bertozzi, L. Benini
This paper gives a retrospective look at the xpipes framework, and documents its evolution from a promising network-on-chip (NoC) design experience to a comprehensive design platform for the next-generation of nanoscale NoCs. Since the early days of xpipes, its cross-layer approach to NoC design has given a significant contribution to bridge the gap between the NoC concept and an industry-relevant interconnect technology.
{"title":"A retrospective look at xpipes: The exciting ride from a design experience to a design platform for nanoscale networks-on-chip","authors":"D. Bertozzi, L. Benini","doi":"10.1109/ICCD.2012.6378614","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378614","url":null,"abstract":"This paper gives a retrospective look at the xpipes framework, and documents its evolution from a promising network-on-chip (NoC) design experience to a comprehensive design platform for the next-generation of nanoscale NoCs. Since the early days of xpipes, its cross-layer approach to NoC design has given a significant contribution to bridge the gap between the NoC concept and an industry-relevant interconnect technology.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128208816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378629
Yier Jin, M. Maniatakos, Y. Makris
This work seeks to expose the vulnerability of un-trusted computing platforms used in critical systems to hardware Trojans and combined hardware/software attacks. As part of our entry in the Cyber Security Awareness Week (CSAW) Embedded System Challenge hosted by NYU-Poly in 2011, we developed and presented 10 such processor-level hardware Trojans. These are split in five categories with various impacts, such as altering instruction memory, modifying the communication channel, stealing user information, changing interrupt handler location and RC-5 encryption algorithm checking of a medium complexity micro-processor (8051). Our work serves as a good starting point for researchers to develop Trojan detection and prevention methodologies on modern processor and to ensure trustworthiness of computing platforms.
{"title":"Exposing vulnerabilities of untrusted computing platforms","authors":"Yier Jin, M. Maniatakos, Y. Makris","doi":"10.1109/ICCD.2012.6378629","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378629","url":null,"abstract":"This work seeks to expose the vulnerability of un-trusted computing platforms used in critical systems to hardware Trojans and combined hardware/software attacks. As part of our entry in the Cyber Security Awareness Week (CSAW) Embedded System Challenge hosted by NYU-Poly in 2011, we developed and presented 10 such processor-level hardware Trojans. These are split in five categories with various impacts, such as altering instruction memory, modifying the communication channel, stealing user information, changing interrupt handler location and RC-5 encryption algorithm checking of a medium complexity micro-processor (8051). Our work serves as a good starting point for researchers to develop Trojan detection and prevention methodologies on modern processor and to ensure trustworthiness of computing platforms.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127208683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378660
Christos Vezyrtzis, Y. Tsividis, S. Nowick
A calibrated delay line is a key component in many modern digital systems. Traditionally, these lines are designed as real-time pipelines with static granularity, fine enough to handle a worst-case input rate. However, due to their rigid structure, they have sub-optimal energy for low- and varying-rate input streams. We introduce a complete methodology for designing reconfigurable delay lines which dynamically adapt granularity to traffic, on-the-fly, without stalling or disturbing normal operation. These lines have two modes: coarse- and fine-grain. During sparser traffic, the system is reconfigured to coarse-grain mode, thereby reducing total energy, and it reverts to fine-grain mode during denser traffic. In each case, overall delay is preserved. This strategy is especially beneficial for applications where input traffic is highly varied. The particular focus of this paper is on one promising domain, continuous-time digital signal processors (CT DSP's), a new class of processors targeting low-energy applications. The proposed system includes two lightweight asynchronous control blocks: a digital controller to continuously monitor input traffic, and a micropipeline to dynamically reconfigure the entire delay line. With a complete implementation in a 0.13 um IBM CMOS technology, post-layout simulations demonstrate an average overall dynamic power reduction up to 45.5% compared to a non-adaptive design, with only minimal area overhead. The design methodology is modular, supporting extensions to multiple configuration modes to provide even greater power reduction for a variety of input traffic. While results are presented for CT DSP's, significant benefits are also expected in many other domains where delay lines are used.
经过校准的延迟线是许多现代数字系统的关键部件。传统上,这些线被设计为具有静态粒度的实时管道,足以处理最坏情况的输入率。然而,由于它们的刚性结构,它们在低速率和变速率输入流中具有次优能量。我们介绍了一个完整的方法来设计可重构延迟线,它动态地适应交通的粒度,在飞行中,不拖延或干扰正常运行。这些线条有粗纹和细纹两种模式。在流量稀疏时,系统重新配置为粗粒度模式,从而减少总能量;在流量密集时,系统恢复为细粒度模式。在每种情况下,总体延迟都是保留的。这种策略对于输入流量变化很大的应用程序特别有用。本文特别关注的是一个有前途的领域,连续时间数字信号处理器(CT DSP),这是一种针对低能耗应用的新型处理器。该系统包括两个轻量级异步控制模块:一个用于连续监控输入流量的数字控制器和一个用于动态重新配置整个延迟线的微管道。通过0.13 um IBM CMOS技术的完整实现,布局后仿真表明,与非自适应设计相比,平均整体动态功耗降低高达45.5%,面积开销最小。设计方法是模块化的,支持扩展到多种配置模式,为各种输入流量提供更大的功耗降低。虽然给出了CT DSP的结果,但在使用延迟线的许多其他领域也有望获得显着的好处。
{"title":"Designing pipelined delay lines with dynamically-adaptive granularity for low-energy applications","authors":"Christos Vezyrtzis, Y. Tsividis, S. Nowick","doi":"10.1109/ICCD.2012.6378660","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378660","url":null,"abstract":"A calibrated delay line is a key component in many modern digital systems. Traditionally, these lines are designed as real-time pipelines with static granularity, fine enough to handle a worst-case input rate. However, due to their rigid structure, they have sub-optimal energy for low- and varying-rate input streams. We introduce a complete methodology for designing reconfigurable delay lines which dynamically adapt granularity to traffic, on-the-fly, without stalling or disturbing normal operation. These lines have two modes: coarse- and fine-grain. During sparser traffic, the system is reconfigured to coarse-grain mode, thereby reducing total energy, and it reverts to fine-grain mode during denser traffic. In each case, overall delay is preserved. This strategy is especially beneficial for applications where input traffic is highly varied. The particular focus of this paper is on one promising domain, continuous-time digital signal processors (CT DSP's), a new class of processors targeting low-energy applications. The proposed system includes two lightweight asynchronous control blocks: a digital controller to continuously monitor input traffic, and a micropipeline to dynamically reconfigure the entire delay line. With a complete implementation in a 0.13 um IBM CMOS technology, post-layout simulations demonstrate an average overall dynamic power reduction up to 45.5% compared to a non-adaptive design, with only minimal area overhead. The design methodology is modular, supporting extensions to multiple configuration modes to provide even greater power reduction for a variety of input traffic. While results are presented for CT DSP's, significant benefits are also expected in many other domains where delay lines are used.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130051721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378669
Mohamed Ismail, G. Suh
Recent work has shown that hardware-based runtime monitoring techniques can significantly enhance security and reliability of computing systems with minimal performance and energy overheads. However, the cost and time for implementing such a hardware-based mechanism presents a major challenge in deploying the run-time monitoring techniques in real systems. This paper addresses this design complexity problem through a common architecture framework and high-level synthesis. Similar to customizable processors such as Tensilica Xtensa where designers only need to write a small piece of code that describes a custom instruction, our framework enables designers to only specify monitoring operations. The framework provides common functions such as collecting a trace of execution, maintaining meta-data, and interfacing with software. To further reduce the design complexity, we also explore using a high-level synthesis tool (Cadence C-to-Silicon) so that hardware monitors can be described in a high-level language (SystemC) instead of in RTL such as Verilog and VHDL. To evaluate our approach, we implemented a set of monitors including soft-error checking, uninitialized memory checking, dynamic information flow tracking, and array boundary checking in our framework. Our results suggest that our monitor framework can greatly reduce the amount of code that needs to be specified for each extension and the high-level synthesis can achieve comparable area, performance, and power consumption to handwritten RTL.
{"title":"Fast development of hardware-based run-time monitors through architecture framework and high-level synthesis","authors":"Mohamed Ismail, G. Suh","doi":"10.1109/ICCD.2012.6378669","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378669","url":null,"abstract":"Recent work has shown that hardware-based runtime monitoring techniques can significantly enhance security and reliability of computing systems with minimal performance and energy overheads. However, the cost and time for implementing such a hardware-based mechanism presents a major challenge in deploying the run-time monitoring techniques in real systems. This paper addresses this design complexity problem through a common architecture framework and high-level synthesis. Similar to customizable processors such as Tensilica Xtensa where designers only need to write a small piece of code that describes a custom instruction, our framework enables designers to only specify monitoring operations. The framework provides common functions such as collecting a trace of execution, maintaining meta-data, and interfacing with software. To further reduce the design complexity, we also explore using a high-level synthesis tool (Cadence C-to-Silicon) so that hardware monitors can be described in a high-level language (SystemC) instead of in RTL such as Verilog and VHDL. To evaluate our approach, we implemented a set of monitors including soft-error checking, uninitialized memory checking, dynamic information flow tracking, and array boundary checking in our framework. Our results suggest that our monitor framework can greatly reduce the amount of code that needs to be specified for each extension and the high-level synthesis can achieve comparable area, performance, and power consumption to handwritten RTL.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130998944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}