Over the past 40 years, the semiconductor industry has exponentially driven cost per function down following the oft stated Moores Law. It is becoming increasingly difficult to scale as we move into the 32 nm and beyond process nodes due both to physics and economics. A lower cost alternative method of scaling is becoming more available in the form of vertical chip integration. Many manufacturers now offer a range of package level integration solutions from traditional planar approaches to commonly used die stacking and recently introduced die level 3-D integration. With the introduction of 3-D integration, designers and system integrators can now consider physical design optimizations which include functional stacking, through silicon interconnect to reduce power and signal latency, and optimized manufacturing cost. To enable design teams to take advantage of the benefits available with this technology, new capabilities must be developed to support the design and implementation process. This support must start at the architectural level where issues of robustness, reliability, testability and power must be thoroughly studied. Support must continue through to manufacturing, packaging, and final test development. In this presentation we will explore how existing design technology and methods can be practically evolved to support the powerful scaling capabilities inherent in 3-D integration technology. Specifically we will cover Architectural design space exploration, functional partitioning, physical planning, and timing/SI/thermal/yield analysis for 3-D structures.
{"title":"3-D semiconductor’s: More from moore","authors":"T. Vucurevich","doi":"10.1145/1391469.1391640","DOIUrl":"https://doi.org/10.1145/1391469.1391640","url":null,"abstract":"Over the past 40 years, the semiconductor industry has exponentially driven cost per function down following the oft stated Moores Law. It is becoming increasingly difficult to scale as we move into the 32 nm and beyond process nodes due both to physics and economics. A lower cost alternative method of scaling is becoming more available in the form of vertical chip integration. Many manufacturers now offer a range of package level integration solutions from traditional planar approaches to commonly used die stacking and recently introduced die level 3-D integration. With the introduction of 3-D integration, designers and system integrators can now consider physical design optimizations which include functional stacking, through silicon interconnect to reduce power and signal latency, and optimized manufacturing cost. To enable design teams to take advantage of the benefits available with this technology, new capabilities must be developed to support the design and implementation process. This support must start at the architectural level where issues of robustness, reliability, testability and power must be thoroughly studied. Support must continue through to manufacturing, packaging, and final test development. In this presentation we will explore how existing design technology and methods can be practically evolved to support the powerful scaling capabilities inherent in 3-D integration technology. Specifically we will cover Architectural design space exploration, functional partitioning, physical planning, and timing/SI/thermal/yield analysis for 3-D structures.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123200703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Once a design is both retimed and sequentially optimized, sequential equivalence verification becomes very hard since retiming breaks the equivalence of the retimed sub-blocks although the design equivalence is preserved. This paper presents a novel compositional algorithm to verify sequential equivalence of large designs that are not only retimed but also optimized sequentially and combinationally. With a new notion of conditional equivalence in the presence of retiming, the proposed compositional algorithm performs hierarchical verification by checking whether each sub-block is conditionally equivalent, then checking whether the conditions are justified on their parent block by temporal equivalence. This is the first compositional algorithm handling both retiming and sequential optimizations hierarchically. The proposed approach is completely automatic and orthogonal to any existing sequential equivalence checker. The experimental results show that the proposed algorithm can handle large industrial designs that cannot be verified by the existing methods on sequential equivalence checking.
{"title":"Compositional verification of retiming and sequential optimizations","authors":"In-Ho Moon","doi":"10.1145/1391469.1391506","DOIUrl":"https://doi.org/10.1145/1391469.1391506","url":null,"abstract":"Once a design is both retimed and sequentially optimized, sequential equivalence verification becomes very hard since retiming breaks the equivalence of the retimed sub-blocks although the design equivalence is preserved. This paper presents a novel compositional algorithm to verify sequential equivalence of large designs that are not only retimed but also optimized sequentially and combinationally. With a new notion of conditional equivalence in the presence of retiming, the proposed compositional algorithm performs hierarchical verification by checking whether each sub-block is conditionally equivalent, then checking whether the conditions are justified on their parent block by temporal equivalence. This is the first compositional algorithm handling both retiming and sequential optimizations hierarchically. The proposed approach is completely automatic and orthogonal to any existing sequential equivalence checker. The experimental results show that the proposed algorithm can handle large industrial designs that cannot be verified by the existing methods on sequential equivalence checking.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"260 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131571346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose the first method for designing N-variant sequential circuits. The flexibility provided by the N-variants enables a number of important tasks, including IP protection, IP metering, security, design optimization, self-adaptation and fault-tolerance. The method is based on extending the finite state machine (FSM) of the design to include multiple variants of the same design specification. The state transitions are managed by added signals that may come from various triggers depending on the target application. We devise an algorithm for implementing the N-variant IC design. We discuss the necessary manipulations of the added signals that would facilitate the various tasks. The key advantage to integrating the heterogeneity in the functional specification of the design is that we can configure the variants during or post-manufacturing, but removal, extraction or deletion of the variants is not viable. Experimental results on benchmark circuits demonstrate that the method can be automatically and efficiently implemented. Because of its lightweight, N-variant design is particularly well-suited for securing embedded systems. As a proof-of-concept, we implement the N-variant method for content protection in portable media players, e.g., iPod. We discuss how the N-variant design methodology readily enables new digital rights management methods.
{"title":"N-variant IC design: Methodology and applications","authors":"Y. Alkabani, F. Koushanfar","doi":"10.1145/1391469.1391606","DOIUrl":"https://doi.org/10.1145/1391469.1391606","url":null,"abstract":"We propose the first method for designing N-variant sequential circuits. The flexibility provided by the N-variants enables a number of important tasks, including IP protection, IP metering, security, design optimization, self-adaptation and fault-tolerance. The method is based on extending the finite state machine (FSM) of the design to include multiple variants of the same design specification. The state transitions are managed by added signals that may come from various triggers depending on the target application. We devise an algorithm for implementing the N-variant IC design. We discuss the necessary manipulations of the added signals that would facilitate the various tasks. The key advantage to integrating the heterogeneity in the functional specification of the design is that we can configure the variants during or post-manufacturing, but removal, extraction or deletion of the variants is not viable. Experimental results on benchmark circuits demonstrate that the method can be automatically and efficiently implemented. Because of its lightweight, N-variant design is particularly well-suited for securing embedded systems. As a proof-of-concept, we implement the N-variant method for content protection in portable media players, e.g., iPod. We discuss how the N-variant design methodology readily enables new digital rights management methods.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131577579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent literature it has been reported that Dynamic Power Management (DPM) may lead to decreased reliability in real-time embedded systems. The ever-shrinking device sizes contribute further to this problem. In this paper, we present a reliability aware power management algorithm that aims at reducing energy consumption while preserving the overall system reliability. The idea behind the proposed scheme is to utilize the dynamic slack to scale down processes while ensuring that the overall system reliability does not reduce drastically. The proposed algorithm employs a proportional feedback controller to keep track of the overall miss ratio of a system of tasks and provide additional level of fault-tolerance based on demand. It was tested with both real-world and synthetic task sets and simulation results have been presented. Both fixed and dynamic priority scheduling policies have been considered for demonstration of results.
{"title":"Feedback-controlled reliability-aware power management for real-time embedded systems","authors":"R. Sridharan, Nikhil Gupta, R. Mahapatra","doi":"10.1145/1391469.1391517","DOIUrl":"https://doi.org/10.1145/1391469.1391517","url":null,"abstract":"In recent literature it has been reported that Dynamic Power Management (DPM) may lead to decreased reliability in real-time embedded systems. The ever-shrinking device sizes contribute further to this problem. In this paper, we present a reliability aware power management algorithm that aims at reducing energy consumption while preserving the overall system reliability. The idea behind the proposed scheme is to utilize the dynamic slack to scale down processes while ensuring that the overall system reliability does not reduce drastically. The proposed algorithm employs a proportional feedback controller to keep track of the overall miss ratio of a system of tasks and provide additional level of fault-tolerance based on demand. It was tested with both real-world and synthetic task sets and simulation results have been presented. Both fixed and dynamic priority scheduling policies have been considered for demonstration of results.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130356042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The creation of an FPGA requires extensive transistor-level design. This is necessary for both the final design, and during architecture exploration, when many different logic and routing architectures are considered. For such explorations, it is not feasible to spend significant amounts of time on transistor-level design. This paper presents an automated transistor sizing tool for FPGA architecture exploration that uses a two-phased approach - a coarse rapid phase with simple modeling followed by refinement with much more accurate models. The output of the system is a design optimized towards a specific area-delay criterion. We compare the quality of our results to prior manual and partially automated approaches. Also, our tool has been used to produce hundreds of candidate architectures which we are releasing to support future high quality explorations.
{"title":"Automated transistor sizing for FPGA architecture exploration","authors":"Ian Kuon, Jonathan Rose","doi":"10.1145/1391469.1391671","DOIUrl":"https://doi.org/10.1145/1391469.1391671","url":null,"abstract":"The creation of an FPGA requires extensive transistor-level design. This is necessary for both the final design, and during architecture exploration, when many different logic and routing architectures are considered. For such explorations, it is not feasible to spend significant amounts of time on transistor-level design. This paper presents an automated transistor sizing tool for FPGA architecture exploration that uses a two-phased approach - a coarse rapid phase with simple modeling followed by refinement with much more accurate models. The output of the system is a design optimized towards a specific area-delay criterion. We compare the quality of our results to prior manual and partially automated approaches. Also, our tool has been used to produce hundreds of candidate architectures which we are releasing to support future high quality explorations.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129524953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivated by excessively high benchmarking efforts caused by rapidly expanding design space and prevailing practices based on ad-hoc and subjective schemes, this paper seeks to improve simulation efficiency by proposing a novel methodology that combines two statistical analyses and one quantitative heuristic capable of subsetting a given benchmark suite based on the targeted processor configuration and desired variance coverage. We demonstrate the usage and effectiveness of the proposed technique by conducting a thorough case study on ImplantBench suite evaluating high/mid/low-end machine configurations modeled after three commercial embedded processors.
{"title":"Improve simulation efficiency using statistical benchmark subsetting - An implantbench case study","authors":"Zhanpeng Jin, A. Cheng","doi":"10.1145/1391469.1391713","DOIUrl":"https://doi.org/10.1145/1391469.1391713","url":null,"abstract":"Motivated by excessively high benchmarking efforts caused by rapidly expanding design space and prevailing practices based on ad-hoc and subjective schemes, this paper seeks to improve simulation efficiency by proposing a novel methodology that combines two statistical analyses and one quantitative heuristic capable of subsetting a given benchmark suite based on the targeted processor configuration and desired variance coverage. We demonstrate the usage and effectiveness of the proposed technique by conducting a thorough case study on ImplantBench suite evaluating high/mid/low-end machine configurations modeled after three commercial embedded processors.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130891910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose an effective multiple defect diagnosis methodology that does not depend on failing pattern characteristics. The methodology consists of a conservative defect site identification and elimination algorithm, and an innovative path-based defect site elimination technique. The search space of the diagnosis method does not grow exponentially with the number of defects in the circuit under diagnosis. Simulation experiments show that this method can effectively diagnose circuits that are affected by 10 or more faults that include multiple stuck-at, bridge and transistor stuck-open faults.
{"title":"Multiple defect diagnosis using no assumptions on failing pattern characteristics","authors":"Xiaochun Yu, R. D. Blanton","doi":"10.1145/1391469.1391567","DOIUrl":"https://doi.org/10.1145/1391469.1391567","url":null,"abstract":"We propose an effective multiple defect diagnosis methodology that does not depend on failing pattern characteristics. The methodology consists of a conservative defect site identification and elimination algorithm, and an innovative path-based defect site elimination technique. The search space of the diagnosis method does not grow exponentially with the number of defects in the circuit under diagnosis. Simulation experiments show that this method can effectively diagnose circuits that are affected by 10 or more faults that include multiple stuck-at, bridge and transistor stuck-open faults.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"37 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132855480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Melanie Elm, H. Wunderlich, M. Imhof, Christian G. Zoellin, J. Leenstra, Nicolas Mäding
An effective technique to save power during scan based test is to switch off unused scan chains. The results obtained with this method strongly depend on the mapping of scan flip-flops into scan chains, which determines how many chains can be deactivated per pattern. In this paper, a new method to cluster flip-flops into scan chains is presented, which minimizes the power consumption during test. It is not dependent on a test set and can improve the performance of any test power reduction technique consequently. The approach does not specify any ordering inside the chains and fits seamlessly to any standard tool for scan chain integration. The application of known test power reduction techniques to the optimized scan chain configurations shows significant improvements for large industrial circuits.
{"title":"Scan chain clustering for test power reduction","authors":"Melanie Elm, H. Wunderlich, M. Imhof, Christian G. Zoellin, J. Leenstra, Nicolas Mäding","doi":"10.1145/1391469.1391680","DOIUrl":"https://doi.org/10.1145/1391469.1391680","url":null,"abstract":"An effective technique to save power during scan based test is to switch off unused scan chains. The results obtained with this method strongly depend on the mapping of scan flip-flops into scan chains, which determines how many chains can be deactivated per pattern. In this paper, a new method to cluster flip-flops into scan chains is presented, which minimizes the power consumption during test. It is not dependent on a test set and can improve the performance of any test power reduction technique consequently. The approach does not specify any ordering inside the chains and fits seamlessly to any standard tool for scan chain integration. The application of known test power reduction techniques to the optimized scan chain configurations shows significant improvements for large industrial circuits.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130942779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Using software-controlled scratch-pad memory (SPM) in systems-on-chip has the potential of reducing power consumption by using design-time application knowledge to reduce memory accesses and processor stalls. This paper presents a fully automatic application analysis and transformation tool which selects data-structures for transfer to the SPM and schedules data transfers between background memory and SPM (pre-fetching) to achieve both high performance and low power consumption. A case study applying this tool on an MPEG-4 video encoder shows an overall power reduction of 25%, a 40% power reduction in just the memories and a 40% reduction in processor cycles as compared to an optimized hardware-cache based solution.
{"title":"An automatic Scratch Pad Memory management tool and MPEG-4 encoder case study","authors":"R. Baert, E. D. Greef, E. Brockmeyer","doi":"10.1145/1391469.1391520","DOIUrl":"https://doi.org/10.1145/1391469.1391520","url":null,"abstract":"Using software-controlled scratch-pad memory (SPM) in systems-on-chip has the potential of reducing power consumption by using design-time application knowledge to reduce memory accesses and processor stalls. This paper presents a fully automatic application analysis and transformation tool which selects data-structures for transfer to the SPM and schedules data transfers between background memory and SPM (pre-fetching) to achieve both high performance and low power consumption. A case study applying this tool on an MPEG-4 video encoder shows an overall power reduction of 25%, a 40% power reduction in just the memories and a 40% reduction in processor cycles as compared to an optimized hardware-cache based solution.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114004900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yashuai Lü, Li Shen, Libo Huang, Zhiying Wang, Nong Xiao
Compared with single-issue general purpose processors (GPPs), extensible multi-issue/VLIW processors can exploit instruction - level parallelism, which are more suitable for computation intensive tasks. Moreover, they offer the ability of customizing computation accelerators for an application domain. In this paper, we present an automated methodology that customizes computation accelerators for the multi-issue/VLIW extensible processors, where several techniques are also proposed to optimize the design of an accelerator.
{"title":"Customizing computation accelerators for extensible multi-issue processors with effective optimization techniques","authors":"Yashuai Lü, Li Shen, Libo Huang, Zhiying Wang, Nong Xiao","doi":"10.1145/1391469.1391519","DOIUrl":"https://doi.org/10.1145/1391469.1391519","url":null,"abstract":"Compared with single-issue general purpose processors (GPPs), extensible multi-issue/VLIW processors can exploit instruction - level parallelism, which are more suitable for computation intensive tasks. Moreover, they offer the ability of customizing computation accelerators for an application domain. In this paper, we present an automated methodology that customizes computation accelerators for the multi-issue/VLIW extensible processors, where several techniques are also proposed to optimize the design of an accelerator.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114410611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}