We present novel and efficient methods for on-line testing in FPGAs. The testing approach uses a ROving TEster (ROTE), which has provable diagnosabilities and is also faster than prior FPGA testing methods. We present 1- and 2-diagnosable built-in self-tester (BISTer) designs that make up the ROTE, and that avoid expensive adaptive diagnosis. To the best of our knowledge, this is the first time that a BISTer design with diagnosability greater than one has been developed for FPGAs. We also develop functional testing methods that test PLBs in only two circuit functions that will be mapped to them (as opposed to testing PLBs in all their operational modes) as the ROTE moves across a functioning FPGA. Simulation results show that our 1-diagnosable BISTer and our functional testing technique leads to significantly more accurate (98% fault coverage at a fault/defect density of 10%) and faster test-and-diagnosis of FPGAs than achieved by previous work. The fault coverage of ROTE is also expected to be high at fault/defect densities of up to 25% using our 1-diagnosable BISTer and up to 33% using our 2-diagnosable BISTer. Our methods should thus prove useful for testing current very deep submicron FPGAs as well as future nano-CMOS and molecular nanotechnology FPGAs in which defect densities are expected to be in the 10% range.
{"title":"Efficient on-line testing of FPGAs with provable diagnosabilities","authors":"V. Verma, S. Dutt, Vishal Suthar","doi":"10.1145/996566.996705","DOIUrl":"https://doi.org/10.1145/996566.996705","url":null,"abstract":"We present novel and efficient methods for on-line testing in FPGAs. The testing approach uses a ROving TEster (ROTE), which has provable diagnosabilities and is also faster than prior FPGA testing methods. We present 1- and 2-diagnosable built-in self-tester (BISTer) designs that make up the ROTE, and that avoid expensive adaptive diagnosis. To the best of our knowledge, this is the first time that a BISTer design with diagnosability greater than one has been developed for FPGAs. We also develop functional testing methods that test PLBs in only two circuit functions that will be mapped to them (as opposed to testing PLBs in all their operational modes) as the ROTE moves across a functioning FPGA. Simulation results show that our 1-diagnosable BISTer and our functional testing technique leads to significantly more accurate (98% fault coverage at a fault/defect density of 10%) and faster test-and-diagnosis of FPGAs than achieved by previous work. The fault coverage of ROTE is also expected to be high at fault/defect densities of up to 25% using our 1-diagnosable BISTer and up to 33% using our 2-diagnosable BISTer. Our methods should thus prove useful for testing current very deep submicron FPGAs as well as future nano-CMOS and molecular nanotechnology FPGAs in which defect densities are expected to be in the 10% range.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"189 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124242498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With deep-sub-micron (DSM) technology, statistical timing analysis becomes increasingly crucial to characterize signal transmission over global interconnect wires. In this paper, a novel statistical timing analysis approach has been developed to analyze the behavior of two important pipelined architectures for multiple clock-cycle global interconnect, namely, the flip-flop inserted global wire and the latch inserted global wire. We present analytical formula that is based on parameters obtained using Monte Carlo simulation. These results enable a global interconnect designer to explore design trade-offs between clock frequency and probability of bit-error during data transmission.
{"title":"Statistical timing analysis in sequential circuit for on-chip global interconnect pipelining","authors":"Lizheng Zhang, Y. Hu, C. C. Chen","doi":"10.1145/996566.996806","DOIUrl":"https://doi.org/10.1145/996566.996806","url":null,"abstract":"With deep-sub-micron (DSM) technology, statistical timing analysis becomes increasingly crucial to characterize signal transmission over global interconnect wires. In this paper, a novel statistical timing analysis approach has been developed to analyze the behavior of two important pipelined architectures for multiple clock-cycle global interconnect, namely, the flip-flop inserted global wire and the latch inserted global wire. We present analytical formula that is based on parameters obtained using Monte Carlo simulation. These results enable a global interconnect designer to explore design trade-offs between clock frequency and probability of bit-error during data transmission.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131589794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fast and accurate estimation is critical for exploration of any design space in general. As we move to higher levels of abstraction, estimation of complete system designs at each level of abstraction is needed. Estimation should provide a variety of useful metrics relevant to design tasks in different domains and at each stage in the design process.In this paper, we present such a system-level estimation approach based on a novel combination of dynamic profiling and static retargeting. Co-estimation of complete system implementations is fast while accurately reflecting even dynamic effects. Furthermore, retargetable profiling is supported at multiple levels of abstraction, providing multiple design quality metrics at each level. Experimental results show the applicability of the approach for efficient design space exploration.
{"title":"Retargetable profiling for rapid, early system-level design space exploration","authors":"Lukai Cai, A. Gerstlauer, D. Gajski","doi":"10.1145/996566.996651","DOIUrl":"https://doi.org/10.1145/996566.996651","url":null,"abstract":"Fast and accurate estimation is critical for exploration of any design space in general. As we move to higher levels of abstraction, estimation of complete system designs at each level of abstraction is needed. Estimation should provide a variety of useful metrics relevant to design tasks in different domains and at each stage in the design process.In this paper, we present such a system-level estimation approach based on a novel combination of dynamic profiling and static retargeting. Co-estimation of complete system implementations is fast while accurately reflecting even dynamic effects. Furthermore, retargetable profiling is supported at multiple levels of abstraction, providing multiple design quality metrics at each level. Experimental results show the applicability of the approach for efficient design space exploration.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131726530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we address the problem of signal pruning in static timing analysis (STA). Traditionally, signals are propagated through the circuit and are pruned, such that only the signal with the latest arrival time at each node is propagated forward. This signal pruning is a key to the linear run time of STA. However, it was previously observed that a signal with the latest arrival time may not be the most critical signal, as an earlier signal with a larger transition time can result in a longer delay in the down-stream logic. Hence, arrival time based pruning can result in an optimistic delay, incorrect critical paths, and discontinuities of the delay during circuit optimization. Although algorithms were proposed to remedy this issue, they rely on propagation of multiple signals and have an exponential worst-case complexity. In this paper, we propose a new timing analysis algorithm, which uses a two pass traversal of the circuit. In the initial backward traversal, we construct delay tables which record the required time at a node as a function of the transition time at that node. This is followed by a forward traversal where signals are pruned not based on arrival times but based on slack. The proposed algorithm corrects the accuracy problems of the arrival time based pruning while at the same time maintaining the linear run time of STA. We implemented our algorithm and demonstrated its accuracy and efficiency.
{"title":"Static timing analysis using backward signal propagation","authors":"Dongwook Lee, V. Zolotov, D. Blaauw","doi":"10.1145/996566.996747","DOIUrl":"https://doi.org/10.1145/996566.996747","url":null,"abstract":"In this paper, we address the problem of signal pruning in static timing analysis (STA). Traditionally, signals are propagated through the circuit and are pruned, such that only the signal with the latest arrival time at each node is propagated forward. This signal pruning is a key to the linear run time of STA. However, it was previously observed that a signal with the latest arrival time may not be the most critical signal, as an earlier signal with a larger transition time can result in a longer delay in the down-stream logic. Hence, arrival time based pruning can result in an optimistic delay, incorrect critical paths, and discontinuities of the delay during circuit optimization. Although algorithms were proposed to remedy this issue, they rely on propagation of multiple signals and have an exponential worst-case complexity. In this paper, we propose a new timing analysis algorithm, which uses a two pass traversal of the circuit. In the initial backward traversal, we construct delay tables which record the required time at a node as a function of the transition time at that node. This is followed by a forward traversal where signals are pruned not based on arrival times but based on slack. The proposed algorithm corrects the accuracy problems of the arrival time based pruning while at the same time maintaining the linear run time of STA. We implemented our algorithm and demonstrated its accuracy and efficiency.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121572979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhong Xiu, James D. Z. Ma, Suzanne M. Fowler, Rob A. Rutenbar
Grid-warping is a new placement algorithm based on a strikingly simple idea: rather than move the gates to optimize their location, we elastically deform a model of the 2-D chip surface on which the gates have been roughly placed, "stretching" it until the gates arrange themselves to our liking. Put simply: we move the grid, not the gates. Deforming the elastic grid is a surprisingly simple, low-dimensional nonlinear optimization, and augments a traditional quadratic formulation. A preliminary implementation, WARP1, is already competitive with most recently published placers, e.g., placements that average 4% better wirelength, 40% faster than GORDIAN-L-DOMINO.
{"title":"Large-scale placement by grid-warping","authors":"Zhong Xiu, James D. Z. Ma, Suzanne M. Fowler, Rob A. Rutenbar","doi":"10.1145/996566.996669","DOIUrl":"https://doi.org/10.1145/996566.996669","url":null,"abstract":"Grid-warping is a new placement algorithm based on a strikingly simple idea: rather than move the gates to optimize their location, we elastically deform a model of the 2-D chip surface on which the gates have been roughly placed, \"stretching\" it until the gates arrange themselves to our liking. Put simply: we move the grid, not the gates. Deforming the elastic grid is a surprisingly simple, low-dimensional nonlinear optimization, and augments a traditional quadratic formulation. A preliminary implementation, WARP1, is already competitive with most recently published placers, e.g., placements that average 4% better wirelength, 40% faster than GORDIAN-L-DOMINO.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"162 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114059288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gang Zhang, E. A. Dengi, R. Rohrer, Rob A. Rutenbar, L. Carley
An electrical and physical synthesis flow for high-speed analog and radio-frequency circuits is presented in this paper. Novel techniques aiming at fast parasitic closure are employed throughout the flow. Parasitic corners generated based on the earlier placement statistics are included for circuit resizing to enable parasitic robust designs. A performance-driven placement with simultaneous fast incremental global routing is proposed to achieve accurate parasitic estimation. Device tuning is utilized during layout to compensate for layout induced performance degradations. This methodology allows sophisticated macromodels of performances versus device variables and parasitics to be used during layout synthesis to make it truly performance-driven. Experimental results of a 4GHz LNA and a mixer demonstrate fast parasitic closure with this methodology.
{"title":"A synthesis flow toward fast parasitic closure for radio-frequency integrated circuits","authors":"Gang Zhang, E. A. Dengi, R. Rohrer, Rob A. Rutenbar, L. Carley","doi":"10.1145/996566.996612","DOIUrl":"https://doi.org/10.1145/996566.996612","url":null,"abstract":"An electrical and physical synthesis flow for high-speed analog and radio-frequency circuits is presented in this paper. Novel techniques aiming at fast parasitic closure are employed throughout the flow. Parasitic corners generated based on the earlier placement statistics are included for circuit resizing to enable parasitic robust designs. A performance-driven placement with simultaneous fast incremental global routing is proposed to achieve accurate parasitic estimation. Device tuning is utilized during layout to compensate for layout induced performance degradations. This methodology allows sophisticated macromodels of performances versus device variables and parasitics to be used during layout synthesis to make it truly performance-driven. Experimental results of a 4GHz LNA and a mixer demonstrate fast parasitic closure with this methodology.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114837716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Biswas, Vinay Choudhary, K. Atasu, L. Pozzi, P. Ienne, N. Dutt
Automatic generation of Instruction Set Extensions (ISEs), to be executed on a custom processing unit or a coprocessor is an important step towards processor customization. A typical goal of a manual designer is to combine a large number of atomic instructions into an ISE satisfying microarchitectural constraints. However, memory operations pose a challenge for previous ISE approaches by limiting the size of the resulting instruction. In this paper, we introduce memory elements into custom units which result in ISEs closer to those sought after by the designers. We consider two kinds of memory elements for mapping to the specialized hardware: small hardware tables and architecturally-visible state registers. We devised a genetic algorithm to specifically exploit opportunities of introducing memory elements during ISE generation. Finally, we demonstrate the effectiveness of our approach by a detailed study of the variation in performance, area and energy in the presence of the generated ISEs, on a number of MediaBench, EEMBC and cryptographic applications. With the introduction of memory, the average speedup varied from 2.7X to 5X depending on the architectural configuration with a nominal area overhead. Moreover, we obtained an average energy reduction of 26% with respect to a 32-KB cache.
{"title":"Introduction of local memory elements in instruction set extensions","authors":"P. Biswas, Vinay Choudhary, K. Atasu, L. Pozzi, P. Ienne, N. Dutt","doi":"10.1145/996566.996765","DOIUrl":"https://doi.org/10.1145/996566.996765","url":null,"abstract":"Automatic generation of Instruction Set Extensions (ISEs), to be executed on a custom processing unit or a coprocessor is an important step towards processor customization. A typical goal of a manual designer is to combine a large number of atomic instructions into an ISE satisfying microarchitectural constraints. However, memory operations pose a challenge for previous ISE approaches by limiting the size of the resulting instruction. In this paper, we introduce memory elements into custom units which result in ISEs closer to those sought after by the designers. We consider two kinds of memory elements for mapping to the specialized hardware: small hardware tables and architecturally-visible state registers. We devised a genetic algorithm to specifically exploit opportunities of introducing memory elements during ISE generation. Finally, we demonstrate the effectiveness of our approach by a detailed study of the variation in performance, area and energy in the presence of the generated ISEs, on a number of MediaBench, EEMBC and cryptographic applications. With the introduction of memory, the average speedup varied from 2.7X to 5X depending on the architectural configuration with a nominal area overhead. Moreover, we obtained an average energy reduction of 26% with respect to a 32-KB cache.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115049096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sungju Park, Sangwook Cho, Seiyang Yang, M. Ciesielski
In order to improve the testabilities and power consumption, a new state assignment technique based on m-block partition is introduced in this paper. The length and number of feedback cycles are reduced with minimal switching activity on the state variables. Experiment shows significant improvement in power dissipation and testabilities for benchmark circuits.
{"title":"A now state assignment technique for testing and low power","authors":"Sungju Park, Sangwook Cho, Seiyang Yang, M. Ciesielski","doi":"10.1145/996566.996707","DOIUrl":"https://doi.org/10.1145/996566.996707","url":null,"abstract":"In order to improve the testabilities and power consumption, a new state assignment technique based on m-block partition is introduced in this paper. The length and number of feedback cycles are reduced with minimal switching activity on the state variables. Experiment shows significant improvement in power dissipation and testabilities for benchmark circuits.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115641815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
There is an increased dominance of intra-die process variations, creating a need for an accurate and fast staristical riming analysis. Most of the recent proposed approaches assume a Single Input Switching model. Our experiments show that SIS underestimates the mean delay of a stage by upto 20% and overestimates the standard deviation upto 26%. We also show fhar Multiple Input Switching has a greater impacr on sratisrical timing, rhan regular static timing analysis. Hence, we propose a modeling technique for gate delay variability, considering MIS. Our model can he efficiently incorporated into most of the statistical riming annlysis frameworks. On average over all rest cases, our approach underesrimares mean delay of a sfage by 0.01 % and overestimates the standard deviation by only 2%. hence increosing the robustness to process variations. Our modeling technique is independent of the deterministic MIS model, and we show that its sensitivity to variations in the MIS model is small.
{"title":"Statistical gate delay model considering multiple input switching","authors":"A. Agarwal, F. Dartu, D. Blaauw","doi":"10.1145/996566.996746","DOIUrl":"https://doi.org/10.1145/996566.996746","url":null,"abstract":"There is an increased dominance of intra-die process variations, creating a need for an accurate and fast staristical riming analysis. Most of the recent proposed approaches assume a Single Input Switching model. Our experiments show that SIS underestimates the mean delay of a stage by upto 20% and overestimates the standard deviation upto 26%. We also show fhar Multiple Input Switching has a greater impacr on sratisrical timing, rhan regular static timing analysis. Hence, we propose a modeling technique for gate delay variability, considering MIS. Our model can he efficiently incorporated into most of the statistical riming annlysis frameworks. On average over all rest cases, our approach underesrimares mean delay of a sfage by 0.01 % and overestimates the standard deviation by only 2%. hence increosing the robustness to process variations. Our modeling technique is independent of the deterministic MIS model, and we show that its sensitivity to variations in the MIS model is small.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125044333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Presented is a design paradigm, pioneered at the University of Illinois in 1997, for reliable and energy-efficient system-on-a-chip (SOC) in nanometer process technologies. These technologies are Characterized by non-idealities such as coupling, leakage, soft errors, and process variations, which contribute to a reliability problem. Increasing complexity of systems-on-a-chip (SOC) leads to a related power problem. The proposed paradigm provides solutions to both problems by viewing SOCs as communication networks, and employs ideas from error-control coding, communications, and information theory in order to achieve the dual goals ofreliability and energy-efficiency.
{"title":"A communication-theoretic design paradigm for reliable SOCs","authors":"Naresh R Shanbhag","doi":"10.1145/996566.996589","DOIUrl":"https://doi.org/10.1145/996566.996589","url":null,"abstract":"Presented is a design paradigm, pioneered at the University of Illinois in 1997, for reliable and energy-efficient system-on-a-chip (SOC) in nanometer process technologies. These technologies are Characterized by non-idealities such as coupling, leakage, soft errors, and process variations, which contribute to a reliability problem. Increasing complexity of systems-on-a-chip (SOC) leads to a related power problem. The proposed paradigm provides solutions to both problems by viewing SOCs as communication networks, and employs ideas from error-control coding, communications, and information theory in order to achieve the dual goals ofreliability and energy-efficiency.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129880268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}