M. Hariyama, Weisheng Chong, S. Ogata, M. Kameyama
Dynamically-programmable gate arrays (DPGAs) promise lower-cost implementations than conventional FPGAs since they efficiently reuse limited hardware resources in time. One of typical DPGA architectures is a multi-context one. Multi-context FPGAs (MC-FPGAs) have multiple memory bits per configuration bit forming configuration planes for fast switching between contexts. The additional memory planes cause significant overhead in area and power consumption. To overcome the overhead, a fine-grained reconfigurable architecture called reconfigurable context memory (RCM) is presented based on the fact that there are redundancy and regularity in configuration bits between different contexts. A floating-MOS functional pass-gate, where storage and switch functions are merged, is used to construct the RCM area-efficiently.
{"title":"Novel switch block architecture using non-volatile functional pass-gate for multi-context FPGAs","authors":"M. Hariyama, Weisheng Chong, S. Ogata, M. Kameyama","doi":"10.1109/ISVLSI.2005.52","DOIUrl":"https://doi.org/10.1109/ISVLSI.2005.52","url":null,"abstract":"Dynamically-programmable gate arrays (DPGAs) promise lower-cost implementations than conventional FPGAs since they efficiently reuse limited hardware resources in time. One of typical DPGA architectures is a multi-context one. Multi-context FPGAs (MC-FPGAs) have multiple memory bits per configuration bit forming configuration planes for fast switching between contexts. The additional memory planes cause significant overhead in area and power consumption. To overcome the overhead, a fine-grained reconfigurable architecture called reconfigurable context memory (RCM) is presented based on the fact that there are redundancy and regularity in configuration bits between different contexts. A floating-MOS functional pass-gate, where storage and switch functions are merged, is used to construct the RCM area-efficiently.","PeriodicalId":158790,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design (ISVLSI'05)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114313906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Maili, C. Steger, R. Weiss, Rob Quigley, D. Dalton
This paper presents a hardware acceleration system based on a gate-level accelerator and an on-chip microprocessor enabling co-simulation of C-models with gate-level modules on the accelerator. This solution tackles the communication bottleneck that occurs when using hardware accelerators or emulators to speed up simulation. We analyze this bottleneck for the APPLES gate-level hardware accelerator and present the speedup that can be achieved by a prototype of the PowerPC-APPLES accelerator implemented on a Virtex2Pro FPGA on a PCI card.
{"title":"Reducing the communication bottleneck via on-chip cosimulation of gate-level HDL and C-models on a hardware accelerator","authors":"A. Maili, C. Steger, R. Weiss, Rob Quigley, D. Dalton","doi":"10.1109/ISVLSI.2005.61","DOIUrl":"https://doi.org/10.1109/ISVLSI.2005.61","url":null,"abstract":"This paper presents a hardware acceleration system based on a gate-level accelerator and an on-chip microprocessor enabling co-simulation of C-models with gate-level modules on the accelerator. This solution tackles the communication bottleneck that occurs when using hardware accelerators or emulators to speed up simulation. We analyze this bottleneck for the APPLES gate-level hardware accelerator and present the speedup that can be achieved by a prototype of the PowerPC-APPLES accelerator implemented on a Virtex2Pro FPGA on a PCI card.","PeriodicalId":158790,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design (ISVLSI'05)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116802432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
During fault diagnosis, the existence of equivalent faults, or faults that are not distinguished by the test set applied to the circuit, can create ambiguity as to the location of a defect. This happens if the circuit-under-test produces a response that matches the circuit response in the presence of two faults in different locations of the circuit. Equivalence between faults of different models, or a test set that does not distinguish two such faults, can increase the ambiguity as to the defect location as well as its type. We refer to this phenomenon as fault model aliasing. We study the extent to which fault model aliasing can be expected to occur under various test sets. We also describe a test generation procedure that can reduce it.
{"title":"Fault diagnosis and fault model aliasing","authors":"I. Pomeranz, S. Venkataraman, S. Reddy","doi":"10.1109/ISVLSI.2005.34","DOIUrl":"https://doi.org/10.1109/ISVLSI.2005.34","url":null,"abstract":"During fault diagnosis, the existence of equivalent faults, or faults that are not distinguished by the test set applied to the circuit, can create ambiguity as to the location of a defect. This happens if the circuit-under-test produces a response that matches the circuit response in the presence of two faults in different locations of the circuit. Equivalence between faults of different models, or a test set that does not distinguish two such faults, can increase the ambiguity as to the defect location as well as its type. We refer to this phenomenon as fault model aliasing. We study the extent to which fault model aliasing can be expected to occur under various test sets. We also describe a test generation procedure that can reduce it.","PeriodicalId":158790,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design (ISVLSI'05)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115082711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper investigates the problem of repeater insertion for global interconnects under given timing constraints. A novel technique is proposed to aggressively reduce the positive timing slack for maximal repeater reduction while maintaining timing closure. Our scheme is both fast and effective due to the judicious combination of accurate SPICE simulations and the simple Elmore delay model. In comparison with other repeater insertion solvers, our scheme achieves up to 41% reduction in total repeater width in comparable runtimes.
{"title":"RITC: repeater insertion with timing target compensation","authors":"Yuantao Peng, Xun Liu","doi":"10.1109/ISVLSI.2005.65","DOIUrl":"https://doi.org/10.1109/ISVLSI.2005.65","url":null,"abstract":"This paper investigates the problem of repeater insertion for global interconnects under given timing constraints. A novel technique is proposed to aggressively reduce the positive timing slack for maximal repeater reduction while maintaining timing closure. Our scheme is both fast and effective due to the judicious combination of accurate SPICE simulations and the simple Elmore delay model. In comparison with other repeater insertion solvers, our scheme achieves up to 41% reduction in total repeater width in comparable runtimes.","PeriodicalId":158790,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design (ISVLSI'05)","volume":"50 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115552408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sheng Sun, Yi Han, Xinyu Guo, Kian Haur Chong, L. McMurchie, C. Sechen
We present a fast 64b adder based on output prediction logic (OPL) that has a measured worst-case delay of 409ps, equivalent to 4.7 FO4 inverter delays for the TSMC 0.18/spl mu/m process that was used for fabrication. This normalized delay is 1.45X faster than the fastest previously reported 64b adder. The adder uses a modified radix-3 Kogge-Stone architecture and has 5 logic levels.
{"title":"409ps 4.7 FO4 64b adder based on output prediction logic in 0.18um CMOS","authors":"Sheng Sun, Yi Han, Xinyu Guo, Kian Haur Chong, L. McMurchie, C. Sechen","doi":"10.1109/ISVLSI.2005.2","DOIUrl":"https://doi.org/10.1109/ISVLSI.2005.2","url":null,"abstract":"We present a fast 64b adder based on output prediction logic (OPL) that has a measured worst-case delay of 409ps, equivalent to 4.7 FO4 inverter delays for the TSMC 0.18/spl mu/m process that was used for fabrication. This normalized delay is 1.45X faster than the fastest previously reported 64b adder. The adder uses a modified radix-3 Kogge-Stone architecture and has 5 logic levels.","PeriodicalId":158790,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design (ISVLSI'05)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129190674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pins serve as both the logical and physical interface between two levels in a hierarchical flow. Pin assignment is the placement of pins on the boundary of a chip or macro. Proper pin placement has a large influence on wire length. Experiments indicate a spread in wire length up to over 20%. To address the pin assignment problem, a modification to the well-known and widely used quadratic placement is introduced. This modification allows for the integration between pin assignment and global placement. Wire length within macros is minimized, while top-level considerations such as the relative position of macro and clusters of cells are taken into account in the form of a side assignment. As indicated by experimental results, integration is promising. More research is necessary to fully exploit the ideas in this paper, and assess the practical impact of the approach.
{"title":"Towards integration of quadratic placement and pin assignment","authors":"J. Westra, P. Groeneveld","doi":"10.1109/ISVLSI.2005.73","DOIUrl":"https://doi.org/10.1109/ISVLSI.2005.73","url":null,"abstract":"Pins serve as both the logical and physical interface between two levels in a hierarchical flow. Pin assignment is the placement of pins on the boundary of a chip or macro. Proper pin placement has a large influence on wire length. Experiments indicate a spread in wire length up to over 20%. To address the pin assignment problem, a modification to the well-known and widely used quadratic placement is introduced. This modification allows for the integration between pin assignment and global placement. Wire length within macros is minimized, while top-level considerations such as the relative position of macro and clusters of cells are taken into account in the form of a side assignment. As indicated by experimental results, integration is promising. More research is necessary to fully exploit the ideas in this paper, and assess the practical impact of the approach.","PeriodicalId":158790,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design (ISVLSI'05)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128569273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a progressive match filling (PMF) technique to reduce the peak current and power dissipation during the fast capture cycle in broadside delay fault testing. The proposed method fills the unspecified values (X) in the generated initialization vector such that the resulting launch vector at a minimal Hamming distance from the initialization vector. The proposed method does not require any hardware modification and can be used to obtain any test sets that require two pattern tests. Experimental results show that the proposed method reduces the peak current and power dissipation during the fast capture cycle by 40.59% on average and up to 54.17% for large ISC AS 89 circuits.
提出了一种递进匹配填充(PMF)技术,以降低宽边延迟故障测试中快速捕获周期的峰值电流和功耗。所提出的方法填充生成的初始化向量中的未指定值(X),从而使生成的发射向量与初始化向量具有最小的汉明距离。所提出的方法不需要任何硬件修改,并且可以用于获得需要两次模式测试的任何测试集。实验结果表明,该方法可将快速捕获周期的峰值电流和功耗平均降低40.59%,对于大型ISC AS 89电路,可降低54.17%。
{"title":"On reducing peak current and power during test","authors":"Wei Li, S. Reddy, I. Pomeranz","doi":"10.1109/ISVLSI.2005.53","DOIUrl":"https://doi.org/10.1109/ISVLSI.2005.53","url":null,"abstract":"This paper presents a progressive match filling (PMF) technique to reduce the peak current and power dissipation during the fast capture cycle in broadside delay fault testing. The proposed method fills the unspecified values (X) in the generated initialization vector such that the resulting launch vector at a minimal Hamming distance from the initialization vector. The proposed method does not require any hardware modification and can be used to obtain any test sets that require two pattern tests. Experimental results show that the proposed method reduces the peak current and power dissipation during the fast capture cycle by 40.59% on average and up to 54.17% for large ISC AS 89 circuits.","PeriodicalId":158790,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design (ISVLSI'05)","volume":"274 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116555328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present the design and evaluation of a two-phase resonant clock generation and distribution system with layout-extracted inductor parameters in a 0.13/spl mu/m copper process. The design includes a programmable replenishing clock generator and tunable capacitors that enable the exploration of skew, jitter, and clock amplitude. Our simulation results show that worst-case skew is within 8.5% of clock period in the range of 790MHz to 1.22GHz under a variety of load imbalance conditions. Furthermore, energy dissipation is at least 60% lower than conventional square waveform distribution.
{"title":"Two-phase resonant clock distribution","authors":"Juang-Ying Chueh, M. Papaefthymiou, C. Ziesler","doi":"10.1109/ISVLSI.2005.74","DOIUrl":"https://doi.org/10.1109/ISVLSI.2005.74","url":null,"abstract":"In this paper, we present the design and evaluation of a two-phase resonant clock generation and distribution system with layout-extracted inductor parameters in a 0.13/spl mu/m copper process. The design includes a programmable replenishing clock generator and tunable capacitors that enable the exploration of skew, jitter, and clock amplitude. Our simulation results show that worst-case skew is within 8.5% of clock period in the range of 790MHz to 1.22GHz under a variety of load imbalance conditions. Furthermore, energy dissipation is at least 60% lower than conventional square waveform distribution.","PeriodicalId":158790,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design (ISVLSI'05)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123805010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A high performance digital architecture for computing 2D convolution utilizing the quadrant symmetry of the kernels is proposed in this paper. Pixels in the four quadrants of the kernel region with respect to an image pixel are considered simultaneously for computing the partial products of the convolution sum. A novel data handling strategy to identify the pixels to be fed to different processing elements helps reducing the data storage requirements in the circuitry. The new design results in 75% reduction in multipliers and 50% reduction in adders when compared with the conventional systolic architecture. The proposed architecture design is capable of performing convolution operations with 14/spl times/14 kernel at a rate of 57 1024/spl times/1024 frames per second in a Xilinx 's Virtex 2v2000ff896-4 FPGA.
{"title":"An efficient VLSI architecture for 2-D convolution with quadrant symmetric kernels","authors":"Ming Z. Zhang, H. T. Ngo, A. Livingston, V. Asari","doi":"10.1109/ISVLSI.2005.15","DOIUrl":"https://doi.org/10.1109/ISVLSI.2005.15","url":null,"abstract":"A high performance digital architecture for computing 2D convolution utilizing the quadrant symmetry of the kernels is proposed in this paper. Pixels in the four quadrants of the kernel region with respect to an image pixel are considered simultaneously for computing the partial products of the convolution sum. A novel data handling strategy to identify the pixels to be fed to different processing elements helps reducing the data storage requirements in the circuitry. The new design results in 75% reduction in multipliers and 50% reduction in adders when compared with the conventional systolic architecture. The proposed architecture design is capable of performing convolution operations with 14/spl times/14 kernel at a rate of 57 1024/spl times/1024 frames per second in a Xilinx 's Virtex 2v2000ff896-4 FPGA.","PeriodicalId":158790,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design (ISVLSI'05)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126694799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Natarajan, V. Shankar, A. Maheshwari, W. Burleson
In this paper, solutions to memory design issues in nanometer CMOS are presented. First, a comparative study between various sense-amplifiers is presented in 70nm CMOS technology. Impact of process variation is studied on the performance of these sense-amplifiers. An improved bit-line leakage compensation scheme is proposed to ensure proper sensing in presence of leakage. Performance benefit of up to 68% can be obtained using this technique.
{"title":"Sensing design issues in deep submicron CMOS SRAMs","authors":"A. Natarajan, V. Shankar, A. Maheshwari, W. Burleson","doi":"10.1109/ISVLSI.2005.67","DOIUrl":"https://doi.org/10.1109/ISVLSI.2005.67","url":null,"abstract":"In this paper, solutions to memory design issues in nanometer CMOS are presented. First, a comparative study between various sense-amplifiers is presented in 70nm CMOS technology. Impact of process variation is studied on the performance of these sense-amplifiers. An improved bit-line leakage compensation scheme is proposed to ensure proper sensing in presence of leakage. Performance benefit of up to 68% can be obtained using this technique.","PeriodicalId":158790,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design (ISVLSI'05)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126085706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}