Pub Date : 2008-09-03DOI: 10.1109/MWSCAS.2008.4616834
K. Yelamarthi, C.-i.H. Chen
The complexity of timing optimization has been increasing rapidly in proportion to the shrinking CMOS device size, due to the increased number of channel-connected transistors in a path, and the rising magnitude of process variations. These significant challenges can be addressed through the implementation of designs with an optimal balance between static and dynamic circuits. This paper presents a process variation-aware path oriented in time (POINT) optimization flow for mixed-static-dynamic CMOS logic designs, where a design is partitioned into static and dynamic circuits based on timing critical paths. Implemented on a 64-b adder and ISCAS benchmark circuits, the POINT optimization flow demonstrated an average improvement in delay by 44% and average improvement in delay uncertainty from process variations by 37% in comparison with a state-of-the-art commercial optimization tool.
{"title":"A Path Oriented In Time optimization flow for mixed-static-dynamic CMOS logic","authors":"K. Yelamarthi, C.-i.H. Chen","doi":"10.1109/MWSCAS.2008.4616834","DOIUrl":"https://doi.org/10.1109/MWSCAS.2008.4616834","url":null,"abstract":"The complexity of timing optimization has been increasing rapidly in proportion to the shrinking CMOS device size, due to the increased number of channel-connected transistors in a path, and the rising magnitude of process variations. These significant challenges can be addressed through the implementation of designs with an optimal balance between static and dynamic circuits. This paper presents a process variation-aware path oriented in time (POINT) optimization flow for mixed-static-dynamic CMOS logic designs, where a design is partitioned into static and dynamic circuits based on timing critical paths. Implemented on a 64-b adder and ISCAS benchmark circuits, the POINT optimization flow demonstrated an average improvement in delay by 44% and average improvement in delay uncertainty from process variations by 37% in comparison with a state-of-the-art commercial optimization tool.","PeriodicalId":118637,"journal":{"name":"2008 51st Midwest Symposium on Circuits and Systems","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117166497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-09-03DOI: 10.1109/MWSCAS.2008.4616762
A. Gothandaraman, G. D. Peterson, R. Hinde, R. Harrison
The ground-state properties of atomic and molecular clusters can be obtained using Quantum Monte Carlo (QMC) simulations. We propose a reconfigurable hardware architecture using Field-Programmable Gate Arrays (FPGAs) to implement the kernels of the QMC application. To achieve higher clock rates, we experiment with different pipeline stages for each component of the design and develop a deeply pipelined architecture that provides the best performance in terms of clock rate, while at the same time has a modest use of embedded memory and multiplier resources so we can fit additional functions in a future implementation. Here, we discuss the details of the pipelined architecture and our design decisions while developing a general framework that can be used to obtain the potential energy of atomic or molecular clusters and extended to compute other useful properties.
{"title":"Design decisions in the pipelined architecture for Quantum Monte Carlo simulations","authors":"A. Gothandaraman, G. D. Peterson, R. Hinde, R. Harrison","doi":"10.1109/MWSCAS.2008.4616762","DOIUrl":"https://doi.org/10.1109/MWSCAS.2008.4616762","url":null,"abstract":"The ground-state properties of atomic and molecular clusters can be obtained using Quantum Monte Carlo (QMC) simulations. We propose a reconfigurable hardware architecture using Field-Programmable Gate Arrays (FPGAs) to implement the kernels of the QMC application. To achieve higher clock rates, we experiment with different pipeline stages for each component of the design and develop a deeply pipelined architecture that provides the best performance in terms of clock rate, while at the same time has a modest use of embedded memory and multiplier resources so we can fit additional functions in a future implementation. Here, we discuss the details of the pipelined architecture and our design decisions while developing a general framework that can be used to obtain the potential energy of atomic or molecular clusters and extended to compute other useful properties.","PeriodicalId":118637,"journal":{"name":"2008 51st Midwest Symposium on Circuits and Systems","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117231412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-09-03DOI: 10.1109/MWSCAS.2008.4616728
A. E. Zadeh
This paper presents switched-capacitor bandpass filters used within the sense system of implantable medical devices such as cardiac pacemakers and defibrillators. This work examines the methods; including filter architecture, discrete-time transformation, and operational amplifier (opamp) topology; to reduce current consumption and to lower supply voltages for switched capacitor filters to reach nano-watt level of power consumption. The implemented fourth-order intra-cardiac signal bandpass filter is in a standard analog CMOS process and has power consumption of 210 nW using 1024 Hz system clock.
{"title":"Nano-power switched-capacitor bandpass filters for medical implantable pacemakers and defibrillators","authors":"A. E. Zadeh","doi":"10.1109/MWSCAS.2008.4616728","DOIUrl":"https://doi.org/10.1109/MWSCAS.2008.4616728","url":null,"abstract":"This paper presents switched-capacitor bandpass filters used within the sense system of implantable medical devices such as cardiac pacemakers and defibrillators. This work examines the methods; including filter architecture, discrete-time transformation, and operational amplifier (opamp) topology; to reduce current consumption and to lower supply voltages for switched capacitor filters to reach nano-watt level of power consumption. The implemented fourth-order intra-cardiac signal bandpass filter is in a standard analog CMOS process and has power consumption of 210 nW using 1024 Hz system clock.","PeriodicalId":118637,"journal":{"name":"2008 51st Midwest Symposium on Circuits and Systems","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114142207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-09-03DOI: 10.1109/MWSCAS.2008.4616843
M. Sharawi
Design and characterization of high speed digital buses and interconnects is an essential part in the computer hardware development process. Signal Integrity (SI) testing and verification examines the signal levels, shapes and timing requirements against specifications. In this work, we present a full SI characterization and modelling of a peripheral component interconnect (PCI) bus as well as a PCI-extended (PCI-x) bus running at 66 MHz/133 MHz, respectively. Laboratory measurements show the compliance with specification timing and signal levels.
{"title":"Signal integrity characterization and modelling of a PCI/PCI-x 66/133 MHz bus","authors":"M. Sharawi","doi":"10.1109/MWSCAS.2008.4616843","DOIUrl":"https://doi.org/10.1109/MWSCAS.2008.4616843","url":null,"abstract":"Design and characterization of high speed digital buses and interconnects is an essential part in the computer hardware development process. Signal Integrity (SI) testing and verification examines the signal levels, shapes and timing requirements against specifications. In this work, we present a full SI characterization and modelling of a peripheral component interconnect (PCI) bus as well as a PCI-extended (PCI-x) bus running at 66 MHz/133 MHz, respectively. Laboratory measurements show the compliance with specification timing and signal levels.","PeriodicalId":118637,"journal":{"name":"2008 51st Midwest Symposium on Circuits and Systems","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116281968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-09-03DOI: 10.1109/MWSCAS.2008.4616922
M. Pude, P. R. Mukund, P. Singh, J. Burleson
The use of positive feedback as a solution to intrinsic gain degradation in scaled technologies is discussed. Criteria for increasing gain while keeping the system stable are derived in terms of traditional feedback theory as well as a modified amplifier model. The amplifier model, in an attempt to standardize positive feedback analysis on generic amplifiers, includes non idealities that traditional feedback theory does not, including finite input impedance and non-zero output impedance. Both treatments show that as amplifier open loop gain decreases, positive feedback can more easily be applied to increase that gain at a cost of a slightly more than one-to-one tradeoff with the amplifier bandwidth. This analysis shows that the concept of positive feedback is most useful in high bandwidth single stage amplifiers where gain is at a minimum. It is applied to a differential stage in 65 nm technology and is shown to increase the gain from 12.61 dB to 27.25 db.
{"title":"Using positive feedback to overcome gmro limitations in scaled CMOS amplifier design","authors":"M. Pude, P. R. Mukund, P. Singh, J. Burleson","doi":"10.1109/MWSCAS.2008.4616922","DOIUrl":"https://doi.org/10.1109/MWSCAS.2008.4616922","url":null,"abstract":"The use of positive feedback as a solution to intrinsic gain degradation in scaled technologies is discussed. Criteria for increasing gain while keeping the system stable are derived in terms of traditional feedback theory as well as a modified amplifier model. The amplifier model, in an attempt to standardize positive feedback analysis on generic amplifiers, includes non idealities that traditional feedback theory does not, including finite input impedance and non-zero output impedance. Both treatments show that as amplifier open loop gain decreases, positive feedback can more easily be applied to increase that gain at a cost of a slightly more than one-to-one tradeoff with the amplifier bandwidth. This analysis shows that the concept of positive feedback is most useful in high bandwidth single stage amplifiers where gain is at a minimum. It is applied to a differential stage in 65 nm technology and is shown to increase the gain from 12.61 dB to 27.25 db.","PeriodicalId":118637,"journal":{"name":"2008 51st Midwest Symposium on Circuits and Systems","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114226981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-09-03DOI: 10.1109/MWSCAS.2008.4616744
A. Nieuwoudt, J. Kawa, Y. Massoud
In this paper, we investigate the timing implications of dummy fill for large-scale designs implemented in 65 nm process technology. For each design, we employ each of rule-based and model-based metal fill generation techniques and model the incremental path-wise delay increases and the level of interconnect planarization due to the fill metal. The results indicate that fill metal can cause significant increases in the average delay and in the individual path delays. We also find that model-based fill generation methods can provide significantly better incremental delay increases and interconnect planarization than rule-based methods. This study provides the first comprehensive investigation of the delay and interconnect planarization implications of rule-based as well as model-based fill generation for large-scale designs implemented in nano-scale process technology.
{"title":"Timing implications of fill metal generation methods for system-level nano-scale designs","authors":"A. Nieuwoudt, J. Kawa, Y. Massoud","doi":"10.1109/MWSCAS.2008.4616744","DOIUrl":"https://doi.org/10.1109/MWSCAS.2008.4616744","url":null,"abstract":"In this paper, we investigate the timing implications of dummy fill for large-scale designs implemented in 65 nm process technology. For each design, we employ each of rule-based and model-based metal fill generation techniques and model the incremental path-wise delay increases and the level of interconnect planarization due to the fill metal. The results indicate that fill metal can cause significant increases in the average delay and in the individual path delays. We also find that model-based fill generation methods can provide significantly better incremental delay increases and interconnect planarization than rule-based methods. This study provides the first comprehensive investigation of the delay and interconnect planarization implications of rule-based as well as model-based fill generation for large-scale designs implemented in nano-scale process technology.","PeriodicalId":118637,"journal":{"name":"2008 51st Midwest Symposium on Circuits and Systems","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114788972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-09-03DOI: 10.1109/MWSCAS.2008.4616914
A. Hussein, H. Saleh, B. Mohammad, E. John
Active power, area, architecture, and timing constraints are the major factors in choosing SRAM-based memory organization in contemporary submicron SOCs. In this paper we add the effect of SRAM organization on leakage power as another major factor to consider in selecting a cache organization. Leakage power becomes an important factor for sub 100 nm process technology especially for SRAM-based memory because of the high percentage of ideal circuit to active circuit in any given time. We present the relationship between the SRAM organization and the leakage power at the following process nodes: 32 nm, 45 nm, 65 nm, 90 nm, 130 nm and 180 nm using the predictive technology models (PTM). SPICE simulations results of leakage power versus SRAM organization for a 1-kbits SRAM design is presented in details.
{"title":"Optimum organization of SRAM-based memory for leakage power reduction","authors":"A. Hussein, H. Saleh, B. Mohammad, E. John","doi":"10.1109/MWSCAS.2008.4616914","DOIUrl":"https://doi.org/10.1109/MWSCAS.2008.4616914","url":null,"abstract":"Active power, area, architecture, and timing constraints are the major factors in choosing SRAM-based memory organization in contemporary submicron SOCs. In this paper we add the effect of SRAM organization on leakage power as another major factor to consider in selecting a cache organization. Leakage power becomes an important factor for sub 100 nm process technology especially for SRAM-based memory because of the high percentage of ideal circuit to active circuit in any given time. We present the relationship between the SRAM organization and the leakage power at the following process nodes: 32 nm, 45 nm, 65 nm, 90 nm, 130 nm and 180 nm using the predictive technology models (PTM). SPICE simulations results of leakage power versus SRAM organization for a 1-kbits SRAM design is presented in details.","PeriodicalId":118637,"journal":{"name":"2008 51st Midwest Symposium on Circuits and Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128094717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-09-03DOI: 10.1109/MWSCAS.2008.4616799
D. R. Blum, J. Delgado-Frías
Traditional single disruption tolerant radiation hardened SRAM designs are vulnerable to failure when exposed to particle strikes that induce multiple node disruptions. Such events become likely when devices with small feature sizes are operated in highly radioactive environments. This paper analyzes the effectiveness of hardened by design techniques created with the intent to mitigate multiple node disruptions in 90 nm CMOS. From the results, it has been concluded that acceptable tolerance to multiple node disruptions in 90 nm can be achieved through a unique combination of hardened memory and layout design techniques with moderate and calculable levels of layout interleaving.
{"title":"Multiple node upset mitigation in TPDICE-based pipeline memory structures","authors":"D. R. Blum, J. Delgado-Frías","doi":"10.1109/MWSCAS.2008.4616799","DOIUrl":"https://doi.org/10.1109/MWSCAS.2008.4616799","url":null,"abstract":"Traditional single disruption tolerant radiation hardened SRAM designs are vulnerable to failure when exposed to particle strikes that induce multiple node disruptions. Such events become likely when devices with small feature sizes are operated in highly radioactive environments. This paper analyzes the effectiveness of hardened by design techniques created with the intent to mitigate multiple node disruptions in 90 nm CMOS. From the results, it has been concluded that acceptable tolerance to multiple node disruptions in 90 nm can be achieved through a unique combination of hardened memory and layout design techniques with moderate and calculable levels of layout interleaving.","PeriodicalId":118637,"journal":{"name":"2008 51st Midwest Symposium on Circuits and Systems","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127194485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-09-03DOI: 10.1109/MWSCAS.2008.4616918
K. Gbolagade, S. Cotofana
This paper investigates the conversion of 3-moduli Residue Number System (RNS) operands to decimal. First we assume a general {mi}i=1;3 moduli set with the dynamic range M = Pii=13 mi and introduce a modified Chinese Remainder Theorem (CRT) that requires mod-m3 instead of mod-M calculations. Subsequently, we further simplify the conversion process by focussing on {2n + 2; 2n + 1; 2n} moduli set, which has a common factor of 2. We introduce in a formal way a CRT based approach for this case, which requires the conversion of {2n + 2; 2n + 1; 2n} set into moduli set with relatively prime moduli, i.e., {m1/2 ;m2;m3}, when n is even, n ges 2 and {m1;m2; m3/2}, when n is odd, n ges 3. We demonstrate that such a conversion can be easily done and doesnpsilat require the computation of any multiplicative inverses. Finally, we further simplify the 3-moduli CRT for the specific case of {2n + 2; 2n + 1; 2n} moduli set. For this case the propose CRT requires 4 additions, 4 multiplications and all the operations are mod-m3 in case n is even and mod-m3/2 if n is odd. This outperforms state of the art converters in terms of required operations and due to the fact that the numbers involved in the calculations are smaller it results in less complex adders and multipliers.
{"title":"Residue Number System operands to decimal conversion for 3-moduli sets","authors":"K. Gbolagade, S. Cotofana","doi":"10.1109/MWSCAS.2008.4616918","DOIUrl":"https://doi.org/10.1109/MWSCAS.2008.4616918","url":null,"abstract":"This paper investigates the conversion of 3-moduli Residue Number System (RNS) operands to decimal. First we assume a general {m<sub>i</sub>}<sub>i=1;3</sub> moduli set with the dynamic range M = Pi<sub>i=1</sub> <sup>3</sup> m<sub>i</sub> and introduce a modified Chinese Remainder Theorem (CRT) that requires mod-m3 instead of mod-M calculations. Subsequently, we further simplify the conversion process by focussing on {2n + 2; 2n + 1; 2n} moduli set, which has a common factor of 2. We introduce in a formal way a CRT based approach for this case, which requires the conversion of {2n + 2; 2n + 1; 2n} set into moduli set with relatively prime moduli, i.e., {m<sub>1</sub>/2 ;m<sub>2</sub>;m<sub>3</sub>}, when n is even, n ges 2 and {m<sub>1</sub>;m<sub>2</sub>; m<sub>3</sub>/2}, when n is odd, n ges 3. We demonstrate that such a conversion can be easily done and doesnpsilat require the computation of any multiplicative inverses. Finally, we further simplify the 3-moduli CRT for the specific case of {2n + 2; 2n + 1; 2n} moduli set. For this case the propose CRT requires 4 additions, 4 multiplications and all the operations are mod-m<sub>3</sub> in case n is even and mod-m<sub>3</sub>/2 if n is odd. This outperforms state of the art converters in terms of required operations and due to the fact that the numbers involved in the calculations are smaller it results in less complex adders and multipliers.","PeriodicalId":118637,"journal":{"name":"2008 51st Midwest Symposium on Circuits and Systems","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124127638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-09-03DOI: 10.1109/MWSCAS.2008.4616961
I. D. Castellanos, J. Stine
Interest in decimal arithmetic is growing considerably due to its relevance in financial and commercial applications. Previous developments on decimal multiplication focused on sequential implementations due to its complexity. However, recent studies have proposed parallel multipliers to improve performance. This paper clarifies recent techniques for partial product generation and presents implementation results and comparison of available partial product generation architectures. As opposed to previous implementations, which only propose partial product generation designs on paper, this research implements and expands each proposed architecture and addresses its utilization within decimal architectures.
{"title":"Decimal partial product generation architectures","authors":"I. D. Castellanos, J. Stine","doi":"10.1109/MWSCAS.2008.4616961","DOIUrl":"https://doi.org/10.1109/MWSCAS.2008.4616961","url":null,"abstract":"Interest in decimal arithmetic is growing considerably due to its relevance in financial and commercial applications. Previous developments on decimal multiplication focused on sequential implementations due to its complexity. However, recent studies have proposed parallel multipliers to improve performance. This paper clarifies recent techniques for partial product generation and presents implementation results and comparison of available partial product generation architectures. As opposed to previous implementations, which only propose partial product generation designs on paper, this research implements and expands each proposed architecture and addresses its utilization within decimal architectures.","PeriodicalId":118637,"journal":{"name":"2008 51st Midwest Symposium on Circuits and Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130357914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}