Pub Date : 1997-06-12DOI: 10.1109/VLSIC.1997.623818
Takashima, Kunishima, Noguchi, Takagi
A new chain ferroelectric random access memory—a chain FRAM—has been proposed. A memory cell consists of parallel connection of one transistor and one ferroelectric ca- pacitor, and one memory cell block consists of plural memory cells connected in series and a block selecting transistor. This configuration realizes the smallest 4 size memory cell using the planar transistor so far reported, and random access. The chip size of the proposed chain FRAM can be reduced to 63% of that of the conventional FRAM when 16 cells are connected in series. The fast nondriven half- cell-plate scheme, as well as the driven cell-plate scheme, are applicable to the chain FRAM without polarization switching during the standby cycle thanks to short-circuiting ferroelectric capacitors. It results in fast access time of 45 ns and cycle time of 70 ns without refresh operation.
{"title":"High-density chain ferroelectric random-access memory (CFRAM)","authors":"Takashima, Kunishima, Noguchi, Takagi","doi":"10.1109/VLSIC.1997.623818","DOIUrl":"https://doi.org/10.1109/VLSIC.1997.623818","url":null,"abstract":"A new chain ferroelectric random access memory—a chain FRAM—has been proposed. A memory cell consists of parallel connection of one transistor and one ferroelectric ca- pacitor, and one memory cell block consists of plural memory cells connected in series and a block selecting transistor. This configuration realizes the smallest 4 size memory cell using the planar transistor so far reported, and random access. The chip size of the proposed chain FRAM can be reduced to 63% of that of the conventional FRAM when 16 cells are connected in series. The fast nondriven half- cell-plate scheme, as well as the driven cell-plate scheme, are applicable to the chain FRAM without polarization switching during the standby cycle thanks to short-circuiting ferroelectric capacitors. It results in fast access time of 45 ns and cycle time of 70 ns without refresh operation.","PeriodicalId":175678,"journal":{"name":"Symposium 1997 on VLSI Circuits","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131131429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-06-12DOI: 10.1109/VLSIC.1997.623819
C. P. Yue, S. Wong
This paper presents a patterned ground shield in- serted between an on-chip spiral inductor and silicon substrate. The patterned ground shield can be realized in standard silicon technologies without additional processing steps. The impacts of shield resistance and pattern on inductance, parasitic resistances and capacitances, and quality factor are studied extensively. Experimental results show that a polysilicon patterned ground shield achieves the most improvement. At 1-2 GHz, the addition of the shield increases the inductor quality factor up to 33% and reduces the substrate coupling between two adjacent inductors by as much as 25 dB. We also demonstrate that the quality factor of a 2-GHz tank can be nearly doubled with a shielded inductor. In this paper, we present a patterned ground shield, which is compatible with standard silicon technologies, to reduce the unwanted substrate effects. To provide some background, Section II presents a discussion on the fundamental definitions of an inductor and an tank . Next, a physical model for spiral inductors on silicon is described. The magnetic energy storage and loss mechanisms in an on-chip inductor are discussed. Based on this insight, it is shown that energy loss can be reduced by shielding the electric field of the inductor from the silicon substrate. Then, the drawbacks of a solid ground shield are analyzed. This leads to the design of a patterned ground shield. Design guidelines for parameters such as shield pattern and resistance are given. In Section III, experiment design, on-wafer testing technique, and parasitic extraction procedure are presented. Experimental results are then reported to study the effects of shield resistance and pattern on inductance, parasitic resistances and capacitances, and inductor . Next, the improvement in of a 2-GHz tank using a shielded inductor is illustrated. A study of the noise coupling between two adjacent inductors and the efficiency of the ground shield for isolation are also presented. Lastly, Section IV gives some conclusions.
{"title":"On-chip Spiral Inductors With Patterned Ground Shields For Si-based RF IC's","authors":"C. P. Yue, S. Wong","doi":"10.1109/VLSIC.1997.623819","DOIUrl":"https://doi.org/10.1109/VLSIC.1997.623819","url":null,"abstract":"This paper presents a patterned ground shield in- serted between an on-chip spiral inductor and silicon substrate. The patterned ground shield can be realized in standard silicon technologies without additional processing steps. The impacts of shield resistance and pattern on inductance, parasitic resistances and capacitances, and quality factor are studied extensively. Experimental results show that a polysilicon patterned ground shield achieves the most improvement. At 1-2 GHz, the addition of the shield increases the inductor quality factor up to 33% and reduces the substrate coupling between two adjacent inductors by as much as 25 dB. We also demonstrate that the quality factor of a 2-GHz tank can be nearly doubled with a shielded inductor. In this paper, we present a patterned ground shield, which is compatible with standard silicon technologies, to reduce the unwanted substrate effects. To provide some background, Section II presents a discussion on the fundamental definitions of an inductor and an tank . Next, a physical model for spiral inductors on silicon is described. The magnetic energy storage and loss mechanisms in an on-chip inductor are discussed. Based on this insight, it is shown that energy loss can be reduced by shielding the electric field of the inductor from the silicon substrate. Then, the drawbacks of a solid ground shield are analyzed. This leads to the design of a patterned ground shield. Design guidelines for parameters such as shield pattern and resistance are given. In Section III, experiment design, on-wafer testing technique, and parasitic extraction procedure are presented. Experimental results are then reported to study the effects of shield resistance and pattern on inductance, parasitic resistances and capacitances, and inductor . Next, the improvement in of a 2-GHz tank using a shielded inductor is illustrated. A study of the noise coupling between two adjacent inductors and the efficiency of the ground shield for isolation are also presented. Lastly, Section IV gives some conclusions.","PeriodicalId":175678,"journal":{"name":"Symposium 1997 on VLSI Circuits","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123184411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-06-12DOI: 10.1109/VLSIC.1997.623786
Wada, Ueda, Hirota, Hirano, Mashiko, Hamano
Introduction SO1 devices operate faster and consume less power than bulk ones due to their small junction capacitances. However, this advantage decreases as the fan-outs get larger or metalwiring gets longer, because SO1 structures have lesser effects on gate capacitance or wiring capacitance. The body-bias controlled gate [l] shown in Fig.1 (a) can drive large load capacitances. However, this device cannot have the supply voltage of over the junction built-in voltage ( O N ) . The body-bias controlled gate with reverse-biased diodes [2] shown in Fig.1 (b) can operate stably with the supply voltage above the built-in voltage. However, the device operation becomes unstable at high-frequency because the excessive charges in the body cannot be discharged quickly as the diodes are reversely biased. We propose in this paper active body-bias SOI-CMOS driver circuits that can operate at high-speed and the supply voltage higher than the built-in voltage. This driver circuit, like the active pull-down scheme in ECL circuits [3], enhances its driving capability during the transition period. The circuit simulations indicated that the proposed circuit operates 55% faster than the bulk driver circuit at 0.8V and 60pF. Circuit Description Fig.2 shows the driver circuits we propose. The output inverter gate, one PMOS and one NMOS, is replaced with six transistors indicated in the broken-line box. The operation of the type-A circuit shown in Fig.2 (a) is as follows. When IN is "L", INB is "H" and OUT is "L". Transistor P1 is on and N1 is off. The feedback signal from OUT turns on P2 and turns off N2. When IN changes from "L" to "H", P1 turns off and N1 turns on. At this time, P3, P2 and N1 are all on and a DC current flows slightly through P3, P2 and N1, as shown in Fig.3, if we design the transistor P3 to be small. Then the body potential of the transistor P4 falls and the threshold voltage of P4 becomes smaller. Consequently OUT goes to "H" quickly. When OUT becomes "H", P2 turns off and N2 turns on. The DC current path is cut off and the body potential of P4 becomes the supply voltage level. Thus, the extra power consumptions are minimized in the circuit. In this driver, the body voltages of the output transistors can be adjusted below the built-in voltage if we optimize the transistor sizes of N2, N3, P2 and P3. The body regions of the output transistors are charged or discharged through the transistors, not through diodes. Therefore, the proposed circuit can operate at high-frequency even when the supply voltage exceeds the built-in voltage. In the type-B circuit shown in Fig.2 (b), the feedback signal is replaced by the output signal of the small-size inverter INV to obtain the longer period of the low threshold voltage of P4 or N4. Circuit Simulation Table 1 shows the device parameters of the SO1 and the bulk devices used for the circuit simulations. Fig.4 shows the operating waveforms of the bulk, the conventional SO1 and the type-A SO1 circuit at 1.OV a
{"title":"Active Body-bias SOI-CMOS Driver Circuits","authors":"Wada, Ueda, Hirota, Hirano, Mashiko, Hamano","doi":"10.1109/VLSIC.1997.623786","DOIUrl":"https://doi.org/10.1109/VLSIC.1997.623786","url":null,"abstract":"Introduction SO1 devices operate faster and consume less power than bulk ones due to their small junction capacitances. However, this advantage decreases as the fan-outs get larger or metalwiring gets longer, because SO1 structures have lesser effects on gate capacitance or wiring capacitance. The body-bias controlled gate [l] shown in Fig.1 (a) can drive large load capacitances. However, this device cannot have the supply voltage of over the junction built-in voltage ( O N ) . The body-bias controlled gate with reverse-biased diodes [2] shown in Fig.1 (b) can operate stably with the supply voltage above the built-in voltage. However, the device operation becomes unstable at high-frequency because the excessive charges in the body cannot be discharged quickly as the diodes are reversely biased. We propose in this paper active body-bias SOI-CMOS driver circuits that can operate at high-speed and the supply voltage higher than the built-in voltage. This driver circuit, like the active pull-down scheme in ECL circuits [3], enhances its driving capability during the transition period. The circuit simulations indicated that the proposed circuit operates 55% faster than the bulk driver circuit at 0.8V and 60pF. Circuit Description Fig.2 shows the driver circuits we propose. The output inverter gate, one PMOS and one NMOS, is replaced with six transistors indicated in the broken-line box. The operation of the type-A circuit shown in Fig.2 (a) is as follows. When IN is \"L\", INB is \"H\" and OUT is \"L\". Transistor P1 is on and N1 is off. The feedback signal from OUT turns on P2 and turns off N2. When IN changes from \"L\" to \"H\", P1 turns off and N1 turns on. At this time, P3, P2 and N1 are all on and a DC current flows slightly through P3, P2 and N1, as shown in Fig.3, if we design the transistor P3 to be small. Then the body potential of the transistor P4 falls and the threshold voltage of P4 becomes smaller. Consequently OUT goes to \"H\" quickly. When OUT becomes \"H\", P2 turns off and N2 turns on. The DC current path is cut off and the body potential of P4 becomes the supply voltage level. Thus, the extra power consumptions are minimized in the circuit. In this driver, the body voltages of the output transistors can be adjusted below the built-in voltage if we optimize the transistor sizes of N2, N3, P2 and P3. The body regions of the output transistors are charged or discharged through the transistors, not through diodes. Therefore, the proposed circuit can operate at high-frequency even when the supply voltage exceeds the built-in voltage. In the type-B circuit shown in Fig.2 (b), the feedback signal is replaced by the output signal of the small-size inverter INV to obtain the longer period of the low threshold voltage of P4 or N4. Circuit Simulation Table 1 shows the device parameters of the SO1 and the bulk devices used for the circuit simulations. Fig.4 shows the operating waveforms of the bulk, the conventional SO1 and the type-A SO1 circuit at 1.OV a","PeriodicalId":175678,"journal":{"name":"Symposium 1997 on VLSI Circuits","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124450483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-06-12DOI: 10.1109/VLSIC.1997.623820
Janssens, Steyaert, Miyakawa
A full-CMOS broadband low-noise amplifier for application in a GSM receiver front-end is presented. The circuit employs a topology which does not require any tuning or trimming. The amplifier provides a forward gain of 24 dB between 0.9 GHz and 1 GHz, while the -3dB band ranges from 420 MHz to 1.2 GHz. The noise figure is less than 2.7 dB in the whole 300 MHz frequency band between 0.8 GHz and 1.1 GHz. Reverse isolation is better than 43.5 dB over the full measuring range. The prototype has been implemented in a 0 . 4 ~ CMOS technology and consumes 35 mW from a 2.7 Volt supply.
{"title":"A 2.7 Volt CMOS Broadband Low Noise Amplifier","authors":"Janssens, Steyaert, Miyakawa","doi":"10.1109/VLSIC.1997.623820","DOIUrl":"https://doi.org/10.1109/VLSIC.1997.623820","url":null,"abstract":"A full-CMOS broadband low-noise amplifier for application in a GSM receiver front-end is presented. The circuit employs a topology which does not require any tuning or trimming. The amplifier provides a forward gain of 24 dB between 0.9 GHz and 1 GHz, while the -3dB band ranges from 420 MHz to 1.2 GHz. The noise figure is less than 2.7 dB in the whole 300 MHz frequency band between 0.8 GHz and 1.1 GHz. Reverse isolation is better than 43.5 dB over the full measuring range. The prototype has been implemented in a 0 . 4 ~ CMOS technology and consumes 35 mW from a 2.7 Volt supply.","PeriodicalId":175678,"journal":{"name":"Symposium 1997 on VLSI Circuits","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122436120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-06-12DOI: 10.1109/VLSIC.1997.623793
Reohr, Navarro, Chan, Mayo, Curran, Krumm, M M Pelella, Lu, Bakhru, Kowalczyk, Rawlins, Carey, Wu
Introduction: The design of fast and robust Cache hit logic was one of the fundamental hurdles overcome to achieve the reported 35OMHz for a S/390 Microprocessor in a O.&n Leff technology’. The 4 way set associative Cache mechanism is shown in Figure 1. It consists of the Cache memory’which holds a subset of the machine instructions and data, the Director), memory which holds a subset of absolute addresses (main memory addresses) corresponding to the Cache’s instructions and data entries, the absolute Translation Lookaside Bufer (TLB) which holds recently translated absolute address entries, and the logical TLB which holds the virtual addresses (internally generated machine addresses) corresponding to absolute TLB’s address entries. The hit logic, which links all these memories together, is used to resolve whether the Cache memory holds the instructions or data requested by the processor each cycle. If the Cache holds the correct entry, the hit logic selects the appropriate entry from 4 possible sets simultaneously read out of the Cache. While most of the processor logic was implemented in static CMOS, performance requirements dictated that the hit logic employ precharge techniques to achieve a single cycle directory lookup and set selection. This in turn caused a number of circuit issues to surface which relate to the proper interfacing of static and precharged logic families. Flexible timing control was incorporated around these interfaces, where races exist, to obtain functional hardware at all process and test comers. Timing Diagram: Figure 2 shows the timing diagram for the logic and memory circuits. The down going edge of the Global Clock triggers both the capture and launch of signals through the logic registers, begins the precharge of the hit logic (Hit L. precharge), and starts the internal decode of addresses in the memory macros. About half way through the cycle, the 7ZB andDirectory memories send their outputs to the conipamtor logic which in turn drives the Anding arid Coritiriuation logic. The hit signal produced by the sum of those actions finally selects one of four possible cache entries driven out to the Cache Output Register. All memory accesses and logic evaluations happen in a single processor cycle. Recharge occurs when circuits are not evaluating. Each functional block in the memory circuits generates its own precharge using a self-resetting scheme2 (SRCMOS). Direcfoq) and T U memories produce a wide enough output pulse, approximately a third of the cycle, to guarantee sufficient overlap between their signals such that bit “Anding” done in XOR circuits, a part of the Directory Comparator, functions reliably. Once the hit logic topples, data stay latched until the next cycle precharge. The single phase clocking scheme described is prone to short path timing problems introduced through the precharge of the hit logic. Clock skew may cause the hit logic to precharge before a latch capturing the hit logic’s state has time to cl
快速和强大的缓存命中逻辑的设计是克服S/390微处理器在O.&n Leff技术中实现35OMHz的基本障碍之一。4路集关联缓存机制如图1所示。它由保存机器指令和数据子集的高速缓存(Director)、保存与高速缓存指令和数据项对应的绝对地址子集(主存储器地址)的内存、保存最近翻译的绝对地址项的绝对转换暂存缓冲区(TLB)和保存与绝对TLB地址项对应的虚拟地址(内部生成的机器地址)的逻辑TLB组成。命中逻辑将所有这些内存连接在一起,用于确定缓存内存是否保存处理器每个周期请求的指令或数据。如果缓存保存正确的条目,命中逻辑从同时从缓存中读取的4个可能的集合中选择适当的条目。虽然大多数处理器逻辑是在静态CMOS中实现的,但性能要求要求hit逻辑采用预充电技术来实现单周期目录查找和集选择。这反过来又引起了一些电路问题的表面,这涉及到正确的接口的静态和预充电逻辑族。灵活的定时控制被结合在这些接口周围,在竞争存在的地方,在所有的过程和测试角落获得功能硬件。时序图:图2显示了逻辑和存储电路的时序图。全局时钟的下行边通过逻辑寄存器触发信号的捕获和发射,开始命中逻辑的预充(hit L. precharge),并开始内存宏中地址的内部解码。大约在周期的一半,7ZB和目录存储器将它们的输出发送到封装逻辑,封装逻辑反过来驱动安定和初始化逻辑。这些动作的总和产生的命中信号最终从驱动到缓存输出寄存器的四个可能的缓存项中选择一个。所有内存访问和逻辑计算都发生在一个处理器周期内。充电发生在电路不评估时。存储电路中的每个功能块使用自复位方案产生自己的预充电2 (SRCMOS)。Direcfoq)和T U存储器产生足够宽的输出脉冲,大约三分之一的周期,以保证它们的信号之间有足够的重叠,这样在XOR电路中完成的位“和”,目录比较器的一部分,可靠地工作。一旦命中逻辑失效,数据保持锁存状态,直到下一个周期预充。所描述的单相时钟方案容易通过命中逻辑的预充引入短路径定时问题。时钟倾斜可能导致在捕获命中逻辑状态的锁存器有时间关闭之前命中逻辑预充。填充防止了这种情况的发生。频闪比较等电路:图4显示了由一个逐位异或和一个频闪“安定”平面组成的目录比较器电路。这个比较器的主要复杂之处在于,只有在异或电路有机会检测到TLB和目录信号之间的差异之后,它才需要一个定时电路来断言频闪器,如果有一个位错误比较,它就会触发一个NOR晶体管拉下电源。数据到达和频闪断言之间存在竞争条件。过早激活频闪器会导致比较器在功能上失败,并伴有高比较相等的恒定签名,而太晚激活频闪器会增加电路延迟的死区时间。还要注意的是,所有的预充电电路(图4和图5)都使用pfet, PNl, 2, 3来管理动态节点和倾斜静态逆变器(未显示)上的电荷共享和泄漏,以消除长信号线上引入的耦合噪声。在图3中,从具有可见电路节点和精确频闪定时控制的试验点获得了频闪比较等电路的电子束结果。第一个波形显示了200pSec电路从脉冲输入的梯度上升到输出的等差上升的性能。为了实现这种性能,第二个波形显示在节点dynl被拉下之前触发了Strobe。第三个波形显示了过于激进的Sbobe设置如何产生噪声故障,对于错误比较的情况,这表明dyn2通过晶体管NI, N2和NSTROBE部分放电。半锁存器,晶体管PNI,一旦晶体管NI关闭,恢复节点dyn2高。在实际设计中,不建议采用这种激进的频闪设置,因为不同长度的长信号线、电源反弹、晶体管误跟踪和耦合噪声可能会引入频闪路径和数据路径之间的误跟踪。
{"title":"Precharged Cache Hit Logic With Flexible Timing Control","authors":"Reohr, Navarro, Chan, Mayo, Curran, Krumm, M M Pelella, Lu, Bakhru, Kowalczyk, Rawlins, Carey, Wu","doi":"10.1109/VLSIC.1997.623793","DOIUrl":"https://doi.org/10.1109/VLSIC.1997.623793","url":null,"abstract":"Introduction: The design of fast and robust Cache hit logic was one of the fundamental hurdles overcome to achieve the reported 35OMHz for a S/390 Microprocessor in a O.&n Leff technology’. The 4 way set associative Cache mechanism is shown in Figure 1. It consists of the Cache memory’which holds a subset of the machine instructions and data, the Director), memory which holds a subset of absolute addresses (main memory addresses) corresponding to the Cache’s instructions and data entries, the absolute Translation Lookaside Bufer (TLB) which holds recently translated absolute address entries, and the logical TLB which holds the virtual addresses (internally generated machine addresses) corresponding to absolute TLB’s address entries. The hit logic, which links all these memories together, is used to resolve whether the Cache memory holds the instructions or data requested by the processor each cycle. If the Cache holds the correct entry, the hit logic selects the appropriate entry from 4 possible sets simultaneously read out of the Cache. While most of the processor logic was implemented in static CMOS, performance requirements dictated that the hit logic employ precharge techniques to achieve a single cycle directory lookup and set selection. This in turn caused a number of circuit issues to surface which relate to the proper interfacing of static and precharged logic families. Flexible timing control was incorporated around these interfaces, where races exist, to obtain functional hardware at all process and test comers. Timing Diagram: Figure 2 shows the timing diagram for the logic and memory circuits. The down going edge of the Global Clock triggers both the capture and launch of signals through the logic registers, begins the precharge of the hit logic (Hit L. precharge), and starts the internal decode of addresses in the memory macros. About half way through the cycle, the 7ZB andDirectory memories send their outputs to the conipamtor logic which in turn drives the Anding arid Coritiriuation logic. The hit signal produced by the sum of those actions finally selects one of four possible cache entries driven out to the Cache Output Register. All memory accesses and logic evaluations happen in a single processor cycle. Recharge occurs when circuits are not evaluating. Each functional block in the memory circuits generates its own precharge using a self-resetting scheme2 (SRCMOS). Direcfoq) and T U memories produce a wide enough output pulse, approximately a third of the cycle, to guarantee sufficient overlap between their signals such that bit “Anding” done in XOR circuits, a part of the Directory Comparator, functions reliably. Once the hit logic topples, data stay latched until the next cycle precharge. The single phase clocking scheme described is prone to short path timing problems introduced through the precharge of the hit logic. Clock skew may cause the hit logic to precharge before a latch capturing the hit logic’s state has time to cl","PeriodicalId":175678,"journal":{"name":"Symposium 1997 on VLSI Circuits","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124846672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-06-12DOI: 10.1109/VLSIC.1997.623777
J. Farrell, T. Fischer
The logic and circuits are presented for a 20-entry instruction queue which scoreboards 80 registers and issues four instructions per cycle in a 600-MHz microprocessor. The request logic and arbiter circuits that control integer execution are described in addition to a novel compaction scheme that maintains temporal order in the queue. The issue logic data path is implemented in 141000 transistors, occupying 10 mm/sup 2/ in a 0.35-/spl mu/m CMOS process.
{"title":"Issue Logic For A 600 MHz Out-of-order Execution","authors":"J. Farrell, T. Fischer","doi":"10.1109/VLSIC.1997.623777","DOIUrl":"https://doi.org/10.1109/VLSIC.1997.623777","url":null,"abstract":"The logic and circuits are presented for a 20-entry instruction queue which scoreboards 80 registers and issues four instructions per cycle in a 600-MHz microprocessor. The request logic and arbiter circuits that control integer execution are described in addition to a novel compaction scheme that maintains temporal order in the queue. The issue logic data path is implemented in 141000 transistors, occupying 10 mm/sup 2/ in a 0.35-/spl mu/m CMOS process.","PeriodicalId":175678,"journal":{"name":"Symposium 1997 on VLSI Circuits","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128596206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-06-12DOI: 10.1109/VLSIC.1997.623825
Kawaguchi, Sakurai
A reduced clock-swing flip-flop (RCSFF) is proposed, which is composed of a reduced swing clock driver and a special flip-flop which embodies the leak current cutoff mechanism. The RCSFF can reduce the clock system power of a VLSI down to one-third compared to the conventional flip-flop. This power improvement is achieved through the reduced clock swing down to 1 V. The area and the delay of the RCSFF can also be reduced by a factor of about 20% compared to the conventional flip- flop. The RCSFF can also reduce the delay of a long interconnect to one-half.
{"title":"A Reduced Clock-swing Flip-flop (RCSFF) For 63% Clock Power Reduction","authors":"Kawaguchi, Sakurai","doi":"10.1109/VLSIC.1997.623825","DOIUrl":"https://doi.org/10.1109/VLSIC.1997.623825","url":null,"abstract":"A reduced clock-swing flip-flop (RCSFF) is proposed, which is composed of a reduced swing clock driver and a special flip-flop which embodies the leak current cutoff mechanism. The RCSFF can reduce the clock system power of a VLSI down to one-third compared to the conventional flip-flop. This power improvement is achieved through the reduced clock swing down to 1 V. The area and the delay of the RCSFF can also be reduced by a factor of about 20% compared to the conventional flip- flop. The RCSFF can also reduce the delay of a long interconnect to one-half.","PeriodicalId":175678,"journal":{"name":"Symposium 1997 on VLSI Circuits","volume":"177 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121729589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We proposed and tested a low power SRAM using charge transfer (CT) pre-sense amplifier and a bus signal encoding scheme. The CT amplifier compensates the Vth mismatch between the pair MOS transistors, and the encoded bus signal reduces the number of wires being switched. They are dynamically controlled by a low power DLL we proposed. The fabricated SRAM worked at 5OMHz with the power dissipation of 5mW at 1V power supply. Introduction The conventional SRAM latch amplifiers, whose inputs are reset to VDD/2, have delay proportional to l/(VDD/2Vth), thus suffer much severely from VDD reduction in terms of the sensing delay increase. Furthermore, the Vth and Id variations in the pair MOS transistors can not be neglected in the sense amplifier design in the deep submicron technology era, because of increased process-origin electric characteristics variations and the reduced bit-line swing due to a small cell current. The following simulations and results are based on our 0.35-pm multi-Vth CMOS technology. A Charge-Tmfer Pre Amplifier The CT amplifier has a pronounced sensitivity and consumes small power. In a balanced use [I], it compensates the Vth mismatch of the two CT MOS transistors, and its incomplete precharge operation gives a switching time of a few ns (Fig.l(a)). We modified the CT-gate pulse to disconnect bit-lines before the latch operation, reducing the bit-line swings to decrease the bit-line charging power and increase the laich speed (Fig.l(b)). The bit-line precharge is performed using 1.5V precharge source, because low voltage SRAM needs bit-line potentials to be kept near VDD (=1V) for the cell stability. The gm improvement in a MOS technology with Tox= 5.5nm, the pulling up the bit lines to IV, and the differential signal scheme in SRAMs reduced the precharge period to 2.5ns, which is fast enough for a 5OMHz operation. Fig. 3(a) shows the simulated waveforms of the CT amplifier, whose circuit implementation is illustrated in the upper half of the Fig. 4. The CT amplifier is operated following the sequence, (1) through (6). First, (1) precharge bit-lines to lV, (2) further precharge the bit-lines towards 1 SV-Vth level through the nMOS CT gates, and (3) turn off the pull-up PMOS at the CT drains and activate a word-line. This makes the BL1 become lower than /BL1 due to a cell current and BI lowers faster than /B1, because Vgs-Vth of the CT on BL side is larger. (4)-(5) activate the nMOS cross coupled sense latch, and finally (6) precharge the bit-lines, B 1 and /B 1, for next cycle. At the beginning of (3), the incomplete bit-line charging is enough to overcome a mismatch or an offset of the sense latch as large as 0.1 V. Since the nMOS dynamic latch is activated at Vgs=VDD in the period (4), its delay is two times shorter than that of a half-VDD-precharge CMOS latch. The proposed CT amplifier consumes less current, which is 12.6pA with 1.5V power supply at SOMHz. With the penalty of the clocking powers, they consumes 580pW at 5OMHz
{"title":"A Charge Tramfer Amplifier And Am Encoded Bus Aditectum For Low Power SRAM","authors":"Kawashima, Mori, Sasagawa, Hamaminato, Wakayama, Sukegawa, Fukushi","doi":"10.1109/VLSIC.1997.623815","DOIUrl":"https://doi.org/10.1109/VLSIC.1997.623815","url":null,"abstract":"We proposed and tested a low power SRAM using charge transfer (CT) pre-sense amplifier and a bus signal encoding scheme. The CT amplifier compensates the Vth mismatch between the pair MOS transistors, and the encoded bus signal reduces the number of wires being switched. They are dynamically controlled by a low power DLL we proposed. The fabricated SRAM worked at 5OMHz with the power dissipation of 5mW at 1V power supply. Introduction The conventional SRAM latch amplifiers, whose inputs are reset to VDD/2, have delay proportional to l/(VDD/2Vth), thus suffer much severely from VDD reduction in terms of the sensing delay increase. Furthermore, the Vth and Id variations in the pair MOS transistors can not be neglected in the sense amplifier design in the deep submicron technology era, because of increased process-origin electric characteristics variations and the reduced bit-line swing due to a small cell current. The following simulations and results are based on our 0.35-pm multi-Vth CMOS technology. A Charge-Tmfer Pre Amplifier The CT amplifier has a pronounced sensitivity and consumes small power. In a balanced use [I], it compensates the Vth mismatch of the two CT MOS transistors, and its incomplete precharge operation gives a switching time of a few ns (Fig.l(a)). We modified the CT-gate pulse to disconnect bit-lines before the latch operation, reducing the bit-line swings to decrease the bit-line charging power and increase the laich speed (Fig.l(b)). The bit-line precharge is performed using 1.5V precharge source, because low voltage SRAM needs bit-line potentials to be kept near VDD (=1V) for the cell stability. The gm improvement in a MOS technology with Tox= 5.5nm, the pulling up the bit lines to IV, and the differential signal scheme in SRAMs reduced the precharge period to 2.5ns, which is fast enough for a 5OMHz operation. Fig. 3(a) shows the simulated waveforms of the CT amplifier, whose circuit implementation is illustrated in the upper half of the Fig. 4. The CT amplifier is operated following the sequence, (1) through (6). First, (1) precharge bit-lines to lV, (2) further precharge the bit-lines towards 1 SV-Vth level through the nMOS CT gates, and (3) turn off the pull-up PMOS at the CT drains and activate a word-line. This makes the BL1 become lower than /BL1 due to a cell current and BI lowers faster than /B1, because Vgs-Vth of the CT on BL side is larger. (4)-(5) activate the nMOS cross coupled sense latch, and finally (6) precharge the bit-lines, B 1 and /B 1, for next cycle. At the beginning of (3), the incomplete bit-line charging is enough to overcome a mismatch or an offset of the sense latch as large as 0.1 V. Since the nMOS dynamic latch is activated at Vgs=VDD in the period (4), its delay is two times shorter than that of a half-VDD-precharge CMOS latch. The proposed CT amplifier consumes less current, which is 12.6pA with 1.5V power supply at SOMHz. With the penalty of the clocking powers, they consumes 580pW at 5OMHz","PeriodicalId":175678,"journal":{"name":"Symposium 1997 on VLSI Circuits","volume":"530 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134282192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-06-12DOI: 10.1109/VLSIC.1997.623821
Otaka, Yamaji, Fujimoto, Tanimoto
A direct conversion receiver (DCR) is a candidate for realizing a small size and low cost wireless terminal because IF filters can be eliminated. However, the DCR suffers from dc offset and 2nd order distortion generated mainly in the direct conversion mixer. A dc offset canceler has been proposed to compensate for this unavoidable dc offset [l], and it must be within 10 mV at the output of the mixer considering hardware complexity of the dc offset canceler. This paper presents an approach to substantially reduce dc offset and 2nd order distortion in the mixer for the DCR.
{"title":"A Very Low Offset 1.9-GHz Si Mixer For Direct Conversion Receivers","authors":"Otaka, Yamaji, Fujimoto, Tanimoto","doi":"10.1109/VLSIC.1997.623821","DOIUrl":"https://doi.org/10.1109/VLSIC.1997.623821","url":null,"abstract":"A direct conversion receiver (DCR) is a candidate for realizing a small size and low cost wireless terminal because IF filters can be eliminated. However, the DCR suffers from dc offset and 2nd order distortion generated mainly in the direct conversion mixer. A dc offset canceler has been proposed to compensate for this unavoidable dc offset [l], and it must be within 10 mV at the output of the mixer considering hardware complexity of the dc offset canceler. This paper presents an approach to substantially reduce dc offset and 2nd order distortion in the mixer for the DCR.","PeriodicalId":175678,"journal":{"name":"Symposium 1997 on VLSI Circuits","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123197940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose an all-digital, multi-phase delay locked loop (DLL) for internal timing generation in embedded DRAMs. The timing generation is achieved by combining the DLL with a command decoder and a resister controlled multi-phase clock counter. The DLL has four phase (d2 step) and six phase (n/3 step) output mode, and employs coarse and fine delay lines to minimize the delay line area while keeping the skew resolution down to a value obtainable by all-digital delay elements. Our DLL operates over a clock range of 125 to 400 MHz with skew adjustment error of*60 ps.
{"title":"All-digital Multi-phase Delay Locked Loop For Internal Timing Generation In Embedded And/or High-speed DRAMs","authors":"Gotoh, Wakayama, Saito, Ogawa, Tamura, Okajima, Taguchi","doi":"10.1109/VLSIC.1997.623830","DOIUrl":"https://doi.org/10.1109/VLSIC.1997.623830","url":null,"abstract":"We propose an all-digital, multi-phase delay locked loop (DLL) for internal timing generation in embedded DRAMs. The timing generation is achieved by combining the DLL with a command decoder and a resister controlled multi-phase clock counter. The DLL has four phase (d2 step) and six phase (n/3 step) output mode, and employs coarse and fine delay lines to minimize the delay line area while keeping the skew resolution down to a value obtainable by all-digital delay elements. Our DLL operates over a clock range of 125 to 400 MHz with skew adjustment error of*60 ps.","PeriodicalId":175678,"journal":{"name":"Symposium 1997 on VLSI Circuits","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125227679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}