Symposium 1997 on VLSI Circuits最新文献_第3页

High-density chain ferroelectric random-access memory (CFRAM) 高密度链铁电随机存取存储器

Symposium 1997 on VLSI Circuits

Pub Date : 1997-06-12 DOI: 10.1109/VLSIC.1997.623818

Takashima, Kunishima, Noguchi, Takagi

A new chain ferroelectric random access memory—a chain FRAM—has been proposed. A memory cell consists of parallel connection of one transistor and one ferroelectric ca- pacitor, and one memory cell block consists of plural memory cells connected in series and a block selecting transistor. This configuration realizes the smallest 4 size memory cell using the planar transistor so far reported, and random access. The chip size of the proposed chain FRAM can be reduced to 63% of that of the conventional FRAM when 16 cells are connected in series. The fast nondriven half- cell-plate scheme, as well as the driven cell-plate scheme, are applicable to the chain FRAM without polarization switching during the standby cycle thanks to short-circuiting ferroelectric capacitors. It results in fast access time of 45 ns and cycle time of 70 ns without refresh operation.

提出了一种新的链式铁电随机存取存储器-链式fram。一个存储单元由一个晶体管和一个铁电电容器并联组成，一个存储单元块由串联连接的多个存储单元和一个选块晶体管组成。该配置使用平面晶体管实现了迄今为止报道的最小的4尺寸存储单元，并实现了随机访问。当16个单元串联时，所提出的链式FRAM的芯片尺寸可以减少到传统FRAM的63%。快速非驱动半蜂窝板方案和驱动蜂窝板方案均适用于由于铁电电容器短路而在待机周期无极化切换的链式FRAM。在没有刷新操作的情况下，快速访问时间为45 ns，周期时间为70 ns。

引用次数: 9

On-chip Spiral Inductors With Patterned Ground Shields For Si-based RF IC's 用于硅基射频集成电路的带图纹地屏蔽的片上螺旋电感

Symposium 1997 on VLSI Circuits

Pub Date : 1997-06-12 DOI: 10.1109/VLSIC.1997.623819

C. P. Yue, S. Wong

This paper presents a patterned ground shield in- serted between an on-chip spiral inductor and silicon substrate. The patterned ground shield can be realized in standard silicon technologies without additional processing steps. The impacts of shield resistance and pattern on inductance, parasitic resistances and capacitances, and quality factor are studied extensively. Experimental results show that a polysilicon patterned ground shield achieves the most improvement. At 1-2 GHz, the addition of the shield increases the inductor quality factor up to 33% and reduces the substrate coupling between two adjacent inductors by as much as 25 dB. We also demonstrate that the quality factor of a 2-GHz tank can be nearly doubled with a shielded inductor. In this paper, we present a patterned ground shield, which is compatible with standard silicon technologies, to reduce the unwanted substrate effects. To provide some background, Section II presents a discussion on the fundamental definitions of an inductor and an tank . Next, a physical model for spiral inductors on silicon is described. The magnetic energy storage and loss mechanisms in an on-chip inductor are discussed. Based on this insight, it is shown that energy loss can be reduced by shielding the electric field of the inductor from the silicon substrate. Then, the drawbacks of a solid ground shield are analyzed. This leads to the design of a patterned ground shield. Design guidelines for parameters such as shield pattern and resistance are given. In Section III, experiment design, on-wafer testing technique, and parasitic extraction procedure are presented. Experimental results are then reported to study the effects of shield resistance and pattern on inductance, parasitic resistances and capacitances, and inductor . Next, the improvement in of a 2-GHz tank using a shielded inductor is illustrated. A study of the noise coupling between two adjacent inductors and the efficiency of the ground shield for isolation are also presented. Lastly, Section IV gives some conclusions.

本文提出了一种在片上螺旋电感器和硅衬底之间嵌入图案接地屏蔽的方法。图案化地屏蔽可以在标准硅技术中实现，而无需额外的处理步骤。广泛研究了屏蔽电阻和模式对电感、寄生电阻和寄生电容以及品质因数的影响。实验结果表明，多晶硅图案化接地屏蔽的改进效果最大。在1-2 GHz时，屏蔽层的加入可将电感质量因数提高33%，并将两个相邻电感之间的衬底耦合降低多达25 dB。我们还证明，使用屏蔽电感器可以将2 ghz槽的质量因数提高近一倍。在本文中，我们提出了一种与标准硅技术兼容的图案化接地屏蔽，以减少不必要的衬底效应。为了提供一些背景知识，第二节讨论了电感器和储罐的基本定义。其次，描述了硅上螺旋电感的物理模型。讨论了片上电感的磁能存储和损耗机理。基于这一见解，表明可以通过屏蔽电感器的电场来减少硅衬底的能量损失。然后，分析了固体接地屏蔽的缺点。这导致了一个图案地屏蔽的设计。给出了屏蔽图样和电阻等参数的设计准则。第三部分介绍了实验设计、晶圆上测试技术和寄生萃取过程。实验结果研究了屏蔽电阻和图案对电感、寄生电阻和寄生电容以及电感的影响。其次，改进在一个2 ghz坦克使用屏蔽电感是说明。对相邻电感之间的噪声耦合和接地屏蔽的隔离效率进行了研究。最后，第四部分给出了一些结论。

{"title":"On-chip Spiral Inductors With Patterned Ground Shields For Si-based RF IC's","authors":"C. P. Yue, S. Wong","doi":"10.1109/VLSIC.1997.623819","DOIUrl":"https://doi.org/10.1109/VLSIC.1997.623819","url":null,"abstract":"This paper presents a patterned ground shield in- serted between an on-chip spiral inductor and silicon substrate. The patterned ground shield can be realized in standard silicon technologies without additional processing steps. The impacts of shield resistance and pattern on inductance, parasitic resistances and capacitances, and quality factor are studied extensively. Experimental results show that a polysilicon patterned ground shield achieves the most improvement. At 1-2 GHz, the addition of the shield increases the inductor quality factor up to 33% and reduces the substrate coupling between two adjacent inductors by as much as 25 dB. We also demonstrate that the quality factor of a 2-GHz tank can be nearly doubled with a shielded inductor. In this paper, we present a patterned ground shield, which is compatible with standard silicon technologies, to reduce the unwanted substrate effects. To provide some background, Section II presents a discussion on the fundamental definitions of an inductor and an tank . Next, a physical model for spiral inductors on silicon is described. The magnetic energy storage and loss mechanisms in an on-chip inductor are discussed. Based on this insight, it is shown that energy loss can be reduced by shielding the electric field of the inductor from the silicon substrate. Then, the drawbacks of a solid ground shield are analyzed. This leads to the design of a patterned ground shield. Design guidelines for parameters such as shield pattern and resistance are given. In Section III, experiment design, on-wafer testing technique, and parasitic extraction procedure are presented. Experimental results are then reported to study the effects of shield resistance and pattern on inductance, parasitic resistances and capacitances, and inductor . Next, the improvement in of a 2-GHz tank using a shielded inductor is illustrated. A study of the noise coupling between two adjacent inductors and the efficiency of the ground shield for isolation are also presented. Lastly, Section IV gives some conclusions.","PeriodicalId":175678,"journal":{"name":"Symposium 1997 on VLSI Circuits","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123184411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1294

Active Body-bias SOI-CMOS Driver Circuits 有源体偏置SOI-CMOS驱动电路

Symposium 1997 on VLSI Circuits

Pub Date : 1997-06-12 DOI: 10.1109/VLSIC.1997.623786

Wada, Ueda, Hirota, Hirano, Mashiko, Hamano

Introduction SO1 devices operate faster and consume less power than bulk ones due to their small junction capacitances. However, this advantage decreases as the fan-outs get larger or metalwiring gets longer, because SO1 structures have lesser effects on gate capacitance or wiring capacitance. The body-bias controlled gate [l] shown in Fig.1 (a) can drive large load capacitances. However, this device cannot have the supply voltage of over the junction built-in voltage ( O N ) . The body-bias controlled gate with reverse-biased diodes [2] shown in Fig.1 (b) can operate stably with the supply voltage above the built-in voltage. However, the device operation becomes unstable at high-frequency because the excessive charges in the body cannot be discharged quickly as the diodes are reversely biased. We propose in this paper active body-bias SOI-CMOS driver circuits that can operate at high-speed and the supply voltage higher than the built-in voltage. This driver circuit, like the active pull-down scheme in ECL circuits [3], enhances its driving capability during the transition period. The circuit simulations indicated that the proposed circuit operates 55% faster than the bulk driver circuit at 0.8V and 60pF. Circuit Description Fig.2 shows the driver circuits we propose. The output inverter gate, one PMOS and one NMOS, is replaced with six transistors indicated in the broken-line box. The operation of the type-A circuit shown in Fig.2 (a) is as follows. When IN is "L", INB is "H" and OUT is "L". Transistor P1 is on and N1 is off. The feedback signal from OUT turns on P2 and turns off N2. When IN changes from "L" to "H", P1 turns off and N1 turns on. At this time, P3, P2 and N1 are all on and a DC current flows slightly through P3, P2 and N1, as shown in Fig.3, if we design the transistor P3 to be small. Then the body potential of the transistor P4 falls and the threshold voltage of P4 becomes smaller. Consequently OUT goes to "H" quickly. When OUT becomes "H", P2 turns off and N2 turns on. The DC current path is cut off and the body potential of P4 becomes the supply voltage level. Thus, the extra power consumptions are minimized in the circuit. In this driver, the body voltages of the output transistors can be adjusted below the built-in voltage if we optimize the transistor sizes of N2, N3, P2 and P3. The body regions of the output transistors are charged or discharged through the transistors, not through diodes. Therefore, the proposed circuit can operate at high-frequency even when the supply voltage exceeds the built-in voltage. In the type-B circuit shown in Fig.2 (b), the feedback signal is replaced by the output signal of the small-size inverter INV to obtain the longer period of the low threshold voltage of P4 or N4. Circuit Simulation Table 1 shows the device parameters of the SO1 and the bulk devices used for the circuit simulations. Fig.4 shows the operating waveforms of the bulk, the conventional SO1 and the type-A SO1 circuit at 1.OV a

SO1器件由于其小的结电容而比本体器件运行更快，功耗更低。然而，随着扇出变大或金属布线变长，这种优势会减少，因为SO1结构对栅极电容或布线电容的影响较小。如图1 (a)所示的体偏控制栅极[l]可以驱动较大的负载电容。但是，该器件的供电电压不能超过结内置电压(O N)。如图1 (b)所示，具有反偏二极管的体偏控制栅极[2]可以在电源电压高于内置电压的情况下稳定工作。然而，由于二极管反向偏置，体内的过量电荷不能快速放电，器件在高频下工作变得不稳定。本文提出一种有源体偏置SOI-CMOS驱动电路，可以在高速下工作，且电源电压高于内置电压。该驱动电路与ECL电路中的有源下拉方案[3]一样，在过渡时期增强了驱动能力。电路仿真表明，该电路在0.8V和60pF下的工作速度比本体驱动电路快55%。图2显示了我们提出的驱动电路。输出逆变器门，一个PMOS和一个NMOS，替换为折线框中所示的六个晶体管。图2 (a)所示a型电路的工作原理如下:当IN为“L”时，INB为“H”，OUT为“L”。晶体管P1处于开启状态，N1处于关闭状态。来自OUT的反馈信号打开P2，关闭N2。当IN由“L”变为“H”时，P1关闭，N1打开。此时，P3、P2和N1都处于导通状态，如果我们将晶体管P3设计得较小，则有直流电流轻微流过P3、P2和N1，如图3所示。这时晶体管P4的体电位下降，P4的阈值电压变小。结果OUT很快转到“H”。当OUT变成“H”时，P2关闭，N2打开。直流电流通路被切断，P4的体电位变为电源电压电平。因此，额外的电力消耗是最小的电路。在该驱动器中，通过优化N2, N3, P2和P3的晶体管尺寸，可以将输出晶体管的体电压调整到低于内置电压。输出晶体管的本体区域通过晶体管充电或放电，而不是通过二极管。因此，即使电源电压超过内置电压，所提出的电路也可以在高频下工作。在图2 (b)所示的b型电路中，将反馈信号替换为小尺寸逆变器INV的输出信号，以获得P4或N4较长周期的低阈值电压。表1给出了SO1和用于电路仿真的批量器件的器件参数。图4为本体、常规SO1和a型SO1电路在1时的工作波形。OV和60pF。工作频率为100MHz。输出晶体管P4或N4的体电位随INB信号的变化而同步变化。因此，晶体管可以快速驱动负载电容。一旦OUT转到“H”或“L”，体电位就返回到地或电源电压电平的原始状态。图5显示了60pF时电源电压对延迟时间和功耗的影响。在lv电压下，a型SO1电路的工作速度比传统SO1快23%，比本体快37%。a型SO1电路的额外功率损耗为常规SOI的2.4%。与本体电路相比，a型SO1电路的功耗降低4.0%。随着电源电压的降低，所提出的电路比传统的SO1电路的优点增加。a型SO1电路在1.2V时比传统SOI快20%，在0.8V时比传统SOI快37%。图6显示了负载电容在1 v时对延迟时间和功耗的影响。随着负载电容的增加，所提出的电路比传统的SO1电路的优势增加。a型SO1电路在20pF时比传统SOI快24%，在lOOpF时比传统SOI快28%。与本体相比，a型SO1电路在20pF时工作速度快40%，在100pF时工作速度快36%。如图5和图6所示，b型SO1电路的工作速度比a型SO1电路快。在0.0'和60pF时，b型SO1电路比a型SO1电路工作速度快12%。在相同条件下，b型SO1电路的工作速度比本体电路快55%。本文描述了一种工作在高速、供电电压超过内置电压的有源体偏置SOI-CMOS驱动电路。通过将驱动电路中的输出逆变器替换为所提出的电路，即使在大负载电容下，该电路也比散装电路具有优异的速度性能。参考文献[I] T。

{"title":"Active Body-bias SOI-CMOS Driver Circuits","authors":"Wada, Ueda, Hirota, Hirano, Mashiko, Hamano","doi":"10.1109/VLSIC.1997.623786","DOIUrl":"https://doi.org/10.1109/VLSIC.1997.623786","url":null,"abstract":"Introduction SO1 devices operate faster and consume less power than bulk ones due to their small junction capacitances. However, this advantage decreases as the fan-outs get larger or metalwiring gets longer, because SO1 structures have lesser effects on gate capacitance or wiring capacitance. The body-bias controlled gate [l] shown in Fig.1 (a) can drive large load capacitances. However, this device cannot have the supply voltage of over the junction built-in voltage ( O N ) . The body-bias controlled gate with reverse-biased diodes [2] shown in Fig.1 (b) can operate stably with the supply voltage above the built-in voltage. However, the device operation becomes unstable at high-frequency because the excessive charges in the body cannot be discharged quickly as the diodes are reversely biased. We propose in this paper active body-bias SOI-CMOS driver circuits that can operate at high-speed and the supply voltage higher than the built-in voltage. This driver circuit, like the active pull-down scheme in ECL circuits [3], enhances its driving capability during the transition period. The circuit simulations indicated that the proposed circuit operates 55% faster than the bulk driver circuit at 0.8V and 60pF. Circuit Description Fig.2 shows the driver circuits we propose. The output inverter gate, one PMOS and one NMOS, is replaced with six transistors indicated in the broken-line box. The operation of the type-A circuit shown in Fig.2 (a) is as follows. When IN is \"L\", INB is \"H\" and OUT is \"L\". Transistor P1 is on and N1 is off. The feedback signal from OUT turns on P2 and turns off N2. When IN changes from \"L\" to \"H\", P1 turns off and N1 turns on. At this time, P3, P2 and N1 are all on and a DC current flows slightly through P3, P2 and N1, as shown in Fig.3, if we design the transistor P3 to be small. Then the body potential of the transistor P4 falls and the threshold voltage of P4 becomes smaller. Consequently OUT goes to \"H\" quickly. When OUT becomes \"H\", P2 turns off and N2 turns on. The DC current path is cut off and the body potential of P4 becomes the supply voltage level. Thus, the extra power consumptions are minimized in the circuit. In this driver, the body voltages of the output transistors can be adjusted below the built-in voltage if we optimize the transistor sizes of N2, N3, P2 and P3. The body regions of the output transistors are charged or discharged through the transistors, not through diodes. Therefore, the proposed circuit can operate at high-frequency even when the supply voltage exceeds the built-in voltage. In the type-B circuit shown in Fig.2 (b), the feedback signal is replaced by the output signal of the small-size inverter INV to obtain the longer period of the low threshold voltage of P4 or N4. Circuit Simulation Table 1 shows the device parameters of the SO1 and the bulk devices used for the circuit simulations. Fig.4 shows the operating waveforms of the bulk, the conventional SO1 and the type-A SO1 circuit at 1.OV a","PeriodicalId":175678,"journal":{"name":"Symposium 1997 on VLSI Circuits","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124450483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Precharged Cache Hit Logic With Flexible Timing Control 预置缓存命中逻辑与灵活的定时控制

Symposium 1997 on VLSI Circuits

Pub Date : 1997-06-12 DOI: 10.1109/VLSIC.1997.623793

Reohr, Navarro, Chan, Mayo, Curran, Krumm, M M Pelella, Lu, Bakhru, Kowalczyk, Rawlins, Carey, Wu

Introduction: The design of fast and robust Cache hit logic was one of the fundamental hurdles overcome to achieve the reported 35OMHz for a S/390 Microprocessor in a O.&n Leff technology’. The 4 way set associative Cache mechanism is shown in Figure 1. It consists of the Cache memory’which holds a subset of the machine instructions and data, the Director), memory which holds a subset of absolute addresses (main memory addresses) corresponding to the Cache’s instructions and data entries, the absolute Translation Lookaside Bufer (TLB) which holds recently translated absolute address entries, and the logical TLB which holds the virtual addresses (internally generated machine addresses) corresponding to absolute TLB’s address entries. The hit logic, which links all these memories together, is used to resolve whether the Cache memory holds the instructions or data requested by the processor each cycle. If the Cache holds the correct entry, the hit logic selects the appropriate entry from 4 possible sets simultaneously read out of the Cache. While most of the processor logic was implemented in static CMOS, performance requirements dictated that the hit logic employ precharge techniques to achieve a single cycle directory lookup and set selection. This in turn caused a number of circuit issues to surface which relate to the proper interfacing of static and precharged logic families. Flexible timing control was incorporated around these interfaces, where races exist, to obtain functional hardware at all process and test comers. Timing Diagram: Figure 2 shows the timing diagram for the logic and memory circuits. The down going edge of the Global Clock triggers both the capture and launch of signals through the logic registers, begins the precharge of the hit logic (Hit L. precharge), and starts the internal decode of addresses in the memory macros. About half way through the cycle, the 7ZB andDirectory memories send their outputs to the conipamtor logic which in turn drives the Anding arid Coritiriuation logic. The hit signal produced by the sum of those actions finally selects one of four possible cache entries driven out to the Cache Output Register. All memory accesses and logic evaluations happen in a single processor cycle. Recharge occurs when circuits are not evaluating. Each functional block in the memory circuits generates its own precharge using a self-resetting scheme2 (SRCMOS). Direcfoq) and T U memories produce a wide enough output pulse, approximately a third of the cycle, to guarantee sufficient overlap between their signals such that bit “Anding” done in XOR circuits, a part of the Directory Comparator, functions reliably. Once the hit logic topples, data stay latched until the next cycle precharge. The single phase clocking scheme described is prone to short path timing problems introduced through the precharge of the hit logic. Clock skew may cause the hit logic to precharge before a latch capturing the hit logic’s state has time to cl

快速和强大的缓存命中逻辑的设计是克服S/390微处理器在O.&n Leff技术中实现35OMHz的基本障碍之一。4路集关联缓存机制如图1所示。它由保存机器指令和数据子集的高速缓存(Director)、保存与高速缓存指令和数据项对应的绝对地址子集(主存储器地址)的内存、保存最近翻译的绝对地址项的绝对转换暂存缓冲区(TLB)和保存与绝对TLB地址项对应的虚拟地址(内部生成的机器地址)的逻辑TLB组成。命中逻辑将所有这些内存连接在一起，用于确定缓存内存是否保存处理器每个周期请求的指令或数据。如果缓存保存正确的条目，命中逻辑从同时从缓存中读取的4个可能的集合中选择适当的条目。虽然大多数处理器逻辑是在静态CMOS中实现的，但性能要求要求hit逻辑采用预充电技术来实现单周期目录查找和集选择。这反过来又引起了一些电路问题的表面，这涉及到正确的接口的静态和预充电逻辑族。灵活的定时控制被结合在这些接口周围，在竞争存在的地方，在所有的过程和测试角落获得功能硬件。时序图:图2显示了逻辑和存储电路的时序图。全局时钟的下行边通过逻辑寄存器触发信号的捕获和发射，开始命中逻辑的预充(hit L. precharge)，并开始内存宏中地址的内部解码。大约在周期的一半，7ZB和目录存储器将它们的输出发送到封装逻辑，封装逻辑反过来驱动安定和初始化逻辑。这些动作的总和产生的命中信号最终从驱动到缓存输出寄存器的四个可能的缓存项中选择一个。所有内存访问和逻辑计算都发生在一个处理器周期内。充电发生在电路不评估时。存储电路中的每个功能块使用自复位方案产生自己的预充电2 (SRCMOS)。Direcfoq)和T U存储器产生足够宽的输出脉冲，大约三分之一的周期，以保证它们的信号之间有足够的重叠，这样在XOR电路中完成的位“和”，目录比较器的一部分，可靠地工作。一旦命中逻辑失效，数据保持锁存状态，直到下一个周期预充。所描述的单相时钟方案容易通过命中逻辑的预充引入短路径定时问题。时钟倾斜可能导致在捕获命中逻辑状态的锁存器有时间关闭之前命中逻辑预充。填充防止了这种情况的发生。频闪比较等电路:图4显示了由一个逐位异或和一个频闪“安定”平面组成的目录比较器电路。这个比较器的主要复杂之处在于，只有在异或电路有机会检测到TLB和目录信号之间的差异之后，它才需要一个定时电路来断言频闪器，如果有一个位错误比较，它就会触发一个NOR晶体管拉下电源。数据到达和频闪断言之间存在竞争条件。过早激活频闪器会导致比较器在功能上失败，并伴有高比较相等的恒定签名，而太晚激活频闪器会增加电路延迟的死区时间。还要注意的是，所有的预充电电路(图4和图5)都使用pfet, PNl, 2, 3来管理动态节点和倾斜静态逆变器(未显示)上的电荷共享和泄漏，以消除长信号线上引入的耦合噪声。在图3中，从具有可见电路节点和精确频闪定时控制的试验点获得了频闪比较等电路的电子束结果。第一个波形显示了200pSec电路从脉冲输入的梯度上升到输出的等差上升的性能。为了实现这种性能，第二个波形显示在节点dynl被拉下之前触发了Strobe。第三个波形显示了过于激进的Sbobe设置如何产生噪声故障，对于错误比较的情况，这表明dyn2通过晶体管NI, N2和NSTROBE部分放电。半锁存器，晶体管PNI，一旦晶体管NI关闭，恢复节点dyn2高。在实际设计中，不建议采用这种激进的频闪设置，因为不同长度的长信号线、电源反弹、晶体管误跟踪和耦合噪声可能会引入频闪路径和数据路径之间的误跟踪。

{"title":"Precharged Cache Hit Logic With Flexible Timing Control","authors":"Reohr, Navarro, Chan, Mayo, Curran, Krumm, M M Pelella, Lu, Bakhru, Kowalczyk, Rawlins, Carey, Wu","doi":"10.1109/VLSIC.1997.623793","DOIUrl":"https://doi.org/10.1109/VLSIC.1997.623793","url":null,"abstract":"Introduction: The design of fast and robust Cache hit logic was one of the fundamental hurdles overcome to achieve the reported 35OMHz for a S/390 Microprocessor in a O.&n Leff technology’. The 4 way set associative Cache mechanism is shown in Figure 1. It consists of the Cache memory’which holds a subset of the machine instructions and data, the Director), memory which holds a subset of absolute addresses (main memory addresses) corresponding to the Cache’s instructions and data entries, the absolute Translation Lookaside Bufer (TLB) which holds recently translated absolute address entries, and the logical TLB which holds the virtual addresses (internally generated machine addresses) corresponding to absolute TLB’s address entries. The hit logic, which links all these memories together, is used to resolve whether the Cache memory holds the instructions or data requested by the processor each cycle. If the Cache holds the correct entry, the hit logic selects the appropriate entry from 4 possible sets simultaneously read out of the Cache. While most of the processor logic was implemented in static CMOS, performance requirements dictated that the hit logic employ precharge techniques to achieve a single cycle directory lookup and set selection. This in turn caused a number of circuit issues to surface which relate to the proper interfacing of static and precharged logic families. Flexible timing control was incorporated around these interfaces, where races exist, to obtain functional hardware at all process and test comers. Timing Diagram: Figure 2 shows the timing diagram for the logic and memory circuits. The down going edge of the Global Clock triggers both the capture and launch of signals through the logic registers, begins the precharge of the hit logic (Hit L. precharge), and starts the internal decode of addresses in the memory macros. About half way through the cycle, the 7ZB andDirectory memories send their outputs to the conipamtor logic which in turn drives the Anding arid Coritiriuation logic. The hit signal produced by the sum of those actions finally selects one of four possible cache entries driven out to the Cache Output Register. All memory accesses and logic evaluations happen in a single processor cycle. Recharge occurs when circuits are not evaluating. Each functional block in the memory circuits generates its own precharge using a self-resetting scheme2 (SRCMOS). Direcfoq) and T U memories produce a wide enough output pulse, approximately a third of the cycle, to guarantee sufficient overlap between their signals such that bit “Anding” done in XOR circuits, a part of the Directory Comparator, functions reliably. Once the hit logic topples, data stay latched until the next cycle precharge. The single phase clocking scheme described is prone to short path timing problems introduced through the precharge of the hit logic. Clock skew may cause the hit logic to precharge before a latch capturing the hit logic’s state has time to cl","PeriodicalId":175678,"journal":{"name":"Symposium 1997 on VLSI Circuits","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124846672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A 2.7 Volt CMOS Broadband Low Noise Amplifier 2.7伏CMOS宽带低噪声放大器

Symposium 1997 on VLSI Circuits

Pub Date : 1997-06-12 DOI: 10.1109/VLSIC.1997.623820

Janssens, Steyaert, Miyakawa

A full-CMOS broadband low-noise amplifier for application in a GSM receiver front-end is presented. The circuit employs a topology which does not require any tuning or trimming. The amplifier provides a forward gain of 24 dB between 0.9 GHz and 1 GHz, while the -3dB band ranges from 420 MHz to 1.2 GHz. The noise figure is less than 2.7 dB in the whole 300 MHz frequency band between 0.8 GHz and 1.1 GHz. Reverse isolation is better than 43.5 dB over the full measuring range. The prototype has been implemented in a 0 . 4 ~ CMOS technology and consumes 35 mW from a 2.7 Volt supply.

介绍了一种用于GSM接收机前端的全cmos宽带低噪声放大器。电路采用不需要任何调谐或修剪的拓扑结构。放大器在0.9 GHz和1 GHz之间提供24 dB的正向增益，而-3dB的频段范围为420 MHz到1.2 GHz。在0.8 GHz ~ 1.1 GHz的整个300mhz频段内，噪声系数小于2.7 dB。在整个测量范围内，反向隔离优于43.5 dB。原型已经在一个0。4 ~ CMOS技术，从2.7伏特电源消耗35兆瓦。

引用次数: 36

A Reduced Clock-swing Flip-flop (RCSFF) For 63% Clock Power Reduction 减少时钟摆动触发器(RCSFF)为63%时钟功耗降低

Symposium 1997 on VLSI Circuits

Pub Date : 1997-06-12 DOI: 10.1109/VLSIC.1997.623825

Kawaguchi, Sakurai

A reduced clock-swing flip-flop (RCSFF) is proposed, which is composed of a reduced swing clock driver and a special flip-flop which embodies the leak current cutoff mechanism. The RCSFF can reduce the clock system power of a VLSI down to one-third compared to the conventional flip-flop. This power improvement is achieved through the reduced clock swing down to 1 V. The area and the delay of the RCSFF can also be reduced by a factor of about 20% compared to the conventional flip- flop. The RCSFF can also reduce the delay of a long interconnect to one-half.

提出了一种减摇时钟触发器(RCSFF)，该触发器由减摇时钟驱动器和具有漏电流截止机制的特殊触发器组成。与传统触发器相比，RCSFF可以将VLSI的时钟系统功率降低到三分之一。这种功率改进是通过将时钟摆幅降低到1 V来实现的。与传统触发器相比，RCSFF的面积和延迟也可以减少约20%。RCSFF还可以将长互连的延迟减少到一半。

引用次数: 52

A Charge Tramfer Amplifier And Am Encoded Bus Aditectum For Low Power SRAM 一种用于低功耗SRAM的电荷转移放大器和Am编码总线电路

Symposium 1997 on VLSI Circuits

Pub Date : 1997-06-12 DOI: 10.1109/VLSIC.1997.623815

Kawashima, Mori, Sasagawa, Hamaminato, Wakayama, Sukegawa, Fukushi

We proposed and tested a low power SRAM using charge transfer (CT) pre-sense amplifier and a bus signal encoding scheme. The CT amplifier compensates the Vth mismatch between the pair MOS transistors, and the encoded bus signal reduces the number of wires being switched. They are dynamically controlled by a low power DLL we proposed. The fabricated SRAM worked at 5OMHz with the power dissipation of 5mW at 1V power supply. Introduction The conventional SRAM latch amplifiers, whose inputs are reset to VDD/2, have delay proportional to l/(VDD/2Vth), thus suffer much severely from VDD reduction in terms of the sensing delay increase. Furthermore, the Vth and Id variations in the pair MOS transistors can not be neglected in the sense amplifier design in the deep submicron technology era, because of increased process-origin electric characteristics variations and the reduced bit-line swing due to a small cell current. The following simulations and results are based on our 0.35-pm multi-Vth CMOS technology. A Charge-Tmfer Pre Amplifier The CT amplifier has a pronounced sensitivity and consumes small power. In a balanced use [I], it compensates the Vth mismatch of the two CT MOS transistors, and its incomplete precharge operation gives a switching time of a few ns (Fig.l(a)). We modified the CT-gate pulse to disconnect bit-lines before the latch operation, reducing the bit-line swings to decrease the bit-line charging power and increase the laich speed (Fig.l(b)). The bit-line precharge is performed using 1.5V precharge source, because low voltage SRAM needs bit-line potentials to be kept near VDD (=1V) for the cell stability. The gm improvement in a MOS technology with Tox= 5.5nm, the pulling up the bit lines to IV, and the differential signal scheme in SRAMs reduced the precharge period to 2.5ns, which is fast enough for a 5OMHz operation. Fig. 3(a) shows the simulated waveforms of the CT amplifier, whose circuit implementation is illustrated in the upper half of the Fig. 4. The CT amplifier is operated following the sequence, (1) through (6). First, (1) precharge bit-lines to lV, (2) further precharge the bit-lines towards 1 SV-Vth level through the nMOS CT gates, and (3) turn off the pull-up PMOS at the CT drains and activate a word-line. This makes the BL1 become lower than /BL1 due to a cell current and BI lowers faster than /B1, because Vgs-Vth of the CT on BL side is larger. (4)-(5) activate the nMOS cross coupled sense latch, and finally (6) precharge the bit-lines, B 1 and /B 1, for next cycle. At the beginning of (3), the incomplete bit-line charging is enough to overcome a mismatch or an offset of the sense latch as large as 0.1 V. Since the nMOS dynamic latch is activated at Vgs=VDD in the period (4), its delay is two times shorter than that of a half-VDD-precharge CMOS latch. The proposed CT amplifier consumes less current, which is 12.6pA with 1.5V power supply at SOMHz. With the penalty of the clocking powers, they consumes 580pW at 5OMHz

图5为该芯片的微照片。单元阵列由8个块组成，在一个块中，16位线对直接连接到16个CT放大器。此外，还使用了增强脉冲字线方案和MT-CMOS[3]下电技术。结论本文提出的CT放大器、编码总线和DLL电路的组合满足了采用深亚微米技术的中速、低压、低功耗sram的要求。[11]李国良等，《电子工程学报》，第11期，第596页(1976)。[2]李晓明，李晓明，李晓明，等。计算机工程学报，2009,p. 391(2005)。[3]李志强，李志强，李志强。低功率电子。1997 VLSl电路学术研讨会[j] .电子学报，90(1994)77 -93081 - 3-76-X。PC 3-43

{"title":"A Charge Tramfer Amplifier And Am Encoded Bus Aditectum For Low Power SRAM","authors":"Kawashima, Mori, Sasagawa, Hamaminato, Wakayama, Sukegawa, Fukushi","doi":"10.1109/VLSIC.1997.623815","DOIUrl":"https://doi.org/10.1109/VLSIC.1997.623815","url":null,"abstract":"We proposed and tested a low power SRAM using charge transfer (CT) pre-sense amplifier and a bus signal encoding scheme. The CT amplifier compensates the Vth mismatch between the pair MOS transistors, and the encoded bus signal reduces the number of wires being switched. They are dynamically controlled by a low power DLL we proposed. The fabricated SRAM worked at 5OMHz with the power dissipation of 5mW at 1V power supply. Introduction The conventional SRAM latch amplifiers, whose inputs are reset to VDD/2, have delay proportional to l/(VDD/2Vth), thus suffer much severely from VDD reduction in terms of the sensing delay increase. Furthermore, the Vth and Id variations in the pair MOS transistors can not be neglected in the sense amplifier design in the deep submicron technology era, because of increased process-origin electric characteristics variations and the reduced bit-line swing due to a small cell current. The following simulations and results are based on our 0.35-pm multi-Vth CMOS technology. A Charge-Tmfer Pre Amplifier The CT amplifier has a pronounced sensitivity and consumes small power. In a balanced use [I], it compensates the Vth mismatch of the two CT MOS transistors, and its incomplete precharge operation gives a switching time of a few ns (Fig.l(a)). We modified the CT-gate pulse to disconnect bit-lines before the latch operation, reducing the bit-line swings to decrease the bit-line charging power and increase the laich speed (Fig.l(b)). The bit-line precharge is performed using 1.5V precharge source, because low voltage SRAM needs bit-line potentials to be kept near VDD (=1V) for the cell stability. The gm improvement in a MOS technology with Tox= 5.5nm, the pulling up the bit lines to IV, and the differential signal scheme in SRAMs reduced the precharge period to 2.5ns, which is fast enough for a 5OMHz operation. Fig. 3(a) shows the simulated waveforms of the CT amplifier, whose circuit implementation is illustrated in the upper half of the Fig. 4. The CT amplifier is operated following the sequence, (1) through (6). First, (1) precharge bit-lines to lV, (2) further precharge the bit-lines towards 1 SV-Vth level through the nMOS CT gates, and (3) turn off the pull-up PMOS at the CT drains and activate a word-line. This makes the BL1 become lower than /BL1 due to a cell current and BI lowers faster than /B1, because Vgs-Vth of the CT on BL side is larger. (4)-(5) activate the nMOS cross coupled sense latch, and finally (6) precharge the bit-lines, B 1 and /B 1, for next cycle. At the beginning of (3), the incomplete bit-line charging is enough to overcome a mismatch or an offset of the sense latch as large as 0.1 V. Since the nMOS dynamic latch is activated at Vgs=VDD in the period (4), its delay is two times shorter than that of a half-VDD-precharge CMOS latch. The proposed CT amplifier consumes less current, which is 12.6pA with 1.5V power supply at SOMHz. With the penalty of the clocking powers, they consumes 580pW at 5OMHz","PeriodicalId":175678,"journal":{"name":"Symposium 1997 on VLSI Circuits","volume":"530 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134282192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Issue Logic For A 600 MHz Out-of-order Execution 600 MHz乱序执行的问题逻辑

Symposium 1997 on VLSI Circuits

Pub Date : 1997-06-12 DOI: 10.1109/VLSIC.1997.623777

J. Farrell, T. Fischer

The logic and circuits are presented for a 20-entry instruction queue which scoreboards 80 registers and issues four instructions per cycle in a 600-MHz microprocessor. The request logic and arbiter circuits that control integer execution are described in addition to a novel compaction scheme that maintains temporal order in the queue. The issue logic data path is implemented in 141000 transistors, occupying 10 mm/sup 2/ in a 0.35-/spl mu/m CMOS process.

给出了在600 mhz微处理器中一个20条指令队列的逻辑和电路，该队列记分80个寄存器，每个周期发出4条指令。此外，还描述了控制整数执行的请求逻辑和仲裁器电路，以及在队列中维护时间顺序的新颖压缩方案。问题逻辑数据路径在141000个晶体管中实现，在0.35-/spl mu/m CMOS工艺中占用10 mm/sup / 2/。

引用次数: 81

All-digital Multi-phase Delay Locked Loop For Internal Timing Generation In Embedded And/or High-speed DRAMs 用于嵌入式和/或高速dram内部时序生成的全数字多相延迟锁相环

Symposium 1997 on VLSI Circuits

Pub Date : 1997-06-12 DOI: 10.1109/VLSIC.1997.623830

Gotoh, Wakayama, Saito, Ogawa, Tamura, Okajima, Taguchi

We propose an all-digital, multi-phase delay locked loop (DLL) for internal timing generation in embedded DRAMs. The timing generation is achieved by combining the DLL with a command decoder and a resister controlled multi-phase clock counter. The DLL has four phase (d2 step) and six phase (n/3 step) output mode, and employs coarse and fine delay lines to minimize the delay line area while keeping the skew resolution down to a value obtainable by all-digital delay elements. Our DLL operates over a clock range of 125 to 400 MHz with skew adjustment error of*60 ps.

我们提出了一种全数字、多相延迟锁定环(DLL)，用于嵌入式dram的内部时序生成。时序生成是通过将DLL与命令解码器和电阻控制的多相时钟计数器相结合来实现的。DLL具有四相(d2步)和六相(n/3步)输出模式，并采用粗延迟线和细延迟线来最小化延迟线面积，同时将倾斜分辨率降低到全数字延迟元件可获得的值。我们的DLL在125至400 MHz的时钟范围内工作，倾斜调整误差为60 ps。

引用次数: 3

A 75mW 128 MHz DS-CDMA Baseband Correlator For High-speed Wireless Applications

Symposium 1997 on VLSI Circuits

Pub Date : 1997-06-12 DOI: 10.1109/VLSIC.1997.623835

Onodera, Gray

A DS-CDMA demodulator uses analog sampled-data signal processing to achieve a 75-mW power dissipation and a 128-MS/s processing rate in a 1.2- m double-metal double-poly CMOS process. To demodulate the signal, a low-power passive correlation technique is introduced that eliminates the integrating opamp with its associated power and settling time overhead. In a prototype demodulator, six 64-chip correlators recover the 2- Mb/s data stream from the doubly modulated (pseudorandom noise (PN) and Walsh) quadrature input signal. An on-chip 10-b pipelined ADC sampling at 8 MS/s follows the analog correlation to permit digital implementation of the acquisition and tracking algorithms.

DS-CDMA解调器采用模拟采样数据信号处理，在1.2 m双金属双聚CMOS工艺中实现75 mw的功耗和128 ms /s的处理速率。为了解调信号，引入了一种低功率无源相关技术，消除了集成运放及其相关的功率和稳定时间开销。在一个原型解调器中，6个64片相关器从双调制(伪随机噪声(PN)和沃尔什)正交输入信号中恢复2 Mb/s的数据流。片上10-b流水线ADC采样速度为8 MS/s，遵循模拟相关，允许数字化实现采集和跟踪算法。

引用次数: 7