Pub Date : 1994-06-09DOI: 10.1109/VLSIC.1994.586235
F. Herrmann, C. Sodini
A 256-element associative processing chip is designed for pixel-parallel image processing and machine vision applications. A five-transistor three-state dynamic memory cell is used, and each processing element has 64 trits of memory. Other processing element components include a function generator, an activity register, and connections to a reconfigurable mesh network and a response resolution subsystem. These are implemented with compact circuits designed within memory pitch constraints. The chip was fabricated in a double-poly CCD-CMOS process and characterized as fully functional. A sample image processing application is demonstrated on a four-chip prototype system. >
{"title":"A 256-element Associative Parallel Processor","authors":"F. Herrmann, C. Sodini","doi":"10.1109/VLSIC.1994.586235","DOIUrl":"https://doi.org/10.1109/VLSIC.1994.586235","url":null,"abstract":"A 256-element associative processing chip is designed for pixel-parallel image processing and machine vision applications. A five-transistor three-state dynamic memory cell is used, and each processing element has 64 trits of memory. Other processing element components include a function generator, an activity register, and connections to a reconfigurable mesh network and a response resolution subsystem. These are implemented with compact circuits designed within memory pitch constraints. The chip was fabricated in a double-poly CCD-CMOS process and characterized as fully functional. A sample image processing application is demonstrated on a four-chip prototype system. >","PeriodicalId":350730,"journal":{"name":"Proceedings of 1994 IEEE Symposium on VLSI Circuits","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126504143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-06-09DOI: 10.1109/VLSIC.1994.586203
Sungjoon Kim, Kyeongho Lee, D. Jeong, Yunho Choi
This paper describes a new skew-insensitive U0 scheme for high bandwidth processor-memory communication, which alleviates the interchip skew problem. High speed transmission up to 770Mbaud was obtained with multiphase clocks generated by phase-locked loop circuit. Interchip skew can be adjusted by the dual loop delay-locked loop based receiver. It is fabricated with 0.9pm CMOS process.
{"title":"A Pseudo-Synchronous Skew-Insensitive I/O Scheme for High Band width Memories","authors":"Sungjoon Kim, Kyeongho Lee, D. Jeong, Yunho Choi","doi":"10.1109/VLSIC.1994.586203","DOIUrl":"https://doi.org/10.1109/VLSIC.1994.586203","url":null,"abstract":"This paper describes a new skew-insensitive U0 scheme for high bandwidth processor-memory communication, which alleviates the interchip skew problem. High speed transmission up to 770Mbaud was obtained with multiphase clocks generated by phase-locked loop circuit. Interchip skew can be adjusted by the dual loop delay-locked loop based receiver. It is fabricated with 0.9pm CMOS process.","PeriodicalId":350730,"journal":{"name":"Proceedings of 1994 IEEE Symposium on VLSI Circuits","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123951191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-06-09DOI: 10.1109/VLSIC.1994.586225
A. Fujiwara, H. Kikukawa, K. Matsuyama, M. Agata, S. Iwanari, M. Fukumoto, T. Yamada, S. Okada, T. Fujita
{"title":"A 200mhz 16mbit Synchronous Dram With Block Access Mode","authors":"A. Fujiwara, H. Kikukawa, K. Matsuyama, M. Agata, S. Iwanari, M. Fukumoto, T. Yamada, S. Okada, T. Fujita","doi":"10.1109/VLSIC.1994.586225","DOIUrl":"https://doi.org/10.1109/VLSIC.1994.586225","url":null,"abstract":"","PeriodicalId":350730,"journal":{"name":"Proceedings of 1994 IEEE Symposium on VLSI Circuits","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115356200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-06-09DOI: 10.1109/VLSIC.1994.586211
M. P. Marks
The application trends and key design tradeoffs involved in the design of high performance microprocessors are discussed. In particular the growing importance of improved human interface technology and the need for high-performance, highly-connected systems combined with the need to continue to drive costs lower make design choices very difficult. Technical innovation in the areas of high bandwidth processor-memory interfaces, low-cost multiprocessors and software compatibility are needed in order to continue move forward. >
{"title":"Future Directions In Microprocessor Technology","authors":"M. P. Marks","doi":"10.1109/VLSIC.1994.586211","DOIUrl":"https://doi.org/10.1109/VLSIC.1994.586211","url":null,"abstract":"The application trends and key design tradeoffs involved in the design of high performance microprocessors are discussed. In particular the growing importance of improved human interface technology and the need for high-performance, highly-connected systems combined with the need to continue to drive costs lower make design choices very difficult. Technical innovation in the areas of high bandwidth processor-memory interfaces, low-cost multiprocessors and software compatibility are needed in order to continue move forward. >","PeriodicalId":350730,"journal":{"name":"Proceedings of 1994 IEEE Symposium on VLSI Circuits","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127104560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-06-09DOI: 10.1109/VLSIC.1994.586213
D.J. Lee, R. Cernea, M. Mofidi, S. Mehrotra, E. Y. Chang, W.Y. Chien, L. Goh, J.H. Yuan, A. Mihnea, G. Samachisa, Y. Fong, D. Guterman, R. D. Norman
High density FLASH EEPROM for solid state disk applications requires minimization of die area while maintaining the flexibility and controllability needed for low cost storage systems. This 18Mb serial FLASH EEPROM utilizes standard 512B sectoring for erase and high parallelism for program and read operations. Erase sector grouping reduces the erase selection circuits by a factor of four over previous designs and a 256 bit programming chunk size increases the program data rate by a factor of four while a shared data latch architecture maintains a similar cell size versus sense area pitch compared to previous designs [l]. In addition, a serial chip selection scheme1 which requires minimal die area enables multiple chip operations to be easily performed. The die is fabricated using a triple poly, single metal, twin well 0.511 CMOS process with memory cell size of 2 . 1 ~ ~ and die size of 396 mils X 290 mils (74 mm2). The split gate, buried n+ source/drain, virtual ground Flash EEPROM memory uses channel hot electron injection for programming and inter poly dielectric tunneling for erase (see figure 1 for cell operation voltages). SERIAL INTERFACE / CHIP SELECTION
{"title":"An 18mb Serial Flash Eeprom For Solid-state Disk Applications","authors":"D.J. Lee, R. Cernea, M. Mofidi, S. Mehrotra, E. Y. Chang, W.Y. Chien, L. Goh, J.H. Yuan, A. Mihnea, G. Samachisa, Y. Fong, D. Guterman, R. D. Norman","doi":"10.1109/VLSIC.1994.586213","DOIUrl":"https://doi.org/10.1109/VLSIC.1994.586213","url":null,"abstract":"High density FLASH EEPROM for solid state disk applications requires minimization of die area while maintaining the flexibility and controllability needed for low cost storage systems. This 18Mb serial FLASH EEPROM utilizes standard 512B sectoring for erase and high parallelism for program and read operations. Erase sector grouping reduces the erase selection circuits by a factor of four over previous designs and a 256 bit programming chunk size increases the program data rate by a factor of four while a shared data latch architecture maintains a similar cell size versus sense area pitch compared to previous designs [l]. In addition, a serial chip selection scheme1 which requires minimal die area enables multiple chip operations to be easily performed. The die is fabricated using a triple poly, single metal, twin well 0.511 CMOS process with memory cell size of 2 . 1 ~ ~ and die size of 396 mils X 290 mils (74 mm2). The split gate, buried n+ source/drain, virtual ground Flash EEPROM memory uses channel hot electron injection for programming and inter poly dielectric tunneling for erase (see figure 1 for cell operation voltages). SERIAL INTERFACE / CHIP SELECTION","PeriodicalId":350730,"journal":{"name":"Proceedings of 1994 IEEE Symposium on VLSI Circuits","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125859472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-06-09DOI: 10.1109/VLSIC.1994.586253
P. Weger, W. Simburger, H. Knapp, T. Leslie, N. Rohringer, J. Popp, G. Schultes, A. Scholtz, L. Treitinger
For the first time, a fully functional single-chip "Direct Conversion Transceiver" for future mobile communications is presented. This single-chip transceiver contains all RF functions except the antenna. It has been realised in a silicon bipolar 0,s pm / 25 GHz technology and is operating up to 1.5 GHz. This enables for the first time a two-chip "communicator" handheld (RF transceiver chip + baseband processor chip).
{"title":"Completely Integrated 1.5 Ghz Direct Conversion Transceiver","authors":"P. Weger, W. Simburger, H. Knapp, T. Leslie, N. Rohringer, J. Popp, G. Schultes, A. Scholtz, L. Treitinger","doi":"10.1109/VLSIC.1994.586253","DOIUrl":"https://doi.org/10.1109/VLSIC.1994.586253","url":null,"abstract":"For the first time, a fully functional single-chip \"Direct Conversion Transceiver\" for future mobile communications is presented. This single-chip transceiver contains all RF functions except the antenna. It has been realised in a silicon bipolar 0,s pm / 25 GHz technology and is operating up to 1.5 GHz. This enables for the first time a two-chip \"communicator\" handheld (RF transceiver chip + baseband processor chip).","PeriodicalId":350730,"journal":{"name":"Proceedings of 1994 IEEE Symposium on VLSI Circuits","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128756686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-06-09DOI: 10.1109/VLSIC.1994.586201
M. Izzard, C.G. Thisell, H. Mader, M. Hedberg, P. Fung, H. Chang, B. Larsson, V. Gopinathan, D. Scott
quadrant pointer bits are decoded to provide a sign bit; this A novel comparison of analog and digital design techniques is maps the linear integrator output onto the phasor diagram. The a high performmce clock syllchronizatioll scheme. single integrator output is converted into complemeiitary curme digital-controlld synchronizer was fabricated using a rents; the result of this is a non-linear phase characteristic (un0.6pm B~CMOS technology. A new and emitter less it is predistorted) and a non-uniform Signal amplitude from logic gate enabled fully symmetric differential ECL logic oper- the Phase rotator. ations down to 2.5V without forward biasing of the switching The generic systems above can both be made to align a halfcollector-base junctions. The analog solution consumes 1/4 of baud clock to the data if the rotated clock is split into a quadrathe power of the digital. ture pair and if the special phase detector shown in fig. 4 is used. The clock synchronizer It is based on two sampling type detectors [2]. The quadrature clock is used as a marker to differentiate between the positive The clock syiichronizer is a data retiming system using a control aid negative edges of the main clock. The sample of the quadraloop but no oscillator. A reference clock is rotated, by means of ture clock, S, becomes the sign of the sample of the main clock, a phase rotator, to match the phase of the input data. P. The sign bit divides the phasor diagram into two hemispheres. The phase rotator is a circuit that can produce an arbitrary phase After lock, the data from an input which is a quadrature signal pair. Refer to fig. 1. The novel building block for the MUX: latch and XOR gate that The phase selection is in response to a pair of weighting factor make up the phase detector is shown in fig. 5. It is a differential signals (analog) and a pair of quadrant pointer bits (digital). The ECL design that avoids stacking BJTs and hence is capable of core of the rotator is a weighted summer, which interpolates the operation down to 2SV, without forward biasing the switching input phases to produce the output (see for example [l]). The BJTs; this is not possible with conventional ECL where BJTs pointer bits can be used to change the sign of the quadrature must be stacked for some functions. It allows completely symfeed clocks as a method of reaching all quadrants. If the clocks metrical XOR realization, which is required in the phase detecare square wave, bandwidth limiting is required on either the in- tor; see fig. 6. An AND/OR gate is also realizable. The structure put or the output of the rotator. The weighting factor can be gen- is implemented with a metal-progammable cell layout. erated by closed loop control. A phase detector is used to provide a command sigllal to a filter. ne filter output is the weight- Measured silicon results are available for the DCS and simulaillg factor pair. The filter can be digital or analog. tion results are available for the ACS and D
{"title":"Analog Versus Digital Control of a Clock Synchronizer for 3 gb/s Data with 3.ov Differential Ecl","authors":"M. Izzard, C.G. Thisell, H. Mader, M. Hedberg, P. Fung, H. Chang, B. Larsson, V. Gopinathan, D. Scott","doi":"10.1109/VLSIC.1994.586201","DOIUrl":"https://doi.org/10.1109/VLSIC.1994.586201","url":null,"abstract":"quadrant pointer bits are decoded to provide a sign bit; this A novel comparison of analog and digital design techniques is maps the linear integrator output onto the phasor diagram. The a high performmce clock syllchronizatioll scheme. single integrator output is converted into complemeiitary curme digital-controlld synchronizer was fabricated using a rents; the result of this is a non-linear phase characteristic (un0.6pm B~CMOS technology. A new and emitter less it is predistorted) and a non-uniform Signal amplitude from logic gate enabled fully symmetric differential ECL logic oper- the Phase rotator. ations down to 2.5V without forward biasing of the switching The generic systems above can both be made to align a halfcollector-base junctions. The analog solution consumes 1/4 of baud clock to the data if the rotated clock is split into a quadrathe power of the digital. ture pair and if the special phase detector shown in fig. 4 is used. The clock synchronizer It is based on two sampling type detectors [2]. The quadrature clock is used as a marker to differentiate between the positive The clock syiichronizer is a data retiming system using a control aid negative edges of the main clock. The sample of the quadraloop but no oscillator. A reference clock is rotated, by means of ture clock, S, becomes the sign of the sample of the main clock, a phase rotator, to match the phase of the input data. P. The sign bit divides the phasor diagram into two hemispheres. The phase rotator is a circuit that can produce an arbitrary phase After lock, the data from an input which is a quadrature signal pair. Refer to fig. 1. The novel building block for the MUX: latch and XOR gate that The phase selection is in response to a pair of weighting factor make up the phase detector is shown in fig. 5. It is a differential signals (analog) and a pair of quadrant pointer bits (digital). The ECL design that avoids stacking BJTs and hence is capable of core of the rotator is a weighted summer, which interpolates the operation down to 2SV, without forward biasing the switching input phases to produce the output (see for example [l]). The BJTs; this is not possible with conventional ECL where BJTs pointer bits can be used to change the sign of the quadrature must be stacked for some functions. It allows completely symfeed clocks as a method of reaching all quadrants. If the clocks metrical XOR realization, which is required in the phase detecare square wave, bandwidth limiting is required on either the in- tor; see fig. 6. An AND/OR gate is also realizable. The structure put or the output of the rotator. The weighting factor can be gen- is implemented with a metal-progammable cell layout. erated by closed loop control. A phase detector is used to provide a command sigllal to a filter. ne filter output is the weight- Measured silicon results are available for the DCS and simulaillg factor pair. The filter can be digital or analog. tion results are available for the ACS and D","PeriodicalId":350730,"journal":{"name":"Proceedings of 1994 IEEE Symposium on VLSI Circuits","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115630510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-06-09DOI: 10.1109/VLSIC.1994.586172
H. Yamauchi, H. Akamatsu, T. Fujita
The ultra-high data rate of 25Gbls and beyond is one of the most important design requirements in realizing the future ULSI's for super highdefiition (HD) moving pictures and graphics applications in consumer electronics. The most effective means to achieve such a data rate is to employ a large number of the buses interconnecting the embedded memory, the graphics controller, efc, on a ULSI system chip. For example, even at an operating frequency of SOMHz, parallel buses of more than 512 are required. However, a drastic increase of the power dissipation is inevitable due to the increased bus-capacitance. Even if suppressing the bus-swing to less than 1V[1], there still remains a bus-power dissipation of a far above 5OOmW, which is intolerable for battery operation as shown in Fig.1. Hence, this paper proposes a complete Charge-Recycling Bus(CRB) architecture that can reduce the bus-power dissipation to less than 15% of the conventional suppressed bus-swing scheme while realizing the data rate of 25Gbts. t of New Bus ( As shown in Fig. 2, even when using suppressed bus-swing (Vcc/k) scheme of Conv.Bq(k=4) coupled with a down converter, the power cannot be reduced adequately for battery operation. This is because there still remain a contribution from the total bus-capacitance nCd, where n and Cd are the number of total bus pairs and the capacitance of the individual bus, respectively. The key concept of the CRB architecture is virtual stacking of the individual buscapacitance Cd into a series configuration to reduce not only the bus-swing but also the total equivalent bus-capacitance. When the practical data-path layouts in ULSrs are considered, data buses are classified into a local bus (path width = 2 bit ) and a global bus (path width 2 4 bit) depending on the hierarchy of data-path. Hence, we propose the Local and Global CRB schemes for the local and global buses, respectively. The Local CRB scheme is virtual connection of two bus-capacitance Cd and a smaller dummy capacitance Cs in series while the Global CRB scheme is virtual stacking of numerous buscapacitance Cd in series. Regarding the i-th complementary bus pair in the two CBR schemes, the input INi and the equalization signal EQ establish not only the output signals, Di and XDi, but also the bus-level signals at nodes H and L as follows: 1)in former half of the clock cycle, the EQ signal synchronized with the system clock equalize Di and XDi of the complementary bus pair ; 2)in latter half of the clock cycle, the input signal INi switches higher (or lower) level of the bus-output pair Di and XDi to the node H (or L) according to the m t h table in Fig.3. In addition, the clocked altemating dummy Cs pair inserted between two pairs of complementary bus is an important element, especially when the number of the buses running in parallel is short as the case of the Local CRB scheme, where the clocked signal CLH of a half of the system clock frequency altemates the connections of the A and E)
25gb及以上的超高数据速率是实现未来超高清(HD)运动图像和消费电子图形应用的ULSI最重要的设计要求之一。实现这种数据速率的最有效手段是在ULSI系统芯片上使用大量的总线连接嵌入式存储器、图形控制器等。例如,即使在工作频率为SOMHz时,也需要512以上的并行总线。然而,由于母线电容的增加,功耗的急剧增加是不可避免的。即使将母线摆幅抑制到小于1V[1],仍然存在远高于5OOmW的母线功耗,如图1所示,这对于电池的工作来说是无法忍受的。因此,本文提出了一种完整的电荷回收总线(CRB)架构,该架构可以将总线功耗降低到传统抑制总线摆幅方案的15%以下,同时实现25Gbts的数据速率。如图2所示,即使采用con . bq (k=4)的抑制母线摆幅(Vcc/k)方案与下变频器相结合,也不能充分降低电池工作所需的功率。这是因为总母线电容nCd仍然有贡献,其中n和Cd分别是总母线对的数量和单个母线的电容。CRB架构的关键概念是将单个母线电容Cd虚拟堆叠成串联配置,不仅可以减少母线摆幅,还可以减少总等效母线电容。考虑到ulsr中实际的数据路径布局,根据数据路径的层次结构,将数据总线分为本地总线(路径宽度为2位)和全局总线(路径宽度为2.4位)。因此,我们分别为本地和全球公交车提出了本地和全球CRB方案。局部CRB方案是两个母线电容Cd和一个较小的虚电容Cs串联的虚拟连接,全局CRB方案是多个母线电容Cd串联的虚拟堆叠。对于两种CBR方案中的第i对互补母线,输入INi和均衡信号EQ不仅建立了输出信号Di和XDi,还建立了节点H和L的总线电平信号,具体如下:1)在时钟周期的前半段,与系统时钟同步的EQ信号均衡互补母线对的Di和XDi;2)在时钟周期后半段,输入信号INi根据图3中的m - t - H表,将母线输出对Di和XDi的高电平(或低电平)切换到节点H(或L)。此外,插入在两对互补母线之间的时钟交替假c对也是一个重要的元素,特别是在并行母线数量较少的情况下,如Local CRB方案,其中一半系统时钟频率的时钟信号CLH交替使用假电容对c的a和E)节点与节点H和L的连接,如图3所示。通过将第i-lth母线(或电容)对的节点L与第i母线(或电容)对的节点H连接,实现了多母线电容Cd与虚电容Cs的串联连接,如图3所示。例如,在Local CRB方案中,当IN1 = "1", CLH = "0"时,D1和XD1分别连接到c的Vcc和B节点。另一方面,在Global CRB方案中,当i-1 = "1"时。INi =“0”,M i + l =“0”,Di和XDi分别连接到下相邻母线对的XDitl和上相邻母线对的XDi-1。
{"title":"A Low Power Complete Charge-Recycling Bus Architecture for Ultra-High Data Rate Ulsi's","authors":"H. Yamauchi, H. Akamatsu, T. Fujita","doi":"10.1109/VLSIC.1994.586172","DOIUrl":"https://doi.org/10.1109/VLSIC.1994.586172","url":null,"abstract":"The ultra-high data rate of 25Gbls and beyond is one of the most important design requirements in realizing the future ULSI's for super highdefiition (HD) moving pictures and graphics applications in consumer electronics. The most effective means to achieve such a data rate is to employ a large number of the buses interconnecting the embedded memory, the graphics controller, efc, on a ULSI system chip. For example, even at an operating frequency of SOMHz, parallel buses of more than 512 are required. However, a drastic increase of the power dissipation is inevitable due to the increased bus-capacitance. Even if suppressing the bus-swing to less than 1V[1], there still remains a bus-power dissipation of a far above 5OOmW, which is intolerable for battery operation as shown in Fig.1. Hence, this paper proposes a complete Charge-Recycling Bus(CRB) architecture that can reduce the bus-power dissipation to less than 15% of the conventional suppressed bus-swing scheme while realizing the data rate of 25Gbts. t of New Bus ( As shown in Fig. 2, even when using suppressed bus-swing (Vcc/k) scheme of Conv.Bq(k=4) coupled with a down converter, the power cannot be reduced adequately for battery operation. This is because there still remain a contribution from the total bus-capacitance nCd, where n and Cd are the number of total bus pairs and the capacitance of the individual bus, respectively. The key concept of the CRB architecture is virtual stacking of the individual buscapacitance Cd into a series configuration to reduce not only the bus-swing but also the total equivalent bus-capacitance. When the practical data-path layouts in ULSrs are considered, data buses are classified into a local bus (path width = 2 bit ) and a global bus (path width 2 4 bit) depending on the hierarchy of data-path. Hence, we propose the Local and Global CRB schemes for the local and global buses, respectively. The Local CRB scheme is virtual connection of two bus-capacitance Cd and a smaller dummy capacitance Cs in series while the Global CRB scheme is virtual stacking of numerous buscapacitance Cd in series. Regarding the i-th complementary bus pair in the two CBR schemes, the input INi and the equalization signal EQ establish not only the output signals, Di and XDi, but also the bus-level signals at nodes H and L as follows: 1)in former half of the clock cycle, the EQ signal synchronized with the system clock equalize Di and XDi of the complementary bus pair ; 2)in latter half of the clock cycle, the input signal INi switches higher (or lower) level of the bus-output pair Di and XDi to the node H (or L) according to the m t h table in Fig.3. In addition, the clocked altemating dummy Cs pair inserted between two pairs of complementary bus is an important element, especially when the number of the buses running in parallel is short as the case of the Local CRB scheme, where the clocked signal CLH of a half of the system clock frequency altemates the connections of the A and E)","PeriodicalId":350730,"journal":{"name":"Proceedings of 1994 IEEE Symposium on VLSI Circuits","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127784334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-06-09DOI: 10.1109/VLSIC.1994.586210
T. Sudo
Multichip modules (MCM's) have been actively developed in recent years. They are expected to provide high-performance systems by packing bare chips at a high density. In particular, a thin-film interconnect substrate that can accommodate higher wiring capacity in a few layers is a new option for coping with high pin count and fine pad pitch VLSI's. MCM's require various kinds of technologies including the fabrication processes of interconnect substrates, chip connection methods, electrical design, thermal management, known good die (KGD), and so on. The state of the art of MCM technologies is reviewed and future directions are discussed. >
{"title":"Present and Future Directions for Multichip Module Technologies","authors":"T. Sudo","doi":"10.1109/VLSIC.1994.586210","DOIUrl":"https://doi.org/10.1109/VLSIC.1994.586210","url":null,"abstract":"Multichip modules (MCM's) have been actively developed in recent years. They are expected to provide high-performance systems by packing bare chips at a high density. In particular, a thin-film interconnect substrate that can accommodate higher wiring capacity in a few layers is a new option for coping with high pin count and fine pad pitch VLSI's. MCM's require various kinds of technologies including the fabrication processes of interconnect substrates, chip connection methods, electrical design, thermal management, known good die (KGD), and so on. The state of the art of MCM technologies is reviewed and future directions are discussed. >","PeriodicalId":350730,"journal":{"name":"Proceedings of 1994 IEEE Symposium on VLSI Circuits","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132538118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-06-09DOI: 10.1109/VLSIC.1994.586236
Feng Chen, B. Leung
A second-order sigma-delta modulator with a 3-b internal quantizer employing the individual level averaging technique has been designed and implemented in a 1.2 /spl mu/m CMOS technology. Testing results show no observable harmonic distortion components above the noise floor. Peak S/(N+D) ratio of 91 dB and dynamic range of 96 dB have been achieved at a clock rate of 2.56 MHz for a 20 kHz baseband. No tone is observed in the baseband as the amplitude of a 10 kHz input sine wave is reduced from -0.5 dB to -107 dB below the voltage reference. The active area of the prototype chip is 3.1 mm/sup 2/ and it dissipates 67.5 mW of power from a 5 V supply. >
{"title":"A High Resolution Multibit Sigma-delta Modulator With Individual Level Averaging","authors":"Feng Chen, B. Leung","doi":"10.1109/VLSIC.1994.586236","DOIUrl":"https://doi.org/10.1109/VLSIC.1994.586236","url":null,"abstract":"A second-order sigma-delta modulator with a 3-b internal quantizer employing the individual level averaging technique has been designed and implemented in a 1.2 /spl mu/m CMOS technology. Testing results show no observable harmonic distortion components above the noise floor. Peak S/(N+D) ratio of 91 dB and dynamic range of 96 dB have been achieved at a clock rate of 2.56 MHz for a 20 kHz baseband. No tone is observed in the baseband as the amplitude of a 10 kHz input sine wave is reduced from -0.5 dB to -107 dB below the voltage reference. The active area of the prototype chip is 3.1 mm/sup 2/ and it dissipates 67.5 mW of power from a 5 V supply. >","PeriodicalId":350730,"journal":{"name":"Proceedings of 1994 IEEE Symposium on VLSI Circuits","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128585819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}