Pub Date : 1996-10-30DOI: 10.1109/VLSISP.1996.558360
O. Sentieys, D. Chillet, J. Diguet, J. Philippe
High level synthesis studies have produced many tools which enable us to design the processing unit of applications. The emergence of new communication services has lead to significant growth in the amount of data to be processed in VLSI chips. It involves to synthesis of memory architecture which enables us to satisfy all the application constraints. To obtain this organization, the first step is to select memory from a component library. This paper suggests a formulation of this problem through a minimization of function under constraints. Our approach takes place after the processing unit synthesis and our methodology can be applied to FPGA chips.
{"title":"Memory module selection for high level synthesis","authors":"O. Sentieys, D. Chillet, J. Diguet, J. Philippe","doi":"10.1109/VLSISP.1996.558360","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558360","url":null,"abstract":"High level synthesis studies have produced many tools which enable us to design the processing unit of applications. The emergence of new communication services has lead to significant growth in the amount of data to be processed in VLSI chips. It involves to synthesis of memory architecture which enables us to satisfy all the application constraints. To obtain this organization, the first step is to select memory from a component library. This paper suggests a formulation of this problem through a minimization of function under constraints. Our approach takes place after the processing unit synthesis and our methodology can be applied to FPGA chips.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130873092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-10-30DOI: 10.1109/VLSISP.1996.558378
Hangu Yeo, Y. Hu
In this paper, architectures which can support the block-based real time motion estimation of video signals using various search methods have been presented. The design efforts are focused on the processor-level design with a new matching criterion. With the new binary level matching criterion which performs a bit-wise comparison instead of the conventional eight-bit addition/subtraction, we could achieve a simple processor-level design with fewer input/output lines and lower power consumption.
{"title":"A novel architecture and processor-level design based on a new matching criterion for video compression","authors":"Hangu Yeo, Y. Hu","doi":"10.1109/VLSISP.1996.558378","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558378","url":null,"abstract":"In this paper, architectures which can support the block-based real time motion estimation of video signals using various search methods have been presented. The design efforts are focused on the processor-level design with a new matching criterion. With the new binary level matching criterion which performs a bit-wise comparison instead of the conventional eight-bit addition/subtraction, we could achieve a simple processor-level design with fewer input/output lines and lower power consumption.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132993370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-10-30DOI: 10.1109/VLSISP.1996.558305
G. Fischer, N.K. Modadugu
This paper describes the design of a monolithic direct digital frequency synthesizer. The circuit realizes a 12 bit output sine wave with a frequency resolution of 32 bit. The core of the 1.2 /spl mu/m CMOS implementation consists of approximately 6,000 transistors and occupies an area not larger than 1.5 mm/sup 2/. The circuit is aimed at a maximum tuning range of 100 MHz, or equivalently, a clock rate of 200 MHz. This upper value yields a minimum frequency increment of 0.023 Hz. The system exhibits a total latency of 14 clock periods.
{"title":"Design of a compact direct digital frequency synthesizer with 12 bit amplitude and 32 bit frequency resolution","authors":"G. Fischer, N.K. Modadugu","doi":"10.1109/VLSISP.1996.558305","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558305","url":null,"abstract":"This paper describes the design of a monolithic direct digital frequency synthesizer. The circuit realizes a 12 bit output sine wave with a frequency resolution of 32 bit. The core of the 1.2 /spl mu/m CMOS implementation consists of approximately 6,000 transistors and occupies an area not larger than 1.5 mm/sup 2/. The circuit is aimed at a maximum tuning range of 100 MHz, or equivalently, a clock rate of 200 MHz. This upper value yields a minimum frequency increment of 0.023 Hz. The system exhibits a total latency of 14 clock periods.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"319 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133367073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-10-30DOI: 10.1109/VLSISP.1996.558381
C. Lemonds
Digital signal processing (DSP) chips are of prime interest in the rapidly growing wireless communications market. Although the demand for higher performance continues to escalate, power consumption is also a concern. The most direct way to lower the power is to lower the supply voltage. Scaling technologies into the sub-micron domain has also led to the scaling of the supply voltage due to excessively high electric fields. Lowering the supply voltage by itself lowers power consumption but the performance degrades drastically. In order to improve performance while lowering the supply voltage it is necessary to scale the threshold voltage along with the supply voltage. This paper focuses on a 16 by 16 array multiplier that operates with a one volt power supply at a clock frequency of 500 MHz. The multiplier is implemented in dual rail domino logic using a 0.25 /spl mu/m multi-threshold CMOS process and has four cycles of latency.
{"title":"A 500 MHz, one volt, 16 by 16 bit multiplier for DSP cores","authors":"C. Lemonds","doi":"10.1109/VLSISP.1996.558381","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558381","url":null,"abstract":"Digital signal processing (DSP) chips are of prime interest in the rapidly growing wireless communications market. Although the demand for higher performance continues to escalate, power consumption is also a concern. The most direct way to lower the power is to lower the supply voltage. Scaling technologies into the sub-micron domain has also led to the scaling of the supply voltage due to excessively high electric fields. Lowering the supply voltage by itself lowers power consumption but the performance degrades drastically. In order to improve performance while lowering the supply voltage it is necessary to scale the threshold voltage along with the supply voltage. This paper focuses on a 16 by 16 array multiplier that operates with a one volt power supply at a clock frequency of 500 MHz. The multiplier is implemented in dual rail domino logic using a 0.25 /spl mu/m multi-threshold CMOS process and has four cycles of latency.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115130044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-10-30DOI: 10.1109/VLSISP.1996.558281
V. Jain, L. Lin
Implementation of systolic arrays has been hindered in the past due to a lack of building blocks, or cells. This paper presents a high-speed floating-point DSP coprocessor cell for rapid computation of nonlinear functions. Several nonlinear functions are typically needed in systolic arrays for signal and image processing algorithms, while the development costs as well as interconnection considerations warrant the use of only a few types of cells. With our approach all of the nonlinear functions needed can be incorporated on a single cell. Furthermore, a new result is produced every two clock cycles in a pipeline mode. The underlying principle which has made the combined goals of high-speed and multi-functionality possible, is significance-based second order interpolation of very small ROM tables. A 32 bit two-cycle chip for computing the square-root, fabricated in 2.0 micron CMOS technology, is presented. As an application example, a parallel architecture for CT image reconstruction for a Fan Beam CT System is briefly discussed.
{"title":"Floating-point nonlinear DSP coprocessor cell-two cycle chip","authors":"V. Jain, L. Lin","doi":"10.1109/VLSISP.1996.558281","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558281","url":null,"abstract":"Implementation of systolic arrays has been hindered in the past due to a lack of building blocks, or cells. This paper presents a high-speed floating-point DSP coprocessor cell for rapid computation of nonlinear functions. Several nonlinear functions are typically needed in systolic arrays for signal and image processing algorithms, while the development costs as well as interconnection considerations warrant the use of only a few types of cells. With our approach all of the nonlinear functions needed can be incorporated on a single cell. Furthermore, a new result is produced every two clock cycles in a pipeline mode. The underlying principle which has made the combined goals of high-speed and multi-functionality possible, is significance-based second order interpolation of very small ROM tables. A 32 bit two-cycle chip for computing the square-root, fabricated in 2.0 micron CMOS technology, is presented. As an application example, a parallel architecture for CT image reconstruction for a Fan Beam CT System is briefly discussed.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127014057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-10-30DOI: 10.1109/VLSISP.1996.558275
P. Carvey
Wireless and low-power technology for interconnecting personal electronic accessories (PEAs) and a wearable basestation worn by a user is presented. This technology enables applications such as personal inertial navigation, medical monitoring, sports training, and virtual reality. PEAs require small size and weight, protocol robustness, and ultra low power consumption, Exploiting the short interconnect distance and designing an air interface protocol specifically for low power consumption allows reducing the analog/RF section power consumption to under five nanojoules per bit. DSP control of the LO, filters, PLL, power management, TDMA event control, FEC encoding and decoding, matched filters, and transducer present architectural challenges to achieve matching power consumption.
{"title":"Technology for the wireless interconnection of wearable personal electronic accessories","authors":"P. Carvey","doi":"10.1109/VLSISP.1996.558275","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558275","url":null,"abstract":"Wireless and low-power technology for interconnecting personal electronic accessories (PEAs) and a wearable basestation worn by a user is presented. This technology enables applications such as personal inertial navigation, medical monitoring, sports training, and virtual reality. PEAs require small size and weight, protocol robustness, and ultra low power consumption, Exploiting the short interconnect distance and designing an air interface protocol specifically for low power consumption allows reducing the analog/RF section power consumption to under five nanojoules per bit. DSP control of the LO, filters, PLL, power management, TDMA event control, FEC encoding and decoding, matched filters, and transducer present architectural challenges to achieve matching power consumption.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129281385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-10-30DOI: 10.1109/VLSISP.1996.558373
Yung-Pin Lee, Liang-Gee Chen, Mei-Juan Chen, Chung-Wei Ku
Among various transform techniques for image compression, the discrete cosine transform (DCT) is the most popular and effective one in practical applications because it gives an almost optimal performance and can be implemented at an acceptable cost. We describe a novel 8/spl times/8 2-D DCT/IDCT architecture based on the direct 2-D approach and the rotation technique. The computational complexity is reduced by taking advantage of the special attribute of complex numbers. Unlike other direct approach, the proposed architecture is regular, hence, it is suitable for VLSI implementation.
{"title":"A new design and implementation of 8/spl times/8 2-D DCT/IDCT","authors":"Yung-Pin Lee, Liang-Gee Chen, Mei-Juan Chen, Chung-Wei Ku","doi":"10.1109/VLSISP.1996.558373","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558373","url":null,"abstract":"Among various transform techniques for image compression, the discrete cosine transform (DCT) is the most popular and effective one in practical applications because it gives an almost optimal performance and can be implemented at an acceptable cost. We describe a novel 8/spl times/8 2-D DCT/IDCT architecture based on the direct 2-D approach and the rotation technique. The computational complexity is reduced by taking advantage of the special attribute of complex numbers. Unlike other direct approach, the proposed architecture is regular, hence, it is suitable for VLSI implementation.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128001495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-10-30DOI: 10.1109/VLSISP.1996.558359
Shan-Hsi Huang, J. Rabaey
This paper proposes a framework aimed at the optimization of speed, area, or power consumption of custom ASIC DSP designs through algorithmic transformations. This framework systematically selects and orders transformations for optimization. The methodology behind the framework combines bottleneck analysis (why the transformations should be applied), transformation ordering (the order in which the transformations are applied), algorithm partitioning (which parts of an algorithm should be transformed), transformation analysis/selection (which transformations to apply), and transformation execution (how to apply the selected transformations). Assisted by this framework, designers can easily and quickly exploit a variety of optimizing transformations to explore the algorithmic design space to reach better designs.
{"title":"An integrated framework for optimizing transformations","authors":"Shan-Hsi Huang, J. Rabaey","doi":"10.1109/VLSISP.1996.558359","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558359","url":null,"abstract":"This paper proposes a framework aimed at the optimization of speed, area, or power consumption of custom ASIC DSP designs through algorithmic transformations. This framework systematically selects and orders transformations for optimization. The methodology behind the framework combines bottleneck analysis (why the transformations should be applied), transformation ordering (the order in which the transformations are applied), algorithm partitioning (which parts of an algorithm should be transformed), transformation analysis/selection (which transformations to apply), and transformation execution (how to apply the selected transformations). Assisted by this framework, designers can easily and quickly exploit a variety of optimizing transformations to explore the algorithmic design space to reach better designs.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128060273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-10-30DOI: 10.1109/VLSISP.1996.558365
Mohammad Javad Omidi, P. Gulak, S. Pasupathy
New parallel structures are proposed for joint data and channel estimation over frequency selective Rayleigh fading channels. Maximum likelihood sequence estimation (MLSE) is implemented using the per-survivor processing (PSP) method. The Kalman filter and the recursive least squares (RLS) algorithm are considered as estimation methods. A square-root implementation of the Kalman filter is discussed. The algorithm used for the measurement update in the Kalman filter results in significant simplicity, once it is used for realization of the RLS algorithm. Two parallel and pipelined architectures are introduced for the RLS algorithm, and an overall architecture is proposed to implement the MLSE receiver, combining the Viterbi decoder and the channel estimator.
{"title":"Parallel structures for joint channel estimation and data detection over fading channels","authors":"Mohammad Javad Omidi, P. Gulak, S. Pasupathy","doi":"10.1109/VLSISP.1996.558365","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558365","url":null,"abstract":"New parallel structures are proposed for joint data and channel estimation over frequency selective Rayleigh fading channels. Maximum likelihood sequence estimation (MLSE) is implemented using the per-survivor processing (PSP) method. The Kalman filter and the recursive least squares (RLS) algorithm are considered as estimation methods. A square-root implementation of the Kalman filter is discussed. The algorithm used for the measurement update in the Kalman filter results in significant simplicity, once it is used for realization of the RLS algorithm. Two parallel and pipelined architectures are introduced for the RLS algorithm, and an overall architecture is proposed to implement the MLSE receiver, combining the Viterbi decoder and the channel estimator.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"216 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130332103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-10-30DOI: 10.1109/VLSISP.1996.558346
A. Wang, K. Yao, R. E. Hudson, D. Korompis, F. Lorenzelli, S. Soli, S. Gao
For various audio, teleconference, hearing aid, and voice recognition applications, a microphone array is known to be an effective method to enhance the SNR in noisy environments resulting in significant improvement of speech intelligibility or recognition. We propose a novel electronically steerable microphone array based on the maximum energy (ME) concentration criterion to form a focused beam toward the desired speech source, attenuating background noises and rejecting discrete spatial interferers. The design and implementation of a prototype DSP-based microphone array system are described. Details on microphone measurement, calibration, and optimization needed to achieve a high performance microphone array are discussed. Computer simulated and measured array performance are presented.
{"title":"Calibration, optimization, and DSP implementation of microphone array for speech processing","authors":"A. Wang, K. Yao, R. E. Hudson, D. Korompis, F. Lorenzelli, S. Soli, S. Gao","doi":"10.1109/VLSISP.1996.558346","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558346","url":null,"abstract":"For various audio, teleconference, hearing aid, and voice recognition applications, a microphone array is known to be an effective method to enhance the SNR in noisy environments resulting in significant improvement of speech intelligibility or recognition. We propose a novel electronically steerable microphone array based on the maximum energy (ME) concentration criterion to form a focused beam toward the desired speech source, attenuating background noises and rejecting discrete spatial interferers. The design and implementation of a prototype DSP-based microphone array system are described. Details on microphone measurement, calibration, and optimization needed to achieve a high performance microphone array are discussed. Computer simulated and measured array performance are presented.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133980444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}