Pub Date : 1996-10-30DOI: 10.1109/VLSISP.1996.558372
Seehyun Kim, Wonyong Sung
The two dimensional discrete cosine transform (DCT) has been used widely for various image and video processing standards. Efficient implementation of the algorithm requires fixed-point arithmetic, which may result in a noticeable mismatch between the encoder and the decoder. The finite wordlength effects of a distributed arithmetic based 8/spl times/8 2D-IDCT (inverse discrete cosine transform) are analytically modeled. In order to accurately model the implementation hardware, the ensemble average of integer domain fixed-point errors after rounding is evaluated not only by calculating the mean and the variance but by considering the statistical distribution as well. Based on the error model, a set of optimum wordlengths conforming to the IEEE specifications is determined. There is a close agreement between the model and the bit-accurate simulation results.
{"title":"Fixed-point error analysis and wordlength optimization of a distributed arithmetic based 8/spl times/8 2D-IDCT architecture","authors":"Seehyun Kim, Wonyong Sung","doi":"10.1109/VLSISP.1996.558372","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558372","url":null,"abstract":"The two dimensional discrete cosine transform (DCT) has been used widely for various image and video processing standards. Efficient implementation of the algorithm requires fixed-point arithmetic, which may result in a noticeable mismatch between the encoder and the decoder. The finite wordlength effects of a distributed arithmetic based 8/spl times/8 2D-IDCT (inverse discrete cosine transform) are analytically modeled. In order to accurately model the implementation hardware, the ensemble average of integer domain fixed-point errors after rounding is evaluated not only by calculating the mean and the variance but by considering the statistical distribution as well. Based on the error model, a set of optimum wordlengths conforming to the IEEE specifications is determined. There is a close agreement between the model and the bit-accurate simulation results.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131353406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-10-30DOI: 10.1109/VLSISP.1996.558299
W. Drescher, G.P. Fettweis
Finite field arithmetic plays an important role in coding theory, cryptography and their applications. Several hardware solutions using finite field arithmetic have already been developed but none of them are user programmable. This is probably one reason why BCH codes are not commonly used in mobile communication applications even though these codes have very desirable properties regarding burst error correction. This article presents architectures for multiplication in GF(2/sup m/) applicable to digital signal processors. First a method is proposed to build an array of gates for hardware multiplication in GF(2/sup m/). Then an approach is shown that combines the hardware of a typical standard binary arithmetic multiplier with a GF(2/sup m/) multiplier. Using this approach saves a considerable number of gates and decreases the bus load while increasing the latency of the standard binary multiplier unit only marginally. Finally, a solution of a combined 17/spl times/17 integer/GF(2/sup m/spl les/8/) multiplier is presented and discussed.
{"title":"VLSI architectures for multiplication in GF(2/sup m/) for application tailored digital signal processors","authors":"W. Drescher, G.P. Fettweis","doi":"10.1109/VLSISP.1996.558299","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558299","url":null,"abstract":"Finite field arithmetic plays an important role in coding theory, cryptography and their applications. Several hardware solutions using finite field arithmetic have already been developed but none of them are user programmable. This is probably one reason why BCH codes are not commonly used in mobile communication applications even though these codes have very desirable properties regarding burst error correction. This article presents architectures for multiplication in GF(2/sup m/) applicable to digital signal processors. First a method is proposed to build an array of gates for hardware multiplication in GF(2/sup m/). Then an approach is shown that combines the hardware of a typical standard binary arithmetic multiplier with a GF(2/sup m/) multiplier. Using this approach saves a considerable number of gates and decreases the bus load while increasing the latency of the standard binary multiplier unit only marginally. Finally, a solution of a combined 17/spl times/17 integer/GF(2/sup m/spl les/8/) multiplier is presented and discussed.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121312514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-10-30DOI: 10.1109/VLSISP.1996.558363
M.-D. Doan, M. Glesner
The paper presents an array architecture for rapid prototyping of mechatronic algorithms. The requirements for high throughput of arbitrary irregular real-time algorithms are supported by adopting the data-driven principle, exploiting the implicit fine grain parallelism, providing a high degree of scalability, and offering large flexibility in system configuration. Interconnection between neighboring processing elements of the array is implemented by a static hardware controlled network, whereas communication between spatial separated elements is provided by two dynamic global networks. Besides an overview of the architecture design, an algorithm mapping example illustrates implementation of a time-critical mechatronic application using the novel wavefront mapping algorithm.
{"title":"A parallel architecture for rapid prototyping of mechatronic algorithms by exploiting implicit fine-grain parallelism","authors":"M.-D. Doan, M. Glesner","doi":"10.1109/VLSISP.1996.558363","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558363","url":null,"abstract":"The paper presents an array architecture for rapid prototyping of mechatronic algorithms. The requirements for high throughput of arbitrary irregular real-time algorithms are supported by adopting the data-driven principle, exploiting the implicit fine grain parallelism, providing a high degree of scalability, and offering large flexibility in system configuration. Interconnection between neighboring processing elements of the array is implemented by a static hardware controlled network, whereas communication between spatial separated elements is provided by two dynamic global networks. Besides an overview of the architecture design, an algorithm mapping example illustrates implementation of a time-critical mechatronic application using the novel wavefront mapping algorithm.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115792975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-10-30DOI: 10.1109/VLSISP.1996.558355
S. Bitterlich, H. Meyr
All the commercially available logic-synthesis tools currently use only (non-redundant) binary and two's complement number representations for representing the results of arithmetic operators. We analyze and compare silicon real-estate and throughput of word-parallel arithmetic circuits (add and shift type arithmetic) based on various redundant number representations and compare these results with the automatically optimized two's complement implementations. The literature on redundant number representations typically recommends radix-4 arithmetic for full-custom or a traditional semi-custom design style. We show that the radix-4 implementation is often not optimal for a logic-synthesis based semi-custom design style. Instead, a high-radix or a mixed-radix implementation (which we derive) should be considered.
{"title":"Logic synthesis of binary, carry-save and mixed-radix arithmetic for digital signal processing","authors":"S. Bitterlich, H. Meyr","doi":"10.1109/VLSISP.1996.558355","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558355","url":null,"abstract":"All the commercially available logic-synthesis tools currently use only (non-redundant) binary and two's complement number representations for representing the results of arithmetic operators. We analyze and compare silicon real-estate and throughput of word-parallel arithmetic circuits (add and shift type arithmetic) based on various redundant number representations and compare these results with the automatically optimized two's complement implementations. The literature on redundant number representations typically recommends radix-4 arithmetic for full-custom or a traditional semi-custom design style. We show that the radix-4 implementation is often not optimal for a logic-synthesis based semi-custom design style. Instead, a high-radix or a mixed-radix implementation (which we derive) should be considered.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121096107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-10-30DOI: 10.1109/VLSISP.1996.558322
R. Saied, C. Chakrabarti
The increasing demand for portable electronics has caused power consumption to be a critical issue in the design process. Reducing the total power consumption in portable systems is important in order to maximize the run time with minimum requirements in size and weight of the batteries. Power consumption in memory-intensive operations can be reduced by minimizing the number of memory accesses. We describe two scheduling schemes under fixed hardware resource constraints which reduce the number of memory accesses by minimizing the number of intermediate variables that need to be stored. While the first scheme achieves this by post order traversal of the DFG, the second scheme achieves this by judiciously delaying the scheduling of some of the nodes. Experimental results show that these schemes require significantly fewer memory accesses compared to existing scheduling schemes.
{"title":"Scheduling for minimizing the number of memory accesses in low power applications","authors":"R. Saied, C. Chakrabarti","doi":"10.1109/VLSISP.1996.558322","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558322","url":null,"abstract":"The increasing demand for portable electronics has caused power consumption to be a critical issue in the design process. Reducing the total power consumption in portable systems is important in order to maximize the run time with minimum requirements in size and weight of the batteries. Power consumption in memory-intensive operations can be reduced by minimizing the number of memory accesses. We describe two scheduling schemes under fixed hardware resource constraints which reduce the number of memory accesses by minimizing the number of intermediate variables that need to be stored. While the first scheme achieves this by post order traversal of the DFG, the second scheme achieves this by judiciously delaying the scheduling of some of the nodes. Experimental results show that these schemes require significantly fewer memory accesses compared to existing scheduling schemes.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134296082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-10-30DOI: 10.1109/VLSISP.1996.558374
K. Nakamura, M. Kurokawa, A. Hashiguchi, M. Kanou, K. Aoyama, H. Okuda, S. Iwase, T. Yamazaki
This paper describes the special architecture of the linear array DSP and design methodology for the application to convert sampling rate of the video signals. This methodology allows us to develop a detailed DSP application code for a given sampling conversion rate. Compared to the ASIC implementation of sampling rate conversion, the required time for implementation is drastically reduced. An example of conversion from HDTV to SDTV (wide) is given.
{"title":"Video DSP architecture and its application design methodology for sampling rate conversion","authors":"K. Nakamura, M. Kurokawa, A. Hashiguchi, M. Kanou, K. Aoyama, H. Okuda, S. Iwase, T. Yamazaki","doi":"10.1109/VLSISP.1996.558374","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558374","url":null,"abstract":"This paper describes the special architecture of the linear array DSP and design methodology for the application to convert sampling rate of the video signals. This methodology allows us to develop a detailed DSP application code for a given sampling conversion rate. Compared to the ASIC implementation of sampling rate conversion, the required time for implementation is drastically reduced. An example of conversion from HDTV to SDTV (wide) is given.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129429626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-10-30DOI: 10.1109/VLSISP.1996.558364
P. Schaumont, S. Vernalde, M. Engels, I. Bolsens
Traditionally, the digital implementation of modems is restricted to parts operating at the baseband frequency. At higher frequencies, roughly 30 MHz and beyond, analog technologies such as SAW filters provide a better power/performance figure. We show how this barrier can be broken by trading programmability for speed. Using a digital multirate filter structure that offers combined interpolation and frequency shifting, an area- and power efficient digital upconversion is achieved.
{"title":"Digital upconversion architecture for quadrature modulators","authors":"P. Schaumont, S. Vernalde, M. Engels, I. Bolsens","doi":"10.1109/VLSISP.1996.558364","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558364","url":null,"abstract":"Traditionally, the digital implementation of modems is restricted to parts operating at the baseband frequency. At higher frequencies, roughly 30 MHz and beyond, analog technologies such as SAW filters provide a better power/performance figure. We show how this barrier can be broken by trading programmability for speed. Using a digital multirate filter structure that offers combined interpolation and frequency shifting, an area- and power efficient digital upconversion is achieved.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121696644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-10-30DOI: 10.1109/VLSISP.1996.558272
B. Girod, K. B. Younes, N. Faerber, E. Steinbach
Mobile channels cannot provide guaranteed quality of service parameters. We illustrate this problem with a ray tracing simulation of an indoor DECT channel. Unreliable transmission poses severe problems for the transmission of motion video, using compression schemes such as described in the ITU-T H.263 international standard due to temporal error propagation. We discuss compatible extensions of H.263 that utilize a feedback channel for robust transmission. In conjunction with FEC and ARQ, an intelligent source coder control can provide excellent robustness at bit error rates worse than 10/sup -2/.
{"title":"Recent advances in mobile video communications","authors":"B. Girod, K. B. Younes, N. Faerber, E. Steinbach","doi":"10.1109/VLSISP.1996.558272","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558272","url":null,"abstract":"Mobile channels cannot provide guaranteed quality of service parameters. We illustrate this problem with a ray tracing simulation of an indoor DECT channel. Unreliable transmission poses severe problems for the transmission of motion video, using compression schemes such as described in the ITU-T H.263 international standard due to temporal error propagation. We discuss compatible extensions of H.263 that utilize a feedback channel for robust transmission. In conjunction with FEC and ARQ, an intelligent source coder control can provide excellent robustness at bit error rates worse than 10/sup -2/.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128961495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-10-30DOI: 10.1109/VLSISP.1996.558358
S. Tongsima, C. Chantrapornchai, E. Sha, N. Passos
Computation intensive DSP applications usually require a parallel/pipelined processor in order to achieve specific timing requirements. Data hazards are a major obstacle against the high performance of pipelined systems. This paper presents a novel efficient loop scheduling algorithm that reduces data hazards for those DSP applications. Such an algorithm has been embedded in a tool, called SHARP, which schedules a pipelined data flow graph to multiple pipelined units, while hiding the underlying data hazards and minimizing the execution time. This paper reports significant improvement for some well-known benchmarks, showing the efficiency of the scheduling algorithm and the flexibility of the simulation tool.
{"title":"SHARP: efficient loop scheduling with data hazard reduction on multiple pipeline DSP systems","authors":"S. Tongsima, C. Chantrapornchai, E. Sha, N. Passos","doi":"10.1109/VLSISP.1996.558358","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558358","url":null,"abstract":"Computation intensive DSP applications usually require a parallel/pipelined processor in order to achieve specific timing requirements. Data hazards are a major obstacle against the high performance of pipelined systems. This paper presents a novel efficient loop scheduling algorithm that reduces data hazards for those DSP applications. Such an algorithm has been embedded in a tool, called SHARP, which schedules a pipelined data flow graph to multiple pipelined units, while hiding the underlying data hazards and minimizing the execution time. This paper reports significant improvement for some well-known benchmarks, showing the efficiency of the scheduling algorithm and the flexibility of the simulation tool.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127570836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-10-30DOI: 10.1109/VLSISP.1996.558366
T. Kamada, T. Fukuoka, Y. Nakai, Y. Nakakura, K. Ueda, K. Ota, T. Shiomi, Y. Fukumoto
A new channel decoder LSI, which will be used in digital satellite TV broadcasting set-top boxes, has been designed. This LSI's functions include AD/DA conversion, QPSK demodulation, Viterbi decoding, frame synchronization, convolutional deinterleaving, Reed-Solomon (RS) decoding, and descrambling. We use a new method for Viterbi decoding called the tracking survivor state information (TSSI) method, which not only reduces power consumption, but also solves the problem of increasing memory size. To reduce the size of the RS decoder circuit, we used a three-stage-pipeline structure as well as designed a new architecture to realize the Euclid algorithm. This device has been fabricated in a 0.35 /spl mu/m 3-metal CMOS standard cell-based process and is composed of 670 K transistors. We describe the TSSI method of the Viterbi decoder and the Reed-Solomon decoder's new 3-stage pipeline architecture.
{"title":"An area effective standard cell based channel decoder LSI for digital satellite TV broadcasting","authors":"T. Kamada, T. Fukuoka, Y. Nakai, Y. Nakakura, K. Ueda, K. Ota, T. Shiomi, Y. Fukumoto","doi":"10.1109/VLSISP.1996.558366","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558366","url":null,"abstract":"A new channel decoder LSI, which will be used in digital satellite TV broadcasting set-top boxes, has been designed. This LSI's functions include AD/DA conversion, QPSK demodulation, Viterbi decoding, frame synchronization, convolutional deinterleaving, Reed-Solomon (RS) decoding, and descrambling. We use a new method for Viterbi decoding called the tracking survivor state information (TSSI) method, which not only reduces power consumption, but also solves the problem of increasing memory size. To reduce the size of the RS decoder circuit, we used a three-stage-pipeline structure as well as designed a new architecture to realize the Euclid algorithm. This device has been fabricated in a 0.35 /spl mu/m 3-metal CMOS standard cell-based process and is composed of 670 K transistors. We describe the TSSI method of the Viterbi decoder and the Reed-Solomon decoder's new 3-stage pipeline architecture.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117200919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}