Pub Date : 1999-08-23DOI: 10.1109/APASIC.1999.824078
K. Byun, Minsoo Hahn, Kyung-Su Kim
In this paper an efficient implementation of a 13 kbps QCELP vocoder ASIC having a speech compression function used in the digital mobile communication is presented The 13 kbps QCELP algorithm has better quality than 8 kbps one, but it requires much more computation. Especially, the complexity load of the pitch and codebook search process for speech synthesis is predominant. We propose an optimized routine for convolution computation by utilizing pipeline structure characteristics of the DSP. Our DSP, specifically designed for vocoder applications, is a 16-bit fixed-point one. We adopt RISC type instruction set, distributed decoding, alternative program fetch, dual bank memory structure, and repeat loop without loss in order to reduce the power consumption and to obtain fast operating capability while keeping the chip size small. The concurrent development of the DSP and the QCELP assembly code enables us to optimize the assembly code more successfully than adopting other general-purpose DSP chips.
{"title":"Implementation of 13 kbps QCELP vocoder ASIC","authors":"K. Byun, Minsoo Hahn, Kyung-Su Kim","doi":"10.1109/APASIC.1999.824078","DOIUrl":"https://doi.org/10.1109/APASIC.1999.824078","url":null,"abstract":"In this paper an efficient implementation of a 13 kbps QCELP vocoder ASIC having a speech compression function used in the digital mobile communication is presented The 13 kbps QCELP algorithm has better quality than 8 kbps one, but it requires much more computation. Especially, the complexity load of the pitch and codebook search process for speech synthesis is predominant. We propose an optimized routine for convolution computation by utilizing pipeline structure characteristics of the DSP. Our DSP, specifically designed for vocoder applications, is a 16-bit fixed-point one. We adopt RISC type instruction set, distributed decoding, alternative program fetch, dual bank memory structure, and repeat loop without loss in order to reduce the power consumption and to obtain fast operating capability while keeping the chip size small. The concurrent development of the DSP and the QCELP assembly code enables us to optimize the assembly code more successfully than adopting other general-purpose DSP chips.","PeriodicalId":346808,"journal":{"name":"AP-ASIC'99. First IEEE Asia Pacific Conference on ASICs (Cat. No.99EX360)","volume":"79 3-4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134289412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-08-23DOI: 10.1109/APASIC.1999.824079
Se Ho Park, Dong Hwan Kim, D. Han, Kyu Lee, S. Park, J. Choi
In this paper we propose an implementation method for a single-chip 8192 complex point FFT in terms of sequential data processing. In order to reduce the required chip area for the sequential processing of 8 K complex data, a DRAM-like pipelined commutator architecture is used. The 16-point FFT is a basic building block of the entire FFT chip, and the 8192-point FFT consists of the cascaded blocks with six stages of radix-4 and one stage of radix-2. Since each stage requires rounding of the resulting bits while maintaining the proper S/N ratio, the convergent block floating point (CBFP) algorithm is used for the effective internal bit rounding. As a result the proposed structure brings about the 55% chip size reduction compared with conventional approach.
{"title":"Sequential design of a 8192 complex point FFT in OFDM receiver","authors":"Se Ho Park, Dong Hwan Kim, D. Han, Kyu Lee, S. Park, J. Choi","doi":"10.1109/APASIC.1999.824079","DOIUrl":"https://doi.org/10.1109/APASIC.1999.824079","url":null,"abstract":"In this paper we propose an implementation method for a single-chip 8192 complex point FFT in terms of sequential data processing. In order to reduce the required chip area for the sequential processing of 8 K complex data, a DRAM-like pipelined commutator architecture is used. The 16-point FFT is a basic building block of the entire FFT chip, and the 8192-point FFT consists of the cascaded blocks with six stages of radix-4 and one stage of radix-2. Since each stage requires rounding of the resulting bits while maintaining the proper S/N ratio, the convergent block floating point (CBFP) algorithm is used for the effective internal bit rounding. As a result the proposed structure brings about the 55% chip size reduction compared with conventional approach.","PeriodicalId":346808,"journal":{"name":"AP-ASIC'99. First IEEE Asia Pacific Conference on ASICs (Cat. No.99EX360)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133190926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-08-23DOI: 10.1109/APASIC.1999.824095
G. Jeong, M. Lee, B. Lee, K. Park
This paper describes the architecture of a reconfigurable shared buffer asynchronous transfer mode (ATM) switch and its VLSI implementation. The reconfigurable shared buffer ATM switch on one chip has a shared buffer of 4 ns scalable pipelined memory. It solves the restriction of memory cycle time in a shared buffer ATM switch, and supports flexible switching performance by the scalability of the embedded buffer. The proposed switch provides port size scalability with the independence of queue address control from buffer memory control. The switch size and the buffer size of the proposed ATM switch can be reconfigured without serious circuit redesign. Prototype chip has been designed for 4/spl times/4 ATM switch that has a shared buffer of 128-cell. It is integrated in 10.6/spl times/10.6 mm/sup 2/ with 0.6 /spl mu/m twin well, double-metal, and single-poly CMOS technology. Simulated operating frequency is 80 MHz which supports 640 Mbps per port.
{"title":"Reconfigurable shared buffer ATM switch","authors":"G. Jeong, M. Lee, B. Lee, K. Park","doi":"10.1109/APASIC.1999.824095","DOIUrl":"https://doi.org/10.1109/APASIC.1999.824095","url":null,"abstract":"This paper describes the architecture of a reconfigurable shared buffer asynchronous transfer mode (ATM) switch and its VLSI implementation. The reconfigurable shared buffer ATM switch on one chip has a shared buffer of 4 ns scalable pipelined memory. It solves the restriction of memory cycle time in a shared buffer ATM switch, and supports flexible switching performance by the scalability of the embedded buffer. The proposed switch provides port size scalability with the independence of queue address control from buffer memory control. The switch size and the buffer size of the proposed ATM switch can be reconfigured without serious circuit redesign. Prototype chip has been designed for 4/spl times/4 ATM switch that has a shared buffer of 128-cell. It is integrated in 10.6/spl times/10.6 mm/sup 2/ with 0.6 /spl mu/m twin well, double-metal, and single-poly CMOS technology. Simulated operating frequency is 80 MHz which supports 640 Mbps per port.","PeriodicalId":346808,"journal":{"name":"AP-ASIC'99. First IEEE Asia Pacific Conference on ASICs (Cat. No.99EX360)","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124564623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-08-23DOI: 10.1109/APASIC.1999.824046
Kyeounsoo Kim, P. Beerel
This paper proposes a high-performance low-power asynchronous architecture for matrix-vector multipliers of a constant matrix by a vector which are typically used in discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) applications. The architecture takes advantage of the statistics of DCT and IDCT data that suggest that the input data have mostly zero or small values. It avoids unnecessary arithmetic operations by quickly terminating multiplication by zero and significantly reduces power and delay when operating on a small-valued data by adaptively controlling effective word lengths using fine-grain bit-partitioning and speculative completion sensing.
{"title":"A high-performance low-power asynchronous matrix-vector multiplier for discrete cosine transform","authors":"Kyeounsoo Kim, P. Beerel","doi":"10.1109/APASIC.1999.824046","DOIUrl":"https://doi.org/10.1109/APASIC.1999.824046","url":null,"abstract":"This paper proposes a high-performance low-power asynchronous architecture for matrix-vector multipliers of a constant matrix by a vector which are typically used in discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) applications. The architecture takes advantage of the statistics of DCT and IDCT data that suggest that the input data have mostly zero or small values. It avoids unnecessary arithmetic operations by quickly terminating multiplication by zero and significantly reduces power and delay when operating on a small-valued data by adaptively controlling effective word lengths using fine-grain bit-partitioning and speculative completion sensing.","PeriodicalId":346808,"journal":{"name":"AP-ASIC'99. First IEEE Asia Pacific Conference on ASICs (Cat. No.99EX360)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115295190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-08-23DOI: 10.1109/APASIC.1999.824053
Bo-Sung Kim, Jun-Dong Cho
This paper presents a new VLSI architecture of the motion estimation in MPEG-2. Previously various full search block matching algorithms (BMA) and architectures using systolic array have been proposed for motion estimation. However, the architectures have inefficiently a large number of external memory access. Our new architecture efficiently reuses data to decrease external memory accesses and saves the computational time by using a parallel algorithm.
{"title":"VLSI architecture for low power motion estimation using high data access reuse","authors":"Bo-Sung Kim, Jun-Dong Cho","doi":"10.1109/APASIC.1999.824053","DOIUrl":"https://doi.org/10.1109/APASIC.1999.824053","url":null,"abstract":"This paper presents a new VLSI architecture of the motion estimation in MPEG-2. Previously various full search block matching algorithms (BMA) and architectures using systolic array have been proposed for motion estimation. However, the architectures have inefficiently a large number of external memory access. Our new architecture efficiently reuses data to decrease external memory accesses and saves the computational time by using a parallel algorithm.","PeriodicalId":346808,"journal":{"name":"AP-ASIC'99. First IEEE Asia Pacific Conference on ASICs (Cat. No.99EX360)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127337326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-08-23DOI: 10.1109/APASIC.1999.824077
T. Suzuki, S. Tomiyama
In this paper, we use the Fourier series algorithm in the design of filter. But we don't use the window functions. Instead, we use the genetic algorithm (GA) on the discrete cosine transform (DCT) of a given band-pass characteristics. This paper's method is useful on an ASIC equalizer (filter) design through the number of the coefficient is reduced. Indeed, an example shows that the proposed method can offer 1/2 decreased high order Fourier coefficients for a given band-pass characteristics. These coefficients are compressed in 1/2.
{"title":"Extrapolation for band-pass characteristics by using genetic algorithm on the DCT","authors":"T. Suzuki, S. Tomiyama","doi":"10.1109/APASIC.1999.824077","DOIUrl":"https://doi.org/10.1109/APASIC.1999.824077","url":null,"abstract":"In this paper, we use the Fourier series algorithm in the design of filter. But we don't use the window functions. Instead, we use the genetic algorithm (GA) on the discrete cosine transform (DCT) of a given band-pass characteristics. This paper's method is useful on an ASIC equalizer (filter) design through the number of the coefficient is reduced. Indeed, an example shows that the proposed method can offer 1/2 decreased high order Fourier coefficients for a given band-pass characteristics. These coefficients are compressed in 1/2.","PeriodicalId":346808,"journal":{"name":"AP-ASIC'99. First IEEE Asia Pacific Conference on ASICs (Cat. No.99EX360)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126919938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-08-23DOI: 10.1109/APASIC.1999.824068
N. Shimizu, D. Mitake
The deviation of the memory latency is hard to be predicted for in software, especially on the SMP or NUMA systems. As a hardware correspondent method, the multi-thread processor has been devised. However, it is difficult to improve the processor performance with a single program. We have proposed SCALT that uses a buffer in a software context. For the deviation of a latency problem, we have proposed a instruction to check the data arrival existence in a buffer. This paper describes the SCALT, which uses a buffer check instruction, and its performance evaluation results, obtained analyzing the SMP system through event-driven simulation.
{"title":"Scalable latency tolerant architecture (SCALT) and its evaluation","authors":"N. Shimizu, D. Mitake","doi":"10.1109/APASIC.1999.824068","DOIUrl":"https://doi.org/10.1109/APASIC.1999.824068","url":null,"abstract":"The deviation of the memory latency is hard to be predicted for in software, especially on the SMP or NUMA systems. As a hardware correspondent method, the multi-thread processor has been devised. However, it is difficult to improve the processor performance with a single program. We have proposed SCALT that uses a buffer in a software context. For the deviation of a latency problem, we have proposed a instruction to check the data arrival existence in a buffer. This paper describes the SCALT, which uses a buffer check instruction, and its performance evaluation results, obtained analyzing the SMP system through event-driven simulation.","PeriodicalId":346808,"journal":{"name":"AP-ASIC'99. First IEEE Asia Pacific Conference on ASICs (Cat. No.99EX360)","volume":"188 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126031345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-08-23DOI: 10.1109/APASIC.1999.824076
C. Choi, Hwa-Hyun Cho, J. Chae, Jin-Sung Park, Byong-Heon Kwon, Myung-Ryul Choi
We present an image processor for SXGA (super extended graphics array, 1280/spl times/1024)/UXGA (ultra XGA, 1600/spl times/1200) FPD (flat panel display) such as TFT (thin film transistor) LCD (liquid crystal display) and PDP (plasma display panel). The proposed image processor can display the full screen of a FPD with lower or higher resolution of video sources such as NTSC, VGA, SVGA, XGA, SXGA, and UXGA by means of a new interpolation and decimation filters. Also, in order to improve an image quality of a FPD we present some video processing techniques such as /spl gamma/(gamma)-correction, contrast control, and edge enhancement. We have simulated the proposed interpolation and decimation algorithm and compared the results of the proposed algorithms with those of other conventional algorithms quantitatively by calculating PSNR (peak signal noise ratio). We have also simulated the proposed video processing techniques and compared the results by visual test. And we have designed the proposed image processor by VHDL and verified it by functional and timing simulation.
{"title":"An image processor for SXGA/UXGA FPD","authors":"C. Choi, Hwa-Hyun Cho, J. Chae, Jin-Sung Park, Byong-Heon Kwon, Myung-Ryul Choi","doi":"10.1109/APASIC.1999.824076","DOIUrl":"https://doi.org/10.1109/APASIC.1999.824076","url":null,"abstract":"We present an image processor for SXGA (super extended graphics array, 1280/spl times/1024)/UXGA (ultra XGA, 1600/spl times/1200) FPD (flat panel display) such as TFT (thin film transistor) LCD (liquid crystal display) and PDP (plasma display panel). The proposed image processor can display the full screen of a FPD with lower or higher resolution of video sources such as NTSC, VGA, SVGA, XGA, SXGA, and UXGA by means of a new interpolation and decimation filters. Also, in order to improve an image quality of a FPD we present some video processing techniques such as /spl gamma/(gamma)-correction, contrast control, and edge enhancement. We have simulated the proposed interpolation and decimation algorithm and compared the results of the proposed algorithms with those of other conventional algorithms quantitatively by calculating PSNR (peak signal noise ratio). We have also simulated the proposed video processing techniques and compared the results by visual test. And we have designed the proposed image processor by VHDL and verified it by functional and timing simulation.","PeriodicalId":346808,"journal":{"name":"AP-ASIC'99. First IEEE Asia Pacific Conference on ASICs (Cat. No.99EX360)","volume":"82 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126067386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-08-23DOI: 10.1109/APASIC.1999.824092
Byung-Gu Choi, Yoon-Seok Chang, Dong-Wook Kim
At present, BIST is a major test strategy with features of automatic test and possibility of at-speed test. But BIST has significant problems for hardware overhead and consumes impractical test time (test length); in the case of CUT it has a large number of primary inputs. We proposed a new method called input grouping which is helpful to reduce test length for BIST application. This method partitions inputs by considering nodal connectivity with respect to internal nodes. To achieve this purpose we proposed some definitions for test points, conditions for a node to be a test point, and a procedure to find test points in a given circuits. The test points were applied to form a BIST structure to reduce the test time. The experimental result showed that BIST TPGs based on this method achieves tremendous reduction in test time compared to the case using pseudorandom patterns for various example circuits.
{"title":"Input grouping method considering nodal connectivity for BIST test time reduction","authors":"Byung-Gu Choi, Yoon-Seok Chang, Dong-Wook Kim","doi":"10.1109/APASIC.1999.824092","DOIUrl":"https://doi.org/10.1109/APASIC.1999.824092","url":null,"abstract":"At present, BIST is a major test strategy with features of automatic test and possibility of at-speed test. But BIST has significant problems for hardware overhead and consumes impractical test time (test length); in the case of CUT it has a large number of primary inputs. We proposed a new method called input grouping which is helpful to reduce test length for BIST application. This method partitions inputs by considering nodal connectivity with respect to internal nodes. To achieve this purpose we proposed some definitions for test points, conditions for a node to be a test point, and a procedure to find test points in a given circuits. The test points were applied to form a BIST structure to reduce the test time. The experimental result showed that BIST TPGs based on this method achieves tremendous reduction in test time compared to the case using pseudorandom patterns for various example circuits.","PeriodicalId":346808,"journal":{"name":"AP-ASIC'99. First IEEE Asia Pacific Conference on ASICs (Cat. No.99EX360)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121930992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-08-23DOI: 10.1109/APASIC.1999.824090
W. Ke, Khoan Truong
This paper describes a design-for-testability (DFT) methodology for an application-oriented platform-based design environment, which reuses test-ready virtual components (VCs) and integrates them using a set of predefined guidelines and practices. We focus on introducing the concept of the proposed methodology with examples for demonstrating some of the techniques and issues.
{"title":"Design with testability for a platform-based SoC design methodology","authors":"W. Ke, Khoan Truong","doi":"10.1109/APASIC.1999.824090","DOIUrl":"https://doi.org/10.1109/APASIC.1999.824090","url":null,"abstract":"This paper describes a design-for-testability (DFT) methodology for an application-oriented platform-based design environment, which reuses test-ready virtual components (VCs) and integrates them using a set of predefined guidelines and practices. We focus on introducing the concept of the proposed methodology with examples for demonstrating some of the techniques and issues.","PeriodicalId":346808,"journal":{"name":"AP-ASIC'99. First IEEE Asia Pacific Conference on ASICs (Cat. No.99EX360)","volume":"230 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122620496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}