Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606858
B. Wah, Yi Shang, Zhe Wu
In this paper, we present a new discrete Lagrangian optimization method for designing multiplierless QMF (quadrature mirror filter) filter banks. In multiplierless QMF filter banks, filter coefficients are powers-of-two (PO2) where numbers are represented as sums or differences of powers of two (also cabled Canonical Signed Digit-CSD-representation), and multiplications can be carried out as additions, subtractions and shifting. We formulate the design problem as a nonlinear discrete constrained optimization problem, using the reconstruction error as the objective, and other performance metrics as constraints. One of the major advantages of this formulation is that it allows us to search for designs that improve over the best existing designs with respect to all performance metrics, rather than finding designs that trade one performance metric for another. We show that our design method can find designs that improve over Johnston's benchmark designs using a maximum of three to six ONE bits in each filter coefficient.
{"title":"Discrete Lagrangian method for optimizing the design of multiplierless QMF filter banks","authors":"B. Wah, Yi Shang, Zhe Wu","doi":"10.1109/ASAP.1997.606858","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606858","url":null,"abstract":"In this paper, we present a new discrete Lagrangian optimization method for designing multiplierless QMF (quadrature mirror filter) filter banks. In multiplierless QMF filter banks, filter coefficients are powers-of-two (PO2) where numbers are represented as sums or differences of powers of two (also cabled Canonical Signed Digit-CSD-representation), and multiplications can be carried out as additions, subtractions and shifting. We formulate the design problem as a nonlinear discrete constrained optimization problem, using the reconstruction error as the objective, and other performance metrics as constraints. One of the major advantages of this formulation is that it allows us to search for designs that improve over the best existing designs with respect to all performance metrics, rather than finding designs that trade one performance metric for another. We show that our design method can find designs that improve over Johnston's benchmark designs using a maximum of three to six ONE bits in each filter coefficient.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124619548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606851
H. Dawid, Klaus-Jürgen Koch, J. Stahl
Due to the rapid increase in the system complexity of modern telecommunication products, two main challenges exist for a system design flow meeting the arising demands: 1) provide a platform for fast algorithmic and architectural design exploration and optimization from system to gate level, which guarantees high quality of results (QoR) and enables full and seamless design verification; 2) provide a platform for design reuse. In this paper, we show how a design flow based on fast system simulation, behavioral synthesis and power analysis is used for the commercial implementation of an ADPCM (Adaptive Differential Pulse Code Modulation) codec module in record time, simultaneously meeting all design constraints and creating a versatile system and HDL model ready for design reuse.
{"title":"ADPCM codec: from system level description to versatile HDL model","authors":"H. Dawid, Klaus-Jürgen Koch, J. Stahl","doi":"10.1109/ASAP.1997.606851","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606851","url":null,"abstract":"Due to the rapid increase in the system complexity of modern telecommunication products, two main challenges exist for a system design flow meeting the arising demands: 1) provide a platform for fast algorithmic and architectural design exploration and optimization from system to gate level, which guarantees high quality of results (QoR) and enables full and seamless design verification; 2) provide a platform for design reuse. In this paper, we show how a design flow based on fast system simulation, behavioral synthesis and power analysis is used for the commercial implementation of an ADPCM (Adaptive Differential Pulse Code Modulation) codec module in record time, simultaneously meeting all design constraints and creating a versatile system and HDL model ready for design reuse.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123371468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606852
G. Fettweis
Improvements in semiconductor integration density and the resulting problem of having to manage designs of increasing complexity is an old one, but still current. The new challenge lies in a new level of architecture heterogeneity, e.g. mixing hard-wired digital circuits with software programmed signal processors on one die. Hence, we are moving by one level of abstraction from semi-custom standard-cells to semi-custom 'block cells'. This results in a new dimension in the gap between algorithm/system design and architecture/circuit design, not addressed by any tools sufficiently yet today. This paper presents a method of analyzing the problem by orthogonalizing algorithms into data transfer and data manipulation, and carrying this over to the control and I/O design as well. This approach might be a promising basis for flexibly mapping the algorithms onto future 'block cell' designs, and furthermore for designing new system simulation tools which allow for tools to be integrated for a flexible mapping of algorithms onto various different hardware architecture domains.
{"title":"Design methodology for digital signal processing","authors":"G. Fettweis","doi":"10.1109/ASAP.1997.606852","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606852","url":null,"abstract":"Improvements in semiconductor integration density and the resulting problem of having to manage designs of increasing complexity is an old one, but still current. The new challenge lies in a new level of architecture heterogeneity, e.g. mixing hard-wired digital circuits with software programmed signal processors on one die. Hence, we are moving by one level of abstraction from semi-custom standard-cells to semi-custom 'block cells'. This results in a new dimension in the gap between algorithm/system design and architecture/circuit design, not addressed by any tools sufficiently yet today. This paper presents a method of analyzing the problem by orthogonalizing algorithms into data transfer and data manipulation, and carrying this over to the control and I/O design as well. This approach might be a promising basis for flexibly mapping the algorithms onto future 'block cell' designs, and furthermore for designing new system simulation tools which allow for tools to be integrated for a flexible mapping of algorithms onto various different hardware architecture domains.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127793022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606820
T. Lang, E. Antelo
CORDIC-based algorithms to compute cos/sup -1/(t), sin/sup -1/(t) and /spl radic/(1-t/sup 2/) are proposed. The implementation requires a standard CORDIC module plus a module to compute the direction of rotation, this being the same hardware required for the extended CORDIC vectoring, recently proposed by the authors. Although these functions can be obtained as a special case of this extended vectoring, the specific algorithm we propose here presents two significant improvements: (1) it achieves an angle granularity of 2/sup -n/ using the same datapath width as the standard CORDIC algorithm (about n bits, instead of about 2n which would be required using the extended veetoring), and (2) no repetitions of iterations are needed. The proposed algorithm is compatible with the extended vectoring and, in contrast with previous implementations, the number of iterations and the delay of each iteration are the same as for the conventional CORDIC algorithm.
{"title":"CORDIC-based computation of ArcCos and ArcSin","authors":"T. Lang, E. Antelo","doi":"10.1109/ASAP.1997.606820","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606820","url":null,"abstract":"CORDIC-based algorithms to compute cos/sup -1/(t), sin/sup -1/(t) and /spl radic/(1-t/sup 2/) are proposed. The implementation requires a standard CORDIC module plus a module to compute the direction of rotation, this being the same hardware required for the extended CORDIC vectoring, recently proposed by the authors. Although these functions can be obtained as a special case of this extended vectoring, the specific algorithm we propose here presents two significant improvements: (1) it achieves an angle granularity of 2/sup -n/ using the same datapath width as the standard CORDIC algorithm (about n bits, instead of about 2n which would be required using the extended veetoring), and (2) no repetitions of iterations are needed. The proposed algorithm is compatible with the extended vectoring and, in contrast with previous implementations, the number of iterations and the delay of each iteration are the same as for the conventional CORDIC algorithm.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130458685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606823
B. Haller, J. Götze, Joseph R. Cavallaro
In this paper we present practical techniques for implementing Givens rotations based on the well-known CORDIC algorithm. Rotations are the basic operation in many high performance adaptive filtering schemes as well as numerous other advanced signal processing algorithms relying on matrix decompositions. To improve the efficiency of these methods, we propose to use "approximate rotations", whereby only a few (i.e. r/spl Lt/b, where b is the operand word length) elementary angles of the original CORDIC sequence are applied, so as to reduce the total number of required shift add operations. This seamingly rather ad hoc and heuristic procedure constitutes a representative example of a very useful design concept termed "approximate signal processing" recently introduced and formally exposed by S.H. Nawab et al. (1997), concerning the trade-off between system performance and implementation complexity, i.e. between accuracy and resources. This is a subject of increasing importance with respect to the efficient realization of demanding signal processing tasks. We present the application of the described rotation schemes to QRD-RLS filtering in wireless communications, specifically high speed channel equalization and beamforming, i.e. for intersymbol and co-channel/interuser interference suppression, respectively. It is shown via computer simulations that the convergence behavior of the scheme using approximate Givens rotations is insensitive to the value of r, and that the misadjustment error decreases as r is increased, opening zip possibilities for "incremental refinement" strategies.
{"title":"Efficient implementation of rotation operations for high performance QRD-RLS filtering","authors":"B. Haller, J. Götze, Joseph R. Cavallaro","doi":"10.1109/ASAP.1997.606823","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606823","url":null,"abstract":"In this paper we present practical techniques for implementing Givens rotations based on the well-known CORDIC algorithm. Rotations are the basic operation in many high performance adaptive filtering schemes as well as numerous other advanced signal processing algorithms relying on matrix decompositions. To improve the efficiency of these methods, we propose to use \"approximate rotations\", whereby only a few (i.e. r/spl Lt/b, where b is the operand word length) elementary angles of the original CORDIC sequence are applied, so as to reduce the total number of required shift add operations. This seamingly rather ad hoc and heuristic procedure constitutes a representative example of a very useful design concept termed \"approximate signal processing\" recently introduced and formally exposed by S.H. Nawab et al. (1997), concerning the trade-off between system performance and implementation complexity, i.e. between accuracy and resources. This is a subject of increasing importance with respect to the efficient realization of demanding signal processing tasks. We present the application of the described rotation schemes to QRD-RLS filtering in wireless communications, specifically high speed channel equalization and beamforming, i.e. for intersymbol and co-channel/interuser interference suppression, respectively. It is shown via computer simulations that the convergence behavior of the scheme using approximate Givens rotations is insensitive to the value of r, and that the misadjustment error decreases as r is increased, opening zip possibilities for \"incremental refinement\" strategies.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131065305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606814
Yuan-Hau Yeh, Chen-Yi Lee
This paper presents how to find optimized buffer size for VLSI architectures of full-search block matching algorithms. Starting from the DG (dependency graph) analysis, we focus in the problem of reducing the internal buffer size under minimal I/O bandwidth constraint. As a result, a systematic design procedure for buffer optimization is derived to reduce realization cost.
{"title":"Buffer size optimization for full-search block matching algorithms","authors":"Yuan-Hau Yeh, Chen-Yi Lee","doi":"10.1109/ASAP.1997.606814","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606814","url":null,"abstract":"This paper presents how to find optimized buffer size for VLSI architectures of full-search block matching algorithms. Starting from the DG (dependency graph) analysis, we focus in the problem of reducing the internal buffer size under minimal I/O bandwidth constraint. As a result, a systematic design procedure for buffer optimization is derived to reduce realization cost.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114345124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606807
M. Zeller, James C. Phillips, A. Dalke, W. Humphrey, K. Schulten, Thomas S. Huang, V. Pavlovic, Yunxin Zhao, Zion Lo, Stephen M. Chu, Rajeev Sharma
Knowledge of the complex molecular structures of living cells is being accumulated at a tremendous rate. Key technologies enabling this success have been, high performance computing and powerful molecular graphics applications, but the technology is beginning to seriously lag behind challenges posed by the size and number of new structures and by the emerging opportunities in drug design and genetic engineering. A visual computing environment is being developed which permits interactive modeling of biopolymers by linking a 3D molecular graphics program with an efficient molecular dynamics simulation program executed on remote high-performance parallel computers. The system will be ideally suited for distributed computing environments, by utilizing both local 3D graphics facilities and the peak capacity of high-performance computers for the purpose of interactive biomolecular modeling. To create an interactive 3D environment three input methods will be explored: (1) a six degree of freedom "mouse" for controlling the space shared by the model and the user; (2) voice commands monitored through a microphone and recognized by a speech recognition interface; (3) hand gestures, detected through cameras and interpreted using computer vision techniques. Controlling 3D graphics connected to real time simulations and the use of voice with suitable language semantics, as well as hand gestures, promise great benefits for many types of problem solving environments. Our focus on structural biology takes advantage of existing sophisticated software, provides concrete objectives, defines a well-posed domain of tasks and offers a well-developed vocabulary for spoken communication.
{"title":"A visual computing environment for very large scale biomolecular modeling","authors":"M. Zeller, James C. Phillips, A. Dalke, W. Humphrey, K. Schulten, Thomas S. Huang, V. Pavlovic, Yunxin Zhao, Zion Lo, Stephen M. Chu, Rajeev Sharma","doi":"10.1109/ASAP.1997.606807","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606807","url":null,"abstract":"Knowledge of the complex molecular structures of living cells is being accumulated at a tremendous rate. Key technologies enabling this success have been, high performance computing and powerful molecular graphics applications, but the technology is beginning to seriously lag behind challenges posed by the size and number of new structures and by the emerging opportunities in drug design and genetic engineering. A visual computing environment is being developed which permits interactive modeling of biopolymers by linking a 3D molecular graphics program with an efficient molecular dynamics simulation program executed on remote high-performance parallel computers. The system will be ideally suited for distributed computing environments, by utilizing both local 3D graphics facilities and the peak capacity of high-performance computers for the purpose of interactive biomolecular modeling. To create an interactive 3D environment three input methods will be explored: (1) a six degree of freedom \"mouse\" for controlling the space shared by the model and the user; (2) voice commands monitored through a microphone and recognized by a speech recognition interface; (3) hand gestures, detected through cameras and interpreted using computer vision techniques. Controlling 3D graphics connected to real time simulations and the use of voice with suitable language semantics, as well as hand gestures, promise great benefits for many types of problem solving environments. Our focus on structural biology takes advantage of existing sophisticated software, provides concrete objectives, defines a well-posed domain of tasks and offers a well-developed vocabulary for spoken communication.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114624909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606841
C. Ebeling, Darren C. Cronquist, Paul Franklin
Recent trends in the cost and performance of application-specific hardware relative to conventional processors discourage investing much time and energy in special-purpose architectures except for niche applications. These trends, however, may be reversed by the increasing complexity of computer architectures and the advent of configurable computing. Configurable computers have attracted considerable attention recently because they promise to deliver the performance of application-specific hardware along with the flexibility of general-purpose computers. In this paper, we discuss some of the forces driving configurable computing, and we argue that new configurable architectures are needed to realize the enormous potential of configurable computing. In particular, we believe that the commercial FPGAs currently used to construct configurable computers are too fine-grained to achieve good cost-performance on computationally-intensive applications that demand high-performance hardware. We then describe a new architecture called RaPiD (Reconfigurable Pipelined Datapaths), which is optimized for highly repetitive, computationally-intensive tasks. Very deep application-specific computation pipelines can be configured in RaPiD that deliver very high performance for a wide range of applications. RaPiD achieves this using a coarse-grained reconfigurable architecture that mixes the appropriate amount of static configuration with dynamic control.
{"title":"Configurable computing: the catalyst for high-performance architectures","authors":"C. Ebeling, Darren C. Cronquist, Paul Franklin","doi":"10.1109/ASAP.1997.606841","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606841","url":null,"abstract":"Recent trends in the cost and performance of application-specific hardware relative to conventional processors discourage investing much time and energy in special-purpose architectures except for niche applications. These trends, however, may be reversed by the increasing complexity of computer architectures and the advent of configurable computing. Configurable computers have attracted considerable attention recently because they promise to deliver the performance of application-specific hardware along with the flexibility of general-purpose computers. In this paper, we discuss some of the forces driving configurable computing, and we argue that new configurable architectures are needed to realize the enormous potential of configurable computing. In particular, we believe that the commercial FPGAs currently used to construct configurable computers are too fine-grained to achieve good cost-performance on computationally-intensive applications that demand high-performance hardware. We then describe a new architecture called RaPiD (Reconfigurable Pipelined Datapaths), which is optimized for highly repetitive, computationally-intensive tasks. Very deep application-specific computation pipelines can be configured in RaPiD that deliver very high performance for a wide range of applications. RaPiD achieves this using a coarse-grained reconfigurable architecture that mixes the appropriate amount of static configuration with dynamic control.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123680978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606844
R. Hartenstein, J. Becker, M. Herz, U. Nageldinger
This paper introduces a powerful novel sequencer for controlling computational machines and for structured DMA (direct memory access) applications. It is mainly focused on applications using 2-dimensional memory organization, where most inherent speed-up is obtained thereof. A classification scheme of computational sequencing patterns and storage schemes is derived. In the context of application specific computing the paper illustrates its usefulness especially for data sequencing-recalling examples hereafter published earlier, as far as needed for completeness. The paper also discusses, how the new sequencer hardware provides substantial speed-up compared to traditional sequencing hardware use.
{"title":"A novel sequencer hardware for application specific computing","authors":"R. Hartenstein, J. Becker, M. Herz, U. Nageldinger","doi":"10.1109/ASAP.1997.606844","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606844","url":null,"abstract":"This paper introduces a powerful novel sequencer for controlling computational machines and for structured DMA (direct memory access) applications. It is mainly focused on applications using 2-dimensional memory organization, where most inherent speed-up is obtained thereof. A classification scheme of computational sequencing patterns and storage schemes is derived. In the context of application specific computing the paper illustrates its usefulness especially for data sequencing-recalling examples hereafter published earlier, as far as needed for completeness. The paper also discusses, how the new sequencer hardware provides substantial speed-up compared to traditional sequencing hardware use.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131176860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606822
C. V. Schimpfle, S. Simon, J. Nossek
In this paper a methodology for reducing the power consumption of shift-and-add operations in general and especially of CORDIC stages is presented. The proposed method uses the fact of simultaneous carry generation in redundant carry-save and signed digit structures to predict the minimum necessary hardware effort for shift-and-add operations. As a carry once generated in a certain bit position cannot "ripple" through the adder if using redundant number representation, hardware parts can be switched on or off depending on the shift constant. Simulations have shown, that shift dependent hardware utilization of parallel implementations leads to monotonically decreasing power consumption for increasing shift constants. A CORDIC processor element for 16 digit SDNR has been implemented as a layout and simulated with PowerMill in terms of power consumption.
{"title":"Low power CORDIC implementation using redundant number representation","authors":"C. V. Schimpfle, S. Simon, J. Nossek","doi":"10.1109/ASAP.1997.606822","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606822","url":null,"abstract":"In this paper a methodology for reducing the power consumption of shift-and-add operations in general and especially of CORDIC stages is presented. The proposed method uses the fact of simultaneous carry generation in redundant carry-save and signed digit structures to predict the minimum necessary hardware effort for shift-and-add operations. As a carry once generated in a certain bit position cannot \"ripple\" through the adder if using redundant number representation, hardware parts can be switched on or off depending on the shift constant. Simulations have shown, that shift dependent hardware utilization of parallel implementations leads to monotonically decreasing power consumption for increasing shift constants. A CORDIC processor element for 16 digit SDNR has been implemented as a layout and simulated with PowerMill in terms of power consumption.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131863940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}