首页 > 最新文献

Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors最新文献

英文 中文
Discrete Lagrangian method for optimizing the design of multiplierless QMF filter banks 无乘法器QMF滤波器组优化设计的离散拉格朗日方法
B. Wah, Yi Shang, Zhe Wu
In this paper, we present a new discrete Lagrangian optimization method for designing multiplierless QMF (quadrature mirror filter) filter banks. In multiplierless QMF filter banks, filter coefficients are powers-of-two (PO2) where numbers are represented as sums or differences of powers of two (also cabled Canonical Signed Digit-CSD-representation), and multiplications can be carried out as additions, subtractions and shifting. We formulate the design problem as a nonlinear discrete constrained optimization problem, using the reconstruction error as the objective, and other performance metrics as constraints. One of the major advantages of this formulation is that it allows us to search for designs that improve over the best existing designs with respect to all performance metrics, rather than finding designs that trade one performance metric for another. We show that our design method can find designs that improve over Johnston's benchmark designs using a maximum of three to six ONE bits in each filter coefficient.
本文提出了一种新的离散拉格朗日优化方法,用于设计无乘法器QMF(正交镜像滤波器)滤波器组。在无乘法器的QMF滤波器组中,滤波器系数是2的幂(PO2),其中数字表示为2的幂的和或差(也称为规范签名数字- csd表示),乘法可以作为加法,减法和移位进行。我们将设计问题表述为一个非线性离散约束优化问题,以重建误差为目标,以其他性能指标为约束。这个公式的主要优点之一是,它允许我们搜索在所有性能指标方面优于现有最佳设计的设计,而不是寻找用一个性能指标交换另一个性能指标的设计。我们表明,我们的设计方法可以找到优于Johnston基准设计的设计,每个滤波器系数中最多使用3到6个ONE位。
{"title":"Discrete Lagrangian method for optimizing the design of multiplierless QMF filter banks","authors":"B. Wah, Yi Shang, Zhe Wu","doi":"10.1109/ASAP.1997.606858","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606858","url":null,"abstract":"In this paper, we present a new discrete Lagrangian optimization method for designing multiplierless QMF (quadrature mirror filter) filter banks. In multiplierless QMF filter banks, filter coefficients are powers-of-two (PO2) where numbers are represented as sums or differences of powers of two (also cabled Canonical Signed Digit-CSD-representation), and multiplications can be carried out as additions, subtractions and shifting. We formulate the design problem as a nonlinear discrete constrained optimization problem, using the reconstruction error as the objective, and other performance metrics as constraints. One of the major advantages of this formulation is that it allows us to search for designs that improve over the best existing designs with respect to all performance metrics, rather than finding designs that trade one performance metric for another. We show that our design method can find designs that improve over Johnston's benchmark designs using a maximum of three to six ONE bits in each filter coefficient.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124619548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
ADPCM codec: from system level description to versatile HDL model ADPCM编解码器:从系统级描述到通用HDL模型
H. Dawid, Klaus-Jürgen Koch, J. Stahl
Due to the rapid increase in the system complexity of modern telecommunication products, two main challenges exist for a system design flow meeting the arising demands: 1) provide a platform for fast algorithmic and architectural design exploration and optimization from system to gate level, which guarantees high quality of results (QoR) and enables full and seamless design verification; 2) provide a platform for design reuse. In this paper, we show how a design flow based on fast system simulation, behavioral synthesis and power analysis is used for the commercial implementation of an ADPCM (Adaptive Differential Pulse Code Modulation) codec module in record time, simultaneously meeting all design constraints and creating a versatile system and HDL model ready for design reuse.
由于现代电信产品系统复杂性的快速增加,满足需求的系统设计流程面临两个主要挑战:1)提供一个从系统到门级的快速算法和架构设计探索和优化平台,保证高质量的结果(QoR),并实现充分和无缝的设计验证;2)为设计重用提供平台。在本文中,我们展示了基于快速系统仿真、行为综合和功率分析的设计流程如何在创纪录的时间内用于ADPCM(自适应差分脉冲编码调制)编解码模块的商业实现,同时满足所有设计约束,并创建一个通用的系统和HDL模型,为设计重用做准备。
{"title":"ADPCM codec: from system level description to versatile HDL model","authors":"H. Dawid, Klaus-Jürgen Koch, J. Stahl","doi":"10.1109/ASAP.1997.606851","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606851","url":null,"abstract":"Due to the rapid increase in the system complexity of modern telecommunication products, two main challenges exist for a system design flow meeting the arising demands: 1) provide a platform for fast algorithmic and architectural design exploration and optimization from system to gate level, which guarantees high quality of results (QoR) and enables full and seamless design verification; 2) provide a platform for design reuse. In this paper, we show how a design flow based on fast system simulation, behavioral synthesis and power analysis is used for the commercial implementation of an ADPCM (Adaptive Differential Pulse Code Modulation) codec module in record time, simultaneously meeting all design constraints and creating a versatile system and HDL model ready for design reuse.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123371468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Design methodology for digital signal processing 数字信号处理的设计方法
G. Fettweis
Improvements in semiconductor integration density and the resulting problem of having to manage designs of increasing complexity is an old one, but still current. The new challenge lies in a new level of architecture heterogeneity, e.g. mixing hard-wired digital circuits with software programmed signal processors on one die. Hence, we are moving by one level of abstraction from semi-custom standard-cells to semi-custom 'block cells'. This results in a new dimension in the gap between algorithm/system design and architecture/circuit design, not addressed by any tools sufficiently yet today. This paper presents a method of analyzing the problem by orthogonalizing algorithms into data transfer and data manipulation, and carrying this over to the control and I/O design as well. This approach might be a promising basis for flexibly mapping the algorithms onto future 'block cell' designs, and furthermore for designing new system simulation tools which allow for tools to be integrated for a flexible mapping of algorithms onto various different hardware architecture domains.
半导体集成密度的提高和由此产生的必须管理日益复杂的设计的问题是一个古老的问题,但仍然是当前的问题。新的挑战在于架构异质性的新水平,例如在一个芯片上混合硬连线数字电路和软件编程信号处理器。因此,我们正在从半自定义的标准单元向半自定义的“块单元”的抽象层次移动。这导致算法/系统设计和架构/电路设计之间的差距出现了一个新的维度,目前还没有任何工具能够充分解决这个问题。本文提出了一种通过将算法正交化为数据传输和数据操作来分析问题的方法,并将其推广到控制和I/O设计中。这种方法可能是灵活地将算法映射到未来的“块单元”设计上的有希望的基础,而且还可以设计新的系统仿真工具,这些工具允许将工具集成到各种不同的硬件架构域上,以灵活地将算法映射到各种不同的硬件架构域上。
{"title":"Design methodology for digital signal processing","authors":"G. Fettweis","doi":"10.1109/ASAP.1997.606852","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606852","url":null,"abstract":"Improvements in semiconductor integration density and the resulting problem of having to manage designs of increasing complexity is an old one, but still current. The new challenge lies in a new level of architecture heterogeneity, e.g. mixing hard-wired digital circuits with software programmed signal processors on one die. Hence, we are moving by one level of abstraction from semi-custom standard-cells to semi-custom 'block cells'. This results in a new dimension in the gap between algorithm/system design and architecture/circuit design, not addressed by any tools sufficiently yet today. This paper presents a method of analyzing the problem by orthogonalizing algorithms into data transfer and data manipulation, and carrying this over to the control and I/O design as well. This approach might be a promising basis for flexibly mapping the algorithms onto future 'block cell' designs, and furthermore for designing new system simulation tools which allow for tools to be integrated for a flexible mapping of algorithms onto various different hardware architecture domains.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127793022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
CORDIC-based computation of ArcCos and ArcSin 基于cordic的ArcCos和ArcSin计算
T. Lang, E. Antelo
CORDIC-based algorithms to compute cos/sup -1/(t), sin/sup -1/(t) and /spl radic/(1-t/sup 2/) are proposed. The implementation requires a standard CORDIC module plus a module to compute the direction of rotation, this being the same hardware required for the extended CORDIC vectoring, recently proposed by the authors. Although these functions can be obtained as a special case of this extended vectoring, the specific algorithm we propose here presents two significant improvements: (1) it achieves an angle granularity of 2/sup -n/ using the same datapath width as the standard CORDIC algorithm (about n bits, instead of about 2n which would be required using the extended veetoring), and (2) no repetitions of iterations are needed. The proposed algorithm is compatible with the extended vectoring and, in contrast with previous implementations, the number of iterations and the delay of each iteration are the same as for the conventional CORDIC algorithm.
提出了基于cordic的cos/sup -1/(t)、sin/sup -1/(t)和/spl radial /(1-t/sup 2/)计算算法。该实现需要一个标准的CORDIC模块加上一个计算旋转方向的模块,这与作者最近提出的扩展CORDIC矢量所需的硬件相同。虽然这些函数可以作为这种扩展矢量的特殊情况来获得,但我们在这里提出的具体算法有两个显著的改进:(1)它使用与标准CORDIC算法相同的数据路径宽度(约n位,而不是使用扩展矢量所需的约2n位)实现了2/sup -n/的角度粒度,(2)不需要重复迭代。与传统的CORDIC算法相比,该算法兼容扩展矢量,迭代次数和每次迭代的延迟与传统的CORDIC算法相同。
{"title":"CORDIC-based computation of ArcCos and ArcSin","authors":"T. Lang, E. Antelo","doi":"10.1109/ASAP.1997.606820","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606820","url":null,"abstract":"CORDIC-based algorithms to compute cos/sup -1/(t), sin/sup -1/(t) and /spl radic/(1-t/sup 2/) are proposed. The implementation requires a standard CORDIC module plus a module to compute the direction of rotation, this being the same hardware required for the extended CORDIC vectoring, recently proposed by the authors. Although these functions can be obtained as a special case of this extended vectoring, the specific algorithm we propose here presents two significant improvements: (1) it achieves an angle granularity of 2/sup -n/ using the same datapath width as the standard CORDIC algorithm (about n bits, instead of about 2n which would be required using the extended veetoring), and (2) no repetitions of iterations are needed. The proposed algorithm is compatible with the extended vectoring and, in contrast with previous implementations, the number of iterations and the delay of each iteration are the same as for the conventional CORDIC algorithm.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130458685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Efficient implementation of rotation operations for high performance QRD-RLS filtering 高效实现旋转操作,实现高性能QRD-RLS滤波
B. Haller, J. Götze, Joseph R. Cavallaro
In this paper we present practical techniques for implementing Givens rotations based on the well-known CORDIC algorithm. Rotations are the basic operation in many high performance adaptive filtering schemes as well as numerous other advanced signal processing algorithms relying on matrix decompositions. To improve the efficiency of these methods, we propose to use "approximate rotations", whereby only a few (i.e. r/spl Lt/b, where b is the operand word length) elementary angles of the original CORDIC sequence are applied, so as to reduce the total number of required shift add operations. This seamingly rather ad hoc and heuristic procedure constitutes a representative example of a very useful design concept termed "approximate signal processing" recently introduced and formally exposed by S.H. Nawab et al. (1997), concerning the trade-off between system performance and implementation complexity, i.e. between accuracy and resources. This is a subject of increasing importance with respect to the efficient realization of demanding signal processing tasks. We present the application of the described rotation schemes to QRD-RLS filtering in wireless communications, specifically high speed channel equalization and beamforming, i.e. for intersymbol and co-channel/interuser interference suppression, respectively. It is shown via computer simulations that the convergence behavior of the scheme using approximate Givens rotations is insensitive to the value of r, and that the misadjustment error decreases as r is increased, opening zip possibilities for "incremental refinement" strategies.
在本文中,我们提出了基于著名的CORDIC算法实现给定旋转的实用技术。旋转是许多高性能自适应滤波方案以及依赖于矩阵分解的许多其他高级信号处理算法的基本操作。为了提高这些方法的效率,我们建议使用“近似旋转”,即只应用原始CORDIC序列的几个基本角度(即r/spl Lt/b,其中b为操作数字长),以减少所需的移位加操作的总数。这种看似特别的启发式过程构成了一个非常有用的设计概念的代表性例子,该概念被称为“近似信号处理”,最近由S.H. Nawab等人(1997)介绍并正式披露,涉及系统性能和实现复杂性之间的权衡,即准确性和资源之间的权衡。对于高效实现高要求的信号处理任务来说,这是一个日益重要的课题。我们介绍了所描述的旋转方案在无线通信中QRD-RLS滤波中的应用,特别是高速信道均衡和波束形成,即分别用于码间和同信道/用户间干扰抑制。通过计算机模拟表明,使用近似Givens旋转的方案的收敛行为对r的值不敏感,并且失调误差随着r的增加而减小,为“增量细化”策略打开了压缩的可能性。
{"title":"Efficient implementation of rotation operations for high performance QRD-RLS filtering","authors":"B. Haller, J. Götze, Joseph R. Cavallaro","doi":"10.1109/ASAP.1997.606823","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606823","url":null,"abstract":"In this paper we present practical techniques for implementing Givens rotations based on the well-known CORDIC algorithm. Rotations are the basic operation in many high performance adaptive filtering schemes as well as numerous other advanced signal processing algorithms relying on matrix decompositions. To improve the efficiency of these methods, we propose to use \"approximate rotations\", whereby only a few (i.e. r/spl Lt/b, where b is the operand word length) elementary angles of the original CORDIC sequence are applied, so as to reduce the total number of required shift add operations. This seamingly rather ad hoc and heuristic procedure constitutes a representative example of a very useful design concept termed \"approximate signal processing\" recently introduced and formally exposed by S.H. Nawab et al. (1997), concerning the trade-off between system performance and implementation complexity, i.e. between accuracy and resources. This is a subject of increasing importance with respect to the efficient realization of demanding signal processing tasks. We present the application of the described rotation schemes to QRD-RLS filtering in wireless communications, specifically high speed channel equalization and beamforming, i.e. for intersymbol and co-channel/interuser interference suppression, respectively. It is shown via computer simulations that the convergence behavior of the scheme using approximate Givens rotations is insensitive to the value of r, and that the misadjustment error decreases as r is increased, opening zip possibilities for \"incremental refinement\" strategies.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131065305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Buffer size optimization for full-search block matching algorithms 全搜索块匹配算法的缓冲区大小优化
Yuan-Hau Yeh, Chen-Yi Lee
This paper presents how to find optimized buffer size for VLSI architectures of full-search block matching algorithms. Starting from the DG (dependency graph) analysis, we focus in the problem of reducing the internal buffer size under minimal I/O bandwidth constraint. As a result, a systematic design procedure for buffer optimization is derived to reduce realization cost.
本文介绍了如何为全搜索块匹配算法的VLSI架构找到最优的缓冲区大小。从DG(依赖性图)分析出发,重点研究在最小I/O带宽约束下减小内部缓冲区大小的问题。为此,导出了一套系统的缓冲器优化设计程序,以降低实现成本。
{"title":"Buffer size optimization for full-search block matching algorithms","authors":"Yuan-Hau Yeh, Chen-Yi Lee","doi":"10.1109/ASAP.1997.606814","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606814","url":null,"abstract":"This paper presents how to find optimized buffer size for VLSI architectures of full-search block matching algorithms. Starting from the DG (dependency graph) analysis, we focus in the problem of reducing the internal buffer size under minimal I/O bandwidth constraint. As a result, a systematic design procedure for buffer optimization is derived to reduce realization cost.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114345124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
A visual computing environment for very large scale biomolecular modeling 一个用于大规模生物分子建模的可视化计算环境
M. Zeller, James C. Phillips, A. Dalke, W. Humphrey, K. Schulten, Thomas S. Huang, V. Pavlovic, Yunxin Zhao, Zion Lo, Stephen M. Chu, Rajeev Sharma
Knowledge of the complex molecular structures of living cells is being accumulated at a tremendous rate. Key technologies enabling this success have been, high performance computing and powerful molecular graphics applications, but the technology is beginning to seriously lag behind challenges posed by the size and number of new structures and by the emerging opportunities in drug design and genetic engineering. A visual computing environment is being developed which permits interactive modeling of biopolymers by linking a 3D molecular graphics program with an efficient molecular dynamics simulation program executed on remote high-performance parallel computers. The system will be ideally suited for distributed computing environments, by utilizing both local 3D graphics facilities and the peak capacity of high-performance computers for the purpose of interactive biomolecular modeling. To create an interactive 3D environment three input methods will be explored: (1) a six degree of freedom "mouse" for controlling the space shared by the model and the user; (2) voice commands monitored through a microphone and recognized by a speech recognition interface; (3) hand gestures, detected through cameras and interpreted using computer vision techniques. Controlling 3D graphics connected to real time simulations and the use of voice with suitable language semantics, as well as hand gestures, promise great benefits for many types of problem solving environments. Our focus on structural biology takes advantage of existing sophisticated software, provides concrete objectives, defines a well-posed domain of tasks and offers a well-developed vocabulary for spoken communication.
关于活细胞复杂分子结构的知识正在以惊人的速度积累。实现这一成功的关键技术是高性能计算和强大的分子图形应用,但该技术开始严重落后于新结构的大小和数量以及药物设计和基因工程中出现的机会所带来的挑战。目前正在开发一种可视化计算环境,通过将3D分子图形程序与在远程高性能并行计算机上执行的高效分子动力学模拟程序连接起来,实现生物聚合物的交互式建模。该系统将非常适合分布式计算环境,通过利用本地3D图形设施和高性能计算机的峰值容量来进行交互式生物分子建模。为了创建一个交互式的3D环境,我们将探索三种输入方法:(1)一个六自由度的“鼠标”,用于控制模型和用户共享的空间;(2)通过麦克风监控并通过语音识别接口识别的语音命令;(3)手势,通过相机检测并使用计算机视觉技术进行解释。控制连接到实时模拟的3D图形,使用具有适当语言语义的语音以及手势,为许多类型的问题解决环境带来了巨大的好处。我们对结构生物学的关注利用了现有的复杂软件,提供了具体的目标,定义了一个良好的任务领域,并为口语交流提供了一个良好的词汇。
{"title":"A visual computing environment for very large scale biomolecular modeling","authors":"M. Zeller, James C. Phillips, A. Dalke, W. Humphrey, K. Schulten, Thomas S. Huang, V. Pavlovic, Yunxin Zhao, Zion Lo, Stephen M. Chu, Rajeev Sharma","doi":"10.1109/ASAP.1997.606807","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606807","url":null,"abstract":"Knowledge of the complex molecular structures of living cells is being accumulated at a tremendous rate. Key technologies enabling this success have been, high performance computing and powerful molecular graphics applications, but the technology is beginning to seriously lag behind challenges posed by the size and number of new structures and by the emerging opportunities in drug design and genetic engineering. A visual computing environment is being developed which permits interactive modeling of biopolymers by linking a 3D molecular graphics program with an efficient molecular dynamics simulation program executed on remote high-performance parallel computers. The system will be ideally suited for distributed computing environments, by utilizing both local 3D graphics facilities and the peak capacity of high-performance computers for the purpose of interactive biomolecular modeling. To create an interactive 3D environment three input methods will be explored: (1) a six degree of freedom \"mouse\" for controlling the space shared by the model and the user; (2) voice commands monitored through a microphone and recognized by a speech recognition interface; (3) hand gestures, detected through cameras and interpreted using computer vision techniques. Controlling 3D graphics connected to real time simulations and the use of voice with suitable language semantics, as well as hand gestures, promise great benefits for many types of problem solving environments. Our focus on structural biology takes advantage of existing sophisticated software, provides concrete objectives, defines a well-posed domain of tasks and offers a well-developed vocabulary for spoken communication.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114624909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Configurable computing: the catalyst for high-performance architectures 可配置计算:高性能架构的催化剂
C. Ebeling, Darren C. Cronquist, Paul Franklin
Recent trends in the cost and performance of application-specific hardware relative to conventional processors discourage investing much time and energy in special-purpose architectures except for niche applications. These trends, however, may be reversed by the increasing complexity of computer architectures and the advent of configurable computing. Configurable computers have attracted considerable attention recently because they promise to deliver the performance of application-specific hardware along with the flexibility of general-purpose computers. In this paper, we discuss some of the forces driving configurable computing, and we argue that new configurable architectures are needed to realize the enormous potential of configurable computing. In particular, we believe that the commercial FPGAs currently used to construct configurable computers are too fine-grained to achieve good cost-performance on computationally-intensive applications that demand high-performance hardware. We then describe a new architecture called RaPiD (Reconfigurable Pipelined Datapaths), which is optimized for highly repetitive, computationally-intensive tasks. Very deep application-specific computation pipelines can be configured in RaPiD that deliver very high performance for a wide range of applications. RaPiD achieves this using a coarse-grained reconfigurable architecture that mixes the appropriate amount of static configuration with dynamic control.
与传统处理器相比,应用程序专用硬件的成本和性能方面的最新趋势不鼓励在专用架构上投入太多时间和精力,除了小众应用程序。然而,随着计算机体系结构的日益复杂和可配置计算的出现,这些趋势可能会被逆转。可配置计算机最近引起了相当大的关注,因为它们承诺提供特定于应用程序的硬件的性能以及通用计算机的灵活性。在本文中,我们讨论了驱动可配置计算的一些力量,我们认为需要新的可配置架构来实现可配置计算的巨大潜力。特别是,我们认为目前用于构建可配置计算机的商用fpga过于细粒度,无法在需要高性能硬件的计算密集型应用中实现良好的性价比。然后,我们描述了一种称为RaPiD(可重构管道数据路径)的新架构,该架构针对高度重复、计算密集型的任务进行了优化。可以在RaPiD中配置非常深入的特定于应用程序的计算管道,为广泛的应用程序提供非常高的性能。RaPiD使用粗粒度的可重构架构实现了这一点,该架构将适当数量的静态配置与动态控制混合在一起。
{"title":"Configurable computing: the catalyst for high-performance architectures","authors":"C. Ebeling, Darren C. Cronquist, Paul Franklin","doi":"10.1109/ASAP.1997.606841","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606841","url":null,"abstract":"Recent trends in the cost and performance of application-specific hardware relative to conventional processors discourage investing much time and energy in special-purpose architectures except for niche applications. These trends, however, may be reversed by the increasing complexity of computer architectures and the advent of configurable computing. Configurable computers have attracted considerable attention recently because they promise to deliver the performance of application-specific hardware along with the flexibility of general-purpose computers. In this paper, we discuss some of the forces driving configurable computing, and we argue that new configurable architectures are needed to realize the enormous potential of configurable computing. In particular, we believe that the commercial FPGAs currently used to construct configurable computers are too fine-grained to achieve good cost-performance on computationally-intensive applications that demand high-performance hardware. We then describe a new architecture called RaPiD (Reconfigurable Pipelined Datapaths), which is optimized for highly repetitive, computationally-intensive tasks. Very deep application-specific computation pipelines can be configured in RaPiD that deliver very high performance for a wide range of applications. RaPiD achieves this using a coarse-grained reconfigurable architecture that mixes the appropriate amount of static configuration with dynamic control.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123680978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
A novel sequencer hardware for application specific computing 一种用于特定应用计算的新型音序器硬件
R. Hartenstein, J. Becker, M. Herz, U. Nageldinger
This paper introduces a powerful novel sequencer for controlling computational machines and for structured DMA (direct memory access) applications. It is mainly focused on applications using 2-dimensional memory organization, where most inherent speed-up is obtained thereof. A classification scheme of computational sequencing patterns and storage schemes is derived. In the context of application specific computing the paper illustrates its usefulness especially for data sequencing-recalling examples hereafter published earlier, as far as needed for completeness. The paper also discusses, how the new sequencer hardware provides substantial speed-up compared to traditional sequencing hardware use.
本文介绍了一种用于控制计算机和结构化DMA(直接存储器访问)应用的功能强大的新型序列器。它主要集中在使用二维内存组织的应用程序上,其中大多数固有的加速是由二维内存组织获得的。推导了计算排序模式和存储模式的分类方案。在特定应用计算的背景下,本文说明了它的有用性,特别是对于以后发布的数据排序召回示例,只要需要完整性。本文还讨论了与传统测序硬件相比,新的测序硬件如何提供实质性的加速。
{"title":"A novel sequencer hardware for application specific computing","authors":"R. Hartenstein, J. Becker, M. Herz, U. Nageldinger","doi":"10.1109/ASAP.1997.606844","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606844","url":null,"abstract":"This paper introduces a powerful novel sequencer for controlling computational machines and for structured DMA (direct memory access) applications. It is mainly focused on applications using 2-dimensional memory organization, where most inherent speed-up is obtained thereof. A classification scheme of computational sequencing patterns and storage schemes is derived. In the context of application specific computing the paper illustrates its usefulness especially for data sequencing-recalling examples hereafter published earlier, as far as needed for completeness. The paper also discusses, how the new sequencer hardware provides substantial speed-up compared to traditional sequencing hardware use.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131176860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Low power CORDIC implementation using redundant number representation 使用冗余数字表示的低功耗CORDIC实现
C. V. Schimpfle, S. Simon, J. Nossek
In this paper a methodology for reducing the power consumption of shift-and-add operations in general and especially of CORDIC stages is presented. The proposed method uses the fact of simultaneous carry generation in redundant carry-save and signed digit structures to predict the minimum necessary hardware effort for shift-and-add operations. As a carry once generated in a certain bit position cannot "ripple" through the adder if using redundant number representation, hardware parts can be switched on or off depending on the shift constant. Simulations have shown, that shift dependent hardware utilization of parallel implementations leads to monotonically decreasing power consumption for increasing shift constants. A CORDIC processor element for 16 digit SDNR has been implemented as a layout and simulated with PowerMill in terms of power consumption.
本文提出了一种降低移位加运算,特别是CORDIC运算阶段的功耗的方法。该方法利用冗余进位保存和符号数结构中同时产生进位的事实来预测移位和相加操作所需的最小硬件工作量。如果使用冗余数字表示法,在某个位上产生的进位不能通过加法器“纹波”,因此硬件部件可以根据移位常数打开或关闭。仿真表明,随着移位常数的增加,并行实现的移位依赖硬件利用率会导致功耗单调降低。一个用于16位SDNR的CORDIC处理器元件已经作为一个布局实现,并在功耗方面用PowerMill进行了仿真。
{"title":"Low power CORDIC implementation using redundant number representation","authors":"C. V. Schimpfle, S. Simon, J. Nossek","doi":"10.1109/ASAP.1997.606822","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606822","url":null,"abstract":"In this paper a methodology for reducing the power consumption of shift-and-add operations in general and especially of CORDIC stages is presented. The proposed method uses the fact of simultaneous carry generation in redundant carry-save and signed digit structures to predict the minimum necessary hardware effort for shift-and-add operations. As a carry once generated in a certain bit position cannot \"ripple\" through the adder if using redundant number representation, hardware parts can be switched on or off depending on the shift constant. Simulations have shown, that shift dependent hardware utilization of parallel implementations leads to monotonically decreasing power consumption for increasing shift constants. A CORDIC processor element for 16 digit SDNR has been implemented as a layout and simulated with PowerMill in terms of power consumption.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131863940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1