首页 > 最新文献

VLSI Signal Processing, IX最新文献

英文 中文
Hardware design of a Hough transform based 2-D motion estimation system 基于霍夫变换的二维运动估计系统的硬件设计
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558368
Hsiang-Ling Li, C. Chakrabarti
A novel feature-domain 2D motion estimation system based on the straight-line Hough transform (SLHT) is presented. This system implements the motion estimation technique proposed by Li and Chakrabarti (see Pattern Recognition, vol.29, no.8, 1996). It operates on 256/spl times/256-pixel binary images and consists of two main blocks. The first block does the preprocessing work including smoothing the boundary, tracing and integrating the contours, and detecting dominant points. The second block computes the Hough transform on contour segments as well as the rotation and translation parameters. Each of the modules has been implemented (gate level) and simulated using Mentor Graphics tools. The experimental results are presented and compared with the results of the software implementation.
提出了一种基于直线霍夫变换(SLHT)的特征域二维运动估计系统。该系统实现了Li和Chakrabarti提出的运动估计技术(见模式识别,vol.29, no. 29)。8, 1996)。它在256/spl次/256像素的二进制图像上运行,由两个主要块组成。第一部分进行边界平滑、轮廓跟踪与积分、优势点检测等预处理工作。第二块计算轮廓段的霍夫变换以及旋转和平移参数。每个模块都已经实现(门级)并使用Mentor Graphics工具进行了模拟。给出了实验结果,并与软件实现结果进行了比较。
{"title":"Hardware design of a Hough transform based 2-D motion estimation system","authors":"Hsiang-Ling Li, C. Chakrabarti","doi":"10.1109/VLSISP.1996.558368","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558368","url":null,"abstract":"A novel feature-domain 2D motion estimation system based on the straight-line Hough transform (SLHT) is presented. This system implements the motion estimation technique proposed by Li and Chakrabarti (see Pattern Recognition, vol.29, no.8, 1996). It operates on 256/spl times/256-pixel binary images and consists of two main blocks. The first block does the preprocessing work including smoothing the boundary, tracing and integrating the contours, and detecting dominant points. The second block computes the Hough transform on contour segments as well as the rotation and translation parameters. Each of the modules has been implemented (gate level) and simulated using Mentor Graphics tools. The experimental results are presented and compared with the results of the software implementation.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129969675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Model-based architectural design and verification of scalable embedded DSP systems-a RASSP approach 基于模型的可扩展嵌入式DSP系统体系结构设计与验证——一种RASSP方法
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558314
Lan-Rong Dung, V. K. Madisetti, J. Hines
The paper describes how rapid model-year architectural synthesis (e.g., HW/SW codesign) of embedded signal processors can be performed to optimize various cost objective functions using a reuse library of model, followed by simulation based optimization. Sponsored as part of DARPA's RASSP program, this approach has developed and released a number of interoperable and verified architectural component libraries at the system level (processors, communication protocols, and topologies). While these libraries have been used in actual demonstrations of avionics and military systems, such as the MIT Lincoln Laboratory's SAR Benchmark, the F-14 legacy Infrared Search and Track System (IRST), and as part of NASA/JPL's Remote Exploration/Experimentation (REE) program studies, the authors introduce the methodology of conceptual prototyping and establish the requirements and features of the proposed environment. They also illustrate its use on some common applications with relatively sophisticated architectural building blocks, such as IEEE SCI protocol and Analog Devices' SHARC processor family.
本文描述了如何使用模型重用库对嵌入式信号处理器进行快速模型年架构综合(例如,硬件/软件协同设计)以优化各种成本目标函数,然后进行基于仿真的优化。作为DARPA RASSP计划的一部分,该方法已经在系统级别(处理器、通信协议和拓扑)开发并发布了许多可互操作的和经过验证的体系结构组件库。虽然这些库已用于航空电子设备和军事系统的实际演示,如麻省理工学院林肯实验室的SAR基准,F-14传统红外搜索和跟踪系统(IRST),以及作为NASA/JPL远程探索/实验(REE)计划研究的一部分,作者介绍了概念原型的方法,并建立了拟议环境的要求和特征。他们还说明了它在一些具有相对复杂的体系结构构建块的常见应用程序中的使用,例如IEEE SCI协议和Analog Devices的SHARC处理器系列。
{"title":"Model-based architectural design and verification of scalable embedded DSP systems-a RASSP approach","authors":"Lan-Rong Dung, V. K. Madisetti, J. Hines","doi":"10.1109/VLSISP.1996.558314","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558314","url":null,"abstract":"The paper describes how rapid model-year architectural synthesis (e.g., HW/SW codesign) of embedded signal processors can be performed to optimize various cost objective functions using a reuse library of model, followed by simulation based optimization. Sponsored as part of DARPA's RASSP program, this approach has developed and released a number of interoperable and verified architectural component libraries at the system level (processors, communication protocols, and topologies). While these libraries have been used in actual demonstrations of avionics and military systems, such as the MIT Lincoln Laboratory's SAR Benchmark, the F-14 legacy Infrared Search and Track System (IRST), and as part of NASA/JPL's Remote Exploration/Experimentation (REE) program studies, the authors introduce the methodology of conceptual prototyping and establish the requirements and features of the proposed environment. They also illustrate its use on some common applications with relatively sophisticated architectural building blocks, such as IEEE SCI protocol and Analog Devices' SHARC processor family.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134391300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Design issues for very-long-instruction-word VLSI video signal processors 超长指令字VLSI视频信号处理器的设计问题
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558307
S. Dutta, A. Wolfe, W. Wolf, K. O'Connor
This paper is a design study of a very long instruction word (VLIW) video signal processor (VSP), concentrating on the VLSI tradeoffs which affect the processor's architecture. VLIW architectures provide high parallelism and excellent high-level language programmability, but require careful attention to VLSI design. Flexible, high-bandwidth interconnect, high-connectivity register files, and fast cycle time are required to achieve real-time video signal processing. The design targets 32-64 operations per cycle at clock rates exceeding 500 MHz. Parameterizable versions of key modules have been designed in a 0.25 /spl mu/m CMOS process, allowing us to explore the VLIW VSP design space and study the tradeoffs defined by the characteristics of the process.
本文对超长指令字(VLIW)视频信号处理器(VSP)进行了设计研究,重点研究了影响处理器结构的VLSI折衷问题。VLSI架构提供了高并行性和优秀的高级语言可编程性,但在VLSI设计时需要特别注意。实现实时视频信号处理需要灵活、高带宽的互连、高连通性的寄存器文件和快速的周期时间。设计目标是在时钟速率超过500 MHz的情况下,每个周期运行32-64次操作。在0.25 /spl mu/m CMOS工艺中设计了关键模块的可参数化版本,使我们能够探索VLIW VSP设计空间并研究由工艺特性定义的权衡。
{"title":"Design issues for very-long-instruction-word VLSI video signal processors","authors":"S. Dutta, A. Wolfe, W. Wolf, K. O'Connor","doi":"10.1109/VLSISP.1996.558307","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558307","url":null,"abstract":"This paper is a design study of a very long instruction word (VLIW) video signal processor (VSP), concentrating on the VLSI tradeoffs which affect the processor's architecture. VLIW architectures provide high parallelism and excellent high-level language programmability, but require careful attention to VLSI design. Flexible, high-bandwidth interconnect, high-connectivity register files, and fast cycle time are required to achieve real-time video signal processing. The design targets 32-64 operations per cycle at clock rates exceeding 500 MHz. Parameterizable versions of key modules have been designed in a 0.25 /spl mu/m CMOS process, allowing us to explore the VLIW VSP design space and study the tradeoffs defined by the characteristics of the process.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130933924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Efficient VLSI suited architectures for discrete wavelet transforms 高效VLSI适合离散小波变换的架构
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558371
S. Simon, P. Rieder, J. Nossek
A variety of architectures for the discrete wavelet transform (DWT) is examined to derive an efficient VLSI implementation. The comparison leads to a lattice filter structure which uses single steps of the CORDIC algorithm. Due to the modular structure of the proposed architecture, this approach is especially suited for full custom design style using module generators to automate the manual design process.
研究了离散小波变换(DWT)的各种结构,以获得高效的VLSI实现。通过比较得出了一种采用CORDIC算法单步执行的点阵滤波结构。由于所建议的体系结构的模块化结构,这种方法特别适合于使用模块生成器来自动化手动设计过程的完全自定义设计风格。
{"title":"Efficient VLSI suited architectures for discrete wavelet transforms","authors":"S. Simon, P. Rieder, J. Nossek","doi":"10.1109/VLSISP.1996.558371","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558371","url":null,"abstract":"A variety of architectures for the discrete wavelet transform (DWT) is examined to derive an efficient VLSI implementation. The comparison leads to a lattice filter structure which uses single steps of the CORDIC algorithm. Due to the modular structure of the proposed architecture, this approach is especially suited for full custom design style using module generators to automate the manual design process.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133285091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A scalable architecture for 2-D discrete wavelet transform 二维离散小波变换的可扩展结构
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558369
J.C. Limqueco, M. Bayoumi
We propose an efficient and simple systolic-like architecture for VLSI implementation of a 2-D discrete wavelet transform (DWT). The "approximation" and "detailed" components of a signal are computed simultaneously in the first octave and alternately in the other octave(s). Each processing element has its own local memory for storing intermediate data and minimum routing requirement limited only to its neighbors. The proposed architecture uses the same clock frequency for every octave level and has a 100% utilization for j=2 architecture, and N/sup 2/+N period cycle. The architecture is scalable for different filter lengths (divisible by 2) and different octave levels.
我们提出了一种高效和简单的类似收缩的架构,用于实现二维离散小波变换(DWT)。信号的“近似”分量和“详细”分量在第一个八度中同时计算,在其他八度中交替计算。每个处理元素都有自己的本地内存,用于存储中间数据和仅限其邻居的最小路由需求。所建议的体系结构对每个八度电平使用相同的时钟频率,并且对于j=2体系结构和N/sup 2/+N周期周期具有100%的利用率。该架构可针对不同的滤波器长度(可被2整除)和不同的八度程级别进行扩展。
{"title":"A scalable architecture for 2-D discrete wavelet transform","authors":"J.C. Limqueco, M. Bayoumi","doi":"10.1109/VLSISP.1996.558369","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558369","url":null,"abstract":"We propose an efficient and simple systolic-like architecture for VLSI implementation of a 2-D discrete wavelet transform (DWT). The \"approximation\" and \"detailed\" components of a signal are computed simultaneously in the first octave and alternately in the other octave(s). Each processing element has its own local memory for storing intermediate data and minimum routing requirement limited only to its neighbors. The proposed architecture uses the same clock frequency for every octave level and has a 100% utilization for j=2 architecture, and N/sup 2/+N period cycle. The architecture is scalable for different filter lengths (divisible by 2) and different octave levels.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132273168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
An object based data cache with conflict free concurrent access as shared memory for a parallel DSP 作为并行DSP的共享内存,具有无冲突并发访问的基于对象的数据缓存
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558278
J. Kneip, P. Pirsch
The paper describes principle and practical implementation of an object based cache concept, allowing conflict free regular access to data structures for a cluster of processing units. The cache is based on a virtual object bound address space instead of the conventional linear address space for the access to shared data located in on-chip caches. By extending the conventional block based cache principle to 2-D blocks and using virtual addresses for address arithmetic and hit/miss detection, the time critical address calculations in the load/store pipeline can be performed fast and at low hardware cost. Transform to physical addresses is performed during block transfer between internal caches and external system memory, where it is much less time critical and must only be performed once per block. The object based cache is compiler friendly, fully transparent to the programmer, and allows the hardware efficient implementation of a shared on-chip memory system for future parallel digital image processors.
本文描述了基于对象的缓存概念的原理和实际实现,允许对处理单元集群的数据结构进行无冲突的常规访问。该缓存基于虚拟对象绑定地址空间,而不是用于访问位于片上缓存中的共享数据的传统线性地址空间。通过将传统的基于块的缓存原理扩展到二维块,并使用虚拟地址进行地址算法和命中/未命中检测,可以在低硬件成本的情况下快速执行加载/存储管道中的时间关键地址计算。在内部缓存和外部系统内存之间的块传输期间执行到物理地址的转换,在这种情况下,它对时间的要求要低得多,并且每个块只需执行一次。基于对象的缓存是编译器友好的,对程序员完全透明,并且允许硬件高效地实现一个共享的片上存储系统,用于未来的并行数字图像处理器。
{"title":"An object based data cache with conflict free concurrent access as shared memory for a parallel DSP","authors":"J. Kneip, P. Pirsch","doi":"10.1109/VLSISP.1996.558278","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558278","url":null,"abstract":"The paper describes principle and practical implementation of an object based cache concept, allowing conflict free regular access to data structures for a cluster of processing units. The cache is based on a virtual object bound address space instead of the conventional linear address space for the access to shared data located in on-chip caches. By extending the conventional block based cache principle to 2-D blocks and using virtual addresses for address arithmetic and hit/miss detection, the time critical address calculations in the load/store pipeline can be performed fast and at low hardware cost. Transform to physical addresses is performed during block transfer between internal caches and external system memory, where it is much less time critical and must only be performed once per block. The object based cache is compiler friendly, fully transparent to the programmer, and allows the hardware efficient implementation of a shared on-chip memory system for future parallel digital image processors.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114946878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Low-power digital filter implementations using ternary coefficients 使用三元系数的低功耗数字滤波器实现
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558325
R. Hezar, V. K. Madisetti
We propose an efficient design procedure for digital FIR filters whose coefficients are restricted to the ternary set (-1, 0, +1), cascaded by a multiplication-free architecture. A dynamic programming algorithm, minimizing the instantaneous error, is also proposed to assist in the search for the optimal ternary filter coefficient set. Power reductions in a VLSI implementation appear feasible, when compared to other published approaches.
我们提出了一种有效的数字FIR滤波器设计程序,其系数被限制在三元集(- 1,0,+1),通过无乘法结构级联。提出了一种动态规划算法,使瞬时误差最小化,以帮助搜索最优的三元滤波器系数集。与其他已发表的方法相比,VLSI实现中的功耗降低似乎是可行的。
{"title":"Low-power digital filter implementations using ternary coefficients","authors":"R. Hezar, V. K. Madisetti","doi":"10.1109/VLSISP.1996.558325","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558325","url":null,"abstract":"We propose an efficient design procedure for digital FIR filters whose coefficients are restricted to the ternary set (-1, 0, +1), cascaded by a multiplication-free architecture. A dynamic programming algorithm, minimizing the instantaneous error, is also proposed to assist in the search for the optimal ternary filter coefficient set. Power reductions in a VLSI implementation appear feasible, when compared to other published approaches.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127038858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Real-time software MPEG-1 video decoder design for low-cost, low-power applications 实时软件MPEG-1视频解码器设计的低成本,低功耗应用
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558376
K. Nadehara, H. Stolberg, M. Ikekawa, E. Murata, I. Kuroda
This paper presents a real-time MPEC-1 video decoder implemented in software on a DSP-enhanced, 160-mW, 100-MHz, 32-bit microprocessor. The processor's DSP-oriented instructions improves the performance of generic DSP operations such as the inverse discrete cosine transform, while fast software algorithms that perform parallel operation on packed-pixel data are developed for processes unique to video decoding such as motion compensation. Furthermore, to reduce the clock count as well as the instruction count, load/store scheduling and cache miss reduction are performed. In total, the processor can achieve 30 frames/sec MPEC-1 video decoding at a cost and power dissipation (160 mW) comparable to dedicated LSIs.
本文介绍了一种实时MPEC-1视频解码器,该解码器在dsp增强的160-mW, 100-MHz, 32位微处理器上用软件实现。处理器的面向DSP的指令提高了通用DSP操作的性能,如逆离散余弦变换,而对打包像素数据执行并行操作的快速软件算法则用于视频解码的独特处理,如运动补偿。此外,为了减少时钟计数和指令计数,执行负载/存储调度和缓存丢失减少。总的来说,该处理器可以以与专用lsi相当的成本和功耗(160 mW)实现30帧/秒的MPEC-1视频解码。
{"title":"Real-time software MPEG-1 video decoder design for low-cost, low-power applications","authors":"K. Nadehara, H. Stolberg, M. Ikekawa, E. Murata, I. Kuroda","doi":"10.1109/VLSISP.1996.558376","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558376","url":null,"abstract":"This paper presents a real-time MPEC-1 video decoder implemented in software on a DSP-enhanced, 160-mW, 100-MHz, 32-bit microprocessor. The processor's DSP-oriented instructions improves the performance of generic DSP operations such as the inverse discrete cosine transform, while fast software algorithms that perform parallel operation on packed-pixel data are developed for processes unique to video decoding such as motion compensation. Furthermore, to reduce the clock count as well as the instruction count, load/store scheduling and cache miss reduction are performed. In total, the processor can achieve 30 frames/sec MPEC-1 video decoding at a cost and power dissipation (160 mW) comparable to dedicated LSIs.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121065272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
High-radix parallel dividers for VLSI signal processing 用于VLSI信号处理的高基数并行分频器
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558306
T. Aoki, Hiroshi Tokoyo, T. Higuchi
This paper presents a unified approach for designing high-radix dividers for on-line signal and data processing applications. It has long been recognized that the use of higher radices makes possible the reduction of computational steps in the division process. However most of the conventional high-radix algorithms are not suited for designing high-speed parallel dividers since they require lookup tables for selecting the quotient digits. We present a high-radix divider design that does not assume the use of lookup tables and is applicable to arbitrary radices. By prescaling the operands and converting the representation of each partial remainder into partially non-redundant representation, the quotient digit can be obtained directly from the integer part of the partial remainder. This paper also discusses the design of a radix-8 fully parallel divider as an example.
本文提出了一种用于在线信号和数据处理应用的高基数分频器的统一设计方法。人们早就认识到,使用更高的根可以减少除法过程中的计算步骤。然而,大多数传统的高基数算法不适合设计高速并行除法,因为它们需要查找表来选择商数字。我们提出了一个高基数除法设计,不假设使用查找表,适用于任意基数。通过对操作数进行预缩,并将每个部分余数的表示转换为部分非冗余表示,可以直接从部分余数的整数部分得到商位。本文还以一个基数-8的全并行除法器为例讨论了其设计。
{"title":"High-radix parallel dividers for VLSI signal processing","authors":"T. Aoki, Hiroshi Tokoyo, T. Higuchi","doi":"10.1109/VLSISP.1996.558306","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558306","url":null,"abstract":"This paper presents a unified approach for designing high-radix dividers for on-line signal and data processing applications. It has long been recognized that the use of higher radices makes possible the reduction of computational steps in the division process. However most of the conventional high-radix algorithms are not suited for designing high-speed parallel dividers since they require lookup tables for selecting the quotient digits. We present a high-radix divider design that does not assume the use of lookup tables and is applicable to arbitrary radices. By prescaling the operands and converting the representation of each partial remainder into partially non-redundant representation, the quotient digit can be obtained directly from the integer part of the partial remainder. This paper also discusses the design of a radix-8 fully parallel divider as an example.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121066585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
LISA-machine description language and generic machine model for HW/SW co-design 面向硬件/软件协同设计的lisa -机器描述语言和通用机器模型
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558311
V. Zivojnovic, S. Pees, Heinrich Meyr
A machine description language is presented. The language, LISA, and its generic machine model are able to produce bit- and cycle/phase-accurate processor models covering the specific needs of HW/SW codesign, and cosimulation environments. The development of a new language was necessary in order to cover the gap between coarse ISA models used in compilers, and instruction set simulators on the one hand, and detailed models used for hardware design on the other. The main part of the paper is devoted to behavioral pipeline modeling. The pipeline controller of the generic machine model is represented as an ASAP (as soon as possible) sequencer parameterized by precedence and resource constraints of operations of each instruction. The standard pipeline description based on reservation tables and Gantt charts was extended by additional operation descriptors which enable the detection of data and control hazards, and permit modeling of pipeline flushes. Using the newly introduced L-charts we reduced the parameterization of the pipeline controller to a minimum and at the same time covered typical pipeline controls found in state of the art signal processors. As an example, the application of the LISA model on the TI-TMS320C54x signal processor is presented.
提出了一种机器描述语言。LISA语言及其通用机器模型能够生成位和周期/相位精确的处理器模型,涵盖了硬件/软件协同设计和协同仿真环境的特定需求。为了弥补编译器和指令集模拟器中使用的粗略ISA模型与硬件设计中使用的详细模型之间的差距,开发一种新语言是必要的。论文的主要部分是行为管道建模。通用机器模型的流水线控制器表示为一个ASAP (as soon as possible)序列器,该序列器由每条指令操作的优先级和资源约束参数化。基于保留表和甘特图的标准管道描述被额外的操作描述符扩展,这些操作描述符能够检测数据和控制危险,并允许对管道冲洗进行建模。使用新引入的l图,我们将管道控制器的参数化减少到最小,同时涵盖了最先进信号处理器中发现的典型管道控制。最后给出了LISA模型在TI-TMS320C54x信号处理器上的应用实例。
{"title":"LISA-machine description language and generic machine model for HW/SW co-design","authors":"V. Zivojnovic, S. Pees, Heinrich Meyr","doi":"10.1109/VLSISP.1996.558311","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558311","url":null,"abstract":"A machine description language is presented. The language, LISA, and its generic machine model are able to produce bit- and cycle/phase-accurate processor models covering the specific needs of HW/SW codesign, and cosimulation environments. The development of a new language was necessary in order to cover the gap between coarse ISA models used in compilers, and instruction set simulators on the one hand, and detailed models used for hardware design on the other. The main part of the paper is devoted to behavioral pipeline modeling. The pipeline controller of the generic machine model is represented as an ASAP (as soon as possible) sequencer parameterized by precedence and resource constraints of operations of each instruction. The standard pipeline description based on reservation tables and Gantt charts was extended by additional operation descriptors which enable the detection of data and control hazards, and permit modeling of pipeline flushes. Using the newly introduced L-charts we reduced the parameterization of the pipeline controller to a minimum and at the same time covered typical pipeline controls found in state of the art signal processors. As an example, the application of the LISA model on the TI-TMS320C54x signal processor is presented.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123235121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 159
期刊
VLSI Signal Processing, IX
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1