2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)最新文献

英文中文

New scalable DCT computation for resource-constrained systems 资源受限系统的新型可扩展DCT计算

2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)

Pub Date : 2001-09-26 DOI: 10.1109/SIPS.2001.957356

'. StephawMietens, P. H. N. de, Christian Hentsche

The applicability of MPEG video coding can be improved by scaling the algorithmic complexity and resource usage to the desired application and device. This paper presents a new DCT computation technique of which the quality and amount of computations is maximized for a limited number of operations. For halved computing resources, about 2-4 SNR dB improvement was obtained when compared to a diagonally oriented computation of coefficients, matching with the conventional MPEG scanning.

MPEG视频编码的适用性可以通过将算法复杂度和资源使用比例调整到所需的应用程序和设备来提高。本文提出了一种新的DCT计算技术，在有限的运算次数下实现了计算质量和计算量的最大化。在计算资源减半的情况下，与传统的MPEG扫描相比，对角线方向的系数计算提高了约2-4个信噪比dB。

引用次数: 6

Tracking performance of leakage LMS for chirped signals 泄漏LMS对啁啾信号的跟踪性能

2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)

Pub Date : 2001-09-26 DOI: 10.1109/SIPS.2001.957335

L. Ting, C. Cowan, Roger Francis Woods, P. R. Cork, C. L. Sprigings

The initial frequency pass-band of the LMS filter remains whilst tracking a non-stationary chirped signal. This past memory effect causes unwanted white noise to leak through the initial residual pass-band of the adaptive filter. A leakage term is applied to the LMS algorithm to remove the memory effect of the tracking filter which leads to a reduction in the noise power at the output of the adaptive filter. This reduced noise power is reflected in an improved SNR (signal-to-noise ratio) of a low SNR chirped signal compared to the standard LMS algorithm.

LMS滤波器的初始频率通频带保持不变，同时跟踪非平稳啁啾信号。这种过去记忆效应导致不需要的白噪声通过自适应滤波器的初始残余通带泄漏。在LMS算法中加入泄漏项，消除了跟踪滤波器的记忆效应，降低了自适应滤波器输出端的噪声功率。与标准LMS算法相比，这种降低的噪声功率反映在低信噪比啁啾信号的提高信噪比(信噪比)上。

引用次数: 2

Power reduction for ASIPS: a case study asps的功耗降低:一个案例研究

2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)

Pub Date : 2001-09-26 DOI: 10.1109/SIPS.2001.957352

T. Glokler, H. Meyr

Application specific instruction set processors (ASIPs) are an excellent architecture for mixed control/data-flow oriented tasks with medium to low data rate and high complexity. The main advantage of ASIPs is the higher flexibility due to programmability compared to dedicated hardware. A drawback of this design style is an increase in power consumption. The current case study focuses on an ASIP design methodology considering the classical parameters computational performance and area as well as energy consumption simultaneously. Several ASIP power optimization options have been applied and evaluated: clock-gating, logic netlist restructuring, ISA optimization, instruction memory power reduction, and use of a dedicated coprocessor. These optimizations are demonstrated with the WORE (ISS-core) ASIP for DVB-T acquisition and tracking algorithms. The results reveal a potential of about one order of magnitude in energy savings for these optimizations.

应用特定指令集处理器(Application specific instruction set processor, asip)是一种面向混合控制/数据流任务的优秀架构，具有中低数据速率和高复杂性。与专用硬件相比，api的主要优点是由于可编程性而具有更高的灵活性。这种设计风格的缺点是增加了功耗。当前的案例研究侧重于同时考虑经典参数、计算性能和面积以及能耗的ASIP设计方法。已经应用和评估了几个ASIP功耗优化选项:时钟门控、逻辑网络列表重构、ISA优化、指令存储器功耗降低和使用专用协处理器。这些优化通过用于DVB-T采集和跟踪算法的wear (ISS-core) ASIP进行了演示。结果显示，这些优化可以节省大约一个数量级的能源。

引用次数: 14

Input sensitive high-level power analysis 输入灵敏的高级功率分析

2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)

Pub Date : 2001-09-26 DOI: 10.1109/SIPS.2001.957341

J. Hezavei, N. Vijaykrishnan, M. J. Irwin, M. Kandemir, D. Duarte

An input sensitive table based power estimation technique is proposed. The proposed technique has been applied to different circuits and validated using circuit-level simulation for 0.25 /spl mu/m, 2.5 V CMOS technology. It is observed that the proposed scheme achieves an average error margin of 3.2% as compared to HSPICE, while running 27 times faster.

提出了一种基于输入敏感表的功率估计技术。该技术已应用于不同的电路中，并通过0.25 /spl mu/m, 2.5 V CMOS技术的电路级仿真进行了验证。与HSPICE相比，该方案的平均误差为3.2%，运行速度提高了27倍。

引用次数: 4

Higher performance and lower power enhancements to VLIW architectures VLIW架构的更高性能和更低功耗增强

2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)

Pub Date : 2001-09-26 DOI: 10.1109/SIPS.2001.957342

W. Gass

Summary form only given. Architecture enhancements to the C6000 architecture have improved performance, reduced code size, lowered power, and increased compiler efficiency. Benchmarks of DSP kernels and typical DSP applications are used to compare commercially available DSP in terms of cycle count, power, and compiler efficiency. The C6000 VLIW family is an 8-issue instruction architecture that has four execution units for each of the two register banks. The C62x, first-generation processor runs at 300 MHz, has 2 multipliers, and dual 32-bit read/write ports. The 64x, second-generation processor extends the performance by increasing the speed to 600 MHz, adding 2 more multipliers and increasing the load/store width to 64-bits. In addition, the 64x adds SIMD operations to support packed data operations. The 62x is an excellent compiler target due to deterministic order and time of instruction execution, a general purpose 32-word register file, simple independent instructions, and no special modes or status bits. The 64x has improved the compiler efficiency by increasing the register file to 64 words, increasing the number of common instructions that will execute on each unit, and providing for non-aligned loads of packed data. The 64x reduces code size by decreasing the number of NOP with non-aligned program memory fetches and by adding complex instructions that combine several RISC instructions into one 32-bit opcode. The 64x reduces power by adding a 2-level on-chip cache, thereby enabling most of the memory accesses to hit the smaller first level cache. In addition, a reduction in code size decreases the number of first-level instruction fetches and the larger register file decreases the number of data memory accesses. The second-generation processor has been optimized for image, graphics, and telecommunication applications. For 2D algorithms such as 30 correlation, median filtering, motion estimation and polyphase filter, the cycle count improvements for the kernels range from 2.3x to 7.6x. For communication algorithms such as Reed Solomon decoding, Viterbi decoding and FFT, the cycle count improvements of the kernels range from 2.1 x to 3.5x.

只提供摘要形式。对C6000体系结构的改进提高了性能，减少了代码量，降低了功耗，提高了编译器效率。DSP内核和典型DSP应用的基准测试用于比较商业上可用的DSP在周期计数、功耗和编译器效率方面的差异。C6000 VLIW系列是一个有8个问题的指令体系结构，两个寄存器库各有四个执行单元。C62x是第一代处理器，运行频率为300mhz，具有2个乘法器和双32位读写端口。第二代64x处理器通过将速度提高到600 MHz，增加2个乘法器并将负载/存储宽度提高到64位来扩展性能。此外，64x增加了SIMD操作来支持打包数据操作。由于指令执行的确定顺序和时间、通用的32字寄存器文件、简单的独立指令以及没有特殊模式或状态位，62x是一个很好的编译器目标。64x通过将寄存器文件增加到64个字，增加将在每个单元上执行的通用指令的数量，并提供非对齐的打包数据加载，从而提高了编译器的效率。64x通过减少非对齐程序内存提取的NOP数量，以及通过添加将多个RISC指令组合成一个32位操作码的复杂指令来减小代码大小。64x通过增加2级片上缓存来降低功耗，从而使大多数内存访问能够访问较小的第一级缓存。此外，代码大小的减少减少了第一级指令读取的数量，更大的寄存器文件减少了数据内存访问的数量。第二代处理器针对图像、图形和电信应用进行了优化。对于2D算法，如30相关、中值滤波、运动估计和多相滤波，内核的循环计数改进范围从2.3倍到7.6倍。对于Reed Solomon解码、Viterbi解码和FFT等通信算法，内核的循环计数改进幅度在2.1到3.5倍之间。

{"title":"Higher performance and lower power enhancements to VLIW architectures","authors":"W. Gass","doi":"10.1109/SIPS.2001.957342","DOIUrl":"https://doi.org/10.1109/SIPS.2001.957342","url":null,"abstract":"Summary form only given. Architecture enhancements to the C6000 architecture have improved performance, reduced code size, lowered power, and increased compiler efficiency. Benchmarks of DSP kernels and typical DSP applications are used to compare commercially available DSP in terms of cycle count, power, and compiler efficiency. The C6000 VLIW family is an 8-issue instruction architecture that has four execution units for each of the two register banks. The C62x, first-generation processor runs at 300 MHz, has 2 multipliers, and dual 32-bit read/write ports. The 64x, second-generation processor extends the performance by increasing the speed to 600 MHz, adding 2 more multipliers and increasing the load/store width to 64-bits. In addition, the 64x adds SIMD operations to support packed data operations. The 62x is an excellent compiler target due to deterministic order and time of instruction execution, a general purpose 32-word register file, simple independent instructions, and no special modes or status bits. The 64x has improved the compiler efficiency by increasing the register file to 64 words, increasing the number of common instructions that will execute on each unit, and providing for non-aligned loads of packed data. The 64x reduces code size by decreasing the number of NOP with non-aligned program memory fetches and by adding complex instructions that combine several RISC instructions into one 32-bit opcode. The 64x reduces power by adding a 2-level on-chip cache, thereby enabling most of the memory accesses to hit the smaller first level cache. In addition, a reduction in code size decreases the number of first-level instruction fetches and the larger register file decreases the number of data memory accesses. The second-generation processor has been optimized for image, graphics, and telecommunication applications. For 2D algorithms such as 30 correlation, median filtering, motion estimation and polyphase filter, the cycle count improvements for the kernels range from 2.3x to 7.6x. For communication algorithms such as Reed Solomon decoding, Viterbi decoding and FFT, the cycle count improvements of the kernels range from 2.1 x to 3.5x.","PeriodicalId":246898,"journal":{"name":"2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129890453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Index-based RNS DWT architectures for custom IC designs 用于定制IC设计的基于索引的RNS DWT架构

2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)

Pub Date : 2001-09-26 DOI: 10.1109/SIPS.2001.957332

J. Ramírez, P. G. Fernández, U. Meyer-Base, F. Taylor, A. García, A. Lloris

The design of high-performance, high-precision, real-time digital signal processing (DSP) systems, such as those associated with wavelet signal processing, is a challenging problem. This paper reports on the innovative use of the residue number system (RNS) for implementing high-end wavelet filter banks. The disclosed system uses an enhanced index-transformation defined over Galois fields to efficiently support different wavelet filter instantiations without adding any extra cost or additional lookup tables (LUT). An exhaustive comparison against existing two's complement (2C) designs for different custom IC technologies was carried out. These structures have been demonstrated to be well suited for field programmable logic (FPL) assimilation as well as for CBIC (cell-based integrated circuit) technologies.

设计高性能、高精度、实时的数字信号处理(DSP)系统，特别是与小波信号处理相关的DSP系统，是一个具有挑战性的问题。本文报道了残数系统(RNS)在实现高端小波滤波器组中的创新应用。公开的系统使用在伽罗瓦字段上定义的增强索引转换来有效地支持不同的小波滤波器实例化，而不增加任何额外的成本或额外的查找表(LUT)。针对不同的定制IC技术，对现有的两个互补(2C)设计进行了详尽的比较。这些结构已被证明非常适合于现场可编程逻辑(FPL)同化以及CBIC(基于单元的集成电路)技术。

引用次数: 13

Optimization of emerging H.26L video encoder 新兴H.26L视频编码器的优化

2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)

Pub Date : 2001-09-26 DOI: 10.1109/SIPS.2001.957368

V. Lappalainen, A. Hallapuro, T. Hamalainen

Two optimized implementations of the emerging ITU-T H.26L video encoder are described. The first, medium-optimized version, is implemented in C and the latter, highly optimized version, utilizes MMX assembly instructions. Comparisons to a correspondingly optimized H.263/H.263+ implementation are given with the spatial and temporal video quality fixed and the bit rate and complexity varied. On a 733 Pentium III processor, a real-time encoding speed of 10 fps for QCIF (quarter common intermediate format) sequences is achieved with a 29% reduction in bit rate compared to H.263+. The complexity of H.26L is about 3.4 times more than that of H.263+.

描述了新兴的ITU-T H.26L视频编码器的两种优化实现。第一个中等优化的版本是用C语言实现的，而第二个高度优化的版本则利用了MMX汇编指令。与相应优化的H.263/H的比较。在固定的空间和时间视频质量，可变的比特率和复杂度的情况下，给出了263+的实现。在733 Pentium III处理器上，QCIF(四分之一通用中间格式)序列的实时编码速度达到10 fps，比特率比H.263+降低了29%。H.26L的复杂度大约是H.263+的3.4倍。

引用次数: 17

A multi-level block priority based instruction caching scheme for multimedia processors 多媒体处理器基于多级块优先级的指令缓存方案

2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)

Pub Date : 2001-09-26 DOI: 10.1109/SIPS.2001.957338

Jiyang Kang, Wonyong Sung

A new instruction caching scheme that utilizes the block priority information is proposed mainly targeted for embedded multimedia processors. The block priority information is obtained by profiling application programs. The goal of this caching scheme is to keep more important code blocks longer using the block priority information, which programmers provide by analyzing the profiling results of multimedia applications. In addition to a new caching scheme, algorithms for determining the priority of each code block statically are also developed and their performances are evaluated using an H.263 video encoder. The experimental results show that the cache miss ratio can be reduced up to nearly a half of that of the normal least recently used (LRU) replacement scheme although the improvement depends on the cache size. The effects of varying cache size, associativity, and line size on the performance of proposed prioritization methods are also investigated. Moreover, the performance gain that can be achieved by employing more than two priority levels is also discussed.

针对嵌入式多媒体处理器，提出了一种利用块优先级信息的指令缓存方案。块优先级信息是通过分析应用程序获得的。这种缓存方案的目标是使用块优先级信息将更重要的代码块保存更长时间，这些信息是程序员通过分析多媒体应用程序的分析结果提供的。除了一种新的缓存方案外，还开发了静态确定每个码块优先级的算法，并使用H.263视频编码器对其性能进行了评估。实验结果表明，该方法可以将缓存丢失率降低到常规LRU (least recently used)替换方案的近一半，尽管这种改进取决于缓存大小。不同的缓存大小，结合性和行大小对提出的优先级方法的性能的影响也进行了研究。此外，还讨论了采用两个以上优先级所能获得的性能增益。

引用次数: 5

An MPEG-4 Twin-VQ based high quality audio codec design 基于MPEG-4双vq的高质量音频编解码器设计

2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)

Pub Date : 2001-09-26 DOI: 10.1109/SIPS.2001.957359

Y. Hwang, Nan-Jung Liu, Ming-Chang Tsai

This paper presents a high quality audio codec design based on a transform-domain weighted interleave vector quantization (Twin-VQ) scheme adopted in the MPEG-4 audio standard. Three novel techniques are employed in this scheme to compress the data, ie, (1) flattening of MDCT coefficients by the spectrum of linear predictive coding (LPC) coefficients; (2) further flattening of MDCT coefficients by the Bark envelope; and (3) weighted interleave vector quantization. This paper examines the related design issues in implementing an efficient Twin-VQ codec. Fast computation algorithms are derived for the computationally intensive modules. Design parameters of each module are determined and the codebooks for weighted interleave vector quantization are constructed. Experimental results show that the designed codec can compress natural audio efficiently and reproduce high quality outputs.

本文提出了一种基于MPEG-4音频标准中采用的变换域加权交错矢量量化(Twin-VQ)方案的高质量音频编解码器设计。该方案采用了三种新技术来压缩数据，即:(1)利用线性预测编码(LPC)系数谱对MDCT系数进行平坦化;(2)树皮包络层进一步平坦化MDCT系数;(3)加权交织矢量量化。本文探讨了实现高效双vq编解码器的相关设计问题。针对计算量大的模块，提出了快速计算算法。确定了各模块的设计参数，构造了加权交织矢量量化码本。实验结果表明，所设计的编解码器能够有效地压缩自然音频，并产生高质量的输出。

引用次数: 1

200 Mbit/s 4-symbol arithmetic encoder architecture for embedded zero tree-based compression 用于嵌入式零树压缩的200mbit /s 4符号算术编码器结构

2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)

Pub Date : 2001-09-26 DOI: 10.1109/SIPS.2001.957367

R. Osorio, B. Vanhoof

In state-of-the-art multimedia compression standards, arithmetic coding is widely used as a powerful entropy compression method. In the MPEG-4 standard a specific 4-symbol, multiple-context arithmetic coder is used for wavelet based image compression. We present an architecture capable of processing close to 1 symbol per cycle, managing a multiple context in a simple, yet cost-efficient manner. A peak performance of 200 Mbit/s is achieved when clocking this architecture at 100 MHz.

在当前的多媒体压缩标准中，算术编码作为一种强大的熵压缩方法被广泛采用。在MPEG-4标准中，一种特定的4符号多上下文算术编码器用于基于小波的图像压缩。我们提出了一个架构，每个周期能够处理接近1个符号，以一种简单而经济高效的方式管理多个上下文。当该架构的时钟频率为100mhz时，峰值性能可达200mbit /s。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀