Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)最新文献

英文中文

Memory-Based List Updating for List Sphere Decoders 基于内存的列表更新列表球体解码器

Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)

Pub Date : 2007-11-21 DOI: 10.1109/SIPS.2007.4387623

P. Salmela, J. Antikainen, O. Silvén, J. Takala

Symbol detection with list sphere decoder (LSD) is an emerging technology targeted on multiple-input multiple-output (MIMO) telecommunication systems. The LSD algorithm requires maintaining a list of candidate symbols with shortest Euclidean distances to the received symbol. For energy efficiency, memory-based list is preferred over registers with long list lengths. In this paper, two hardware units for alleviating processing of such lists are presented. The list is stored as a heap in the memory and the proposed list updating units are incorporated with application specific processors. With presented principles, the number of clock cycles per list insertion gets very close to the theoretical lower bound with heap data structure.

基于表球解码器的符号检测是一项针对多输入多输出通信系统的新兴技术。LSD算法需要维护一个候选符号列表，该候选符号与接收到的符号具有最短的欧几里得距离。为了能源效率，基于内存的列表优于具有长列表长度的寄存器。在本文中，提出了两个硬件单元来减轻这类列表的处理。列表以堆的形式存储在内存中，建议的列表更新单元与特定于应用程序的处理器结合在一起。根据所提出的原理，每次列表插入的时钟周期数非常接近堆数据结构的理论下界。

引用次数: 5

32-Parallel SAD Tree Hardwired Engine for Variable Block Size Motion Estimation in HDTV1080P Real-Time Encoding Application 用于HDTV1080P实时编码中可变块大小运动估计的32并行SAD树硬连线引擎

Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)

Pub Date : 2007-11-21 DOI: 10.1109/SIPS.2007.4387630

Zhenyu Liu, Yang Song, Ming Shao, Shen Li, Lingfeng Li, S. Goto, T. Ikenaga

H.264/AVC coding standard incorporates variable block size (VBS) motion estimation (ME) to improve the compression efficiency. For HDTV-1080p application, the massive computation and huge memory bandwidth by the large video frame size and the wide search range are two critical impediments to the real-time hardwired VB-SME engine design. In this paper, we present six techniques to circumvent these difficulties. First, the inter modes bellow 8 × 8 are eliminated in our design to reduce the hardware cost. Second, the low-pass filter based 4:1 down-sampling algorithm successfully reduces about 75% arithmetic computation in each search position. Third, the coarse to fine search scheme is made use of to reduce 25%-50% search candidates. Fourth, C+ memory organization is adopted to reduce the external IO bandwidth. Fifth, horizontal zigzag scan mode optimizes the search window memories. Finally, in circuit design, 4:2 compressor based CSA tree, multi-cycle path delay and 2 pipeline stage SAD tree techniques are utilized to improve the speed and reduce the hardware of each SAD tree. The hardwired integer motion estimation (IME) engine with 192 × 128 search range for HDTVl080p@30Hz is demonstrated in this paper. With TSMC 0.18¿m 1P6M CMOS technology, it is implemented with 485.7k gates standard cells and 327.68k bit on chip memories. The power dissipation is 729mw at 200MHz clock speed.

H.264/AVC编码标准引入了可变块大小(VBS)运动估计(ME)，提高了压缩效率。在HDTV-1080p应用中，大视频帧大小和大搜索范围所带来的庞大计算量和巨大的内存带宽是实时硬连线VB-SME引擎设计的两个关键障碍。在本文中，我们提出了六种技术来克服这些困难。首先，我们的设计消除了8 × 8以下的交互模式，以降低硬件成本。其次，基于低通滤波器的4:1降采样算法成功地减少了每个搜索位置约75%的算术计算。第三，采用粗到细的搜索方案，将候选搜索对象减少25% ~ 50%。第四，采用c++内存组织，减少外部IO带宽。第五，水平之字形扫描模式优化了搜索窗口存储器。最后，在电路设计中，采用了基于4:2压缩机的CSA树、多周期路径延迟和2管道级SAD树技术，提高了速度，减少了每个SAD树的硬件。本文演示了一种192 × 128搜索范围的硬连线整数运动估计引擎(IME)。采用台积电0.18¿m 1P6M CMOS技术，采用485.7k栅极标准单元和327.68k位片上存储器。在200MHz时钟速度下，功耗为729mw。

{"title":"32-Parallel SAD Tree Hardwired Engine for Variable Block Size Motion Estimation in HDTV1080P Real-Time Encoding Application","authors":"Zhenyu Liu, Yang Song, Ming Shao, Shen Li, Lingfeng Li, S. Goto, T. Ikenaga","doi":"10.1109/SIPS.2007.4387630","DOIUrl":"https://doi.org/10.1109/SIPS.2007.4387630","url":null,"abstract":"H.264/AVC coding standard incorporates variable block size (VBS) motion estimation (ME) to improve the compression efficiency. For HDTV-1080p application, the massive computation and huge memory bandwidth by the large video frame size and the wide search range are two critical impediments to the real-time hardwired VB-SME engine design. In this paper, we present six techniques to circumvent these difficulties. First, the inter modes bellow 8 × 8 are eliminated in our design to reduce the hardware cost. Second, the low-pass filter based 4:1 down-sampling algorithm successfully reduces about 75% arithmetic computation in each search position. Third, the coarse to fine search scheme is made use of to reduce 25%-50% search candidates. Fourth, C+ memory organization is adopted to reduce the external IO bandwidth. Fifth, horizontal zigzag scan mode optimizes the search window memories. Finally, in circuit design, 4:2 compressor based CSA tree, multi-cycle path delay and 2 pipeline stage SAD tree techniques are utilized to improve the speed and reduce the hardware of each SAD tree. The hardwired integer motion estimation (IME) engine with 192 × 128 search range for HDTVl080p@30Hz is demonstrated in this paper. With TSMC 0.18¿m 1P6M CMOS technology, it is implemented with 485.7k gates standard cells and 327.68k bit on chip memories. The power dissipation is 729mw at 200MHz clock speed.","PeriodicalId":93225,"journal":{"name":"Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)","volume":"28 1","pages":"675-680"},"PeriodicalIF":0.0,"publicationDate":"2007-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84260530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Sphere Decoding for Multiprocessor Architectures 多处理器架构的球体解码

Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)

Pub Date : 2007-11-21 DOI: 10.1109/SIPS.2007.4387516

Q. Qi, C. Chakrabarti

Motivated by the need for high throughput sphere decoding for multiple-input-multiple-output (MIMO) communication systems, we propose a parallel depth-first sphere decoding (PDSD) algorithm that provides the advantages of both parallel processing and rapid search space reduction. The PDSD algorithm is designed for efficient implementation on programmable multi-processor platforms. We investigate the trade-off between the throughput and computation overhead when the number of processing elements is 2, 4 and 8, for a 4 × 4 16-QAM system across a wide range of SNR conditions. Through simulation, we show that PDSD can offer significant throughput improvement without incurring substantial computation overhead by selecting the appropriate number of processing elements according to specific SNR conditions.

针对多输入多输出(MIMO)通信系统对高吞吐量球解码的需求，提出了一种并行深度优先球解码(PDSD)算法，该算法具有并行处理和快速压缩搜索空间的优点。PDSD算法是为了在可编程多处理器平台上高效实现而设计的。我们研究了4 × 4 16-QAM系统在广泛的信噪比条件下，当处理元素的数量为2,4和8时，吞吐量和计算开销之间的权衡。通过仿真，我们表明，通过根据特定的信噪比条件选择适当数量的处理元素，PDSD可以在不产生大量计算开销的情况下提供显着的吞吐量改进。

引用次数: 5

Efficient VLSI Design of Modulo 2n-1 Adder Using Hybrid Carry Selection 基于混合进位选择的模2n-1加法器高效VLSI设计

Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)

Pub Date : 2007-11-21 DOI: 10.1109/SIPS.2007.4387534

Su-Hon Lin, M. Sheu, K. Wang, Jun-Jie Zhu, Si-Ying Chen

A novel Hybrid-Carry-Selection (HCS) approach used for deriving an efficient modulo 2n-1 addition is presented in this study. Its resulting adder architecture which is mainly built by modified carry look-ahead adder (MCLA), carry prediction unit and simple multiplexer (MUX) is simple and regular for all n values. For VLSI implementation based on 180nm standard-cell technology, the HCS-based modulo 2n-1 adder demonstrates the superiority in AreaxTime (AT) performance over those of the latest existing solutions. The layout area and clock rate for HCS-based 216-1 modular adder chip are 25709 um2 and 518MHz respectively.

本研究提出了一种新的混合携带选择(HCS)方法，用于推导有效的模2n-1加法。其所得的加法器结构简单，对所有n个值都具有规则性，主要由改进进位预判加法器(MCLA)、进位预测单元和简单复用器(MUX)组成。对于基于180nm标准单元技术的VLSI实现，基于hcs的模2n-1加法器在AreaxTime (AT)性能上优于现有的最新解决方案。基于hcs的216-1模块加法器芯片的布局面积为25709 um2，时钟速率为518MHz。

引用次数: 2

Vector Quantization-Block Constrained Trellis Coded Quantization of Speech Line Spectral Frequencies 语音谱频率的矢量量化-块约束网格编码量化

Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)

Pub Date : 2007-11-21 DOI: 10.1109/SIPS.2007.4387520

Jungeun Park, Yanghee Won, Sangwon Kang

In this paper, a vector quantization-block constrained trellis coded quantization (VQ-BCTCQ) is presented to quantize line spectrum frequency (LSF) parameters of the wideband speech codec. Both the predictive structure and safety-net concept are combined into VQ-BCTCQ to develop the predictive VQ-BCTCQ. The performance of this quantization is compared with that of the linear predictive coding (LPC) vector quantizer used in the AMR-WB codec, and reductions in spectral distortion (SD) and encoding complexity are demonstrated.

本文提出了一种矢量量化-块约束网格编码量化(VQ-BCTCQ)方法来量化宽带语音编解码器的线谱频率参数。将预测结构和安全网概念结合到VQ-BCTCQ中，开发出预测型VQ-BCTCQ。与AMR-WB编解码器中使用的线性预测编码(LPC)矢量量化器的性能进行了比较，并证明了该量化器在频谱失真(SD)和编码复杂度方面的降低。

引用次数: 0

Design of Low-Power Memory-Efficient Viterbi Decoder 低功耗高效存储维特比解码器的设计

Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)

Pub Date : 2007-11-21 DOI: 10.1109/SIPS.2007.4387532

Lupin Chen, Jinjin He, Zhongfeng Wang

This paper presents a new low-power memory-efficient trace-back (TB) scheme for high constraint length Viterbi decoder (VD). With the trace-back modifications and path merging techniques, up to 50% memory read operations in the survivor memory unit (SMU) can be reduced. The memory size of SMU can be reduced by 33% and the decoding latency can be reduced by 14%. The simulation results show that compared to the conventional TB scheme, the performance loss of this scheme is negligible.

针对高约束长度维特比解码器(VD)，提出了一种新的低功耗高效存储回溯(TB)方案。通过回溯修改和路径合并技术，可以减少幸存者内存单元(SMU)中多达50%的内存读取操作。SMU的内存大小可以减少33%，解码延迟可以减少14%。仿真结果表明，与传统的TB方案相比，该方案的性能损失可以忽略不计。

引用次数: 4

Wavelet Based Lossless Video Compression Using Motion Compensated Temporal Filtering 基于运动补偿时间滤波的小波无损视频压缩

Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)

Pub Date : 2007-11-21 DOI: 10.1109/SIPS.2007.4387632

Cheng-Chen Lin, Y. Hwang, Kwan-Hsun Tseng, Shao-Wen Chen

In the paper we propose a new lossless video coding system utilizing both Motion Compensated Temporal Filtering (MCTF) and integer wavelet transform techniques to explore data redundancy in both spatial and temporal domain, respectively. We elaborate the design issues such MCTF scheme, filter selection, group of picture size, wavelet coefficient coding, and develop an efficient coding system. Simulation results show that the proposed system using 1/3 filter has the best coding performance. The bit rate saving is over 20% compared with lossless still image coder such as JPEG-LS. The saving is also around 10% when compared with other wavelet based lossless video coder.

在本文中，我们提出了一种新的无损视频编码系统，利用运动补偿时间滤波(MCTF)和整数小波变换技术分别探索空间和时间域的数据冗余。我们详细阐述了MCTF方案、滤波器选择、图像组大小、小波系数编码等设计问题，并开发了一个高效的编码系统。仿真结果表明，采用1/3滤波器的编码系统具有最佳的编码性能。与JPEG-LS等无损静态图像编码器相比，比特率节省20%以上。与其他基于小波的无损视频编码器相比，节省了10%左右。

引用次数: 1

Reconfigurable Color Doppler DSP Engine for High-Frequency Ultrasonic Imaging Systems 用于高频超声成像系统的可重构彩色多普勒DSP引擎

Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)

Pub Date : 2007-11-21 DOI: 10.1109/SIPS.2007.4387542

T. Yu, Shih-Yu Sun, Chih-Liang Ding, Pai-Chi Li, A. Wu

A single-chip reconfigurable Color Doppler DSP engine is presented. It acts as the computation kernel of the high-frequency ultrasonic imaging system under development. The flexibility of the proposed DSP engine enables users to acquire sufficient information as needed, while the specificity of the hardware compared to general-purpose processors reduces cost and power consumption. This chip is implemented by TSMC 0.18 ¿m 1P6M CMOS technology. The die size is 2.94*2.94 mm2, and the power consumption is 184 mW when frame rate = 50, frame size = 512*256, and packet size = 8.

提出了一种单片可重构的彩色多普勒DSP引擎。它是正在开发的高频超声成像系统的计算核心。所提出的DSP引擎的灵活性使用户能够根据需要获取足够的信息，同时与通用处理器相比，硬件的特殊性降低了成本和功耗。该芯片采用台积电0.18¿m 1P6M CMOS技术实现。当帧率为50，帧大小为512*256，包大小为8时，芯片尺寸为2.94*2.94 mm2，功耗为184 mW。

引用次数: 8

A Quality Scalable H.264/AVC Baseline Intra Encoder for High Definition Video Applicaitons 高清视频应用的高质量可扩展H.264/AVC基线内部编码器

Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)

Pub Date : 2007-11-21 DOI: 10.1109/SIPS.2007.4387602

Chun-Hao Chang, Jia-Wei Chen, Hsiu-Cheng Chang, Yao-Chang Yang, Jinn-Shyan Wang, Jiun-In Guo

In this paper, we propose a quality scalable H.264/AVC baseline intra encoder with two hardware sharing mechanisms and three timing optimizing schemes. The proposed hardware sharing schemes share the common terms among intra prediction of different modes to reduce the hardware cost. The proposed timing optimizing schemes are used to improve the data throughput rate. The proposed design supports different clock rates of 26/33/47 MHz and 70/85 MHz to encode SD and HD720 video sequences with 30fps respectively with different qualities. According to a 0.13¿m CMOS technology, the proposed design costs 170K gates and 4.43 KB of internal SRAM at clock rate of 130MHz.

本文提出了一种具有两种硬件共享机制和三种时序优化方案的高质量可扩展H.264/AVC基线内编码器。所提出的硬件共享方案在不同模式的内部预测之间共享公共项，以降低硬件成本。采用所提出的时序优化方案提高了数据吞吐率。本设计支持26/33/47 MHz和70/85 MHz的不同时钟频率，分别对30fps的SD和HD720视频序列进行不同质量的编码。根据0.13 μ m CMOS技术，提出的设计成本为170K门和4.43 KB内部SRAM，时钟速率为130MHz。

引用次数: 17

Fast EBCOT Encoder Architecture for JPEG 2000 JPEG 2000的快速EBCOT编码器架构

Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)

Pub Date : 2007-11-21 DOI: 10.1109/SIPS.2007.4387616

Somya Rathi, Zhongfeng Wang

Embedded Block Coding with optimized Truncation (EBCOT) is a very computation and hardware intensive algorithm. It consumes more than 50 percent processing time of JPEG2000 encoding system. In this paper, we present a new algorithm and architecture of Block Coder based on serial mode in JPEG2000. It processes two bit planes simultaneously along with the encoding of four bits of a stripe concurrently. The architecture is capable of encoding in the causal mode of the standard. The paper also describes a variant of pass switching arithmetic encoder which further reduces the computation time of tier 1 with minimal increase in hardware. The proposed architecture not only saves memory by 4K bits but also significantly increases the throughput. It is estimated that the throughput can be increased by over 50%. In addition, the new architecture also reduces memory access.

优化截断嵌入式分组编码(EBCOT)是一种计算量和硬件强度都很高的算法。它消耗了JPEG2000编码系统50%以上的处理时间。本文提出了一种基于JPEG2000串行模式的分组编码器的新算法和结构。它同时处理两个位平面，同时对一个条带的四个位进行编码。该体系结构能够以标准的因果模式进行编码。本文还介绍了一种改进型的通道交换算法编码器，该编码器在最小的硬件增加的情况下进一步减少了第1层的计算时间。该架构不仅节省了4K位的内存，而且显著提高了吞吐量。据估计，吞吐量可以提高50%以上。此外，新架构还减少了内存访问。

引用次数: 4

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀