Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors最新文献

英文中文

A VLSI architecture for image geometrical transformations using an embedded core based processor 一种使用嵌入式核心处理器进行图像几何变换的VLSI架构

Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors

Pub Date : 1997-07-14 DOI: 10.1109/ASAP.1997.606815

C. Miro, N. Darbel, R. Pacalet, Valerie Paquet

This paper presents a circuit dedicated to real time geometrical transforms of pictures. The supported transforms are third degree polynomials of two variables. The post-processing is performed by a bilinear filter. An embedded DSP core is in charge of high level, low rate, control tasks while a set of hard wired units is in charge of computing intensive low level tasks.

本文提出了一种用于图像实时几何变换的电路。支持的变换是两个变量的三次多项式。后处理由双线性滤波器完成。嵌入式DSP核心负责高电平、低速率的控制任务，而一组硬连线单元负责计算密集型的低电平任务。

引用次数: 3

A flexible VLSI architecture for variable block size segment matching with luminance correction 基于亮度校正的可变块大小分段匹配的灵活VLSI结构

Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors

Pub Date : 1997-07-14 DOI: 10.1109/ASAP.1997.606853

P. Kuhn, A. Weisgerber, Robert Poppenwimmer, W. Stechele

This paper describes a flexible 25.6 Giga operations per second exhaustive search segment matching VLSI architecture to support evolving motion estimation algorithms as well as block matching algorithms of established video coding standards. The architecture is based on a 16/spl times/16 processor element (PE) array and a 12 kbyte on-chip search area RAM and allows concurrent calculation of motion vectors for 32/spl times/32, 16/spl times/16, 8/spl times/8 and 4/spl times/4 blocks and partial quadtrees (called segments)for a +/-32 pel search range with 100% PE utilization. This architecture supports object based algorithms by excluding pixels outside of video objects from the segment matching process as well as advanced algorithms like variable blocksize segment matching with luminance correction. A preprocessing unit is included to support halfpel interpolation and pixel decimation. The VLSI has been designed using VHDL synthesis and a 0.5 /spl mu/m CMOS technology. The chip will have a clock rate of 100 MHz (min.) allowing realtime variable blocksize segment matching of 4CIF video (704/spl times/576 pel) at 15 fps or luminance corrected variable blocksize segment matching at above CIF (352/spl times/288), 15 fps resolution.

本文描述了一种灵活的每秒25.6千兆操作的详尽搜索段匹配VLSI架构，以支持不断发展的运动估计算法以及已建立的视频编码标准的块匹配算法。该架构基于16/spl倍/16处理器元素(PE)阵列和12 kb片上搜索区域RAM，允许并行计算32/spl倍/ 32,16 /spl倍/ 16,8 /spl倍/8和4/spl倍/4块的运动向量和部分四叉树(称为段)，用于+/-32倍搜索范围，100% PE利用率。该架构通过从片段匹配过程中排除视频对象之外的像素来支持基于对象的算法，以及诸如具有亮度校正的可变块大小片段匹配等高级算法。包括一个预处理单元以支持半像素插值和像素抽取。该VLSI采用VHDL合成和0.5 /spl μ m CMOS技术设计。该芯片将具有100 MHz (min)的时钟速率，允许以15 fps的速度实时匹配4CIF视频(704/spl次/576 pel)或以高于CIF (352/spl次/288)的亮度校正可变块大小段匹配，15 fps分辨率。

{"title":"A flexible VLSI architecture for variable block size segment matching with luminance correction","authors":"P. Kuhn, A. Weisgerber, Robert Poppenwimmer, W. Stechele","doi":"10.1109/ASAP.1997.606853","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606853","url":null,"abstract":"This paper describes a flexible 25.6 Giga operations per second exhaustive search segment matching VLSI architecture to support evolving motion estimation algorithms as well as block matching algorithms of established video coding standards. The architecture is based on a 16/spl times/16 processor element (PE) array and a 12 kbyte on-chip search area RAM and allows concurrent calculation of motion vectors for 32/spl times/32, 16/spl times/16, 8/spl times/8 and 4/spl times/4 blocks and partial quadtrees (called segments)for a +/-32 pel search range with 100% PE utilization. This architecture supports object based algorithms by excluding pixels outside of video objects from the segment matching process as well as advanced algorithms like variable blocksize segment matching with luminance correction. A preprocessing unit is included to support halfpel interpolation and pixel decimation. The VLSI has been designed using VHDL synthesis and a 0.5 /spl mu/m CMOS technology. The chip will have a clock rate of 100 MHz (min.) allowing realtime variable blocksize segment matching of 4CIF video (704/spl times/576 pel) at 15 fps or luminance corrected variable blocksize segment matching at above CIF (352/spl times/288), 15 fps resolution.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130253403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

New arithmetic coder/decoder architectures based on pipelining 基于流水线的新型算术编码器/解码器架构

Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors

Pub Date : 1997-07-14 DOI: 10.1109/ASAP.1997.606817

R. Osorio, J. Bruguera

In this paper we present new VLSI architectures for the arithmetic encoding and decoding of multilevel images. In these algorithms the speed is limited by their recursive natures and the arithmetic and memory access operations. They become specially critical in the case of decoding. In order to reduce the cycle length we propose working with two executions of the algorithm which alternate in the use of the pipelined hardware with a minimum increase in its cost.

在本文中，我们提出了一种新的VLSI结构，用于多电平图像的算术编码和解码。在这些算法中，速度受其递归性质以及算术和内存访问操作的限制。它们在解码的情况下变得特别重要。为了减少周期长度，我们建议使用两次执行算法，交替使用流水线硬件，以最小的成本增加。

引用次数: 9

An optimized coefficient update processor for high-throughput adaptive equalizers 用于高通量自适应均衡器的优化系数更新处理器

Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors

Pub Date : 1997-07-14 DOI: 10.1109/ASAP.1997.606857

C. Lutkemeyer, T. Noll

A processor for the adaptation of the coefficients in high throughput adaptive equalizers is presented. The accumulation operation-fundamental basis of the adaptation process-is split into two steps: A fine-grain carry-save accumulation with time sharing factor 2 collects the products of estimated error and input symbols over a block length of 16 input symbols and operates at twice the symbol rate, a master accumulator with time-sharing factor 32 collects the block-sums from 16 fine-grain accumulators, multiplies them with the adaptation constant and carries out the final vector merging operation, saturation, tap leakage and radix-4 Booth recording. Three steps to reduce the power consumption of the fine-grain accumulators is presented and evaluated for a 14-bit-wide accumulator: The suppression of one state of the redundant codes for the value "1" in the carry save digit alphabet i.e. (0, 1) or (1,0), reduces the power consumption by 5.5%; The redundancy-reduced digit alphabet can be exploited to reduce the transistor count of the following full adder by one third, resulting in a significant input capacity reduction which increases the maximum clock frequency by nearly 15% and achieves further reduction of power consumption of 2.7%. Finally an optimized sign extension logic reduces the capacitive load of the input sign bits by 70%, eliminates six of the full adders in the sign extension slices and increases the power reduction to 19.2%. The maximum clock frequency of the accumulator could be increased by 18% due to the reduced internal lends.

提出了一种用于高通量自适应均衡器中系数自适应的处理器。积累操作——适应过程的根本基础——分为两个步骤:一个时间共享因子为2的细粒度保存累加器收集估计误差与16个输入符号的乘积，并以两倍的符号速率运行，一个时间共享因子为32的主累加器从16个细粒度累加器中收集块和，将其与自适应常数相乘，并进行最终的矢量合并操作、饱和、分接泄漏和基数4 Booth记录。针对一个14位宽的累加器，提出并评估了降低细粒度累加器功耗的三个步骤:抑制进位保存数字字母表中值“1”的冗余码的一种状态，即(0,1)或(1,0)，降低功耗5.5%;可以利用减少冗余的数字字母表将以下全加法器的晶体管计数减少三分之一，从而显著减少输入容量，使最大时钟频率增加近15%，并进一步降低功耗2.7%。最后，优化的符号扩展逻辑将输入符号位的容性负载降低了70%，消除了符号扩展片中的6个全加法器，并将功耗降低到19.2%。由于减少了内部借贷，累加器的最大时钟频率可以增加18%。

{"title":"An optimized coefficient update processor for high-throughput adaptive equalizers","authors":"C. Lutkemeyer, T. Noll","doi":"10.1109/ASAP.1997.606857","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606857","url":null,"abstract":"A processor for the adaptation of the coefficients in high throughput adaptive equalizers is presented. The accumulation operation-fundamental basis of the adaptation process-is split into two steps: A fine-grain carry-save accumulation with time sharing factor 2 collects the products of estimated error and input symbols over a block length of 16 input symbols and operates at twice the symbol rate, a master accumulator with time-sharing factor 32 collects the block-sums from 16 fine-grain accumulators, multiplies them with the adaptation constant and carries out the final vector merging operation, saturation, tap leakage and radix-4 Booth recording. Three steps to reduce the power consumption of the fine-grain accumulators is presented and evaluated for a 14-bit-wide accumulator: The suppression of one state of the redundant codes for the value \"1\" in the carry save digit alphabet i.e. (0, 1) or (1,0), reduces the power consumption by 5.5%; The redundancy-reduced digit alphabet can be exploited to reduce the transistor count of the following full adder by one third, resulting in a significant input capacity reduction which increases the maximum clock frequency by nearly 15% and achieves further reduction of power consumption of 2.7%. Finally an optimized sign extension logic reduces the capacitive load of the input sign bits by 70%, eliminates six of the full adders in the sign extension slices and increases the power reduction to 19.2%. The maximum clock frequency of the accumulator could be increased by 18% due to the reduced internal lends.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"592 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116309183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Algorithm and architecture-level design space exploration using hierarchical data flows 使用分层数据流的算法和架构级设计空间探索

Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors

Pub Date : 1997-07-14 DOI: 10.1109/ASAP.1997.606833

H. P. Peixoto, M. Jacome

Incorporating algorithm and architecture level design space exploration in the early phases of the design process can have a dramatic impact on the area, speed, and power consumption of the resulting systems. This paper proposes a framework for supporting system-level design space exploration and discusses the three fundamental issues involved in effectively supporting such an early design space exploration: definition of an adequate level of abstraction; definition of good fidelity system-level metrics; and definition of mechanisms for automating the exploration process. The first issue, the definition of an adequate level of abstraction is then addressed in detail. Specifically, an algorithm-level model, an architecture-level model, and a set of operations on these models, are proposed, aiming at efficiently supporting an early, aggressive system-level design space exploration. A discussion on work in progress in the other two topics, metrics and automation, concludes the paper.

在设计过程的早期阶段整合算法和架构关卡设计空间探索可以对最终系统的面积、速度和功耗产生巨大影响。本文提出了一个支持系统级设计空间探索的框架，并讨论了有效支持这种早期设计空间探索所涉及的三个基本问题:适当抽象级别的定义;高保真度系统级指标的定义;并定义了自动化探索过程的机制。第一个问题，即适当的抽象级别的定义，然后详细讨论。具体来说，提出了算法级模型、体系结构级模型和这些模型上的一组操作，旨在有效地支持早期的、积极的系统级设计空间探索。对其他两个主题(度量和自动化)中正在进行的工作进行了讨论，从而结束了本文。

引用次数: 15

Tiling with limited resources 资源有限的瓷砖

Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors

Pub Date : 1997-07-14 DOI: 10.1109/ASAP.1997.606829

P. Calland, J. Dongarra, Y. Robert

In the framework of perfect loop nests with uniform dependences, tiling has been extensively studied as a source-to-source program transformation. Little work has been devoted to the mapping and scheduling of the tiles on to physical processors. We present several new results in the context of limited computational resources, and assuming communication-computation overlap. In particular, under some reasonable assumptions, we derive the optimal mapping and scheduling of tiles to physical processors.

在具有统一依赖关系的完美循环巢的框架中，平铺作为一种源到源的程序转换被广泛研究。很少有工作专门用于将磁贴映射和调度到物理处理器上。我们在有限的计算资源和假设通信计算重叠的情况下提出了几个新的结果。特别是，在一些合理的假设下，我们推导出贴图到物理处理器的最佳映射和调度。

引用次数: 26

Fast arithmetic and fault tolerance in the FERMI system FERMI系统的快速算法与容错

Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors

Pub Date : 1997-07-14 DOI: 10.1109/ASAP.1997.606842

L. Breveglieri, L. Dadda, V. Piuri

The FERMI is a data acquisition system for calorimetry experiments in high energy physics at the LHC, CERN. The system contains a large number of acquisition channels, with a precision of 16 bits and a sampling rate of 40 MHz. A large part of the information driven by the channels is processed locally, to reduce the amount of data. This requires to cluster several channels by adding them. The paper presents the design of a fast, low cost adder chip, based on the implementation of column compression techniques for the computation of integer addition. Since the system is operating in a radiation-hard environment, fault tolerance (namely fault detection) is implemented by means of arithmetic codes.

FERMI是欧洲核子研究中心(CERN)大型强子对撞机(LHC)用于高能物理量热实验的数据采集系统。该系统包含大量采集通道，精度为16位，采样率为40 MHz。大部分由通道驱动的信息在本地处理，以减少数据量。这需要通过添加通道来聚集多个通道。本文设计了一种基于列压缩技术的快速低成本加法器芯片，用于整数加法的计算。由于系统运行在一个辐射较强的环境中，容错(即故障检测)是通过算术编码的方式实现的。

引用次数: 2

Array placement for storage size reduction in embedded multimedia systems 嵌入式多媒体系统中用于减小存储器大小的阵列放置

Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors

Pub Date : 1997-07-14 DOI: 10.1109/ASAP.1997.606813

E. D. Greef, F. Catthoor, H. Man

In this paper we present the second stage of a two-phase strategy for reducing the required background memory sizes for a large class of data-intensive multimedia applications. This strategy is particularly useful in an embedded application context, where memory size and the corresponding power consumption are the main cost factors together with data transfers. Our strategy optimizes the storage order of arrays in memory by trying to improve the reuse of memory locations, as well for elements of the same array as for elements of different arrays. Although size reduction is the main objective, an added benefit is a reduced power consumption due to the decreased capacitive load of the memories. The memory size reduction task is part of an overall memory size and power reduction methodology called ATOMIUM in which other tasks can increase its effectiveness (e.g. loop, transformations), but it can also be used on a stand-alone base. The effectiveness of our approach is demonstrated by experimental results for some real-life multimedia applications, for which a considerable memory size reduction was obtained.

在本文中，我们提出了一种两阶段策略的第二阶段，用于减少大量数据密集型多媒体应用所需的后台内存大小。这种策略在嵌入式应用程序上下文中特别有用，在这种情况下，内存大小和相应的功耗以及数据传输是主要的成本因素。我们的策略通过尝试提高内存位置的重用来优化内存中数组的存储顺序，无论是同一数组的元素还是不同数组的元素。虽然减小尺寸是主要目标，但由于存储器的容性负载减少，因此还可以降低功耗。内存大小缩减任务是称为ATOMIUM的整体内存大小和功耗缩减方法的一部分，其中其他任务可以提高其有效性(例如循环，转换)，但它也可以在独立基础上使用。我们的方法的有效性通过一些实际多媒体应用的实验结果得到了证明，这些应用获得了相当大的内存大小减少。

{"title":"Array placement for storage size reduction in embedded multimedia systems","authors":"E. D. Greef, F. Catthoor, H. Man","doi":"10.1109/ASAP.1997.606813","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606813","url":null,"abstract":"In this paper we present the second stage of a two-phase strategy for reducing the required background memory sizes for a large class of data-intensive multimedia applications. This strategy is particularly useful in an embedded application context, where memory size and the corresponding power consumption are the main cost factors together with data transfers. Our strategy optimizes the storage order of arrays in memory by trying to improve the reuse of memory locations, as well for elements of the same array as for elements of different arrays. Although size reduction is the main objective, an added benefit is a reduced power consumption due to the decreased capacitive load of the memories. The memory size reduction task is part of an overall memory size and power reduction methodology called ATOMIUM in which other tasks can increase its effectiveness (e.g. loop, transformations), but it can also be used on a stand-alone base. The effectiveness of our approach is demonstrated by experimental results for some real-life multimedia applications, for which a considerable memory size reduction was obtained.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131716476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 62

A datapath generator for full-custom macros of iterative logic arrays 用于迭代逻辑数组的全自定义宏的数据路径生成器

Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors

Pub Date : 1997-07-14 DOI: 10.1109/ASAP.1997.606849

M. Gansen, F. Richter, O. Weiss, T. Noll

A new flexible datapath generator which allows the automated design of full-custom macros covering dedicated filter structures as well as programmable DSP cores is presented. The underlying concept combines the advantages of full-custom designs concerning power dissipation, silicon area, and throughput rate with a moderate design effort. In addition, the datapath generator can be easily included in existing semi-custom design flows. This enables highly efficient VLSI implementations of optimized full-custom macros (datapaths) embedded into synthesized standard cell designs covering uncritical structures in terms of area, power, and throughput (e.g. control paths) using common design flows. In order to demonstrate the datapath generator assisted design flow, the implementation of a time-shared correlator is presented as an example.

提出了一种新的灵活的数据路径生成器，它可以自动设计涵盖专用滤波器结构和可编程DSP内核的全定制宏。其基本概念结合了全定制设计在功耗、硅面积和吞吐率方面的优势和适度的设计努力。此外，数据路径生成器可以很容易地包含在现有的半定制设计流中。这使得高效的VLSI实现优化的全自定义宏(数据路径)嵌入到合成标准单元设计中，覆盖面积，功率和吞吐量(例如控制路径)方面的非关键结构，使用通用设计流程。为了演示数据路径生成器辅助设计流程，给出了一个分时相关器的实现示例。

引用次数: 22

Processor elements for the standard cell implementation of residue number systems 处理器元件用于标准单元实现剩余数系统

Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors

Pub Date : 1997-07-14 DOI: 10.1109/ASAP.1997.606818

A. Drolshagen, H. Henkelmann, W. Anheier

In this article processor elements for the effective implementation of standard cell circuits based on residue number systems (RNS) are presented. Two new processors are proposed helping to reduce the hardware requirements of the implementations. Following a new strategy for implementation a comparison between other circuits discussed in past prove the new method and cells to lead to faster and smaller circuits.

本文介绍了有效实现基于残数系统(RNS)的标准单元电路的处理器元件。提出了两种新的处理器，以帮助减少实现的硬件需求。在一种新的实现策略之后，与过去讨论的其他电路进行了比较，证明了新方法和细胞可以导致更快和更小的电路。

引用次数: 10

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀