首页 > 最新文献

VLSI Signal Processing, IX最新文献

英文 中文
A chip set for a ray-casting engine 用于光线投射引擎的芯片
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558335
G. Hekstra, E. Deprettere
Rendering artificial scenes is an appealing example of a class of problems leading to complex data dependent algorithms for which efficient software/hardware mapping techniques have to be envisaged. We present one of the ASICs in our rendering system to illustrate our design methodology in more detail. The first step in the algorithm-architecture design is to reformulate an existing naive algorithm in such a way that, as much as possible, only significant operations are performed. The resulting algorithm has a nested loop structure, with non-manifest, data-dependent loop bounds, rendering classical techniques for parallelisation useless. The second step is to greatly reduce the overall computation time of the algorithm by reducing the computational complexity of the innermost loop operation. The third and last step is to map this algorithm on a pipelined architecture, where the pipeline stages-functional units within an ASIC-implement different loop levels. Due to the data dependent nature, the functional units that implement the parts of the loops are time-varying with regard to both execution time and in how much data is produced for the following pipeline stages. Since the execution times of the various pipeline stages are changing, so does the location of the bottleneck over time. Hence the goal is not to keep all pipeline stages continually busy, but to keep the throughput at the most critical innermost loop operation as high as possible.
渲染人工场景是导致复杂数据依赖算法的一类问题的一个吸引人的例子,必须设想有效的软件/硬件映射技术。我们展示了渲染系统中的一个asic,以更详细地说明我们的设计方法。算法架构设计的第一步是重新制定现有的朴素算法,使其尽可能只执行重要的操作。生成的算法具有嵌套循环结构,具有非明显的、依赖数据的循环边界,使得传统的并行化技术毫无用处。第二步是通过降低最内层循环操作的计算复杂度来大大减少算法的整体计算时间。第三步也是最后一步是将该算法映射到流水线架构上,其中流水线阶段(asic中的功能单元)实现不同的循环级别。由于数据依赖的性质,实现循环部分的功能单元在执行时间和为以下管道阶段产生的数据量方面都是时变的。由于各个管道阶段的执行时间都在变化,因此瓶颈的位置也会随着时间的推移而变化。因此,我们的目标不是让所有管道阶段都持续忙碌,而是在最关键的最内层循环操作中保持尽可能高的吞吐量。
{"title":"A chip set for a ray-casting engine","authors":"G. Hekstra, E. Deprettere","doi":"10.1109/VLSISP.1996.558335","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558335","url":null,"abstract":"Rendering artificial scenes is an appealing example of a class of problems leading to complex data dependent algorithms for which efficient software/hardware mapping techniques have to be envisaged. We present one of the ASICs in our rendering system to illustrate our design methodology in more detail. The first step in the algorithm-architecture design is to reformulate an existing naive algorithm in such a way that, as much as possible, only significant operations are performed. The resulting algorithm has a nested loop structure, with non-manifest, data-dependent loop bounds, rendering classical techniques for parallelisation useless. The second step is to greatly reduce the overall computation time of the algorithm by reducing the computational complexity of the innermost loop operation. The third and last step is to map this algorithm on a pipelined architecture, where the pipeline stages-functional units within an ASIC-implement different loop levels. Due to the data dependent nature, the functional units that implement the parts of the loops are time-varying with regard to both execution time and in how much data is produced for the following pipeline stages. Since the execution times of the various pipeline stages are changing, so does the location of the bottleneck over time. Hence the goal is not to keep all pipeline stages continually busy, but to keep the throughput at the most critical innermost loop operation as high as possible.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114874163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Scalability of 2-D wavelet transform algorithms: analytical and experimental results on coarse-grained parallel computers 二维小波变换算法的可扩展性:在粗粒度并行计算机上的分析和实验结果
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558370
Jamshed N. Pately, Ashfaq A. Khokharz, Leah H. Jamiesony
We present analytical and experimental results for the scalability of 2-D discrete wavelet transform algorithms on coarse-grained parallel architectures. The principal operation in the 2-D DWT is the filtering operation used to implement the filter banks of the 2-D subband decomposition. We derive analytical results comparing time domain and frequency domain parallel algorithms for realizing the filter banks. Experiments on the Intel Paragon validate the analytical results. We demonstrate that there exist combinations of the machine size, image size, and wavelet size for which the time-domain algorithms outperform the frequency domain algorithms, and vice-versa.
我们给出了二维离散小波变换算法在粗粒度并行结构上的可扩展性的分析和实验结果。二维DWT中的主要操作是用于实现二维子带分解的滤波器组的滤波操作。给出了实现滤波器组的时域和频域并行算法的比较分析结果。在Intel Paragon上的实验验证了分析结果。我们证明存在机器大小,图像大小和小波大小的组合,其中时域算法优于频域算法,反之亦然。
{"title":"Scalability of 2-D wavelet transform algorithms: analytical and experimental results on coarse-grained parallel computers","authors":"Jamshed N. Pately, Ashfaq A. Khokharz, Leah H. Jamiesony","doi":"10.1109/VLSISP.1996.558370","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558370","url":null,"abstract":"We present analytical and experimental results for the scalability of 2-D discrete wavelet transform algorithms on coarse-grained parallel architectures. The principal operation in the 2-D DWT is the filtering operation used to implement the filter banks of the 2-D subband decomposition. We derive analytical results comparing time domain and frequency domain parallel algorithms for realizing the filter banks. Experiments on the Intel Paragon validate the analytical results. We demonstrate that there exist combinations of the machine size, image size, and wavelet size for which the time-domain algorithms outperform the frequency domain algorithms, and vice-versa.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125344572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
New motion estimation using low-resolution quantization for MPEG2 video encoding 基于低分辨率量化的MPEG2视频编码新运动估计
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558375
Seongsoo Lee, Jeong-Min Kim, S. Chae
We propose a new algorithm of real-time motion estimation for MPEG2 video encoding. It reduces the computational cost by using low bit-resolution quantization and new matching criterion. To maintain the performance, we employed a low-resolution search followed by a full-resolution search. Simulation results show that the proposed algorithm requires 1/17.4 computational cost while maintaining the performance degradation less than 0.37 dB with respect to the full search algorithm for -32.0/spl sim/+31.5 search range in the CCIR601 image. The architecture for the real-time MPEG2 motion estimator using this algorithm is also explained. It searches concurrently two prediction modes for -32.0/spl sim/+31.5 search range. Its hardware complexity is estimated to about 100,000 gates of random logic and 90 Kbits of SRAM. A VLSI design of the proposed architecture is in progress using a 0.5 /spl mu/m triple-metal CMOS standard-cell technology.
提出了一种用于MPEG2视频编码的实时运动估计算法。采用了低比特分辨率量化和新的匹配准则,降低了计算量。为了保持性能,我们先进行低分辨率搜索,然后再进行全分辨率搜索。仿真结果表明,在CCIR601图像的-32.0/spl sim/+31.5搜索范围内,与全搜索算法相比,该算法的计算成本为1/17.4,性能下降小于0.37 dB。并给出了基于该算法的实时MPEG2运动估计器的结构。它同时搜索-32.0/spl sim/+31.5搜索范围的两种预测模式。其硬件复杂性估计约为100,000个随机逻辑门和90 kb的SRAM。采用0.5 /spl mu/m三金属CMOS标准电池技术的拟议架构的VLSI设计正在进行中。
{"title":"New motion estimation using low-resolution quantization for MPEG2 video encoding","authors":"Seongsoo Lee, Jeong-Min Kim, S. Chae","doi":"10.1109/VLSISP.1996.558375","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558375","url":null,"abstract":"We propose a new algorithm of real-time motion estimation for MPEG2 video encoding. It reduces the computational cost by using low bit-resolution quantization and new matching criterion. To maintain the performance, we employed a low-resolution search followed by a full-resolution search. Simulation results show that the proposed algorithm requires 1/17.4 computational cost while maintaining the performance degradation less than 0.37 dB with respect to the full search algorithm for -32.0/spl sim/+31.5 search range in the CCIR601 image. The architecture for the real-time MPEG2 motion estimator using this algorithm is also explained. It searches concurrently two prediction modes for -32.0/spl sim/+31.5 search range. Its hardware complexity is estimated to about 100,000 gates of random logic and 90 Kbits of SRAM. A VLSI design of the proposed architecture is in progress using a 0.5 /spl mu/m triple-metal CMOS standard-cell technology.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126709017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Low power parallel multipliers 低功率并行乘法器
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558332
Edwin de Angel, Earl E. Swartzlander
This paper presents and compares sign extension techniques used to decrease the switching activity and improve the performance of parallel multipliers. A detailed review of different sign extension schemes is presented and an improved scheme for reducing the power dissipation is proposed. Four parallel CMOS multipliers designed in 0.6 /spl mu/m technology are used to implement and compare the sign extension schemes.
本文介绍并比较了用于减少开关活动和提高并行乘法器性能的符号扩展技术。对不同的符号扩展方案进行了详细的回顾,并提出了一种降低功耗的改进方案。采用4个以0.6 /spl mu/m技术设计的并行CMOS乘法器来实现和比较符号扩展方案。
{"title":"Low power parallel multipliers","authors":"Edwin de Angel, Earl E. Swartzlander","doi":"10.1109/VLSISP.1996.558332","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558332","url":null,"abstract":"This paper presents and compares sign extension techniques used to decrease the switching activity and improve the performance of parallel multipliers. A detailed review of different sign extension schemes is presented and an improved scheme for reducing the power dissipation is proposed. Four parallel CMOS multipliers designed in 0.6 /spl mu/m technology are used to implement and compare the sign extension schemes.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127896696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
ComBox: library-based generation of VHDL modules ComBox:基于库的VHDL模块生成
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558362
M. Vaupel, T. Grotker, H. Meyr
We describe the automated generation of components for high throughput data-flow dominated VLSI-systems in digital communications. By means of a hierarchically organized library both behavioural models with high simulation efficiency and corresponding hardware generators that produce sophisticated VHDL descriptions are made easily accessible to the system designer. The structured approach allows the evaluation of the trade-offs between alternatives at each design step and guarantees a fast and reliable design flow towards hardware. The design environment ComBox enhances reusability and enables rapid implementation of complex systems starting from a system level description.
我们描述了数字通信中高吞吐量数据流主导的vlsi系统的自动生成组件。通过层次化组织的库,系统设计人员可以很容易地访问具有高仿真效率的行为模型和生成复杂VHDL描述的相应硬件生成器。结构化方法允许在每个设计步骤中评估备选方案之间的权衡,并保证快速可靠的硬件设计流程。设计环境ComBox增强了可重用性,并且能够从系统级描述开始快速实现复杂系统。
{"title":"ComBox: library-based generation of VHDL modules","authors":"M. Vaupel, T. Grotker, H. Meyr","doi":"10.1109/VLSISP.1996.558362","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558362","url":null,"abstract":"We describe the automated generation of components for high throughput data-flow dominated VLSI-systems in digital communications. By means of a hierarchically organized library both behavioural models with high simulation efficiency and corresponding hardware generators that produce sophisticated VHDL descriptions are made easily accessible to the system designer. The structured approach allows the evaluation of the trade-offs between alternatives at each design step and guarantees a fast and reliable design flow towards hardware. The design environment ComBox enhances reusability and enables rapid implementation of complex systems starting from a system level description.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"31 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128564770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
VLSI Signal Processing, IX
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1