首页 > 最新文献

VLSI Signal Processing, IX最新文献

英文 中文
Fixed-point error analysis and wordlength optimization of a distributed arithmetic based 8/spl times/8 2D-IDCT architecture 基于8/spl times/8 2D-IDCT结构的分布式算法的定点误差分析及字长优化
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558372
Seehyun Kim, Wonyong Sung
The two dimensional discrete cosine transform (DCT) has been used widely for various image and video processing standards. Efficient implementation of the algorithm requires fixed-point arithmetic, which may result in a noticeable mismatch between the encoder and the decoder. The finite wordlength effects of a distributed arithmetic based 8/spl times/8 2D-IDCT (inverse discrete cosine transform) are analytically modeled. In order to accurately model the implementation hardware, the ensemble average of integer domain fixed-point errors after rounding is evaluated not only by calculating the mean and the variance but by considering the statistical distribution as well. Based on the error model, a set of optimum wordlengths conforming to the IEEE specifications is determined. There is a close agreement between the model and the bit-accurate simulation results.
二维离散余弦变换(DCT)已广泛应用于各种图像和视频处理标准。该算法的有效实现需要定点运算,这可能导致编码器和解码器之间明显的不匹配。本文对基于8/spl次/8 2D-IDCT(逆离散余弦变换)的分布式算法有限字长效应进行了解析建模。为了准确地对实现硬件进行建模,除了计算平均值和方差外,还考虑了统计分布,对舍入后的整数域定点误差的集成平均值进行了评估。基于误差模型,确定了一组符合IEEE规范的最佳字长。仿真结果与模型吻合较好。
{"title":"Fixed-point error analysis and wordlength optimization of a distributed arithmetic based 8/spl times/8 2D-IDCT architecture","authors":"Seehyun Kim, Wonyong Sung","doi":"10.1109/VLSISP.1996.558372","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558372","url":null,"abstract":"The two dimensional discrete cosine transform (DCT) has been used widely for various image and video processing standards. Efficient implementation of the algorithm requires fixed-point arithmetic, which may result in a noticeable mismatch between the encoder and the decoder. The finite wordlength effects of a distributed arithmetic based 8/spl times/8 2D-IDCT (inverse discrete cosine transform) are analytically modeled. In order to accurately model the implementation hardware, the ensemble average of integer domain fixed-point errors after rounding is evaluated not only by calculating the mean and the variance but by considering the statistical distribution as well. Based on the error model, a set of optimum wordlengths conforming to the IEEE specifications is determined. There is a close agreement between the model and the bit-accurate simulation results.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131353406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
VLSI architectures for multiplication in GF(2/sup m/) for application tailored digital signal processors 用于GF(2/sup m/)乘法的VLSI架构,适用于定制的数字信号处理器
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558299
W. Drescher, G.P. Fettweis
Finite field arithmetic plays an important role in coding theory, cryptography and their applications. Several hardware solutions using finite field arithmetic have already been developed but none of them are user programmable. This is probably one reason why BCH codes are not commonly used in mobile communication applications even though these codes have very desirable properties regarding burst error correction. This article presents architectures for multiplication in GF(2/sup m/) applicable to digital signal processors. First a method is proposed to build an array of gates for hardware multiplication in GF(2/sup m/). Then an approach is shown that combines the hardware of a typical standard binary arithmetic multiplier with a GF(2/sup m/) multiplier. Using this approach saves a considerable number of gates and decreases the bus load while increasing the latency of the standard binary multiplier unit only marginally. Finally, a solution of a combined 17/spl times/17 integer/GF(2/sup m/spl les/8/) multiplier is presented and discussed.
有限域算法在编码理论、密码学及其应用中起着重要的作用。已经开发了几种使用有限域算法的硬件解决方案,但没有一种是用户可编程的。这可能是BCH码在移动通信应用中不常用的原因之一,尽管这些码在突发纠错方面具有非常理想的特性。本文介绍了适用于数字信号处理器的GF(2/sup m/)乘法结构。首先提出了在GF(2/sup /)中构建硬件乘法门阵列的方法。然后给出了一种将典型标准二进制算术乘法器的硬件与GF(2/sup m/)乘法器相结合的方法。使用这种方法可以节省相当数量的门并降低总线负载,同时仅略微增加标准二进制乘法器单元的延迟。最后,给出并讨论了17/ sp1倍/17整数/GF(2/sup m/ sp1小/8/)组合乘法器的解。
{"title":"VLSI architectures for multiplication in GF(2/sup m/) for application tailored digital signal processors","authors":"W. Drescher, G.P. Fettweis","doi":"10.1109/VLSISP.1996.558299","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558299","url":null,"abstract":"Finite field arithmetic plays an important role in coding theory, cryptography and their applications. Several hardware solutions using finite field arithmetic have already been developed but none of them are user programmable. This is probably one reason why BCH codes are not commonly used in mobile communication applications even though these codes have very desirable properties regarding burst error correction. This article presents architectures for multiplication in GF(2/sup m/) applicable to digital signal processors. First a method is proposed to build an array of gates for hardware multiplication in GF(2/sup m/). Then an approach is shown that combines the hardware of a typical standard binary arithmetic multiplier with a GF(2/sup m/) multiplier. Using this approach saves a considerable number of gates and decreases the bus load while increasing the latency of the standard binary multiplier unit only marginally. Finally, a solution of a combined 17/spl times/17 integer/GF(2/sup m/spl les/8/) multiplier is presented and discussed.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121312514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
A parallel architecture for rapid prototyping of mechatronic algorithms by exploiting implicit fine-grain parallelism 一种利用隐式细粒度并行性实现机电一体化算法快速原型设计的并行架构
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558363
M.-D. Doan, M. Glesner
The paper presents an array architecture for rapid prototyping of mechatronic algorithms. The requirements for high throughput of arbitrary irregular real-time algorithms are supported by adopting the data-driven principle, exploiting the implicit fine grain parallelism, providing a high degree of scalability, and offering large flexibility in system configuration. Interconnection between neighboring processing elements of the array is implemented by a static hardware controlled network, whereas communication between spatial separated elements is provided by two dynamic global networks. Besides an overview of the architecture design, an algorithm mapping example illustrates implementation of a time-critical mechatronic application using the novel wavefront mapping algorithm.
提出了一种用于机电一体化算法快速成型的阵列结构。采用数据驱动原理,利用隐式细粒度并行性,提供高度的可扩展性,并在系统配置上提供很大的灵活性,支持任意不规则实时算法的高吞吐量需求。阵列相邻处理单元之间的互连由一个静态的硬件控制网络实现,而空间分离单元之间的通信由两个动态的全局网络提供。除了概述架构设计外,还通过算法映射示例说明了使用新型波前映射算法实现时间紧迫的机电一体化应用程序。
{"title":"A parallel architecture for rapid prototyping of mechatronic algorithms by exploiting implicit fine-grain parallelism","authors":"M.-D. Doan, M. Glesner","doi":"10.1109/VLSISP.1996.558363","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558363","url":null,"abstract":"The paper presents an array architecture for rapid prototyping of mechatronic algorithms. The requirements for high throughput of arbitrary irregular real-time algorithms are supported by adopting the data-driven principle, exploiting the implicit fine grain parallelism, providing a high degree of scalability, and offering large flexibility in system configuration. Interconnection between neighboring processing elements of the array is implemented by a static hardware controlled network, whereas communication between spatial separated elements is provided by two dynamic global networks. Besides an overview of the architecture design, an algorithm mapping example illustrates implementation of a time-critical mechatronic application using the novel wavefront mapping algorithm.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115792975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Logic synthesis of binary, carry-save and mixed-radix arithmetic for digital signal processing 用于数字信号处理的二进制逻辑综合、进位保存和混合基数算法
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558355
S. Bitterlich, H. Meyr
All the commercially available logic-synthesis tools currently use only (non-redundant) binary and two's complement number representations for representing the results of arithmetic operators. We analyze and compare silicon real-estate and throughput of word-parallel arithmetic circuits (add and shift type arithmetic) based on various redundant number representations and compare these results with the automatically optimized two's complement implementations. The literature on redundant number representations typically recommends radix-4 arithmetic for full-custom or a traditional semi-custom design style. We show that the radix-4 implementation is often not optimal for a logic-synthesis based semi-custom design style. Instead, a high-radix or a mixed-radix implementation (which we derive) should be considered.
所有商业上可用的逻辑合成工具目前只使用(非冗余的)二进制和二进制补数表示来表示算术运算符的结果。我们分析和比较了基于各种冗余数字表示的字并行算术电路(加法和移位类型算术)的硅空间和吞吐量,并将这些结果与自动优化的两个互补实现进行了比较。关于冗余数表示的文献通常建议在完全定制或传统的半定制设计风格中使用基数-4算法。我们表明,对于基于逻辑合成的半定制设计风格,radix-4实现通常不是最优的。相反,应该考虑高基数或混合基数实现(我们推导的)。
{"title":"Logic synthesis of binary, carry-save and mixed-radix arithmetic for digital signal processing","authors":"S. Bitterlich, H. Meyr","doi":"10.1109/VLSISP.1996.558355","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558355","url":null,"abstract":"All the commercially available logic-synthesis tools currently use only (non-redundant) binary and two's complement number representations for representing the results of arithmetic operators. We analyze and compare silicon real-estate and throughput of word-parallel arithmetic circuits (add and shift type arithmetic) based on various redundant number representations and compare these results with the automatically optimized two's complement implementations. The literature on redundant number representations typically recommends radix-4 arithmetic for full-custom or a traditional semi-custom design style. We show that the radix-4 implementation is often not optimal for a logic-synthesis based semi-custom design style. Instead, a high-radix or a mixed-radix implementation (which we derive) should be considered.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121096107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Scheduling for minimizing the number of memory accesses in low power applications 在低功耗应用程序中最小化内存访问数量的调度
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558322
R. Saied, C. Chakrabarti
The increasing demand for portable electronics has caused power consumption to be a critical issue in the design process. Reducing the total power consumption in portable systems is important in order to maximize the run time with minimum requirements in size and weight of the batteries. Power consumption in memory-intensive operations can be reduced by minimizing the number of memory accesses. We describe two scheduling schemes under fixed hardware resource constraints which reduce the number of memory accesses by minimizing the number of intermediate variables that need to be stored. While the first scheme achieves this by post order traversal of the DFG, the second scheme achieves this by judiciously delaying the scheduling of some of the nodes. Experimental results show that these schemes require significantly fewer memory accesses compared to existing scheduling schemes.
对便携式电子产品日益增长的需求使得功耗成为设计过程中的一个关键问题。降低便携式系统的总功耗是重要的,以最大限度地延长运行时间,最小的尺寸和重量的电池的要求。内存密集型操作中的功耗可以通过最小化内存访问次数来降低。我们描述了在固定硬件资源约束下的两种调度方案,它们通过最小化需要存储的中间变量的数量来减少内存访问的数量。第一种方案通过后序遍历DFG来实现这一点,而第二种方案通过明智地延迟一些节点的调度来实现这一点。实验结果表明,与现有调度方案相比,这些方案所需的内存访问量显著减少。
{"title":"Scheduling for minimizing the number of memory accesses in low power applications","authors":"R. Saied, C. Chakrabarti","doi":"10.1109/VLSISP.1996.558322","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558322","url":null,"abstract":"The increasing demand for portable electronics has caused power consumption to be a critical issue in the design process. Reducing the total power consumption in portable systems is important in order to maximize the run time with minimum requirements in size and weight of the batteries. Power consumption in memory-intensive operations can be reduced by minimizing the number of memory accesses. We describe two scheduling schemes under fixed hardware resource constraints which reduce the number of memory accesses by minimizing the number of intermediate variables that need to be stored. While the first scheme achieves this by post order traversal of the DFG, the second scheme achieves this by judiciously delaying the scheduling of some of the nodes. Experimental results show that these schemes require significantly fewer memory accesses compared to existing scheduling schemes.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134296082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Video DSP architecture and its application design methodology for sampling rate conversion 视频DSP的体系结构及其应用设计方法
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558374
K. Nakamura, M. Kurokawa, A. Hashiguchi, M. Kanou, K. Aoyama, H. Okuda, S. Iwase, T. Yamazaki
This paper describes the special architecture of the linear array DSP and design methodology for the application to convert sampling rate of the video signals. This methodology allows us to develop a detailed DSP application code for a given sampling conversion rate. Compared to the ASIC implementation of sampling rate conversion, the required time for implementation is drastically reduced. An example of conversion from HDTV to SDTV (wide) is given.
本文介绍了线性阵列DSP的特殊结构和视频信号采样率转换应用的设计方法。这种方法使我们能够针对给定的采样转换率开发详细的DSP应用代码。与采样率转换的ASIC实现相比,实现所需的时间大大缩短。给出了一个从HDTV到SDTV(宽)转换的实例。
{"title":"Video DSP architecture and its application design methodology for sampling rate conversion","authors":"K. Nakamura, M. Kurokawa, A. Hashiguchi, M. Kanou, K. Aoyama, H. Okuda, S. Iwase, T. Yamazaki","doi":"10.1109/VLSISP.1996.558374","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558374","url":null,"abstract":"This paper describes the special architecture of the linear array DSP and design methodology for the application to convert sampling rate of the video signals. This methodology allows us to develop a detailed DSP application code for a given sampling conversion rate. Compared to the ASIC implementation of sampling rate conversion, the required time for implementation is drastically reduced. An example of conversion from HDTV to SDTV (wide) is given.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129429626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Digital upconversion architecture for quadrature modulators 正交调制器的数字上转换结构
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558364
P. Schaumont, S. Vernalde, M. Engels, I. Bolsens
Traditionally, the digital implementation of modems is restricted to parts operating at the baseband frequency. At higher frequencies, roughly 30 MHz and beyond, analog technologies such as SAW filters provide a better power/performance figure. We show how this barrier can be broken by trading programmability for speed. Using a digital multirate filter structure that offers combined interpolation and frequency shifting, an area- and power efficient digital upconversion is achieved.
传统上,调制解调器的数字实现仅限于在基带频率下工作的部分。在更高的频率下,大约30 MHz及以上,SAW滤波器等模拟技术提供更好的功率/性能数据。我们展示了如何通过交易可编程性来打破这个障碍。采用数字多速率滤波器结构,结合插值和频移,实现了面积和功率效率高的数字上变频。
{"title":"Digital upconversion architecture for quadrature modulators","authors":"P. Schaumont, S. Vernalde, M. Engels, I. Bolsens","doi":"10.1109/VLSISP.1996.558364","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558364","url":null,"abstract":"Traditionally, the digital implementation of modems is restricted to parts operating at the baseband frequency. At higher frequencies, roughly 30 MHz and beyond, analog technologies such as SAW filters provide a better power/performance figure. We show how this barrier can be broken by trading programmability for speed. Using a digital multirate filter structure that offers combined interpolation and frequency shifting, an area- and power efficient digital upconversion is achieved.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121696644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Recent advances in mobile video communications 移动视频通信的最新进展
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558272
B. Girod, K. B. Younes, N. Faerber, E. Steinbach
Mobile channels cannot provide guaranteed quality of service parameters. We illustrate this problem with a ray tracing simulation of an indoor DECT channel. Unreliable transmission poses severe problems for the transmission of motion video, using compression schemes such as described in the ITU-T H.263 international standard due to temporal error propagation. We discuss compatible extensions of H.263 that utilize a feedback channel for robust transmission. In conjunction with FEC and ARQ, an intelligent source coder control can provide excellent robustness at bit error rates worse than 10/sup -2/.
移动信道无法提供有质量保证的服务参数。我们用室内DECT通道的光线追踪模拟来说明这个问题。使用ITU-T H.263国际标准中描述的压缩方案,由于时间误差传播,不可靠的传输给运动视频的传输带来了严重的问题。我们讨论了H.263的兼容扩展,利用反馈通道进行鲁棒传输。结合FEC和ARQ,智能源编码器控制可以在误码率低于10/sup -2/时提供出色的鲁棒性。
{"title":"Recent advances in mobile video communications","authors":"B. Girod, K. B. Younes, N. Faerber, E. Steinbach","doi":"10.1109/VLSISP.1996.558272","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558272","url":null,"abstract":"Mobile channels cannot provide guaranteed quality of service parameters. We illustrate this problem with a ray tracing simulation of an indoor DECT channel. Unreliable transmission poses severe problems for the transmission of motion video, using compression schemes such as described in the ITU-T H.263 international standard due to temporal error propagation. We discuss compatible extensions of H.263 that utilize a feedback channel for robust transmission. In conjunction with FEC and ARQ, an intelligent source coder control can provide excellent robustness at bit error rates worse than 10/sup -2/.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128961495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SHARP: efficient loop scheduling with data hazard reduction on multiple pipeline DSP systems SHARP:在多个管道DSP系统上减少数据危害的高效循环调度
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558358
S. Tongsima, C. Chantrapornchai, E. Sha, N. Passos
Computation intensive DSP applications usually require a parallel/pipelined processor in order to achieve specific timing requirements. Data hazards are a major obstacle against the high performance of pipelined systems. This paper presents a novel efficient loop scheduling algorithm that reduces data hazards for those DSP applications. Such an algorithm has been embedded in a tool, called SHARP, which schedules a pipelined data flow graph to multiple pipelined units, while hiding the underlying data hazards and minimizing the execution time. This paper reports significant improvement for some well-known benchmarks, showing the efficiency of the scheduling algorithm and the flexibility of the simulation tool.
计算密集型DSP应用通常需要并行/流水线处理器来实现特定的时序要求。数据危害是阻碍流水线系统实现高性能的主要障碍。本文提出了一种新的有效的循环调度算法,减少了DSP应用中的数据危害。这种算法已经嵌入到一个名为SHARP的工具中,该工具将流水线数据流图调度到多个流水线单元,同时隐藏潜在的数据危险并最大限度地减少执行时间。本文报告了对一些知名基准测试的显著改进,显示了调度算法的效率和仿真工具的灵活性。
{"title":"SHARP: efficient loop scheduling with data hazard reduction on multiple pipeline DSP systems","authors":"S. Tongsima, C. Chantrapornchai, E. Sha, N. Passos","doi":"10.1109/VLSISP.1996.558358","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558358","url":null,"abstract":"Computation intensive DSP applications usually require a parallel/pipelined processor in order to achieve specific timing requirements. Data hazards are a major obstacle against the high performance of pipelined systems. This paper presents a novel efficient loop scheduling algorithm that reduces data hazards for those DSP applications. Such an algorithm has been embedded in a tool, called SHARP, which schedules a pipelined data flow graph to multiple pipelined units, while hiding the underlying data hazards and minimizing the execution time. This paper reports significant improvement for some well-known benchmarks, showing the efficiency of the scheduling algorithm and the flexibility of the simulation tool.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127570836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An area effective standard cell based channel decoder LSI for digital satellite TV broadcasting 一种用于数字卫星电视广播的基于区域有效标准单元的信道解码器LSI
Pub Date : 1996-10-30 DOI: 10.1109/VLSISP.1996.558366
T. Kamada, T. Fukuoka, Y. Nakai, Y. Nakakura, K. Ueda, K. Ota, T. Shiomi, Y. Fukumoto
A new channel decoder LSI, which will be used in digital satellite TV broadcasting set-top boxes, has been designed. This LSI's functions include AD/DA conversion, QPSK demodulation, Viterbi decoding, frame synchronization, convolutional deinterleaving, Reed-Solomon (RS) decoding, and descrambling. We use a new method for Viterbi decoding called the tracking survivor state information (TSSI) method, which not only reduces power consumption, but also solves the problem of increasing memory size. To reduce the size of the RS decoder circuit, we used a three-stage-pipeline structure as well as designed a new architecture to realize the Euclid algorithm. This device has been fabricated in a 0.35 /spl mu/m 3-metal CMOS standard cell-based process and is composed of 670 K transistors. We describe the TSSI method of the Viterbi decoder and the Reed-Solomon decoder's new 3-stage pipeline architecture.
设计了一种用于数字卫星电视广播机顶盒的新型通道解码器LSI。该LSI的功能包括AD/DA转换、QPSK解调、Viterbi解码、帧同步、卷积去交错、Reed-Solomon (RS)解码和解码器。我们采用了一种新的Viterbi解码方法——跟踪幸存者状态信息(TSSI)方法,不仅降低了功耗,而且解决了内存增加的问题。为了减小RS译码电路的尺寸,我们采用了三级流水线结构,并设计了一种新的架构来实现欧几里得算法。该器件以0.35 /spl mu/m的3金属CMOS标准电池工艺制成,由670 K晶体管组成。我们描述了Viterbi解码器的TSSI方法和Reed-Solomon解码器的新的3级管道架构。
{"title":"An area effective standard cell based channel decoder LSI for digital satellite TV broadcasting","authors":"T. Kamada, T. Fukuoka, Y. Nakai, Y. Nakakura, K. Ueda, K. Ota, T. Shiomi, Y. Fukumoto","doi":"10.1109/VLSISP.1996.558366","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558366","url":null,"abstract":"A new channel decoder LSI, which will be used in digital satellite TV broadcasting set-top boxes, has been designed. This LSI's functions include AD/DA conversion, QPSK demodulation, Viterbi decoding, frame synchronization, convolutional deinterleaving, Reed-Solomon (RS) decoding, and descrambling. We use a new method for Viterbi decoding called the tracking survivor state information (TSSI) method, which not only reduces power consumption, but also solves the problem of increasing memory size. To reduce the size of the RS decoder circuit, we used a three-stage-pipeline structure as well as designed a new architecture to realize the Euclid algorithm. This device has been fabricated in a 0.35 /spl mu/m 3-metal CMOS standard cell-based process and is composed of 670 K transistors. We describe the TSSI method of the Viterbi decoder and the Reed-Solomon decoder's new 3-stage pipeline architecture.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117200919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
VLSI Signal Processing, IX
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1