首页 > 最新文献

Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)最新文献

英文 中文
Dynamic Channel Flow Control of Networks-on-Chip Systems for High Buffer Efficiency 面向高缓冲效率的片上网络系统动态通道流量控制
Pub Date : 2007-11-21 DOI: 10.1109/SIPS.2007.4387597
Sung-Tze Wu, Chih-Hao Chao, I-Chyn Wey, A. Wu
System-on-Chip (SoC) designs become more complex nowadays. The communication between each processing element often suffers challenges due to the wiring problem. Networks-on-Chip (NoC) provides a practical solution to solve the problem. The major components in NoC are routers, which are dominated by the buffer size. Previous mechanisms need large buffer size to achieve high performance. In this paper, a dynamic channel flow control mechanism is proposed to realize the channel resource sharing globally, which can increase the throughput and the channel utilization rate. An 8 × 8 mesh on-chip network is implemented on a cycle accurate simulator. By the experimental result, the proposed mechanism can reduce the buffer size by 30% as compared with virtual channel flow control at the same throughput. Moreover, the throughput can be improved by 20% as compared with wormhole flow control.
如今,片上系统(SoC)设计变得越来越复杂。由于布线问题,每个处理元素之间的通信经常受到挑战。片上网络(NoC)为解决这一问题提供了一种实用的解决方案。NoC的主要组件是路由器,它由缓冲区大小决定。以前的机制需要较大的缓冲区大小来实现高性能。本文提出了一种动态信道流量控制机制,实现了信道资源的全局共享,提高了吞吐量和信道利用率。在周期精确模拟器上实现了8 × 8网格片上网络。实验结果表明,在相同吞吐量的情况下,与虚拟通道流量控制相比,该机制可以减少30%的缓冲区大小。此外,与虫孔流控制相比,吞吐量可提高20%。
{"title":"Dynamic Channel Flow Control of Networks-on-Chip Systems for High Buffer Efficiency","authors":"Sung-Tze Wu, Chih-Hao Chao, I-Chyn Wey, A. Wu","doi":"10.1109/SIPS.2007.4387597","DOIUrl":"https://doi.org/10.1109/SIPS.2007.4387597","url":null,"abstract":"System-on-Chip (SoC) designs become more complex nowadays. The communication between each processing element often suffers challenges due to the wiring problem. Networks-on-Chip (NoC) provides a practical solution to solve the problem. The major components in NoC are routers, which are dominated by the buffer size. Previous mechanisms need large buffer size to achieve high performance. In this paper, a dynamic channel flow control mechanism is proposed to realize the channel resource sharing globally, which can increase the throughput and the channel utilization rate. An 8 × 8 mesh on-chip network is implemented on a cycle accurate simulator. By the experimental result, the proposed mechanism can reduce the buffer size by 30% as compared with virtual channel flow control at the same throughput. Moreover, the throughput can be improved by 20% as compared with wormhole flow control.","PeriodicalId":93225,"journal":{"name":"Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)","volume":"27 1","pages":"493-498"},"PeriodicalIF":0.0,"publicationDate":"2007-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81270069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Hardware Efficient QR Decomposition for GDFE GDFE的硬件高效QR分解
Pub Date : 2007-11-21 DOI: 10.1109/SIPS.2007.4387583
Kyung-Ju Cho, Yinan Xu, Jin-Gyun Chung
This paper presents a QR decomposition core by exploiting Givens rotation for the generalized decision feedback equalizer (GDFE). A Givens rotation consists of phase extraction, sine/cosine generation and angle rotation parts. Combining the fixed-width modified-Booth multiplier and two-stage method (coarse and fine stage), we design an efficient QR decomposition core. By simulations, it is shown that the proposed QR decomposition core can be a feasible solution for GDFE.
本文提出了一种利用给定旋转的广义决策反馈均衡器(GDFE) QR分解核。给定旋转由相位提取、正弦/余弦生成和角度旋转三个部分组成。结合定宽修正布斯乘法器和粗、细两阶段法,设计了一种高效的QR分解核心。仿真结果表明,所提出的QR分解核是一种可行的GDFE解。
{"title":"Hardware Efficient QR Decomposition for GDFE","authors":"Kyung-Ju Cho, Yinan Xu, Jin-Gyun Chung","doi":"10.1109/SIPS.2007.4387583","DOIUrl":"https://doi.org/10.1109/SIPS.2007.4387583","url":null,"abstract":"This paper presents a QR decomposition core by exploiting Givens rotation for the generalized decision feedback equalizer (GDFE). A Givens rotation consists of phase extraction, sine/cosine generation and angle rotation parts. Combining the fixed-width modified-Booth multiplier and two-stage method (coarse and fine stage), we design an efficient QR decomposition core. By simulations, it is shown that the proposed QR decomposition core can be a feasible solution for GDFE.","PeriodicalId":93225,"journal":{"name":"Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)","volume":"62 1","pages":"412-417"},"PeriodicalIF":0.0,"publicationDate":"2007-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85055291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Partial Self-Reconfigurable Adaptive FIR Filter System 部分自重构自适应FIR滤波器系统
Pub Date : 2007-11-21 DOI: 10.1109/SIPS.2007.4387545
Chang-Seok Choi, Hanho Lee
This paper presents a self-reconfigurable adaptive FIR Filter system design using dynamic partial reconfiguration, which has flexibility, power efficiency, configuration time advantage allowing dynamically inserting or removing adaptive FIR filter modules. This self-reconfigurable adaptive FIR filter is responsible for providing the best solution for realization and autonomous adaptation of FIR filters, and processes the optimal digital signal processing algorithms, which are the low-pass, band-pass and high-pass filter algorithms with various frequencies, for noise removal operations. The proposed stand-alone self-reconfigurable system using Xilinx Virtex4 FPGA and Compact-Flash memory shows the improvement of configuration time and flexibility by using the dynamic partial reconfiguration techniques.
本文提出了一种采用动态部分重构的自适应FIR滤波器系统设计,该系统具有灵活性、节能性和配置时间优势,允许动态插入或移除自适应FIR滤波器模块。这种自重构自适应FIR滤波器负责为FIR滤波器的实现和自主自适应提供最佳解决方案,并处理最佳的数字信号处理算法,即各种频率的低通、带通和高通滤波器算法,以进行噪声去除操作。基于Xilinx Virtex4 FPGA和Compact-Flash存储器的独立自重构系统表明,采用动态部分重构技术可以提高配置时间和灵活性。
{"title":"A Partial Self-Reconfigurable Adaptive FIR Filter System","authors":"Chang-Seok Choi, Hanho Lee","doi":"10.1109/SIPS.2007.4387545","DOIUrl":"https://doi.org/10.1109/SIPS.2007.4387545","url":null,"abstract":"This paper presents a self-reconfigurable adaptive FIR Filter system design using dynamic partial reconfiguration, which has flexibility, power efficiency, configuration time advantage allowing dynamically inserting or removing adaptive FIR filter modules. This self-reconfigurable adaptive FIR filter is responsible for providing the best solution for realization and autonomous adaptation of FIR filters, and processes the optimal digital signal processing algorithms, which are the low-pass, band-pass and high-pass filter algorithms with various frequencies, for noise removal operations. The proposed stand-alone self-reconfigurable system using Xilinx Virtex4 FPGA and Compact-Flash memory shows the improvement of configuration time and flexibility by using the dynamic partial reconfiguration techniques.","PeriodicalId":93225,"journal":{"name":"Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)","volume":"2 1","pages":"204-209"},"PeriodicalIF":0.0,"publicationDate":"2007-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90605599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Coefficient Conversion for Transform Domain VC-1 TO H.264 Transcoding 变换域VC-1到H.264转码的系数转换
Pub Date : 2007-11-21 DOI: 10.1109/SIPS.2007.4387573
Maria Pantoja, N. Ling, Weijia Shang
This paper discusses the problem of transcoding between VC-1 and H.264 video standards. VC-1 uses an adaptive block size integer transform, which is different from the 4×4 integer transform used by H.264. We propose an algorithm to transcode the transform coefficients from VC-1 to those for H.264, which is a fundamental step for transform domain transcoding. The paper also presents a fast computation version of the algorithm. The implementation of the proposed algorithm shows that the quality of the video remains roughly the same while the complexity is greatly reduced when compared with the reference full cascade pixel domain transcoder.
本文讨论了VC-1和H.264视频标准之间的转码问题。VC-1使用自适应块大小整数变换,这与H.264使用的4×4整数变换不同。提出了一种将VC-1变换系数转码为H.264变换系数的算法,这是实现变换域转码的基本步骤。本文还给出了该算法的快速计算版本。该算法的实现表明,与参考的全级联像素域转码器相比,该算法的视频质量基本保持不变,但复杂度大大降低。
{"title":"Coefficient Conversion for Transform Domain VC-1 TO H.264 Transcoding","authors":"Maria Pantoja, N. Ling, Weijia Shang","doi":"10.1109/SIPS.2007.4387573","DOIUrl":"https://doi.org/10.1109/SIPS.2007.4387573","url":null,"abstract":"This paper discusses the problem of transcoding between VC-1 and H.264 video standards. VC-1 uses an adaptive block size integer transform, which is different from the 4×4 integer transform used by H.264. We propose an algorithm to transcode the transform coefficients from VC-1 to those for H.264, which is a fundamental step for transform domain transcoding. The paper also presents a fast computation version of the algorithm. The implementation of the proposed algorithm shows that the quality of the video remains roughly the same while the complexity is greatly reduced when compared with the reference full cascade pixel domain transcoder.","PeriodicalId":93225,"journal":{"name":"Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)","volume":"21 1","pages":"363-367"},"PeriodicalIF":0.0,"publicationDate":"2007-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90729062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
On The Complexity of Joint Demodulation and Convolutional Decoding 联合解调与卷积解码的复杂性研究
Pub Date : 2007-11-21 DOI: 10.1109/SIPS.2007.4387629
Dimitris Gkrimpas, Vassilis Paliouras
This paper investigates the combined computational complexity of demodulation and decoding of QAM signals. Four combinations of demodulation and decoding techniques are compared in terms of bit error rate (BER) vs. signal-to-noise (SNR) behavior, finite word length effects, and hardware complexity. It is found that joint demodulation and decoding using a high-radix trellis can be more efficient for higher orders of modulation, while a decoding strategy which produces soft values, followed by a Viterbi decoder is more efficient for lower modulation orders. Complexity formulas that take into account word lengths and modulation order are introduced.
本文研究了QAM信号解调和解码的综合计算复杂度。根据误码率(BER)与信噪比(SNR)行为、有限字长效应和硬件复杂性,对四种解调和解码技术组合进行了比较。研究发现,对于高阶调制,采用高基数网格的联合解调和译码可以获得更高的效率,而对于低阶调制,采用产生软值的译码策略,然后采用Viterbi译码器可以获得更高的效率。介绍了考虑字长和调制顺序的复杂度公式。
{"title":"On The Complexity of Joint Demodulation and Convolutional Decoding","authors":"Dimitris Gkrimpas, Vassilis Paliouras","doi":"10.1109/SIPS.2007.4387629","DOIUrl":"https://doi.org/10.1109/SIPS.2007.4387629","url":null,"abstract":"This paper investigates the combined computational complexity of demodulation and decoding of QAM signals. Four combinations of demodulation and decoding techniques are compared in terms of bit error rate (BER) vs. signal-to-noise (SNR) behavior, finite word length effects, and hardware complexity. It is found that joint demodulation and decoding using a high-radix trellis can be more efficient for higher orders of modulation, while a decoding strategy which produces soft values, followed by a Viterbi decoder is more efficient for lower modulation orders. Complexity formulas that take into account word lengths and modulation order are introduced.","PeriodicalId":93225,"journal":{"name":"Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)","volume":"34 1","pages":"669-674"},"PeriodicalIF":0.0,"publicationDate":"2007-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87136807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Novel Complete Passive Equivalent Circuit Model of the Practical 4-OTA-Based Floating Inductor 实用4- ota型浮动电感器新型全无源等效电路模型
Pub Date : 2007-11-21 DOI: 10.1109/SIPS.2007.4387589
R. Banchuin, B. Chipipop, B. Sirinaovakul
In this research, the practical 4-OTA-based floating inductor based upon the often cited monolithic CMOS technology has been studied and its complete passive equivalent circuit model, where the effects of both parasitic elements and finite opened-loop bandwidth have been taken into account, has been proposed. The accuracy evaluation of the proposed model has also been performed. The resulting model has been found to be excellently accurate with a considerably very small average error. Furthermore, the further study which is the inclusion of the mismatches among OTAs in order to obtain the most accurate results has also been proposed. However, the proposed passive equivalent circuit model has been found to be a convenience tool for the design of any signal processing circuits which require the CMOS-OTA-based floating inductors due to its considerably very small average error and the nature of the monolithic CMOS technology which allows the exclusion of the mismatches among OTAs.
在本研究中,研究了基于常被引用的单片CMOS技术的实用4- ota浮动电感,并提出了其完整的无源等效电路模型,其中考虑了寄生元件和有限开环带宽的影响。最后对所提出的模型进行了精度评价。所得到的模型非常准确,平均误差相当小。此外,本文还提出了进一步的研究方向,即纳入ota之间的不匹配,以获得最准确的结果。然而,由于其相当小的平均误差和单片CMOS技术的性质,所提出的无源等效电路模型已被发现是设计任何需要基于CMOS- ota的浮动电感器的信号处理电路的便利工具,该模型允许排除ota之间的不匹配。
{"title":"Novel Complete Passive Equivalent Circuit Model of the Practical 4-OTA-Based Floating Inductor","authors":"R. Banchuin, B. Chipipop, B. Sirinaovakul","doi":"10.1109/SIPS.2007.4387589","DOIUrl":"https://doi.org/10.1109/SIPS.2007.4387589","url":null,"abstract":"In this research, the practical 4-OTA-based floating inductor based upon the often cited monolithic CMOS technology has been studied and its complete passive equivalent circuit model, where the effects of both parasitic elements and finite opened-loop bandwidth have been taken into account, has been proposed. The accuracy evaluation of the proposed model has also been performed. The resulting model has been found to be excellently accurate with a considerably very small average error. Furthermore, the further study which is the inclusion of the mismatches among OTAs in order to obtain the most accurate results has also been proposed. However, the proposed passive equivalent circuit model has been found to be a convenience tool for the design of any signal processing circuits which require the CMOS-OTA-based floating inductors due to its considerably very small average error and the nature of the monolithic CMOS technology which allows the exclusion of the mismatches among OTAs.","PeriodicalId":93225,"journal":{"name":"Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)","volume":"93 1","pages":"447-451"},"PeriodicalIF":0.0,"publicationDate":"2007-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86207068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Iterative Joint Source Channel Decoding of Error Correction Arithmetic Codes 纠错算术码的迭代联合源信道译码
Pub Date : 2007-11-21 DOI: 10.1109/SIPS.2007.4387570
Junqing Liu, Tianhao Li
Binary arithmetic codes with forbidden symbols (named error correction arithmetic codes: ECAC) can be modeled as finite state machines and treated as variable length trellis codes. In this paper, a novel iterative joint source channel decoding algorithm is proposed for decoding trellis based error correction arithmetic codes. Unlike the conventional iterative decoding algorithm, it is needless to use the additional check codes such as CRC during the encoding, the proposed algorithm utilizes the Monte Carlo methods to detect the error bit directly. Furthermore, the outer error detector can not only detect the error bits but also provide the probability of the error location to the inner error corrector so as to accelerate the decoding process. Experimental results show that the proposed algorithm has some significant performance improvements over some conventional decoding algorithms in terms of the symbol error rate, while the increased computational complexity can be accepted.
带有禁止符号的二进制算术码(称为纠错算术码:ECAC)可以建模为有限状态机,并作为变长网格码处理。提出了一种基于网格纠错算法的联合源信道迭代译码算法。与传统的迭代译码算法不同,该算法不需要在编码过程中使用CRC等附加校验码,而是利用蒙特卡罗方法直接检测错误位。此外,外部纠错器不仅可以检测错误位,还可以向内部纠错器提供错误位置的概率,从而加快解码过程。实验结果表明,与传统的译码算法相比,该算法在误码率上有了明显的提高,但增加的计算复杂度是可以接受的。
{"title":"Iterative Joint Source Channel Decoding of Error Correction Arithmetic Codes","authors":"Junqing Liu, Tianhao Li","doi":"10.1109/SIPS.2007.4387570","DOIUrl":"https://doi.org/10.1109/SIPS.2007.4387570","url":null,"abstract":"Binary arithmetic codes with forbidden symbols (named error correction arithmetic codes: ECAC) can be modeled as finite state machines and treated as variable length trellis codes. In this paper, a novel iterative joint source channel decoding algorithm is proposed for decoding trellis based error correction arithmetic codes. Unlike the conventional iterative decoding algorithm, it is needless to use the additional check codes such as CRC during the encoding, the proposed algorithm utilizes the Monte Carlo methods to detect the error bit directly. Furthermore, the outer error detector can not only detect the error bits but also provide the probability of the error location to the inner error corrector so as to accelerate the decoding process. Experimental results show that the proposed algorithm has some significant performance improvements over some conventional decoding algorithms in terms of the symbol error rate, while the increased computational complexity can be accepted.","PeriodicalId":93225,"journal":{"name":"Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)","volume":"15 1","pages":"346-350"},"PeriodicalIF":0.0,"publicationDate":"2007-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82477412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Montgomery Modular Multiplication Algorithm on Multi-Core Systems 多核系统上的Montgomery模乘法算法
Pub Date : 2007-11-21 DOI: 10.1109/SIPS.2007.4387555
Junfeng Fan, K. Sakiyama, I. Verbauwhede
In this paper, we investigate the efficient software implementations of theMontgomery modular multiplication algorithm on amulti-core system. AHW/SW co-design technique is used to find the efficient system architecture and the instruction scheduling method. We first implement the Montgomery modular multiplication on a multi-core systemwith general purpose cores. We then speed up it by adopting the Multiply-Accumulate (MAC) operation in each core. As a result, the performance can be improved by a factor of 1.53 and 2.15 when 256-bit and 1024-bit Montgomery modular multiplication being performed, respectively.
本文研究了montgomery模乘法算法在多核系统上的高效软件实现。采用AHW/SW协同设计技术寻找高效的系统架构和指令调度方法。我们首先在具有通用内核的多核系统上实现Montgomery模乘法。然后,我们通过在每个内核中采用乘法累加(MAC)操作来加速它。因此,当执行256位和1024位Montgomery模乘法时,性能可以分别提高1.53和2.15倍。
{"title":"Montgomery Modular Multiplication Algorithm on Multi-Core Systems","authors":"Junfeng Fan, K. Sakiyama, I. Verbauwhede","doi":"10.1109/SIPS.2007.4387555","DOIUrl":"https://doi.org/10.1109/SIPS.2007.4387555","url":null,"abstract":"In this paper, we investigate the efficient software implementations of theMontgomery modular multiplication algorithm on amulti-core system. AHW/SW co-design technique is used to find the efficient system architecture and the instruction scheduling method. We first implement the Montgomery modular multiplication on a multi-core systemwith general purpose cores. We then speed up it by adopting the Multiply-Accumulate (MAC) operation in each core. As a result, the performance can be improved by a factor of 1.53 and 2.15 when 256-bit and 1024-bit Montgomery modular multiplication being performed, respectively.","PeriodicalId":93225,"journal":{"name":"Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)","volume":"22 1","pages":"261-266"},"PeriodicalIF":0.0,"publicationDate":"2007-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81760394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Multi-Dimensional Parallel Rank Order Filtering 多维并行秩序滤波
Pub Date : 2007-11-21 DOI: 10.1109/SIPS.2007.4387622
M. V. D. Horst, R. H. Mak
We present a method to design multi-dimensional rank order filters. Our designs are more efficient than existing ones from literature, e.g. reducing the number of operations required by a 2-dimensional 7 × 7 median filter by 66%. This efficiency is maintained regardless of the amount of parallelism, therefore the throughput of our designs scales linearly with the amount of hardware. To accomplish this we introduce a framework in the form of a generator graph. This graph allows us to formalize our methods and formulate an algorithm that produces efficient designs by reusing common sub-expressions. Like other rank order filters our designs are based on sorting networks composed from Batcher¿s merging networks. However, we introduce an additional optimization that increases the savings obtained by pruning sorting networks. Our design method is independent of the implementation method and resulting designs can be implemented both as a VLSI circuit and as a program for an SIMD processor.
提出了一种设计多维秩序滤波器的方法。我们的设计比现有文献中的设计更高效,例如将二维7 × 7中值滤波器所需的操作次数减少了66%。无论并行度多少,这种效率都保持不变,因此我们设计的吞吐量与硬件数量呈线性增长。为了实现这一点,我们引入了一个生成器图形式的框架。这个图允许我们形式化我们的方法,并制定一个算法,通过重用公共子表达式产生有效的设计。与其他秩序过滤器一样,我们的设计基于由Batcher合并网络组成的排序网络。然而,我们引入了一个额外的优化,它增加了通过修剪排序网络获得的节省。我们的设计方法独立于实现方法,结果设计既可以作为VLSI电路实现,也可以作为SIMD处理器的程序实现。
{"title":"Multi-Dimensional Parallel Rank Order Filtering","authors":"M. V. D. Horst, R. H. Mak","doi":"10.1109/SIPS.2007.4387622","DOIUrl":"https://doi.org/10.1109/SIPS.2007.4387622","url":null,"abstract":"We present a method to design multi-dimensional rank order filters. Our designs are more efficient than existing ones from literature, e.g. reducing the number of operations required by a 2-dimensional 7 × 7 median filter by 66%. This efficiency is maintained regardless of the amount of parallelism, therefore the throughput of our designs scales linearly with the amount of hardware. To accomplish this we introduce a framework in the form of a generator graph. This graph allows us to formalize our methods and formulate an algorithm that produces efficient designs by reusing common sub-expressions. Like other rank order filters our designs are based on sorting networks composed from Batcher¿s merging networks. However, we introduce an additional optimization that increases the savings obtained by pruning sorting networks. Our design method is independent of the implementation method and resulting designs can be implemented both as a VLSI circuit and as a program for an SIMD processor.","PeriodicalId":93225,"journal":{"name":"Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)","volume":"135 1","pages":"627-632"},"PeriodicalIF":0.0,"publicationDate":"2007-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82864599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Adaptive Techniques for a Fast Frequency Domain Motion Estimation 快速频域运动估计的自适应技术
Pub Date : 2007-11-21 DOI: 10.1109/SIPS.2007.4387567
Y. Ismail, M. Elgamel, M. Bayoumi
Dynamic Block Size Motion Estimation (DBS-ME) and smart Dynamic Early Search Termination (DEST) techniques are proposed and implemented in this paper. Both of the proposed techniques are combined and applied to the conventional phase correlation technique. The performance, visual quality and complexity of the proposed techniques are compared to that of the original phase correlation motion estimation (PC-ME) and Full Search Block Matching (FSBM) techniques. The proposed techniques provide an increase in the encoding quality besides a decrease in the computational complexity of ME process. Results show that there is approximately 100% of the stationary blocks decided by the FSBM algorithm are discovered correctly which consequently reduce the computations compared with the original FS and PC techniques. Also it is noted that, DBS-ME technique greatly decreases the computations required for ME process by decreasing the required padding to one or two pixels for both the current and the reference blocks. In addition, the motion field of the proposed algorithm gives much lower entropy than PC-ME which means more reduction in the transmitted bit rate.
本文提出并实现了动态块大小运动估计(DBS-ME)和智能动态早期搜索终止(DEST)技术。将这两种方法结合起来,应用于传统的相位相关技术。将所提技术的性能、视觉质量和复杂度与原始的相位相关运动估计(PC-ME)和全搜索块匹配(FSBM)技术进行了比较。所提出的技术不仅提高了编码质量,而且降低了编码过程的计算复杂度。结果表明,FSBM算法确定的固定块约100%被正确发现,与原来的FS和PC技术相比,减少了计算量。另外值得注意的是,DBS-ME技术通过减少当前和参考块所需的填充到一个或两个像素,大大减少了ME过程所需的计算量。此外,该算法的运动场熵比PC-ME低得多,这意味着传输比特率的降低幅度更大。
{"title":"Adaptive Techniques for a Fast Frequency Domain Motion Estimation","authors":"Y. Ismail, M. Elgamel, M. Bayoumi","doi":"10.1109/SIPS.2007.4387567","DOIUrl":"https://doi.org/10.1109/SIPS.2007.4387567","url":null,"abstract":"Dynamic Block Size Motion Estimation (DBS-ME) and smart Dynamic Early Search Termination (DEST) techniques are proposed and implemented in this paper. Both of the proposed techniques are combined and applied to the conventional phase correlation technique. The performance, visual quality and complexity of the proposed techniques are compared to that of the original phase correlation motion estimation (PC-ME) and Full Search Block Matching (FSBM) techniques. The proposed techniques provide an increase in the encoding quality besides a decrease in the computational complexity of ME process. Results show that there is approximately 100% of the stationary blocks decided by the FSBM algorithm are discovered correctly which consequently reduce the computations compared with the original FS and PC techniques. Also it is noted that, DBS-ME technique greatly decreases the computations required for ME process by decreasing the required padding to one or two pixels for both the current and the reference blocks. In addition, the motion field of the proposed algorithm gives much lower entropy than PC-ME which means more reduction in the transmitted bit rate.","PeriodicalId":93225,"journal":{"name":"Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)","volume":"48 1","pages":"331-336"},"PeriodicalIF":0.0,"publicationDate":"2007-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82680761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1