首页 > 最新文献

2017 IEEE International Workshop on Signal Processing Systems (SiPS)最新文献

英文 中文
Task-based execution of synchronous dataflow graphs for scalable multicore computing 基于任务的同步数据流图执行,用于可扩展的多核计算
Pub Date : 2017-10-01 DOI: 10.1109/SiPS.2017.8110023
Georgios Georgakarakos, Sudeep Kanur, J. Lilius, K. Desnos
Dataflow models of computation have early on been acknowledged as an attractive methodology to describe parallel algorithms, hence they have become highly relevant for programming in the current multicore processor era. While several frameworks provide tools to create dataflow descriptions of algorithms, generating parallel code for programmable processors is still sub-optimal due to the scheduling overheads and the semantics gap when expressing parallelism with conventional programming languages featuring threads. In this paper we propose an optimization of the parallel code generation process by combining dataflow and task programming models. We develop a task-based code generator for PREESM, a dataflow-based prototyping framework, in order to deploy algorithms described as synchronous dataflow graphs on multicore platforms. Experimental performance comparison of our task generated code against typical thread-based code shows that our approach removes significant scheduling and synchronization overheads while maintaining similar (and occasionally improving) application throughput.
计算的数据流模型很早就被认为是描述并行算法的一种有吸引力的方法,因此它们与当前多核处理器时代的编程高度相关。虽然有几个框架提供了创建算法数据流描述的工具,但由于调度开销和使用传统的以线程为特征的编程语言表达并行性时的语义差距,为可编程处理器生成并行代码仍然不是最优的。本文提出了一种结合数据流和任务编程模型的并行代码生成过程优化方法。我们为PREESM(一个基于数据流的原型框架)开发了一个基于任务的代码生成器,以便在多核平台上部署被描述为同步数据流图的算法。我们的任务生成代码与典型的基于线程的代码的实验性能比较表明,我们的方法在保持类似(偶尔改进)应用程序吞吐量的同时消除了显著的调度和同步开销。
{"title":"Task-based execution of synchronous dataflow graphs for scalable multicore computing","authors":"Georgios Georgakarakos, Sudeep Kanur, J. Lilius, K. Desnos","doi":"10.1109/SiPS.2017.8110023","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8110023","url":null,"abstract":"Dataflow models of computation have early on been acknowledged as an attractive methodology to describe parallel algorithms, hence they have become highly relevant for programming in the current multicore processor era. While several frameworks provide tools to create dataflow descriptions of algorithms, generating parallel code for programmable processors is still sub-optimal due to the scheduling overheads and the semantics gap when expressing parallelism with conventional programming languages featuring threads. In this paper we propose an optimization of the parallel code generation process by combining dataflow and task programming models. We develop a task-based code generator for PREESM, a dataflow-based prototyping framework, in order to deploy algorithms described as synchronous dataflow graphs on multicore platforms. Experimental performance comparison of our task generated code against typical thread-based code shows that our approach removes significant scheduling and synchronization overheads while maintaining similar (and occasionally improving) application throughput.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125538206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Customizing fixed-point and floating-point arithmetic — A case study in K-means clustering 自定义定点和浮点算法- K-means聚类的案例研究
Pub Date : 2017-10-01 DOI: 10.1109/SiPS.2017.8109980
Benjamin Barrois, O. Sentieys
This paper presents a comparison between custom fixed-point (FxP) and floating-point (FlP) arithmetic, applied to bidimensional K-means clustering algorithm. After a discussion on the K-means clustering algorithm and arithmetic characteristics, hardware implementations of FxP and FlP arithmetic operators are compared in terms of area, delay and energy, for different bitwidth, using the ApxPerf2.0 framework. Finally, both are compared in the context of K-means clustering. The direct comparison shows the large difference between 8-to-16-bit FxP and FlP operators, FlP adders consuming 5–12 χ more energy than FxP adders, and multipliers 2–10χ more. However, when applied to K-means clustering algorithm, the gap between FxP and FlP tightens. Indeed, the accuracy improvements brought by FlP make the computation more accurate and lead to an accuracy equivalent to FxP with less iterations of the algorithm, proportionally reducing the global energy spent. The 8-bit version of the algorithm becomes more profitable using FlP, which is 80% more accurate with only 1.6 χ more energy. This paper finally discusses the stake of custom FlP for low-energy general-purpose computation, thanks to its ease of use, supported by an energy overhead lower than what could have been expected.
本文比较了自定义定点算法(FxP)和浮点算法(FlP)在二维k均值聚类算法中的应用。在讨论了K-means聚类算法和算法特性之后,在ApxPerf2.0框架下,比较了不同位宽下FxP和FlP算法的硬件实现在面积、延迟和能量方面的差异。最后,在K-means聚类的背景下对两者进行比较。直接比较显示8- 16位FxP和FlP运算符之间的巨大差异,FlP加法器比FxP加法器消耗的能量多5-12 χ,乘法器多2-10χ。然而,当应用于K-means聚类算法时,FxP与FlP之间的差距缩小了。事实上,FlP带来的精度改进使计算更加精确,并且通过更少的算法迭代获得相当于FxP的精度,成比例地减少了全局能量消耗。使用FlP,该算法的8位版本变得更加有利可图,其准确率提高80%,仅增加1.6 χ的能量。本文最后讨论了自定义FlP对低能耗通用计算的重要性,由于它易于使用,并且能量开销低于预期。
{"title":"Customizing fixed-point and floating-point arithmetic — A case study in K-means clustering","authors":"Benjamin Barrois, O. Sentieys","doi":"10.1109/SiPS.2017.8109980","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8109980","url":null,"abstract":"This paper presents a comparison between custom fixed-point (FxP) and floating-point (FlP) arithmetic, applied to bidimensional K-means clustering algorithm. After a discussion on the K-means clustering algorithm and arithmetic characteristics, hardware implementations of FxP and FlP arithmetic operators are compared in terms of area, delay and energy, for different bitwidth, using the ApxPerf2.0 framework. Finally, both are compared in the context of K-means clustering. The direct comparison shows the large difference between 8-to-16-bit FxP and FlP operators, FlP adders consuming 5–12 χ more energy than FxP adders, and multipliers 2–10χ more. However, when applied to K-means clustering algorithm, the gap between FxP and FlP tightens. Indeed, the accuracy improvements brought by FlP make the computation more accurate and lead to an accuracy equivalent to FxP with less iterations of the algorithm, proportionally reducing the global energy spent. The 8-bit version of the algorithm becomes more profitable using FlP, which is 80% more accurate with only 1.6 χ more energy. This paper finally discusses the stake of custom FlP for low-energy general-purpose computation, thanks to its ease of use, supported by an energy overhead lower than what could have been expected.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121527467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Obtaining an optimal set of head-related transfer functions with a small amount of measurements 用少量的测量获得一组最优的头部相关传递函数
Pub Date : 2017-10-01 DOI: 10.1109/SiPS.2017.8110008
Mikko Parviainen, Pasi Pertilä
This article presents a method to obtain personalized Head-Related Transfer Functions (HRTFs) for creating virtual soundscapes based on small amount of measurements. The best matching set of HRTFs are selected among the entries from publicly available databases. The proposed method is evaluated using a listening test where subjects assess the audio samples created using the best matching set of HRTFs against a randomly chosen set of HRTFs from the same location. The listening test indicates that subjects prefer the proposed method over random set of HRTFs.
本文提出了一种方法,以获得个性化的头部相关传递函数(HRTFs),以创建基于少量测量的虚拟声景。从公开数据库的条目中选择最佳的hrtf匹配集。使用听力测试来评估所提出的方法,其中受试者评估使用最佳匹配的hrtf集与来自同一位置的随机选择的hrtf集创建的音频样本。听力测试表明,受试者更喜欢所提出的方法,而不是随机设置的hrtf。
{"title":"Obtaining an optimal set of head-related transfer functions with a small amount of measurements","authors":"Mikko Parviainen, Pasi Pertilä","doi":"10.1109/SiPS.2017.8110008","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8110008","url":null,"abstract":"This article presents a method to obtain personalized Head-Related Transfer Functions (HRTFs) for creating virtual soundscapes based on small amount of measurements. The best matching set of HRTFs are selected among the entries from publicly available databases. The proposed method is evaluated using a listening test where subjects assess the audio samples created using the best matching set of HRTFs against a randomly chosen set of HRTFs from the same location. The listening test indicates that subjects prefer the proposed method over random set of HRTFs.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132613771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CRN-based design methodology for synchronous sequential logic 基于crn的同步顺序逻辑设计方法
Pub Date : 2017-10-01 DOI: 10.1109/SiPS.2017.8109979
Zhiwei Zhong, Lulu Ge, Ziyuan Shen, X. You, Chuan Zhang
With the aid of a storage-release mechanism named key-keysmith, an implementation approach based on chemical reaction networks (CRNs) for synchronous sequential logic is proposed. This design approach, which stores logic information in keysmith and releases it through key, primarily focuses on the underlying state transitions behind the required logic rather than the electronic circuit representation. Therefore, it can be uniformly and easily employed to implement any synchronous sequential logic with molecular reactions. Theoretical analysis and numerical simulations have demonstrated the robustness and universality of the proposed approach.
借助一种名为key-keysmith的存储-释放机制,提出了一种基于化学反应网络(crn)的同步顺序逻辑实现方法。这种将逻辑信息存储在keysmith中并通过key释放的设计方法主要关注所需逻辑背后的底层状态转换,而不是电子电路表示。因此,它可以统一和容易地实现与分子反应的任何同步顺序逻辑。理论分析和数值仿真证明了该方法的鲁棒性和通用性。
{"title":"CRN-based design methodology for synchronous sequential logic","authors":"Zhiwei Zhong, Lulu Ge, Ziyuan Shen, X. You, Chuan Zhang","doi":"10.1109/SiPS.2017.8109979","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8109979","url":null,"abstract":"With the aid of a storage-release mechanism named key-keysmith, an implementation approach based on chemical reaction networks (CRNs) for synchronous sequential logic is proposed. This design approach, which stores logic information in keysmith and releases it through key, primarily focuses on the underlying state transitions behind the required logic rather than the electronic circuit representation. Therefore, it can be uniformly and easily employed to implement any synchronous sequential logic with molecular reactions. Theoretical analysis and numerical simulations have demonstrated the robustness and universality of the proposed approach.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134310204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Processing LSTM in memory using hybrid network expansion model 使用混合网络扩展模型处理内存中的LSTM
Pub Date : 2017-10-01 DOI: 10.1109/SiPS.2017.8110011
Yu Gong, Tingting Xu, Bo Liu, Wei-qi Ge, Jinjiang Yang, Jun Yang, Longxing Shi
With the rapidly increasing applications of deep learning, LSTM-RNNs are widely used. Meanwhile, the complex data dependence and intensive computation limit the performance of the accelerators. In this paper, we first proposed a hybrid network expansion model to exploit the finegrained data parallelism. Based on the model, we implemented a Reconfigurable Processing Unit(RPU) using Processing In Memory(PIM) units. Our work shows that the gates and cells in LSTM can be partitioned to fundamental operations and then recombined and mapped into heterogeneous computing components. The experimental results show that, implemented on 45nm CMOS process, the proposed RPU with size of 1.51 mm2 and power of 413 mw achieves 309 GOPS/W in power efficiency, and is 1.7 χ better than state-of-the-art reconfigurable architecture.
随着深度学习应用的迅速增加,lstm - rnn得到了广泛的应用。同时,复杂的数据依赖性和密集的计算量限制了加速器的性能。本文首先提出了一种利用细粒度数据并行性的混合网络扩展模型。基于该模型,我们使用内存处理(PIM)单元实现了可重构处理单元(RPU)。我们的工作表明,LSTM中的门和单元可以划分为基本操作,然后重新组合并映射为异构计算组件。实验结果表明,在45nm CMOS工艺上实现的RPU尺寸为1.51 mm2,功耗为413 mw,功率效率为309 GOPS/W,比目前最先进的可重构架构提高1.7 χ。
{"title":"Processing LSTM in memory using hybrid network expansion model","authors":"Yu Gong, Tingting Xu, Bo Liu, Wei-qi Ge, Jinjiang Yang, Jun Yang, Longxing Shi","doi":"10.1109/SiPS.2017.8110011","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8110011","url":null,"abstract":"With the rapidly increasing applications of deep learning, LSTM-RNNs are widely used. Meanwhile, the complex data dependence and intensive computation limit the performance of the accelerators. In this paper, we first proposed a hybrid network expansion model to exploit the finegrained data parallelism. Based on the model, we implemented a Reconfigurable Processing Unit(RPU) using Processing In Memory(PIM) units. Our work shows that the gates and cells in LSTM can be partitioned to fundamental operations and then recombined and mapped into heterogeneous computing components. The experimental results show that, implemented on 45nm CMOS process, the proposed RPU with size of 1.51 mm2 and power of 413 mw achieves 309 GOPS/W in power efficiency, and is 1.7 χ better than state-of-the-art reconfigurable architecture.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133414533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Successive cancellation decoder for very long polar codes 超长极性码的连续对消解码器
Pub Date : 2017-10-01 DOI: 10.1109/SiPS.2017.8110022
B. Gal, Camille Leroux, C. Jégo
Polar codes are a family of error correcting codes that achieves the symmetric capacity of memoryless channels when the code length N tends to infinity. However, moderate code lengths are required in most of wireless digital applications to limit the decoding latency. In some other applications, such as optical communications or quantum key distribution, the latency introduced by very long codes is not an issue. The main challenge is to design codes with the best error correction capability, a tractable complexity and a high throughput. In such a context, SC decoding is an interesting solution because its performance improves with N while the computational complexity scales almost linearly. In this paper, we propose to improve the scalability of SC decoders thanks to four architectural optimizations. The resulting SC decoder is implemented on an FPGA device and favorably compares with state-of-the-art scalable SC decoders. Moreover, a 222 polar code SC decoder is implemented on a Stratix-5 FPGA. This code length is twice larger than the ones achieved in previous works. To the best of our knowledge, this is the first architecture for which a N = 4 million bits polar code can be actually decoded on a reconfigurable circuit.
极性码是一类纠错码,当码长N趋于无穷大时,可以达到无记忆信道的对称容量。然而,在大多数无线数字应用中,需要适当的码长来限制解码延迟。在其他一些应用中,例如光通信或量子密钥分发,由非常长的代码引入的延迟不是问题。主要的挑战是设计具有最佳纠错能力、可处理的复杂性和高吞吐量的代码。在这种情况下,SC解码是一个有趣的解决方案,因为它的性能随着N的增加而提高,而计算复杂度几乎呈线性增长。在本文中,我们提出通过四个架构优化来提高SC解码器的可扩展性。由此产生的SC解码器在FPGA器件上实现,与最先进的可扩展SC解码器相比具有优势。此外,在Stratix-5 FPGA上实现了222极性码SC解码器。这个代码长度比以前的工作中实现的代码长度大两倍。据我们所知,这是第一个可以在可重构电路上解码N = 400万比特极性代码的架构。
{"title":"Successive cancellation decoder for very long polar codes","authors":"B. Gal, Camille Leroux, C. Jégo","doi":"10.1109/SiPS.2017.8110022","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8110022","url":null,"abstract":"Polar codes are a family of error correcting codes that achieves the symmetric capacity of memoryless channels when the code length N tends to infinity. However, moderate code lengths are required in most of wireless digital applications to limit the decoding latency. In some other applications, such as optical communications or quantum key distribution, the latency introduced by very long codes is not an issue. The main challenge is to design codes with the best error correction capability, a tractable complexity and a high throughput. In such a context, SC decoding is an interesting solution because its performance improves with N while the computational complexity scales almost linearly. In this paper, we propose to improve the scalability of SC decoders thanks to four architectural optimizations. The resulting SC decoder is implemented on an FPGA device and favorably compares with state-of-the-art scalable SC decoders. Moreover, a 222 polar code SC decoder is implemented on a Stratix-5 FPGA. This code length is twice larger than the ones achieved in previous works. To the best of our knowledge, this is the first architecture for which a N = 4 million bits polar code can be actually decoded on a reconfigurable circuit.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128100582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Statistical analysis of Post-HEVC encoded videos 后hevc编码视频的统计分析
Pub Date : 2017-10-01 DOI: 10.1109/SiPS.2017.8110020
A. Jallouli, Fatma Belghith, M. A. B. Ayed, W. Hamidouche, J. Nezan, N. Masmoudi
The Post-HEVC is the emerging video coding standard beyond the High Efficiency Video Coding (HEVC) standard. It is more complex in transformation and prediction steps but it offers the opportunity of 3D and 360° videos coding and compression. This paper presents different statistical analyzes of Post-HEVC encoded videos especially analysis on 1D and 2D transformation types and analysis on intra and inter prediction types of some test videos for different classes and resolutions. Analyzes are carried out at the decoder level where the coding decision has already been taken by the encoder. Results show that the choice of transformation (type and size) and the prediction type (intra or inter) depends on the nature of video: motion and texture. This work can be considered as a milestone for proposing intelligent algorithms based on video characteristics to perform fast decision in the Post-HEVC encoding process.
后HEVC是继HEVC (High Efficiency video coding)标准之后的新兴视频编码标准。它在转换和预测步骤上比较复杂,但为3D和360°视频编码和压缩提供了机会。本文对hevc编码后的视频进行了不同的统计分析,特别是对一维和二维变换类型的分析,以及对不同类别和分辨率的一些测试视频的内预测和间预测类型的分析。在编码器已经做出编码决定的地方,在解码器级别进行分析。结果表明,变换(类型和大小)和预测类型(内部或内部)的选择取决于视频的性质:运动和纹理。这项工作可以被认为是提出基于视频特征的智能算法在后hevc编码过程中进行快速决策的里程碑。
{"title":"Statistical analysis of Post-HEVC encoded videos","authors":"A. Jallouli, Fatma Belghith, M. A. B. Ayed, W. Hamidouche, J. Nezan, N. Masmoudi","doi":"10.1109/SiPS.2017.8110020","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8110020","url":null,"abstract":"The Post-HEVC is the emerging video coding standard beyond the High Efficiency Video Coding (HEVC) standard. It is more complex in transformation and prediction steps but it offers the opportunity of 3D and 360° videos coding and compression. This paper presents different statistical analyzes of Post-HEVC encoded videos especially analysis on 1D and 2D transformation types and analysis on intra and inter prediction types of some test videos for different classes and resolutions. Analyzes are carried out at the decoder level where the coding decision has already been taken by the encoder. Results show that the choice of transformation (type and size) and the prediction type (intra or inter) depends on the nature of video: motion and texture. This work can be considered as a milestone for proposing intelligent algorithms based on video characteristics to perform fast decision in the Post-HEVC encoding process.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114180474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Low complexity hardware accelerator for nD FastICA based on coordinate rotation 基于坐标旋转的nD - FastICA低复杂度硬件加速器
Pub Date : 2017-10-01 DOI: 10.1109/SiPS.2017.8110000
Swati Bhardwaj, Shashank Raghuraman, A. Acharyya
This paper proposes a low complex hardware accelerator algorithmic modification for n-dimensional (nD) FastICA methodology based on Coordinate Rotation Digital Computer (CORDIC) to attain high computation speed. The most complex and time consuming update stage and convergence check required for computation of the nth weight vector are eliminated in the proposed methodology. Using the Gram-Schmidt Orthogonalization stage and normalization stage to calculate nth weight vector in an entirely sequential procedure of CORDIC-based FastICA results in a significant gain in terms of the computation time. The proposed methodology has been functionally verified and validated by applying it for separating 6D speech signals. It has been implemented on hardware using Verilog HDL and synthesized using UMC 180nm technology. The average improvement in computation time obtained by using the proposed methodology for 4D to 6D FastICA with 1024 samples, considering the minimum case of two iterations for nth stage, was found to be 98.79 %.
本文提出了一种基于坐标旋转数字计算机(CORDIC)的低复杂度硬件加速器算法对n维(nD) FastICA方法的改进,以获得较高的计算速度。该方法消除了计算第n个权向量所需的最复杂、最耗时的更新阶段和收敛性检查。在基于cordic的FastICA的完全顺序过程中,使用Gram-Schmidt正交化阶段和归一化阶段来计算第n个权重向量,结果在计算时间方面有显着的增益。通过对6D语音信号的分离,对该方法进行了功能验证和验证。采用Verilog HDL在硬件上实现,采用UMC 180nm工艺合成。考虑到第n阶段两次迭代的最小情况,采用所提出的方法对1024个样本的4D到6D FastICA计算时间的平均改进为98.79%。
{"title":"Low complexity hardware accelerator for nD FastICA based on coordinate rotation","authors":"Swati Bhardwaj, Shashank Raghuraman, A. Acharyya","doi":"10.1109/SiPS.2017.8110000","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8110000","url":null,"abstract":"This paper proposes a low complex hardware accelerator algorithmic modification for n-dimensional (nD) FastICA methodology based on Coordinate Rotation Digital Computer (CORDIC) to attain high computation speed. The most complex and time consuming update stage and convergence check required for computation of the nth weight vector are eliminated in the proposed methodology. Using the Gram-Schmidt Orthogonalization stage and normalization stage to calculate nth weight vector in an entirely sequential procedure of CORDIC-based FastICA results in a significant gain in terms of the computation time. The proposed methodology has been functionally verified and validated by applying it for separating 6D speech signals. It has been implemented on hardware using Verilog HDL and synthesized using UMC 180nm technology. The average improvement in computation time obtained by using the proposed methodology for 4D to 6D FastICA with 1024 samples, considering the minimum case of two iterations for nth stage, was found to be 98.79 %.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124309218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
FPGA implementation of object recognition processor for HDTV resolution video using sparse FIND feature 基于稀疏FIND特征的HDTV分辨率视频目标识别处理器的FPGA实现
Pub Date : 2017-10-01 DOI: 10.1109/SiPS.2017.8109993
Yuri Nishizumi, Go Matsukawa, K. Kajihara, T. Kodama, S. Izumi, H. Kawaguchi, C. Nakanishi, Toshio Goto, Takeo Kato, M. Yoshimoto
This paper describes FPGA implementation of object recognition processor for HDTV resolution 30 fps video using the Sparse FIND feature. Two-stage feature extraction processing by HOG and Sparse FIND, a highly parallel classification in the support vector machine (SVM), and a block-parallel processing for RAM access cycle reduction are proposed to perform a real time object recognition with enormous computational complexity. From implementation of the proposed architecture in the FPGA, it was confirmed that detection using the Sparse FIND feature was performed for HDTV images at 47.63 fps, on average, at 90 MHz. The recognition accuracy degradation from the original Sparse FIND-base object detection algorithm implemented on software was 0.5%, which shows that the FPGA system provides sufficient accuracy for practical use.
本文介绍了利用稀疏查找特性实现HDTV分辨率为30fps视频的目标识别处理器的FPGA实现。提出了基于HOG和Sparse FIND的两阶段特征提取处理、支持向量机(SVM)的高度并行分类和减少RAM访问周期的块并行处理来实现具有巨大计算复杂度的实时目标识别。通过在FPGA中实现所提出的架构,可以确认使用Sparse FIND特征对HDTV图像进行检测,平均帧率为47.63 fps,频率为90 MHz。在软件上实现的基于Sparse find的原始目标检测算法的识别精度下降了0.5%,表明FPGA系统具有足够的实际应用精度。
{"title":"FPGA implementation of object recognition processor for HDTV resolution video using sparse FIND feature","authors":"Yuri Nishizumi, Go Matsukawa, K. Kajihara, T. Kodama, S. Izumi, H. Kawaguchi, C. Nakanishi, Toshio Goto, Takeo Kato, M. Yoshimoto","doi":"10.1109/SiPS.2017.8109993","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8109993","url":null,"abstract":"This paper describes FPGA implementation of object recognition processor for HDTV resolution 30 fps video using the Sparse FIND feature. Two-stage feature extraction processing by HOG and Sparse FIND, a highly parallel classification in the support vector machine (SVM), and a block-parallel processing for RAM access cycle reduction are proposed to perform a real time object recognition with enormous computational complexity. From implementation of the proposed architecture in the FPGA, it was confirmed that detection using the Sparse FIND feature was performed for HDTV images at 47.63 fps, on average, at 90 MHz. The recognition accuracy degradation from the original Sparse FIND-base object detection algorithm implemented on software was 0.5%, which shows that the FPGA system provides sufficient accuracy for practical use.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116800175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Robust compressed analysis using subspace-based dictionary for ECG telemonitoring systems 基于子空间字典的心电远程监护系统鲁棒压缩分析
Pub Date : 2017-10-01 DOI: 10.1109/SiPS.2017.8110016
Meng-Ya Tsai, Ching-Yao Chou, A. Wu
To realize Electrocardiography (ECG) signals monitoring systems, compressive sensing (CS) is a new technique to reduce power of biosensors and data transmission. Instead of spending high complexity on reconstructing back to data domain to do signal analysis, compressed analysis (CA) exploits the data structure preserved by CS to directly analyze in the compressed domain. However, compressively-sensed signals contaminated by interference cause learning performance degradation. Meanwhile, traditional interference removal methods are developed for signals in data domain, which involve reconstruction. In this paper, we propose a new CA framework using pre-trained subspace-based dictionary to project interfered and compressed data onto the subspace with high learnability and low complexity. Through simulations, we show that our technique enables 5.64% improvements on accuracy of detection compared with conventional CA, and reduces 99% complexity compared with reconstructed analysis.
为了实现心电信号监测系统,压缩感知(CS)是一种降低生物传感器功耗和数据传输的新技术。压缩分析(CA)利用压缩分析保留的数据结构直接在压缩域中进行分析,而不是花费高复杂度的重构回数据域进行信号分析。然而,被干扰污染的压缩感知信号会导致学习性能下降。同时,对数据域中的信号发展了传统的干扰去除方法,这些方法涉及重构。在本文中,我们提出了一种新的CA框架,利用预训练的基于子空间的字典将干扰和压缩的数据投影到具有高学习性和低复杂度的子空间上。仿真结果表明,与传统CA相比,该方法的检测精度提高了5.64%,与重构分析相比,复杂度降低了99%。
{"title":"Robust compressed analysis using subspace-based dictionary for ECG telemonitoring systems","authors":"Meng-Ya Tsai, Ching-Yao Chou, A. Wu","doi":"10.1109/SiPS.2017.8110016","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8110016","url":null,"abstract":"To realize Electrocardiography (ECG) signals monitoring systems, compressive sensing (CS) is a new technique to reduce power of biosensors and data transmission. Instead of spending high complexity on reconstructing back to data domain to do signal analysis, compressed analysis (CA) exploits the data structure preserved by CS to directly analyze in the compressed domain. However, compressively-sensed signals contaminated by interference cause learning performance degradation. Meanwhile, traditional interference removal methods are developed for signals in data domain, which involve reconstruction. In this paper, we propose a new CA framework using pre-trained subspace-based dictionary to project interfered and compressed data onto the subspace with high learnability and low complexity. Through simulations, we show that our technique enables 5.64% improvements on accuracy of detection compared with conventional CA, and reduces 99% complexity compared with reconstructed analysis.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128676923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2017 IEEE International Workshop on Signal Processing Systems (SiPS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1