2019 IEEE International Workshop on Signal Processing Systems (SiPS)最新文献

英文中文

SiPS 2019 Conference Committee SiPS 2019会议委员会

2019 IEEE International Workshop on Signal Processing Systems (SiPS)

Pub Date : 2019-10-01 DOI: 10.1109/sips47522.2019.9020636

引用次数: 0

A Hybrid GPU + FPGA System Design for Autonomous Driving Cars 面向自动驾驶汽车的GPU + FPGA混合系统设计

2019 IEEE International Workshop on Signal Processing Systems (SiPS)

Pub Date : 2019-10-01 DOI: 10.1109/SiPS47522.2019.9020540

Cong Hao, Junli Gu, Deming Chen, A. Sarwari, Zhijie Jin, Husam Abu-Haimed, Daryl Sew, Yuhong Li, Xinheng Liu, Bryan Wu, Dongdong Fu

Autonomous driving cars need highly complex hardware and software systems, which require high performance computing platforms in order to enable a real time AI-based perception and decision making pipeline. The industry has been exploring various in-vehicle accelerators such as GPUs, ASICs and FPGAs. Yet the autonomous driving platform design is far from mature when taking into account of system reliability, redundancy and higher level of autonomy. In this work, we propose a hybrid computing system design, which integrates a GPU as the primary computing system and a FPGA as a secondary system. This hybrid system architecture has multiple advantages: 1) The FPGA can be constantly running as a complementary system with very short latency, helping to detect main system failure and anomalous behavior, contributing to system functionality verification and reliability. 2) If the primary system fails (mostly from sensor or interconnection error), the FPGA will quickly detect the failure and run a safe-mode task with a subset of sensors. 3) The FPGA can be used as an independent computing system to run extra algorithm components to improve the overall system autonomy. For example, FPGA can handle driver monitoring tasks while GPU focuses on driving functions. Together they can boost the driving function from L2 (constantly requires driver’s attention) to L3 (allows driver to mind off for 10 seconds). This paper defines how such a system works, discusses various use cases and potential design challenges, and shares some initial results and insights about how to make such a system deliver the maximum value for autonomous driving.

自动驾驶汽车需要高度复杂的硬件和软件系统，这需要高性能的计算平台，以实现基于人工智能的实时感知和决策管道。业界一直在探索各种车载加速器，如gpu、asic和fpga。然而，考虑到系统可靠性、冗余度和更高的自主程度，自动驾驶平台的设计还远远不够成熟。在这项工作中，我们提出了一种混合计算系统的设计，该系统将GPU作为主要计算系统，FPGA作为次要系统。这种混合系统架构具有多种优点:1)FPGA可以作为一个互补系统持续运行，具有非常短的延迟，有助于检测主系统故障和异常行为，有助于系统功能验证和可靠性。2)如果主系统发生故障(主要来自传感器或互连错误)，FPGA将快速检测故障并使用传感器子集运行安全模式任务。3) FPGA可以作为一个独立的计算系统来运行额外的算法组件，以提高整个系统的自主性。例如，FPGA可以处理驱动程序监控任务，而GPU专注于驱动功能。它们一起可以将驾驶功能从L2(持续需要驾驶员的注意力)提升到L3(允许驾驶员分心10秒)。本文定义了这样一个系统是如何工作的，讨论了各种用例和潜在的设计挑战，并分享了一些关于如何使这样一个系统为自动驾驶提供最大价值的初步结果和见解。

{"title":"A Hybrid GPU + FPGA System Design for Autonomous Driving Cars","authors":"Cong Hao, Junli Gu, Deming Chen, A. Sarwari, Zhijie Jin, Husam Abu-Haimed, Daryl Sew, Yuhong Li, Xinheng Liu, Bryan Wu, Dongdong Fu","doi":"10.1109/SiPS47522.2019.9020540","DOIUrl":"https://doi.org/10.1109/SiPS47522.2019.9020540","url":null,"abstract":"Autonomous driving cars need highly complex hardware and software systems, which require high performance computing platforms in order to enable a real time AI-based perception and decision making pipeline. The industry has been exploring various in-vehicle accelerators such as GPUs, ASICs and FPGAs. Yet the autonomous driving platform design is far from mature when taking into account of system reliability, redundancy and higher level of autonomy. In this work, we propose a hybrid computing system design, which integrates a GPU as the primary computing system and a FPGA as a secondary system. This hybrid system architecture has multiple advantages: 1) The FPGA can be constantly running as a complementary system with very short latency, helping to detect main system failure and anomalous behavior, contributing to system functionality verification and reliability. 2) If the primary system fails (mostly from sensor or interconnection error), the FPGA will quickly detect the failure and run a safe-mode task with a subset of sensors. 3) The FPGA can be used as an independent computing system to run extra algorithm components to improve the overall system autonomy. For example, FPGA can handle driver monitoring tasks while GPU focuses on driving functions. Together they can boost the driving function from L2 (constantly requires driver’s attention) to L3 (allows driver to mind off for 10 seconds). This paper defines how such a system works, discusses various use cases and potential design challenges, and shares some initial results and insights about how to make such a system deliver the maximum value for autonomous driving.","PeriodicalId":256971,"journal":{"name":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122560559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Modified Complementary Joint Sparse Representations: A Novel Post-Filtering to MVDR Beamforming 改进互补联合稀疏表示:一种新的MVDR波束形成后滤波方法

2019 IEEE International Workshop on Signal Processing Systems (SiPS)

Pub Date : 2019-10-01 DOI: 10.1109/SiPS47522.2019.9020522

Yuanyuan Zhu, Jiafei Fu, Xu Xu, Z. Ye

Post-filtering is a popular technique for multichannel speech enhancement system, in order to further improve the speech quality and intelligibility after beamforming. This paper presents a novel post-filtering to a minimum variance distortionless response (MVDR) beamforming which is a single-channel modified complementary joint sparse representations (M-CJSR) method. First, MVDR beamformer is used to suppress interference and noise. Subsequently, the proposed M-CJSR approach based on joint dictionary learning is applied as a single microphone post-filter to process the beamformer output. Different from the existing post-filtering techniques which rely on the assumptions about the noise field, this algorithm considers a more generalized signal model including the ambient noise, like diffuse noise or white noise, as well as the point-source interference. Moreover, the original CJSR method is extended to jointly learn dictionaries for not only the mappings from mixture to speech and noise, but also the mapping from mixture to interference. In order to take the complementary advantages of different sparse representations, we design the weighting parameters based on the residual components of the estimated signals. An experimental study which consists of objective evaluations under various conditions verifies the superiority of the proposed algorithm compared to other state-of-the-art methods.

为了进一步提高波束形成后的语音质量和清晰度，后滤波是多通道语音增强系统中常用的一种技术。提出了一种新的最小方差无失真响应(MVDR)波束形成后滤波方法，即单通道修正互补联合稀疏表示(M-CJSR)方法。首先，采用MVDR波束形成器抑制干扰和噪声。随后，采用基于联合字典学习的M-CJSR方法作为单麦克风后滤波器处理波束形成器输出。与现有的后滤波技术依赖于对噪声场的假设不同，该算法考虑了一个更广义的信号模型，包括环境噪声，如漫射噪声或白噪声，以及点源干扰。同时，将原有的CJSR方法扩展到混合到语音和噪声的映射，以及混合到干扰的映射，共同学习字典。为了发挥不同稀疏表示的互补优势，我们根据估计信号的残差分量设计加权参数。在各种条件下进行客观评价的实验研究，验证了该算法与其他先进方法相比的优越性。

{"title":"Modified Complementary Joint Sparse Representations: A Novel Post-Filtering to MVDR Beamforming","authors":"Yuanyuan Zhu, Jiafei Fu, Xu Xu, Z. Ye","doi":"10.1109/SiPS47522.2019.9020522","DOIUrl":"https://doi.org/10.1109/SiPS47522.2019.9020522","url":null,"abstract":"Post-filtering is a popular technique for multichannel speech enhancement system, in order to further improve the speech quality and intelligibility after beamforming. This paper presents a novel post-filtering to a minimum variance distortionless response (MVDR) beamforming which is a single-channel modified complementary joint sparse representations (M-CJSR) method. First, MVDR beamformer is used to suppress interference and noise. Subsequently, the proposed M-CJSR approach based on joint dictionary learning is applied as a single microphone post-filter to process the beamformer output. Different from the existing post-filtering techniques which rely on the assumptions about the noise field, this algorithm considers a more generalized signal model including the ambient noise, like diffuse noise or white noise, as well as the point-source interference. Moreover, the original CJSR method is extended to jointly learn dictionaries for not only the mappings from mixture to speech and noise, but also the mapping from mixture to interference. In order to take the complementary advantages of different sparse representations, we design the weighting parameters based on the residual components of the estimated signals. An experimental study which consists of objective evaluations under various conditions verifies the superiority of the proposed algorithm compared to other state-of-the-art methods.","PeriodicalId":256971,"journal":{"name":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128079066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

[SiPS 2019 Title Page] [SiPS 2019标题页]

2019 IEEE International Workshop on Signal Processing Systems (SiPS)

Pub Date : 2019-10-01 DOI: 10.1109/sips47522.2019.9020313

引用次数: 0

Improving Reliability of ReRAM-Based DNN Implementation through Novel Weight Distribution 通过新的权值分布提高基于reram的深度神经网络实现的可靠性

2019 IEEE International Workshop on Signal Processing Systems (SiPS)

Pub Date : 2019-10-01 DOI: 10.1109/SiPS47522.2019.9020318

Jingtao Li, Manqing Mao, C. Chakrabarti

Binary deep neural networks, that have been implemented in resistive random access memory (ReRAM) for storage efficiency, suffer from poor recognition performance in the presence of hardware errors. This paper addresses this problem by deriving a novel weight distribution and representation scheme that mitigates errors due to faulty ReRAM cells with minimal storage overhead. In the proposed scheme, the weight matrix is partitioned into grains, and each weight in a grain is represented by the sum of a multi-bit mean and a 1-bit deviation. The grain size as well as the mean to deviation ratio of the weights in a grain can be chosen such that the network is resilient to hardware errors. A hybrid processing-in-memory (PIM) architecture is proposed to support this scheme. The mean values are stored in a small SRAM and processed by a CMOS unit, and the deviations are stored and processed by the ReRAM unit. Compared to the baseline binary neural network which fails in the presence of severe hardware errors, the proposed hybrid scheme has only a mild recognition performance degradation. Simulation results show the proposed scheme achieves 97.84% test accuracy (a 0.84% accuracy drop) on a MNIST dataset, and 88.07% test accuracy (a 1.10% accuracy drop) on a CIFAR-10 dataset under 9.04% stuck-at-1 and 1.75% stuck-at-0 faults.

为了提高存储效率，二进制深度神经网络已经在电阻式随机存取存储器(ReRAM)中实现，但在存在硬件错误的情况下，其识别性能较差。本文通过推导一种新的权重分布和表示方案来解决这个问题，该方案以最小的存储开销减轻了由错误的ReRAM单元引起的错误。在该方案中，权重矩阵被划分为多个颗粒，每个颗粒中的每个权重由多位平均值和1位偏差的和表示。可以选择颗粒大小以及颗粒中权重的平均偏差比，从而使网络对硬件错误具有弹性。提出了一种支持该方案的混合内存处理(PIM)架构。平均值存储在一个小的SRAM中并由CMOS单元处理，偏差由ReRAM单元存储和处理。与基线二值神经网络在存在严重硬件错误时失效相比，所提出的混合方案只有轻微的识别性能下降。仿真结果表明，在9.04%卡在1和1.75%卡在0故障情况下，该方案在MNIST数据集上的测试准确率为97.84%(下降0.84%)，在CIFAR-10数据集上的测试准确率为88.07%(下降1.10%)。

{"title":"Improving Reliability of ReRAM-Based DNN Implementation through Novel Weight Distribution","authors":"Jingtao Li, Manqing Mao, C. Chakrabarti","doi":"10.1109/SiPS47522.2019.9020318","DOIUrl":"https://doi.org/10.1109/SiPS47522.2019.9020318","url":null,"abstract":"Binary deep neural networks, that have been implemented in resistive random access memory (ReRAM) for storage efficiency, suffer from poor recognition performance in the presence of hardware errors. This paper addresses this problem by deriving a novel weight distribution and representation scheme that mitigates errors due to faulty ReRAM cells with minimal storage overhead. In the proposed scheme, the weight matrix is partitioned into grains, and each weight in a grain is represented by the sum of a multi-bit mean and a 1-bit deviation. The grain size as well as the mean to deviation ratio of the weights in a grain can be chosen such that the network is resilient to hardware errors. A hybrid processing-in-memory (PIM) architecture is proposed to support this scheme. The mean values are stored in a small SRAM and processed by a CMOS unit, and the deviations are stored and processed by the ReRAM unit. Compared to the baseline binary neural network which fails in the presence of severe hardware errors, the proposed hybrid scheme has only a mild recognition performance degradation. Simulation results show the proposed scheme achieves 97.84% test accuracy (a 0.84% accuracy drop) on a MNIST dataset, and 88.07% test accuracy (a 1.10% accuracy drop) on a CIFAR-10 dataset under 9.04% stuck-at-1 and 1.75% stuck-at-0 faults.","PeriodicalId":256971,"journal":{"name":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121669337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Pilot-Assisted Methods for Channel Estimation in MIMO-V-OFDM Systems MIMO-V-OFDM系统中导频辅助信道估计方法

2019 IEEE International Workshop on Signal Processing Systems (SiPS)

Pub Date : 2019-10-01 DOI: 10.1109/SiPS47522.2019.9020482

Wei Zhang, Xuyang Gao, Yibing Shi

Multiple-input multiple-output (MIMO) with Orthogonal Frequency Division Multiplexing (OFDM) technology has both the advantages of MIMO and OFDM. Vector Orthogonal Frequency Division Multiplexing (V-OFDM) is an extension of OFDM, which makes data transmission flexible. In MIMO systems using V-OFDM technology, different novel schemes are proposed to improve channel estimation performance for different channel sparsity. The 2-D Kriging interpolation scheme is proposed for the non-sparse channels, which can significantly improve the performance of conventional Least Square (LS) and Minimum Mean Square Error (MMSE) algorithms. When the channel is sparse, the estimation process can be modeled as a sparse recovery problem using compressed sensing (CS) theory. In this paper, the measurement matrix is determined by pilot locations, and a pilot search algorithm based on random genetic algorithm (RGA) is proposed to minimize the cross-correlation value of the measurement matrix. Besides, a variable threshold sparsity adaptive matching pursuit (VTSAMP) algorithm is designed to obtain more accurate estimates, which achieves better Normalized Mean Square Error (NMSE) performance, higher calculation speed, and lower implementation complexity.

采用正交频分复用(OFDM)技术的多输入多输出(MIMO)具有MIMO和OFDM的优点。矢量正交频分复用(V-OFDM)是OFDM的一种扩展，它使数据传输更加灵活。在采用V-OFDM技术的MIMO系统中，针对不同的信道稀疏度，提出了不同的信道估计方案来提高信道估计性能。针对非稀疏信道，提出了二维Kriging插值方法，该方法可以显著提高传统最小二乘(LS)和最小均方误差(MMSE)算法的性能。当信道是稀疏的，估计过程可以用压缩感知(CS)理论建模为一个稀疏恢复问题。本文根据导频位置确定测量矩阵，提出了一种基于随机遗传算法(RGA)的导频搜索算法，以最小化测量矩阵的互相关值。此外，设计了可变阈值稀疏度自适应匹配追踪(VTSAMP)算法以获得更精确的估计，从而获得更好的归一化均方误差(NMSE)性能、更高的计算速度和更低的实现复杂度。

{"title":"Pilot-Assisted Methods for Channel Estimation in MIMO-V-OFDM Systems","authors":"Wei Zhang, Xuyang Gao, Yibing Shi","doi":"10.1109/SiPS47522.2019.9020482","DOIUrl":"https://doi.org/10.1109/SiPS47522.2019.9020482","url":null,"abstract":"Multiple-input multiple-output (MIMO) with Orthogonal Frequency Division Multiplexing (OFDM) technology has both the advantages of MIMO and OFDM. Vector Orthogonal Frequency Division Multiplexing (V-OFDM) is an extension of OFDM, which makes data transmission flexible. In MIMO systems using V-OFDM technology, different novel schemes are proposed to improve channel estimation performance for different channel sparsity. The 2-D Kriging interpolation scheme is proposed for the non-sparse channels, which can significantly improve the performance of conventional Least Square (LS) and Minimum Mean Square Error (MMSE) algorithms. When the channel is sparse, the estimation process can be modeled as a sparse recovery problem using compressed sensing (CS) theory. In this paper, the measurement matrix is determined by pilot locations, and a pilot search algorithm based on random genetic algorithm (RGA) is proposed to minimize the cross-correlation value of the measurement matrix. Besides, a variable threshold sparsity adaptive matching pursuit (VTSAMP) algorithm is designed to obtain more accurate estimates, which achieves better Normalized Mean Square Error (NMSE) performance, higher calculation speed, and lower implementation complexity.","PeriodicalId":256971,"journal":{"name":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130069349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

An Efficient Polynomial Multiplier Architecture for the Bootstrapping Algorithm in a Fully Homomorphic Encryption Scheme 全同态加密方案中自举算法的高效多项式乘法器结构

2019 IEEE International Workshop on Signal Processing Systems (SiPS)

Pub Date : 2019-10-01 DOI: 10.1109/SiPS47522.2019.9020592

Weihang Tan, Aengran Au, Benjamin Aase, S. Aao, Yingjie Lao

Bootstrapping algorithm, which is the intermediate refreshing procedure of a processed ciphertext, has been the performance bottleneck among various existing Fully Homomorphic Encryption (FHE) schemes. Specifically, the external product of polynomials is the most computationally expensive step of bootstrapping algorithms that are based on the Ring Learning With Error (RLWE) problem. In this paper, we design a novel and scalable polynomial multiplier architecture for a bootstrapping algorithm along with a conflict-free memory management scheme to reduce the latency, while achieving a full utilization of the processing elements (PEs). Each PE is a modified radix-2 butterfly unit from fast Fourier transform (FFT), which can be reconfigured to use in both the number theoretic transform (NTT) and the basic modular multiplication of polynomial multiplication in the external product step. The experimental results show that our design yields 33% less area-time product than prior designs.

自引导算法作为处理后的密文的中间刷新过程，一直是现有各种完全同态加密方案的性能瓶颈。具体来说，多项式的外部积是基于带误差环学习(RLWE)问题的自举算法中计算开销最大的步骤。在本文中，我们为自启动算法设计了一种新颖的可伸缩多项式乘法器架构，并采用无冲突的内存管理方案来减少延迟，同时实现了处理元素(pe)的充分利用。每个PE都是快速傅里叶变换(FFT)中改进的基数-2蝴蝶单位，可以重新配置用于数论变换(NTT)和外部积步骤中多项式乘法的基本模乘法。实验结果表明，我们的设计比以前的设计减少了33%的面积时间产品。

引用次数: 4

A Survey of Computation-Driven Data Encoding 计算驱动数据编码研究综述

2019 IEEE International Workshop on Signal Processing Systems (SiPS)

Pub Date : 2019-10-01 DOI: 10.1109/SiPS47522.2019.9020519

Weikang Qian, Runsheng Wang, Yuan Wang, Marc D. Riedel, Ru Huang

Although the metal-oxide-semiconductor field-effect transistor (MOSFET) has been the dominant device for modern very-large scale integration (VLSI) circuits for more than six decades, with the dawning of a post-Moore era, researchers are trying to find replacements. A foundation of modern digital computing is the encoding of digital values through a binary radix representation. However, as we enter into the post-Moore era, the challenges of increasing power density, signal noise, and device unreliability raise the question of whether this basic way of encoding data is still the best choice, particularly with novel electronic devices. Prior work has shown that binary radix encoding has some disadvantages. We argue that it is crucial to rethink the necessity of using this representation in the post-Moore era. In this paper, we review some recent development on computation-driven data encoding. We begin with stochastic encoding, a representation proposed a long time ago, discussing both its advantages and disadvantages. Then, we review several recent breakthroughs with variations of stochastic encoding that mitigate many of its disadvantages. Finally, we conclude the paper by extrapolating future directions for effective computation-driven data encoding.

尽管金属氧化物半导体场效应晶体管(MOSFET)在过去60多年里一直是现代超大规模集成电路(VLSI)的主导器件，但随着后摩尔时代的到来，研究人员正在努力寻找替代品。现代数字计算的基础是通过二进制基数表示对数字值进行编码。然而，随着我们进入后摩尔时代，不断增加的功率密度、信号噪声和设备不可靠性的挑战提出了这样一个问题:这种基本的数据编码方式是否仍然是最好的选择，特别是对于新颖的电子设备。先前的工作表明，二进制基数编码有一些缺点。我们认为，重新思考在后摩尔时代使用这种表述的必要性是至关重要的。本文综述了计算驱动数据编码的最新研究进展。我们从随机编码开始，这是一种很久以前提出的表示，讨论了它的优点和缺点。然后，我们回顾了最近在随机编码变化方面的一些突破，这些突破减轻了随机编码的许多缺点。最后，我们通过推断有效的计算驱动数据编码的未来方向来总结本文。

{"title":"A Survey of Computation-Driven Data Encoding","authors":"Weikang Qian, Runsheng Wang, Yuan Wang, Marc D. Riedel, Ru Huang","doi":"10.1109/SiPS47522.2019.9020519","DOIUrl":"https://doi.org/10.1109/SiPS47522.2019.9020519","url":null,"abstract":"Although the metal-oxide-semiconductor field-effect transistor (MOSFET) has been the dominant device for modern very-large scale integration (VLSI) circuits for more than six decades, with the dawning of a post-Moore era, researchers are trying to find replacements. A foundation of modern digital computing is the encoding of digital values through a binary radix representation. However, as we enter into the post-Moore era, the challenges of increasing power density, signal noise, and device unreliability raise the question of whether this basic way of encoding data is still the best choice, particularly with novel electronic devices. Prior work has shown that binary radix encoding has some disadvantages. We argue that it is crucial to rethink the necessity of using this representation in the post-Moore era. In this paper, we review some recent development on computation-driven data encoding. We begin with stochastic encoding, a representation proposed a long time ago, discussing both its advantages and disadvantages. Then, we review several recent breakthroughs with variations of stochastic encoding that mitigate many of its disadvantages. Finally, we conclude the paper by extrapolating future directions for effective computation-driven data encoding.","PeriodicalId":256971,"journal":{"name":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127674322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Towards Algebraic Modeling of GPU Memory Access for Bank Conflict Mitigation 面向库冲突缓解的GPU内存访问代数建模

2019 IEEE International Workshop on Signal Processing Systems (SiPS)

Pub Date : 2019-10-01 DOI: 10.1109/SiPS47522.2019.9020385

Luca Ferranti, J. Boutellier

Graphics Processing Units (GPU) have been widely used in various fields of scientific computing, such as in signal processing. GPUs have a hierarchical memory structure with memory layers that are shared between GPU processing elements. Partly due to the complex memory hierarchy, GPU programming is non-trivial, and several aspects must be taken into account, one being memory access patterns. One of the fastest GPU memory layers, shared memory, is grouped into banks to enable fast, parallel access for processing elements. Unfortunately, it may happen that multiple threads of a GPU program may access the same shared memory bank simultaneously causing a bank conflict. If this happens, program execution slows down as memory accesses have to be rescheduled to determine which instruction to execute first. Bank conflicts are not taken into account automatically by the compiler, and hence the programmer must detect and deal with them prior to program execution. In this paper, we present an algebraic approach to detect bank conflicts and prove some theoretical results that can be used to predict when bank conflicts happen and how to avoid them. Also, our experimental results illustrate the savings in computation time.

图形处理单元(GPU)已广泛应用于科学计算的各个领域，如信号处理。GPU具有分层内存结构，其内存层在GPU处理元素之间共享。部分由于复杂的内存层次结构，GPU编程是非平凡的，必须考虑几个方面，其中一个是内存访问模式。最快的GPU内存层之一，共享内存，被分组到库中，以实现对处理元素的快速并行访问。不幸的是，一个GPU程序的多个线程可能同时访问同一个共享内存库，从而导致内存库冲突。如果发生这种情况，程序执行速度会变慢，因为必须重新调度内存访问，以确定首先执行哪条指令。编译器不会自动考虑Bank冲突，因此程序员必须在程序执行之前检测并处理它们。本文提出了一种检测银行冲突的代数方法，并证明了一些理论结果，这些结果可用于预测银行冲突何时发生以及如何避免银行冲突。此外，我们的实验结果表明，节省计算时间。

{"title":"Towards Algebraic Modeling of GPU Memory Access for Bank Conflict Mitigation","authors":"Luca Ferranti, J. Boutellier","doi":"10.1109/SiPS47522.2019.9020385","DOIUrl":"https://doi.org/10.1109/SiPS47522.2019.9020385","url":null,"abstract":"Graphics Processing Units (GPU) have been widely used in various fields of scientific computing, such as in signal processing. GPUs have a hierarchical memory structure with memory layers that are shared between GPU processing elements. Partly due to the complex memory hierarchy, GPU programming is non-trivial, and several aspects must be taken into account, one being memory access patterns. One of the fastest GPU memory layers, shared memory, is grouped into banks to enable fast, parallel access for processing elements. Unfortunately, it may happen that multiple threads of a GPU program may access the same shared memory bank simultaneously causing a bank conflict. If this happens, program execution slows down as memory accesses have to be rescheduled to determine which instruction to execute first. Bank conflicts are not taken into account automatically by the compiler, and hence the programmer must detect and deal with them prior to program execution. In this paper, we present an algebraic approach to detect bank conflicts and prove some theoretical results that can be used to predict when bank conflicts happen and how to avoid them. Also, our experimental results illustrate the savings in computation time.","PeriodicalId":256971,"journal":{"name":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129382316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AdaBoost-assisted Extreme Learning Machine for Efficient Online Sequential Classification adaboost辅助的高效在线顺序分类极限学习机

2019 IEEE International Workshop on Signal Processing Systems (SiPS)

Pub Date : 2019-09-16 DOI: 10.1109/SiPS47522.2019.9020609

Yi-Ta Chen, Yu-Chuan Chuang, A. Wu

In this paper, we propose an AdaBoost-assisted extreme learning machine for efficient online sequential classification (AOS-ELM). In order to achieve better accuracy in online sequential learning scenarios, we utilize the cost-sensitive algorithm-AdaBoost, which diversifying the weak classifiers, and adding the forgetting mechanism, which stabilizing the performance during the training procedure. Hence, AOS-ELM adapts better to sequentially arrived data compared with other voting based methods. The experiment results show AOS-ELM can achieve 94.41% accuracy on MNIST dataset, which is the theoretical accuracy bound performed by original batch learning algorithm, AdaBoost-ELM. Moreover, with the forgetting mechanism, the standard deviation of accuracy during the online sequential learning process is reduced to 8.26x.

在本文中，我们提出了一种adaboost辅助的用于高效在线顺序分类(AOS-ELM)的极限学习机。为了在在线顺序学习场景下获得更好的准确率，我们使用了代价敏感算法adaboost，使弱分类器多样化，并增加了遗忘机制，在训练过程中稳定了性能。因此，与其他基于投票的方法相比，AOS-ELM更适合顺序到达的数据。实验结果表明，AOS-ELM在MNIST数据集上可以达到94.41%的准确率，这是原始批处理学习算法AdaBoost-ELM所能达到的理论准确率界限。此外，在遗忘机制下，在线顺序学习过程中的准确率标准差降低到8.26x。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2019 IEEE International Workshop on Signal Processing Systems (SiPS)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀