首页 > 最新文献

2016 IEEE International Conference on Rebooting Computing (ICRC)最新文献

英文 中文
Designing reconfigurable large-scale deep learning systems using stochastic computing 利用随机计算设计可重构的大规模深度学习系统
Pub Date : 2016-11-08 DOI: 10.1109/ICRC.2016.7738685
Ao Ren, Zhe Li, Yanzhi Wang, Qinru Qiu, Bo Yuan
Deep Learning, as an important branch of machine learning and neural network, is playing an increasingly important role in a number of fields like computer vision, natural language processing, etc. However, large-scale deep learning systems mainly operate in high-performance server clusters, thus restricting the application extensions to personal or mobile devices. The solution proposed in this paper is taking advantage of the fantastic features of stochastic computing methods. Stochastic computing is a type of data representation and processing technique, which uses a binary bit stream to represent a probability number (by counting the number of ones in this bit stream). In the stochastic computing area, some key arithmetic operations such as additions or multiplications can be implemented with very simple components like AND gates or multiplexers, respectively. Thus it provides an immense design space for integrating a large amount of neurons and enabling fully parallel and scalable hardware implementations of large-scale deep learning systems. In this paper, we present a reconfigurable large-scale deep learning system based on stochastic computing technologies, including the design of the neuron, the convolution function, the back-propagation function and some other basic operations. And the network-on-chip technique is also proposed in this paper to achieve the goal of implementing a large-scale hardware system. Our experiments validate the functionality of reconfigurable deep learning systems using stochastic computing, and demonstrate that when the bit streams are set to be 8192 bits, classification of MNIST digits by stochastic computing can perform as low error rate as that by normal arithmetic operations.
深度学习作为机器学习和神经网络的一个重要分支,在计算机视觉、自然语言处理等多个领域发挥着越来越重要的作用。然而,大规模深度学习系统主要运行在高性能的服务器集群中,因此限制了应用扩展到个人或移动设备。本文提出的解决方案充分利用了随机计算方法的奇妙特性。随机计算是一种数据表示和处理技术,它使用二进制位流来表示概率数(通过计算该位流中1的个数)。在随机计算领域,一些关键的算术运算,如加法或乘法,可以分别用与门或多路复用器等非常简单的组件来实现。因此,它为集成大量神经元和实现大规模深度学习系统的完全并行和可扩展的硬件实现提供了巨大的设计空间。本文提出了一个基于随机计算技术的可重构大规模深度学习系统,包括神经元的设计、卷积函数的设计、反向传播函数的设计以及一些基本操作。为了实现大规模硬件系统的实现,本文还提出了片上网络技术。我们的实验验证了使用随机计算的可重构深度学习系统的功能,并证明当比特流设置为8192位时,随机计算对MNIST数字的分类可以执行与普通算术运算一样低的错误率。
{"title":"Designing reconfigurable large-scale deep learning systems using stochastic computing","authors":"Ao Ren, Zhe Li, Yanzhi Wang, Qinru Qiu, Bo Yuan","doi":"10.1109/ICRC.2016.7738685","DOIUrl":"https://doi.org/10.1109/ICRC.2016.7738685","url":null,"abstract":"Deep Learning, as an important branch of machine learning and neural network, is playing an increasingly important role in a number of fields like computer vision, natural language processing, etc. However, large-scale deep learning systems mainly operate in high-performance server clusters, thus restricting the application extensions to personal or mobile devices. The solution proposed in this paper is taking advantage of the fantastic features of stochastic computing methods. Stochastic computing is a type of data representation and processing technique, which uses a binary bit stream to represent a probability number (by counting the number of ones in this bit stream). In the stochastic computing area, some key arithmetic operations such as additions or multiplications can be implemented with very simple components like AND gates or multiplexers, respectively. Thus it provides an immense design space for integrating a large amount of neurons and enabling fully parallel and scalable hardware implementations of large-scale deep learning systems. In this paper, we present a reconfigurable large-scale deep learning system based on stochastic computing technologies, including the design of the neuron, the convolution function, the back-propagation function and some other basic operations. And the network-on-chip technique is also proposed in this paper to achieve the goal of implementing a large-scale hardware system. Our experiments validate the functionality of reconfigurable deep learning systems using stochastic computing, and demonstrate that when the bit streams are set to be 8192 bits, classification of MNIST digits by stochastic computing can perform as low error rate as that by normal arithmetic operations.","PeriodicalId":387008,"journal":{"name":"2016 IEEE International Conference on Rebooting Computing (ICRC)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123400815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Bayesian sensor fusion with fast and low power stochastic circuits 基于快速低功耗随机电路的贝叶斯传感器融合
Pub Date : 2016-10-17 DOI: 10.1109/ICRC.2016.7738672
Alexandre Coninx, P. Bessière, E. Mazer, J. Droulez, R. Laurent, Awais Aslam, J. Lobo
As the physical limits of Moore's law are being reached, a research effort is launched to achieve further performance improvements by exploring computation paradigms departing from standard approaches. The BAMBI project (Bottom-up Approaches to Machines dedicated to Bayesian Inference) aims at developing hardware dedicated to probabilistic computation, which extends logic computation realised by boolean gates in current computer chips. Such probabilistic computing devices would allow to solve faster and at a lower energy cost a wide range of Artificial Intelligence applications, especially when decisions need to be taken from incomplete data in an uncertain environment. This paper describes an architecture where very simple operators compute on a time coding of probability values as stochastic signals. Simulation tests and a reconfigurable logic hardware implementation demonstrated the feasibility and performances of the proposed inference machine. Hardware results show this architecture can quickly solve Bayesian sensor fusion problems and is very efficient in terms of energy consumption.
随着摩尔定律的物理极限被达到,一项研究工作开始了,通过探索脱离标准方法的计算范式来实现进一步的性能改进。BAMBI项目(自底向上的贝叶斯推理机器方法)旨在开发专用于概率计算的硬件,扩展当前计算机芯片中由布尔门实现的逻辑计算。这种概率计算设备将允许以更低的能源成本更快地解决广泛的人工智能应用,特别是当需要在不确定环境中从不完整的数据中做出决策时。本文描述了一种结构,其中非常简单的算子计算概率值作为随机信号的时间编码。仿真测试和可重构逻辑硬件实现验证了该推理机的可行性和性能。硬件结果表明,该架构能够快速解决贝叶斯传感器融合问题,并且在能耗方面非常高效。
{"title":"Bayesian sensor fusion with fast and low power stochastic circuits","authors":"Alexandre Coninx, P. Bessière, E. Mazer, J. Droulez, R. Laurent, Awais Aslam, J. Lobo","doi":"10.1109/ICRC.2016.7738672","DOIUrl":"https://doi.org/10.1109/ICRC.2016.7738672","url":null,"abstract":"As the physical limits of Moore's law are being reached, a research effort is launched to achieve further performance improvements by exploring computation paradigms departing from standard approaches. The BAMBI project (Bottom-up Approaches to Machines dedicated to Bayesian Inference) aims at developing hardware dedicated to probabilistic computation, which extends logic computation realised by boolean gates in current computer chips. Such probabilistic computing devices would allow to solve faster and at a lower energy cost a wide range of Artificial Intelligence applications, especially when decisions need to be taken from incomplete data in an uncertain environment. This paper describes an architecture where very simple operators compute on a time coding of probability values as stochastic signals. Simulation tests and a reconfigurable logic hardware implementation demonstrated the feasibility and performances of the proposed inference machine. Hardware results show this architecture can quickly solve Bayesian sensor fusion problems and is very efficient in terms of energy consumption.","PeriodicalId":387008,"journal":{"name":"2016 IEEE International Conference on Rebooting Computing (ICRC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133047576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Accelerating machine learning with Non-Volatile Memory: Exploring device and circuit tradeoffs 用非易失性存储器加速机器学习:探索器件和电路的权衡
Pub Date : 2016-10-01 DOI: 10.1109/ICRC.2016.7738684
Alessandro Fumarola, P. Narayanan, Lucas L. Sanches, Severin Sidler, Junwoo Jang, Kibong Moon, R. Shelby, H. Hwang, G. Burr
Large arrays of the same nonvolatile memories (NVM) being developed for Storage-Class Memory (SCM) - such as Phase Change Memory (PCM) and Resistance RAM (ReRAM) - can also be used in non-Von Neumann neuromorphic computational schemes, with device conductance serving as synaptic “weight.” This allows the all-important multiply-accumulate operation within these algorithms to be performed efficiently at the weight data.
为存储级存储器(SCM)开发的相同的非易失性存储器(NVM)的大阵列-例如相变存储器(PCM)和电阻RAM (ReRAM) -也可以用于非冯诺伊曼神经形态计算方案,器件电导作为突触“重量”。这允许在权重数据上有效地执行这些算法中最重要的乘法-累加操作。
{"title":"Accelerating machine learning with Non-Volatile Memory: Exploring device and circuit tradeoffs","authors":"Alessandro Fumarola, P. Narayanan, Lucas L. Sanches, Severin Sidler, Junwoo Jang, Kibong Moon, R. Shelby, H. Hwang, G. Burr","doi":"10.1109/ICRC.2016.7738684","DOIUrl":"https://doi.org/10.1109/ICRC.2016.7738684","url":null,"abstract":"Large arrays of the same nonvolatile memories (NVM) being developed for Storage-Class Memory (SCM) - such as Phase Change Memory (PCM) and Resistance RAM (ReRAM) - can also be used in non-Von Neumann neuromorphic computational schemes, with device conductance serving as synaptic “weight.” This allows the all-important multiply-accumulate operation within these algorithms to be performed efficiently at the weight data.","PeriodicalId":387008,"journal":{"name":"2016 IEEE International Conference on Rebooting Computing (ICRC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126083835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Neuromorphic mixed-signal circuitry for Asynchronous Pulse Processing 异步脉冲处理的神经形态混合信号电路
Pub Date : 2016-10-01 DOI: 10.1109/ICRC.2016.7738686
P. Petre, J. Cruz-Albrecht
We demonstrate a software reconfigurable mixed-signal Printed Circuit Board (PCB) prototype and a custom mixed-signal Application Specific Integrated Circuit (ASIC) prototype of a cognitive signal processor using neuromorphic methods to perform adaptive nonlinear filtering based real-time wideband signal processing algorithms. The cognitive processor effectively implements a trending computing paradigm called Reservoir Computer (RC). Hardware implementation of the RC is achieved by a novel analog signal processor architecture called the Asynchronous Pulse Processor (APP).
我们展示了一个软件可重构的混合信号印刷电路板(PCB)原型和一个自定义的混合信号专用集成电路(ASIC)原型的认知信号处理器,使用神经形态方法来执行基于实时宽带信号处理算法的自适应非线性滤波。认知处理器有效地实现了一种称为水库计算机(RC)的趋势计算范式。RC的硬件实现由一种称为异步脉冲处理器(APP)的新型模拟信号处理器体系结构实现。
{"title":"Neuromorphic mixed-signal circuitry for Asynchronous Pulse Processing","authors":"P. Petre, J. Cruz-Albrecht","doi":"10.1109/ICRC.2016.7738686","DOIUrl":"https://doi.org/10.1109/ICRC.2016.7738686","url":null,"abstract":"We demonstrate a software reconfigurable mixed-signal Printed Circuit Board (PCB) prototype and a custom mixed-signal Application Specific Integrated Circuit (ASIC) prototype of a cognitive signal processor using neuromorphic methods to perform adaptive nonlinear filtering based real-time wideband signal processing algorithms. The cognitive processor effectively implements a trending computing paradigm called Reservoir Computer (RC). Hardware implementation of the RC is achieved by a novel analog signal processor architecture called the Asynchronous Pulse Processor (APP).","PeriodicalId":387008,"journal":{"name":"2016 IEEE International Conference on Rebooting Computing (ICRC)","volume":"415 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124169852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Parallel data processing with Magnonic Holographic Co-Processor 磁振子全息协处理器并行数据处理
Pub Date : 2016-10-01 DOI: 10.1109/ICRC.2016.7738708
M. Balynsky, D. Gutierrez, H. Chiang, A. Khitun, A. Kozhevnikov, Y. Khivintsev, G. Dudko, Y. Filimonov
In this work, we present experimental data demonstrating the capabilities of Magnonic Holographic Co-Processor for parallel data processing. It is a type of magnetic logic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltages. We present experimental data obtained for 8-terminal prototype based on Y3Fe2(FeO4)3 structure. The input of the device is provided by the phased array of spin wave generating elements allowing us to produce input phase patterns of an arbitrary form. The obtained data demonstrate the capabilities of Magnonic Holographic Co-Processor for parallel data processing by using spin wave superposition. Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains are also discussed.
在这项工作中,我们提出了实验数据,证明了磁振子全息协处理器并行数据处理的能力。它是一种利用自旋波进行数据传输和处理的磁逻辑器件。它的工作原理是基于输入自旋波的相位和振幅与输出感应电压之间的相关性。本文给出了基于Y3Fe2(FeO4)3结构的8端原型的实验数据。该装置的输入由自旋波产生元件的相控阵提供,使我们能够产生任意形式的输入相位模式。所得数据证明了磁振子全息协处理器利用自旋波叠加进行并行数据处理的能力。潜在地,磁振全息器件可以作为数字处理器的补充逻辑单元来实现。还讨论了物理限制和技术限制。
{"title":"Parallel data processing with Magnonic Holographic Co-Processor","authors":"M. Balynsky, D. Gutierrez, H. Chiang, A. Khitun, A. Kozhevnikov, Y. Khivintsev, G. Dudko, Y. Filimonov","doi":"10.1109/ICRC.2016.7738708","DOIUrl":"https://doi.org/10.1109/ICRC.2016.7738708","url":null,"abstract":"In this work, we present experimental data demonstrating the capabilities of Magnonic Holographic Co-Processor for parallel data processing. It is a type of magnetic logic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltages. We present experimental data obtained for 8-terminal prototype based on Y3Fe2(FeO4)3 structure. The input of the device is provided by the phased array of spin wave generating elements allowing us to produce input phase patterns of an arbitrary form. The obtained data demonstrate the capabilities of Magnonic Holographic Co-Processor for parallel data processing by using spin wave superposition. Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains are also discussed.","PeriodicalId":387008,"journal":{"name":"2016 IEEE International Conference on Rebooting Computing (ICRC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132791294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Reducing data movement with approximate computing techniques 使用近似计算技术减少数据移动
Pub Date : 2016-10-01 DOI: 10.1109/ICRC.2016.7738675
S. Crago, D. Yeung
Data movement is the dominant factor that limits performance and efficiency in today's architectures, and we do not expect that to change in future architectures. In this paper, we describe how approximate computing techniques can be applied to communication at the algorithm level, in conventional computer architectures, and in the architectures being explored as we go beyond Moore's Law. We present results that demonstrate potential performance gains and the effect of approximations in traditional computer architectures. We describe how these techniques may be applied to future architectures based on probabilistic, approximate, stochastic, and neuromorphic computing, as well as more conventional heterogeneous and 3D architectures.
在当今的体系结构中,数据移动是限制性能和效率的主要因素,我们不希望在未来的体系结构中发生改变。在本文中,我们描述了近似计算技术如何应用于算法级别的通信,在传统的计算机体系结构中,以及在我们超越摩尔定律时正在探索的体系结构中。我们展示了在传统计算机体系结构中潜在的性能增益和近似效果的结果。我们描述了这些技术如何应用于基于概率、近似、随机和神经形态计算的未来架构,以及更传统的异构和3D架构。
{"title":"Reducing data movement with approximate computing techniques","authors":"S. Crago, D. Yeung","doi":"10.1109/ICRC.2016.7738675","DOIUrl":"https://doi.org/10.1109/ICRC.2016.7738675","url":null,"abstract":"Data movement is the dominant factor that limits performance and efficiency in today's architectures, and we do not expect that to change in future architectures. In this paper, we describe how approximate computing techniques can be applied to communication at the algorithm level, in conventional computer architectures, and in the architectures being explored as we go beyond Moore's Law. We present results that demonstrate potential performance gains and the effect of approximations in traditional computer architectures. We describe how these techniques may be applied to future architectures based on probabilistic, approximate, stochastic, and neuromorphic computing, as well as more conventional heterogeneous and 3D architectures.","PeriodicalId":387008,"journal":{"name":"2016 IEEE International Conference on Rebooting Computing (ICRC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125227936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Brain inspired photonic motif networks 受大脑启发的光子基序网络
Pub Date : 2016-10-01 DOI: 10.1109/ICRC.2016.7738706
F. Monifi, S. Shahin, F. Vallini, Y. Fainman, M. Rabinovich
Here we present a brain-inspired photonic cognitive motif network. The proposed architecture consists of semiconductor lasers that are coupled through opto-electronic feedbacks. Competitive interaction among photons and carriers in these coupled lasers leads to dynamics similar to that of many brain activities.
在这里,我们提出了一个脑启发光子认知母基网络。所提出的架构由通过光电反馈耦合的半导体激光器组成。在这些耦合激光器中,光子和载流子之间的竞争性相互作用导致了类似于许多大脑活动的动力学。
{"title":"Brain inspired photonic motif networks","authors":"F. Monifi, S. Shahin, F. Vallini, Y. Fainman, M. Rabinovich","doi":"10.1109/ICRC.2016.7738706","DOIUrl":"https://doi.org/10.1109/ICRC.2016.7738706","url":null,"abstract":"Here we present a brain-inspired photonic cognitive motif network. The proposed architecture consists of semiconductor lasers that are coupled through opto-electronic feedbacks. Competitive interaction among photons and carriers in these coupled lasers leads to dynamics similar to that of many brain activities.","PeriodicalId":387008,"journal":{"name":"2016 IEEE International Conference on Rebooting Computing (ICRC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127523686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Processor-in-memory support for artificial neural networks 对人工神经网络的内存处理器支持
Pub Date : 2016-10-01 DOI: 10.1109/ICRC.2016.7738697
J. Schabel, Lee Baker, Sumon Dey, Weifu Li, P. Franzon
Hardware acceleration of artificial neural network (ANN) processing has potential for supporting applications benefiting from real time and low power operation, such as autonomous vehicles, robotics, recognition and data mining. Most interest in ANNs targets acceleration of deep multi-layered ANNs that can require days of offline training to converge on a desired network behavior. Interest has grown in ANNs capable of supporting unsupervised training, where networks can learn new information from unlabeled data dynamically without the need for offline training. These ANNs require large memories with bandwidths much higher than supported in modern GPGPUs. Custom hardware acceleration and memory co-design holds the potential to provide real-time performance in cases where the performance requirements cannot be met by modern GPGPUs. This work presents a custom processor solution to accelerate two hetero-associative memories (Sparsey and HTM) capable of unsupervised and one-hot learning. This custom processor is implemented as an expandable ASIP built upon a configurable SIMD engine for exploiting parallelism. Functional specialization is implemented utilizing processor-in-memory techniques, which results in up to a 20× speedup and a 2000× reduction in energy per frame compared to a software implementation operating on a dataset for recognition of human actions.
人工神经网络(ANN)处理的硬件加速具有支持实时和低功耗操作的应用的潜力,例如自动驾驶汽车、机器人、识别和数据挖掘。对人工神经网络最感兴趣的是深度多层人工神经网络的加速,这些人工神经网络可能需要数天的离线训练才能收敛到期望的网络行为。人们对支持无监督训练的人工神经网络越来越感兴趣,在这种情况下,网络可以动态地从未标记的数据中学习新信息,而不需要离线训练。这些人工神经网络需要比现代gpgpu支持的带宽高得多的大内存。在现代gpgpu无法满足性能要求的情况下,定制硬件加速和内存协同设计具有提供实时性能的潜力。本研究提出了一种自定义处理器解决方案,以加速两个具有无监督和单热学习能力的异联想存储器(Sparsey和HTM)。这个定制处理器是作为一个可扩展的ASIP实现的,该ASIP构建在一个可配置的SIMD引擎上,以利用并行性。功能专门化是利用内存中的处理器技术实现的,与在识别人类行为的数据集上操作的软件实现相比,它的速度提高了20倍,每帧能量减少了2000倍。
{"title":"Processor-in-memory support for artificial neural networks","authors":"J. Schabel, Lee Baker, Sumon Dey, Weifu Li, P. Franzon","doi":"10.1109/ICRC.2016.7738697","DOIUrl":"https://doi.org/10.1109/ICRC.2016.7738697","url":null,"abstract":"Hardware acceleration of artificial neural network (ANN) processing has potential for supporting applications benefiting from real time and low power operation, such as autonomous vehicles, robotics, recognition and data mining. Most interest in ANNs targets acceleration of deep multi-layered ANNs that can require days of offline training to converge on a desired network behavior. Interest has grown in ANNs capable of supporting unsupervised training, where networks can learn new information from unlabeled data dynamically without the need for offline training. These ANNs require large memories with bandwidths much higher than supported in modern GPGPUs. Custom hardware acceleration and memory co-design holds the potential to provide real-time performance in cases where the performance requirements cannot be met by modern GPGPUs. This work presents a custom processor solution to accelerate two hetero-associative memories (Sparsey and HTM) capable of unsupervised and one-hot learning. This custom processor is implemented as an expandable ASIP built upon a configurable SIMD engine for exploiting parallelism. Functional specialization is implemented utilizing processor-in-memory techniques, which results in up to a 20× speedup and a 2000× reduction in energy per frame compared to a software implementation operating on a dataset for recognition of human actions.","PeriodicalId":387008,"journal":{"name":"2016 IEEE International Conference on Rebooting Computing (ICRC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124254870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A recurrent crossbar of memristive nanodevices implements online novelty detection 忆阻纳米器件的循环交叉棒实现了在线新颖性检测
Pub Date : 2016-10-01 DOI: 10.1109/ICRC.2016.7738689
C. Bennett, D. Querlioz, Jacques-Olivier Klein
An auto-correlation matrix memory (ACMM) system continuously computes the degree to which a presented input is novel or anomalous relative to past examples. Here we demonstrate that such a filter can be efficiently implemented with memristive nanodevices and accompanying CMOS circuitry. Complete (a full crossbar) and incomplete (an array of memristive devices) variants of the proposed nanofabric are electrically detailed and subsequently simulated on a simple sparse input image test meant to gauge the system's responses to transitions. Both systems demonstrate active novelty filtering with a small level of false positives in the presence of noise, but only the complete system reports all transitions successfully (avoids false negative too). While the system is robust to a noisy channel, degradation towards false positives is more likely when nanodevice variability is taken into account as well. In addition to novelty filtering, the proposed system may be a useful building block for larger reservoir or recurrent on-chip learning systems.
自相关矩阵记忆(ACMM)系统连续计算给定输入相对于过去示例的新颖或异常程度。在这里,我们证明了这样的滤波器可以有效地实现与忆阻纳米器件和配套的CMOS电路。所提出的纳米织物的完整(一个完整的横杆)和不完整(一个忆阻装置阵列)变体在电气上进行详细描述,随后在一个简单的稀疏输入图像测试上进行模拟,旨在测量系统对过渡的响应。这两个系统都展示了主动新颖性过滤,在存在噪声的情况下有少量误报,但只有完整的系统才能成功报告所有转换(也避免了误报)。虽然系统对噪声信道具有鲁棒性,但当考虑到纳米器件的可变性时,更有可能出现假阳性的退化。除了新颖性过滤之外,所提出的系统可能是更大的存储库或循环芯片上学习系统的有用构建块。
{"title":"A recurrent crossbar of memristive nanodevices implements online novelty detection","authors":"C. Bennett, D. Querlioz, Jacques-Olivier Klein","doi":"10.1109/ICRC.2016.7738689","DOIUrl":"https://doi.org/10.1109/ICRC.2016.7738689","url":null,"abstract":"An auto-correlation matrix memory (ACMM) system continuously computes the degree to which a presented input is novel or anomalous relative to past examples. Here we demonstrate that such a filter can be efficiently implemented with memristive nanodevices and accompanying CMOS circuitry. Complete (a full crossbar) and incomplete (an array of memristive devices) variants of the proposed nanofabric are electrically detailed and subsequently simulated on a simple sparse input image test meant to gauge the system's responses to transitions. Both systems demonstrate active novelty filtering with a small level of false positives in the presence of noise, but only the complete system reports all transitions successfully (avoids false negative too). While the system is robust to a noisy channel, degradation towards false positives is more likely when nanodevice variability is taken into account as well. In addition to novelty filtering, the proposed system may be a useful building block for larger reservoir or recurrent on-chip learning systems.","PeriodicalId":387008,"journal":{"name":"2016 IEEE International Conference on Rebooting Computing (ICRC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130404705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Digital neuromorphic design of a Liquid State Machine for real-time processing 实时处理的数字神经形态液体状态机设计
Pub Date : 2016-10-01 DOI: 10.1109/ICRC.2016.7738687
Anvesh Polepalli, Nicholas Soures, D. Kudithipudi
The Liquid State Machine (LSM) is a form of reservoir computing which emulates the brains capability of processing spatio-temporal data. This type of network generates highly descriptive responses to continuous input streams. The response is then used to extract information about the input stream. A single LSM network can be used as a generic intelligent processor that processes different streams of data (or) on same stream of data to extract different features. The LSM has been shown to perform well in tasks dependent on a systems behavior through time. The LSM's intrinsic memory and its reduced training complexity make it a suitable choice for hardware implementations for spatio-temporal applications. Existing behavioral models of LSM cannot process real time data due to their hardware complexity or inability to deal with real-time data or both. The proposed model focuses on a simple liquid design that exploits spatial locality and is capable of processing real time data. The model is evaluated for EEG seizure detection with an accuracy of 84.2% and for user identification based on walking pattern with an accuracy of 98.4%.
液态机(LSM)是一种模拟人脑处理时空数据能力的储层计算方法。这种类型的网络对连续输入流产生高度描述性的响应。然后使用响应提取有关输入流的信息。单个LSM网络可以作为一个通用的智能处理器,处理不同的数据流(或同一数据流)以提取不同的特征。LSM已被证明在依赖于系统行为的任务中表现良好。LSM的固有内存和较低的训练复杂度使其成为时空应用硬件实现的合适选择。现有的LSM行为模型由于硬件复杂或无法处理实时数据,或者两者兼而有之,无法处理实时数据。提出的模型侧重于利用空间局部性的简单液体设计,并能够处理实时数据。该模型用于脑电图癫痫发作检测的准确率为84.2%,用于基于行走模式的用户识别的准确率为98.4%。
{"title":"Digital neuromorphic design of a Liquid State Machine for real-time processing","authors":"Anvesh Polepalli, Nicholas Soures, D. Kudithipudi","doi":"10.1109/ICRC.2016.7738687","DOIUrl":"https://doi.org/10.1109/ICRC.2016.7738687","url":null,"abstract":"The Liquid State Machine (LSM) is a form of reservoir computing which emulates the brains capability of processing spatio-temporal data. This type of network generates highly descriptive responses to continuous input streams. The response is then used to extract information about the input stream. A single LSM network can be used as a generic intelligent processor that processes different streams of data (or) on same stream of data to extract different features. The LSM has been shown to perform well in tasks dependent on a systems behavior through time. The LSM's intrinsic memory and its reduced training complexity make it a suitable choice for hardware implementations for spatio-temporal applications. Existing behavioral models of LSM cannot process real time data due to their hardware complexity or inability to deal with real-time data or both. The proposed model focuses on a simple liquid design that exploits spatial locality and is capable of processing real time data. The model is evaluated for EEG seizure detection with an accuracy of 84.2% and for user identification based on walking pattern with an accuracy of 98.4%.","PeriodicalId":387008,"journal":{"name":"2016 IEEE International Conference on Rebooting Computing (ICRC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126739690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
期刊
2016 IEEE International Conference on Rebooting Computing (ICRC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1