首页 > 最新文献

2018 IEEE International Conference on Rebooting Computing (ICRC)最新文献

英文 中文
Thermodynamic Intelligence, a Heretical Theory 热力学智能,一个异端理论
Pub Date : 2018-11-01 DOI: 10.1109/ICRC.2018.8638594
N. Ganesh
There is a significant amount of interest in the field of big data and machine learning right now. This has been driven by use of sophisticated learning algorithms along with large datasets and powerful computing hardware to achieve extraordinary success in narrow tasks. Such success has been classified as narrow artificial intelligence (AI), in order to distinguish it from general intelligence, which continues to be the holy grail of computing. If we are to progress from narrow to general AI, it is important to have a better understanding of what intelligence is and what it entails. As we seek to reboot computing across the stack, this is an important question to address, to help us identify the optimal devices, architectures and design techniques that will allow us to build the intelligent systems of the future. In this paper, I will review the fundamental ideas and assumptions that have allowed us to achieve computing in artificial systems over the years. Building off these ideas, I will discuss the important distinction between a good example of a system with general intelligence i.e. ourselves, and the intelligence achieved through our current computational approaches. Following this, I will use recent results to explore a new framework - a physically grounded theory of thermodynamic intelligence, and discuss the design paradigm that seeks to achieve such intelligence in systems.
现在,人们对大数据和机器学习领域非常感兴趣。这是由于使用复杂的学习算法以及大型数据集和强大的计算硬件来在狭窄的任务中取得非凡的成功。这种成功被归类为狭义人工智能(AI),以区别于通用智能,后者仍然是计算领域的圣杯。如果我们要从狭义的人工智能发展到通用的人工智能,重要的是要更好地理解什么是智能以及它需要什么。当我们寻求跨堆栈重新启动计算时,这是一个需要解决的重要问题,它可以帮助我们确定最佳的设备、架构和设计技术,从而使我们能够构建未来的智能系统。在本文中,我将回顾多年来使我们能够在人工系统中实现计算的基本思想和假设。在这些想法的基础上,我将讨论具有一般智能的系统(即我们自己)和通过我们当前的计算方法获得的智能之间的重要区别。在此之后,我将使用最近的结果来探索一个新的框架——热力学智能的物理基础理论,并讨论寻求在系统中实现这种智能的设计范式。
{"title":"Thermodynamic Intelligence, a Heretical Theory","authors":"N. Ganesh","doi":"10.1109/ICRC.2018.8638594","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638594","url":null,"abstract":"There is a significant amount of interest in the field of big data and machine learning right now. This has been driven by use of sophisticated learning algorithms along with large datasets and powerful computing hardware to achieve extraordinary success in narrow tasks. Such success has been classified as narrow artificial intelligence (AI), in order to distinguish it from general intelligence, which continues to be the holy grail of computing. If we are to progress from narrow to general AI, it is important to have a better understanding of what intelligence is and what it entails. As we seek to reboot computing across the stack, this is an important question to address, to help us identify the optimal devices, architectures and design techniques that will allow us to build the intelligent systems of the future. In this paper, I will review the fundamental ideas and assumptions that have allowed us to achieve computing in artificial systems over the years. Building off these ideas, I will discuss the important distinction between a good example of a system with general intelligence i.e. ourselves, and the intelligence achieved through our current computational approaches. Following this, I will use recent results to explore a new framework - a physically grounded theory of thermodynamic intelligence, and discuss the design paradigm that seeks to achieve such intelligence in systems.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115746451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Regular Expression Matching with Memristor TCAMs 忆阻器tcam的正则表达式匹配
Pub Date : 2018-11-01 DOI: 10.1109/ICRC.2018.8638603
Catherine E. Graves, W. Ma, X. Sheng, B. Buchanan, Le Zheng, Sity Lam, Xuema Li, S. R. Chalamalasetti, Lennie Kiyama, M. Foltin, Matthew P. Hardy, J. Strachan
Regular expression (RegEx)matching is a key function in network security, where matching of packet data against known malicious signatures filters and alerts against active network intrusions. RegExs are widely used in open source and commercial network security systems as they easily and concisely represent complex patterns like those malicious signatures. However, the latency and power required to perform RegEx matching is incredibly high and approaches to this problem struggle to achieve > 1 Gbps on real-world rulesets while internet wirespeeds continue to increase > 100 Gbps. We propose performing RegEx matching using memristor-based ternary content addressable memories (mTCAMs)with compressed finite automata (CFA)to meet this challenge. In this work, we show fabrication of mTCAM circuits with excellent device properties from 100nm to 20nm device sizes and validate mTCAM operation. SPICE simulations investigate mTCAM performance at scale and a mTCAM dynamic power model using 16nm mTCAM layout parameters demonstrates 0.173 fJ/bit/search energy for a $36times 250$ mTCAM array. Using a tiled architecture to implement a Snort ruleset, we estimate performance of our mTCAM approach to be 47.2 Gbps at 1.21W dynamic search power (39 Gbps/W), compared to a state-of-the-art FPGA approach which achieves 3.9 Gbps at 630mW (6.2 Gbps/W). Preliminary error analysis shows the mTCAM approach allows for arbitrarily low false positive/negative rates using minimal and standard state refresh techniques. Dynamic search power is also calculated prior to applying standard TCAM power-reduction techniques capable of lowering power by $simtimes 10$, further demonstrating the promise of mTCAM for wirespeed RegEx matching at low power.
正则表达式(RegEx)匹配是网络安全中的一项关键功能,通过将数据包数据与已知的恶意签名进行匹配,可以对活跃的网络入侵进行过滤和警报。regex被广泛应用于开源和商业网络安全系统中,因为它可以简单、简洁地表示像恶意签名这样的复杂模式。然而,执行RegEx匹配所需的延迟和功率非常高,并且在现实世界的规则集上实现这个问题的方法很难达到> 1gbps,而互联网无线速度继续增加> 100gbps。我们建议使用基于忆阻器的具有压缩有限自动机(CFA)的三元内容可寻址存储器(mTCAMs)来执行RegEx匹配以应对这一挑战。在这项工作中,我们展示了具有优异器件性能的mTCAM电路的制造,从100nm到20nm的器件尺寸,并验证了mTCAM的操作。SPICE模拟研究了mTCAM的大规模性能,使用16nm mTCAM布局参数的mTCAM动态功率模型显示,对于36 × 250$ mTCAM阵列,搜索能量为0.173 fJ/bit/。使用平排架构实现Snort规则集,我们估计mTCAM方法在1.21W动态搜索功率(39 Gbps/W)下的性能为47.2 Gbps,而最先进的FPGA方法在630mW (6.2 Gbps/W)下的性能为3.9 Gbps。初步误差分析表明,使用最小和标准状态刷新技术,mTCAM方法允许任意低的假阳性/阴性率。在应用标准TCAM降功耗技术之前,还计算了动态搜索功率,该技术能够将功耗降低10倍,进一步证明了mTCAM在低功耗下进行无线RegEx匹配的前景。
{"title":"Regular Expression Matching with Memristor TCAMs","authors":"Catherine E. Graves, W. Ma, X. Sheng, B. Buchanan, Le Zheng, Sity Lam, Xuema Li, S. R. Chalamalasetti, Lennie Kiyama, M. Foltin, Matthew P. Hardy, J. Strachan","doi":"10.1109/ICRC.2018.8638603","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638603","url":null,"abstract":"Regular expression (RegEx)matching is a key function in network security, where matching of packet data against known malicious signatures filters and alerts against active network intrusions. RegExs are widely used in open source and commercial network security systems as they easily and concisely represent complex patterns like those malicious signatures. However, the latency and power required to perform RegEx matching is incredibly high and approaches to this problem struggle to achieve > 1 Gbps on real-world rulesets while internet wirespeeds continue to increase > 100 Gbps. We propose performing RegEx matching using memristor-based ternary content addressable memories (mTCAMs)with compressed finite automata (CFA)to meet this challenge. In this work, we show fabrication of mTCAM circuits with excellent device properties from 100nm to 20nm device sizes and validate mTCAM operation. SPICE simulations investigate mTCAM performance at scale and a mTCAM dynamic power model using 16nm mTCAM layout parameters demonstrates 0.173 fJ/bit/search energy for a $36times 250$ mTCAM array. Using a tiled architecture to implement a Snort ruleset, we estimate performance of our mTCAM approach to be 47.2 Gbps at 1.21W dynamic search power (39 Gbps/W), compared to a state-of-the-art FPGA approach which achieves 3.9 Gbps at 630mW (6.2 Gbps/W). Preliminary error analysis shows the mTCAM approach allows for arbitrarily low false positive/negative rates using minimal and standard state refresh techniques. Dynamic search power is also calculated prior to applying standard TCAM power-reduction techniques capable of lowering power by $simtimes 10$, further demonstrating the promise of mTCAM for wirespeed RegEx matching at low power.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"44 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116270759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Design of Superconducting Optoelectronic Networks for Neuromorphic Computing 用于神经形态计算的超导光电网络设计
Pub Date : 2018-11-01 DOI: 10.1109/ICRC.2018.8638595
S. Buckley, A. McCaughan, J. Chiles, R. Mirin, S. Nam, J. Shainline, Grant Bruer, J. Plank, Catherine D. Schuman
We have previously proposed a novel hardware platform (SOEN) for neuromorphic computing based on superconducting optoelectronics that presents many of the features necessary for information processing in the brain. Here we discuss the design and training of networks of neurons and synapses based on this technology. We present circuit models for the simplest neurons and synapses that we can use to build networks. We discuss the further abstracted integrate and fire model that we use for evolutionary optimization of small networks of these neurons. We show that we can use the TENNLab evolutionary optimization programming framework to design small networks for logic, control and classification tasks. We plan to use the results as feedback to inform our neuron design.
我们之前提出了一种基于超导光电子学的神经形态计算的新型硬件平台(SOEN),该平台具有大脑信息处理所需的许多特征。在此,我们讨论了基于该技术的神经元和突触网络的设计和训练。我们提出了最简单的神经元和突触的电路模型,我们可以用它们来构建网络。我们讨论了进一步抽象的集成和火模型,我们用于这些神经元的小网络的进化优化。我们表明,我们可以使用TENNLab进化优化规划框架来设计用于逻辑、控制和分类任务的小型网络。我们计划利用这些结果作为反馈来指导我们的神经元设计。
{"title":"Design of Superconducting Optoelectronic Networks for Neuromorphic Computing","authors":"S. Buckley, A. McCaughan, J. Chiles, R. Mirin, S. Nam, J. Shainline, Grant Bruer, J. Plank, Catherine D. Schuman","doi":"10.1109/ICRC.2018.8638595","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638595","url":null,"abstract":"We have previously proposed a novel hardware platform (SOEN) for neuromorphic computing based on superconducting optoelectronics that presents many of the features necessary for information processing in the brain. Here we discuss the design and training of networks of neurons and synapses based on this technology. We present circuit models for the simplest neurons and synapses that we can use to build networks. We discuss the further abstracted integrate and fire model that we use for evolutionary optimization of small networks of these neurons. We show that we can use the TENNLab evolutionary optimization programming framework to design small networks for logic, control and classification tasks. We plan to use the results as feedback to inform our neuron design.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"1998 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129758030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
2018 IEEE International Conference on Rebooting Computing, ICRC 2018, McLean, VA, USA, November 7-9, 2018 2018 IEEE计算机重启国际会议,红十字国际委员会2018,美国弗吉尼亚州麦克莱恩,2018年11月7日至9日
Pub Date : 2018-11-01 DOI: 10.1109/icrc.2018.8638615
{"title":"2018 IEEE International Conference on Rebooting Computing, ICRC 2018, McLean, VA, USA, November 7-9, 2018","authors":"","doi":"10.1109/icrc.2018.8638615","DOIUrl":"https://doi.org/10.1109/icrc.2018.8638615","url":null,"abstract":"","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130345113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Level Optimization for Large Fan-In Optical Logic Circuits Using Integrated Nanophotonics 基于集成纳米光子学的大扇入光逻辑电路多级优化
Pub Date : 2018-11-01 DOI: 10.1109/ICRC.2018.8638607
Takumi Egawa, T. Ishihara, H. Onodera, A. Shinya, S. Kita, K. Nozaki, K. Takata, M. Notomi
Optical circuits constructed using nanophotonic logic gates have attracted significant attention due to its ultra low-latency operation. This paper first introduces conventional optical logic circuits and their issues when the number of inputs is large. Then, we propose a method of minimizing the latency of large fan-in optical logic circuits using a multi-level optimization method. The proposed optimization method reduces not only the delay but also the attenuation of light involved in the circuits. Experimental results obtained targeting a 1024-bit pattern matching circuit show that the circuit optimized with our method is 1.72 times faster than the traditional optical circuit composed of the optical logic gates connected in series, and it is 3.40 times faster than the CMOS 32 nm circuit.
利用纳米光子逻辑门构建的光电路由于其超低延迟的工作方式而备受关注。本文首先介绍了传统的光逻辑电路及其在输入数较大时存在的问题。然后,我们提出了一种使用多级优化方法最小化大风扇光学逻辑电路延迟的方法。所提出的优化方法不仅减少了延迟,而且减少了电路中涉及的光的衰减。针对1024位模式匹配电路的实验结果表明,采用本文方法优化的电路比由光逻辑门串联组成的传统光电路快1.72倍,比CMOS 32 nm电路快3.40倍。
{"title":"Multi-Level Optimization for Large Fan-In Optical Logic Circuits Using Integrated Nanophotonics","authors":"Takumi Egawa, T. Ishihara, H. Onodera, A. Shinya, S. Kita, K. Nozaki, K. Takata, M. Notomi","doi":"10.1109/ICRC.2018.8638607","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638607","url":null,"abstract":"Optical circuits constructed using nanophotonic logic gates have attracted significant attention due to its ultra low-latency operation. This paper first introduces conventional optical logic circuits and their issues when the number of inputs is large. Then, we propose a method of minimizing the latency of large fan-in optical logic circuits using a multi-level optimization method. The proposed optimization method reduces not only the delay but also the attenuation of light involved in the circuits. Experimental results obtained targeting a 1024-bit pattern matching circuit show that the circuit optimized with our method is 1.72 times faster than the traditional optical circuit composed of the optical logic gates connected in series, and it is 3.40 times faster than the CMOS 32 nm circuit.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124038127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SNRA: A Spintronic Neuromorphic Reconfigurable Array for In-Circuit Training and Evaluation of Deep Belief Networks SNRA:一种用于深度信念网络在线训练和评估的自旋电子神经形态可重构阵列
Pub Date : 2018-11-01 DOI: 10.1109/ICRC.2018.8638604
Ramtin Zand, R. Demara
In this paper, a spintronic neuromorphic reconfigurable Array (SNRA)is developed to fuse together power-efficient probabilistic and in-field programmable deterministic computing during both training and evaluation phases of restricted Boltzmann machines (RBMs). First, probabilistic spin logic devices are used to develop an RBM realization which is adapted to construct deep belief networks (DBNs)having one to three hidden layers of size 10 to 800 neurons each. Second, we design a hardware implementation for the contrastive divergence (CD)algorithm using a four-state finite state machine capable of unsupervised training in N+3 clocks where N denotes the number of neurons in each RBM. The functionality of our proposed CD hardware implementation is validated using ModelSim simulations. We synthesize the developed Verilog HDL implementation of our proposed test/train control circuitry for various DBN topologies where the maximal RBM dimensions yield resource utilization ranging from 51 to 2,421 lookup tables (LUTs). Next, we leverage spin Hall effect (SHE)-magnetic tunnel junction (MTJ)based non-volatile LUTs circuits as an alternative for static random access memory (SRAM)-based LUTs storing the deterministic logic configuration to form a reconfigurable fabric. Finally, we compare the performance of our proposed SNRA with SRAM-based configurable fabrics focusing on the area and power consumption induced by the LUTs used to implement both CD and evaluation modes. The results obtained indicate more than 80% reduction in combined dynamic and static power dissipation, while achieving at least 50% reduction in device count.
本文开发了一种自旋电子神经形态可重构阵列(SNRA),用于在受限玻尔兹曼机(rbm)的训练和评估阶段融合低功耗概率和现场可编程确定性计算。首先,使用概率自旋逻辑器件开发了一种RBM实现,该实现适用于构建具有1至3个隐藏层的深度信念网络(dbn),每个隐藏层的大小为10至800个神经元。其次,我们使用能够在N+3时钟中进行无监督训练的四状态有限状态机设计了对比发散(CD)算法的硬件实现,其中N表示每个RBM中的神经元数量。我们提出的CD硬件实现的功能使用ModelSim模拟进行了验证。我们综合开发的Verilog HDL实现了我们提出的各种DBN拓扑的测试/列车控制电路,其中最大RBM维度产生的资源利用率从51到2,421查找表(lut)不等。接下来,我们利用基于自旋霍尔效应(SHE)-磁隧道结(MTJ)的非易失性lut电路作为基于静态随机存取存储器(SRAM)的lut存储确定性逻辑配置的替代方案,以形成可重构结构。最后,我们比较了我们提出的SNRA与基于sram的可配置结构的性能,重点关注用于实现CD和评估模式的lut引起的面积和功耗。所获得的结果表明,动态和静态综合功耗降低80%以上,同时实现至少50%的器件数量减少。
{"title":"SNRA: A Spintronic Neuromorphic Reconfigurable Array for In-Circuit Training and Evaluation of Deep Belief Networks","authors":"Ramtin Zand, R. Demara","doi":"10.1109/ICRC.2018.8638604","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638604","url":null,"abstract":"In this paper, a spintronic neuromorphic reconfigurable Array (SNRA)is developed to fuse together power-efficient probabilistic and in-field programmable deterministic computing during both training and evaluation phases of restricted Boltzmann machines (RBMs). First, probabilistic spin logic devices are used to develop an RBM realization which is adapted to construct deep belief networks (DBNs)having one to three hidden layers of size 10 to 800 neurons each. Second, we design a hardware implementation for the contrastive divergence (CD)algorithm using a four-state finite state machine capable of unsupervised training in N+3 clocks where N denotes the number of neurons in each RBM. The functionality of our proposed CD hardware implementation is validated using ModelSim simulations. We synthesize the developed Verilog HDL implementation of our proposed test/train control circuitry for various DBN topologies where the maximal RBM dimensions yield resource utilization ranging from 51 to 2,421 lookup tables (LUTs). Next, we leverage spin Hall effect (SHE)-magnetic tunnel junction (MTJ)based non-volatile LUTs circuits as an alternative for static random access memory (SRAM)-based LUTs storing the deterministic logic configuration to form a reconfigurable fabric. Finally, we compare the performance of our proposed SNRA with SRAM-based configurable fabrics focusing on the area and power consumption induced by the LUTs used to implement both CD and evaluation modes. The results obtained indicate more than 80% reduction in combined dynamic and static power dissipation, while achieving at least 50% reduction in device count.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121451471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Merge Network for a Non-Von Neumann Accumulate Accelerator in a 3D Chip 三维芯片中非冯诺依曼累积加速器的合并网络
Pub Date : 2018-11-01 DOI: 10.1109/ICRC.2018.8638619
Anirudh Jain, S. Srikanth, E. Debenedictis, T. Krishna
Logic-memory integration helps mitigate the von Neumann bottleneck, and this has enabled a new class of architectures that helps accelerate graph analytics and operations on sparse data streams. These utilize merge networks as a key unit of computation. Such networks are highly parallel and their performance increases with tighter coupling between logic and memory when a bitonic algorithm is used. This paper presents energy-efficient on-chip network architectures for merging key-value pairs using both word-parallel and bit-serial paradigms. The proposed architectures are capable of merging two rows of high bandwidth memory (HBM)worth of data in a manner that is completely overlapped with the reading from and writing back to such a row. Furthermore, their energy consumption is about an order of magnitude lower when compared to a naive crossbar based design.
逻辑-内存集成有助于缓解冯·诺伊曼瓶颈,这使得一类新的架构能够帮助加速图形分析和对稀疏数据流的操作。它们利用合并网络作为关键的计算单元。这种网络是高度并行的,当使用双元算法时,其性能随着逻辑和内存之间更紧密的耦合而提高。本文提出了一种采用字并行和位串行两种模式合并键值对的高效片上网络结构。所建议的体系结构能够合并两行高带宽内存(HBM)价值的数据,其方式与对这一行的读取和回写完全重叠。此外,它们的能耗比单纯的基于横梁的设计低一个数量级。
{"title":"Merge Network for a Non-Von Neumann Accumulate Accelerator in a 3D Chip","authors":"Anirudh Jain, S. Srikanth, E. Debenedictis, T. Krishna","doi":"10.1109/ICRC.2018.8638619","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638619","url":null,"abstract":"Logic-memory integration helps mitigate the von Neumann bottleneck, and this has enabled a new class of architectures that helps accelerate graph analytics and operations on sparse data streams. These utilize merge networks as a key unit of computation. Such networks are highly parallel and their performance increases with tighter coupling between logic and memory when a bitonic algorithm is used. This paper presents energy-efficient on-chip network architectures for merging key-value pairs using both word-parallel and bit-serial paradigms. The proposed architectures are capable of merging two rows of high bandwidth memory (HBM)worth of data in a manner that is completely overlapped with the reading from and writing back to such a row. Furthermore, their energy consumption is about an order of magnitude lower when compared to a naive crossbar based design.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114328115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An Oscillatory Neural Network with Programmable Resistive Synapses in 28 Nm CMOS 具有可编程电阻突触的28纳米CMOS振荡神经网络
Pub Date : 2018-11-01 DOI: 10.1109/ICRC.2018.8638600
T. C. Jackson, S. Pagliarini, L. Pileggi
Implementing scalable and effective synaptic networks will enable neuromorphic computing to deliver on its promise of revolutionizing computing. RRAM represents the most promising technology for realizing the fully connected synapse network: By using programmable resistive elements as weights, RRAM can modulate the strength of synapses in a neural network architecture. Oscillatory Neural Networks (ONNs)that are based on phase-locked loop (PLL)neurons are compatible with the resistive synapses but otherwise rather impractical. In this paper, A PLL-free ONN is implemented in 28 nm CMOS and compared to its PLL-based counterpart. Our silicon results show that the PLL-free architecture is compatible with resistive synapses, addresses practical implementation issues for improved robustness, and demonstrates favorable energy consumption compared to state-of-the-art NNs.
实现可扩展和有效的突触网络将使神经形态计算实现其革命性计算的承诺。RRAM代表了实现全连接突触网络最有前途的技术:通过使用可编程电阻元件作为权重,RRAM可以调节神经网络架构中突触的强度。基于锁相环(PLL)神经元的振荡神经网络(ONNs)与电阻性突触兼容,但在其他方面不切实际。在本文中,无锁相环的ONN在28纳米CMOS中实现,并与基于锁相环的ONN进行了比较。我们的硅结果表明,无锁相环架构与电阻突触兼容,解决了提高鲁棒性的实际实施问题,并且与最先进的神经网络相比,显示出有利的能耗。
{"title":"An Oscillatory Neural Network with Programmable Resistive Synapses in 28 Nm CMOS","authors":"T. C. Jackson, S. Pagliarini, L. Pileggi","doi":"10.1109/ICRC.2018.8638600","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638600","url":null,"abstract":"Implementing scalable and effective synaptic networks will enable neuromorphic computing to deliver on its promise of revolutionizing computing. RRAM represents the most promising technology for realizing the fully connected synapse network: By using programmable resistive elements as weights, RRAM can modulate the strength of synapses in a neural network architecture. Oscillatory Neural Networks (ONNs)that are based on phase-locked loop (PLL)neurons are compatible with the resistive synapses but otherwise rather impractical. In this paper, A PLL-free ONN is implemented in 28 nm CMOS and compared to its PLL-based counterpart. Our silicon results show that the PLL-free architecture is compatible with resistive synapses, addresses practical implementation issues for improved robustness, and demonstrates favorable energy consumption compared to state-of-the-art NNs.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125857671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
ICRC 2018 Committees 红十字国际委员会2018委员会
Pub Date : 2018-11-01 DOI: 10.1109/icrc.2018.8638620
{"title":"ICRC 2018 Committees","authors":"","doi":"10.1109/icrc.2018.8638620","DOIUrl":"https://doi.org/10.1109/icrc.2018.8638620","url":null,"abstract":"","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130964209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parallel Quantum Computing Emulation 并行量子计算仿真
Pub Date : 2018-11-01 DOI: 10.1109/ICRC.2018.8638597
B. Cour, S. Lanham, Corey I. Ostrove
Quantum computers provide a fundamentally new computing paradigm that promises to revolutionize our ability to solve broad classes of problems. Surprisingly, the basic mathematical structures of gate-based quantum computing, such as unitary operations on a finite-dimensional Hilbert space, are not unique to quantum systems but may be found in certain classical systems as well. Previously, it has been shown that one can represent an arbitrary multi-qubit quantum state in terms of classical analog signals using nested quadrature amplitude modulated signals. Furthermore, using digitally controlled analog electronics one may manipulate these signals to perform quantum gate operations and thereby execute quantum algorithms. The computational capacity of a single signal is, however, limited by the required bandwidth, which scales exponentially with the number of qubits when represented using frequency-based encoding. To overcome this limitation, we introduce a method to extend this approach to multiple parallel signals. Doing so allows a larger quantum state to be emulated with the same gate time required for processing frequency-encoded signals. In the proposed representation, each doubling of the number of signals corresponds to an additional qubit in the spatial domain. Single quit gate operations are similarly extended so as to operate on qubits represented using either frequency-based or spatial encoding schemes. Furthermore, we describe a method to perform gate operations between pairs of qubits represented using frequency or spatial encoding or between frequency-based and spatially encoded qubits. Finally, we describe how this approach may be extended to represent qubits in the time domain as well.
量子计算机提供了一种全新的计算范式,有望彻底改变我们解决广泛问题的能力。令人惊讶的是,基于门的量子计算的基本数学结构,如有限维希尔伯特空间上的幺正运算,并不是量子系统所独有的,而是可以在某些经典系统中找到的。以前,已经证明可以使用嵌套正交调幅信号在经典模拟信号方面表示任意多量子位量子态。此外,使用数字控制的模拟电子学可以操纵这些信号以执行量子门操作并因此执行量子算法。然而,单个信号的计算能力受到所需带宽的限制,当使用基于频率的编码表示时,带宽随量子位的数量呈指数级增长。为了克服这一限制,我们引入了一种方法,将这种方法扩展到多个并行信号。这样做可以用处理频率编码信号所需的相同门时间来模拟更大的量子态。在提出的表示中,信号数量的每增加一倍对应于空间域中的一个额外量子位。单退出门操作被类似地扩展,以便在使用基于频率或空间编码方案表示的量子位上操作。此外,我们描述了一种在使用频率或空间编码表示的量子比特对之间或基于频率和空间编码的量子比特之间执行门操作的方法。最后,我们描述了如何将这种方法扩展到在时域中表示量子位。
{"title":"Parallel Quantum Computing Emulation","authors":"B. Cour, S. Lanham, Corey I. Ostrove","doi":"10.1109/ICRC.2018.8638597","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638597","url":null,"abstract":"Quantum computers provide a fundamentally new computing paradigm that promises to revolutionize our ability to solve broad classes of problems. Surprisingly, the basic mathematical structures of gate-based quantum computing, such as unitary operations on a finite-dimensional Hilbert space, are not unique to quantum systems but may be found in certain classical systems as well. Previously, it has been shown that one can represent an arbitrary multi-qubit quantum state in terms of classical analog signals using nested quadrature amplitude modulated signals. Furthermore, using digitally controlled analog electronics one may manipulate these signals to perform quantum gate operations and thereby execute quantum algorithms. The computational capacity of a single signal is, however, limited by the required bandwidth, which scales exponentially with the number of qubits when represented using frequency-based encoding. To overcome this limitation, we introduce a method to extend this approach to multiple parallel signals. Doing so allows a larger quantum state to be emulated with the same gate time required for processing frequency-encoded signals. In the proposed representation, each doubling of the number of signals corresponds to an additional qubit in the spatial domain. Single quit gate operations are similarly extended so as to operate on qubits represented using either frequency-based or spatial encoding schemes. Furthermore, we describe a method to perform gate operations between pairs of qubits represented using frequency or spatial encoding or between frequency-based and spatially encoded qubits. Finally, we describe how this approach may be extended to represent qubits in the time domain as well.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134213114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2018 IEEE International Conference on Rebooting Computing (ICRC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1