首页 > 最新文献

2018 IEEE International Conference on Rebooting Computing (ICRC)最新文献

英文 中文
2018 IEEE International Conference on Rebooting Computing (ICRC) 2018 IEEE计算机重启国际会议(ICRC)
Pub Date : 2018-11-01 DOI: 10.1109/icrc.2018.8638589
{"title":"2018 IEEE International Conference on Rebooting Computing (ICRC)","authors":"","doi":"10.1109/icrc.2018.8638589","DOIUrl":"https://doi.org/10.1109/icrc.2018.8638589","url":null,"abstract":"","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130663555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
High-Level Synthesis of Non-Rectangular Multi-Dimensional Nested Loops Using Reshaping and Vectorization 基于重构和矢量化的非矩形多维嵌套环的高级综合
Pub Date : 2018-11-01 DOI: 10.1109/ICRC.2018.8638593
Sahand Salamat, M. Azarbad, B. Alizadeh
High-Ievel synthesis accelerates the process of design space exploration in which various transformations and optimizations can be applied to the high-level description. In this paper, a new method has been proposed to improve the high-level synthesis process for non-rectangular multi-dimensional nested loops using reshaping and vectorization techniques. As the high-level descriptions with non-rectangular iteration spaces do not lend themselves well to efficient high-level synthesis process, our method proposes a reshaping technique to convert nonrectangular iteration spaces with certain inter-iteration dependencies to the rectangular ones. Furthermore, the proposed method suggests a vectorization technique to let the different iterations be executed simultaneously in a manner which does not violate inter-iteration dependencies. Finally, this paper combines the proposed reshaping and vectorization techniques to a hybrid method which supports both 2D and 3D perfect/imperfect nested loops and can be extended for the nested loops of dimensions more than three. According to the experimental results, the proposed hybrid method shows average speed-up of 51.9%, 50.1%, and 15.9% in comparison with the state-of-the-art methods for the pipelined perfect, the pipelined imperfect, and the pipelined 3D nested loops, respectively.
高级综合加速了设计空间探索的过程,其中各种转换和优化可以应用于高级描述。本文提出了一种利用整形和矢量化技术改进非矩形多维嵌套环的高级合成过程的新方法。由于具有非矩形迭代空间的高层描述不能很好地进行高效的高层综合处理,本文提出了一种将具有一定迭代间依赖关系的非矩形迭代空间转换为矩形迭代空间的重构技术。此外,该方法提出了一种矢量化技术,使不同的迭代以不违反迭代间依赖关系的方式同时执行。最后,本文将所提出的整形和矢量化技术结合成一种混合方法,该方法同时支持二维和三维完美/不完美嵌套循环,并可扩展到三维以上的嵌套循环。实验结果表明,与现有方法相比,本文提出的混合算法对流水线完美、流水线不完美和流水线三维嵌套循环的平均速度分别提高了51.9%、50.1%和15.9%。
{"title":"High-Level Synthesis of Non-Rectangular Multi-Dimensional Nested Loops Using Reshaping and Vectorization","authors":"Sahand Salamat, M. Azarbad, B. Alizadeh","doi":"10.1109/ICRC.2018.8638593","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638593","url":null,"abstract":"High-Ievel synthesis accelerates the process of design space exploration in which various transformations and optimizations can be applied to the high-level description. In this paper, a new method has been proposed to improve the high-level synthesis process for non-rectangular multi-dimensional nested loops using reshaping and vectorization techniques. As the high-level descriptions with non-rectangular iteration spaces do not lend themselves well to efficient high-level synthesis process, our method proposes a reshaping technique to convert nonrectangular iteration spaces with certain inter-iteration dependencies to the rectangular ones. Furthermore, the proposed method suggests a vectorization technique to let the different iterations be executed simultaneously in a manner which does not violate inter-iteration dependencies. Finally, this paper combines the proposed reshaping and vectorization techniques to a hybrid method which supports both 2D and 3D perfect/imperfect nested loops and can be extended for the nested loops of dimensions more than three. According to the experimental results, the proposed hybrid method shows average speed-up of 51.9%, 50.1%, and 15.9% in comparison with the state-of-the-art methods for the pipelined perfect, the pipelined imperfect, and the pipelined 3D nested loops, respectively.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133284136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Electric-Field Bit Write-In for Molecular Quantum-Dot Cellular Automata Circuits 分子量子点元胞自动机电路的电场位写入
Pub Date : 2018-11-01 DOI: 10.1109/ICRC.2018.8638591
Jackson Henry, Joseph Previti, E. Blair
Quantum-dot cellular automata (QCA)was conceptualized to provide low-power, high-speed, general-purpose computing in the post-CMOS era. Here, an elementary device, called a “cell” is a system of quantum dots and a few mobile charges. The configuration of charge on a cell encodes a binary state, and cells are networked locally using the electrostatic field. Layouts of QCA cells on a substrate provide non-von-Neumann circuits in which digital logic, interconnections, and memory are intermingled. QCA supports reversible, adiabatic computing for arbitrarily low levels of dissipation. Here, we focus on a molecular implementation of QCA and describe the promise this holds. This discussion includes an outline of an architecture for clocked molecular QCA circuits and some technical challenges remaining before molecular QCA computation may be realized. This work focuses on the challenge of using macroscopic devices to write-in bits to nanoscale QCA molecules. We use an electric field established between electrodes fabricated using standard, mature lithographic processes, and the field need not feature single-molecule specificity. An intercellular Hartree approximation is used to model the state of an $N-$ molecule circuit. Simulations of a method for providing bit inputs to clocked molecular circuits are shown.
量子点元胞自动机(QCA)的概念是在后cmos时代提供低功耗,高速,通用的计算。这里,一个叫做“细胞”的基本装置是由量子点和一些移动电荷组成的系统。细胞上的电荷结构编码了二进制状态,细胞利用静电场在局部联网。QCA单元在衬底上的布局提供了非非诺伊曼电路,其中数字逻辑,互连和存储器混合在一起。QCA支持可逆,绝热计算任意低水平的耗散。在这里,我们专注于QCA的分子实现,并描述了它的前景。本讨论包括一个分子时钟QCA电路的架构大纲,以及在分子QCA计算可能实现之前存在的一些技术挑战。这项工作的重点是使用宏观设备将比特写入纳米级QCA分子的挑战。我们使用标准的、成熟的光刻工艺制造的电极之间建立的电场,并且电场不需要具有单分子特异性。细胞间哈特里近似用于模拟N-$分子电路的状态。模拟一种方法提供位输入时钟分子电路显示。
{"title":"Electric-Field Bit Write-In for Molecular Quantum-Dot Cellular Automata Circuits","authors":"Jackson Henry, Joseph Previti, E. Blair","doi":"10.1109/ICRC.2018.8638591","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638591","url":null,"abstract":"Quantum-dot cellular automata (QCA)was conceptualized to provide low-power, high-speed, general-purpose computing in the post-CMOS era. Here, an elementary device, called a “cell” is a system of quantum dots and a few mobile charges. The configuration of charge on a cell encodes a binary state, and cells are networked locally using the electrostatic field. Layouts of QCA cells on a substrate provide non-von-Neumann circuits in which digital logic, interconnections, and memory are intermingled. QCA supports reversible, adiabatic computing for arbitrarily low levels of dissipation. Here, we focus on a molecular implementation of QCA and describe the promise this holds. This discussion includes an outline of an architecture for clocked molecular QCA circuits and some technical challenges remaining before molecular QCA computation may be realized. This work focuses on the challenge of using macroscopic devices to write-in bits to nanoscale QCA molecules. We use an electric field established between electrodes fabricated using standard, mature lithographic processes, and the field need not feature single-molecule specificity. An intercellular Hartree approximation is used to model the state of an $N-$ molecule circuit. Simulations of a method for providing bit inputs to clocked molecular circuits are shown.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115568985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Efficient Adder Architecture with Three- Independent-Gate Field-Effect Transistors 三独立门场效应晶体管的高效加法器结构
Pub Date : 2018-11-01 DOI: 10.1109/ICRC.2018.8638608
J. R. Gonzalez, P. Gaillardon
Three-Independent-Gate Field-Effect Transistors (TIGFETs)extend the functional diversity of a single transistor by allowing a dynamic electric reconfiguration of the polarity. This property has been shown to unlock unique circuit level opportunities. In this article, a ripple-carry 32-bit adder is uniquely designed using simulated TIGFET technology and its metrics are compared against CMOS High-Performance (HP)and CMOS Low-Voltage. By adopting TIGFET's polarity control characteristic, the proposed ripple-carry adder architecture uses efficient exclusive OR and majority gates to compute complementary carry signals in parallel, leading to a 38% decrease in logic depth as compared to the standard CMOS design. Additionally, a 38% reduction in contacted gates reduces the effects coming from an interconnect-limited design. The results show that the decrease in the logic depth and the reduction in contacted gates lead to a 3.8x lower energy-delay product and a 5.6x lower area-delay product as compared with CMOS HP. The boost in performance coming from realizing arithmetic circuits with TIGFET transistors makes them a promising next-generation high-performance device technology.
三独立门场效应晶体管(tigfet)通过允许极性的动态电重新配置扩展了单个晶体管的功能多样性。这个特性已经被证明可以解锁独特的电路级机会。在本文中,采用模拟TIGFET技术设计了一种独特的波纹进位32位加法器,并将其指标与CMOS高性能(HP)和CMOS低电压进行了比较。通过采用TIGFET的极性控制特性,所提出的纹波进位加法器架构使用高效的排他或多数门并行计算互补进位信号,与标准CMOS设计相比,逻辑深度降低了38%。此外,接触门减少38%,减少了互连限制设计带来的影响。结果表明,与CMOS HP相比,逻辑深度的降低和接触门的减少使能量延迟积降低3.8倍,面积延迟积降低5.6倍。用TIGFET晶体管实现算术电路的性能提升使其成为有前途的下一代高性能器件技术。
{"title":"An Efficient Adder Architecture with Three- Independent-Gate Field-Effect Transistors","authors":"J. R. Gonzalez, P. Gaillardon","doi":"10.1109/ICRC.2018.8638608","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638608","url":null,"abstract":"Three-Independent-Gate Field-Effect Transistors (TIGFETs)extend the functional diversity of a single transistor by allowing a dynamic electric reconfiguration of the polarity. This property has been shown to unlock unique circuit level opportunities. In this article, a ripple-carry 32-bit adder is uniquely designed using simulated TIGFET technology and its metrics are compared against CMOS High-Performance (HP)and CMOS Low-Voltage. By adopting TIGFET's polarity control characteristic, the proposed ripple-carry adder architecture uses efficient exclusive OR and majority gates to compute complementary carry signals in parallel, leading to a 38% decrease in logic depth as compared to the standard CMOS design. Additionally, a 38% reduction in contacted gates reduces the effects coming from an interconnect-limited design. The results show that the decrease in the logic depth and the reduction in contacted gates lead to a 3.8x lower energy-delay product and a 5.6x lower area-delay product as compared with CMOS HP. The boost in performance coming from realizing arithmetic circuits with TIGFET transistors makes them a promising next-generation high-performance device technology.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123757531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
RNSnet: In-Memory Neural Network Acceleration Using Residue Number System 基于剩余数系统的内存神经网络加速
Pub Date : 2018-11-01 DOI: 10.1109/ICRC.2018.8638592
Sahand Salamat, M. Imani, Saransh Gupta, T. Simunic
We live in a world where technological advances are continually creating more data than what we can deal with. Machine learning algorithms, in particular Deep Neural Networks (DNNs), are essential to process such large data. Computation of DNNs requires loading the trained network on the processing element and storing the result in memory. Therefore, running these applications need a high memory bandwidth. Traditional cores are memory limited in terms of the memory bandwidth. Hence, running DNNs on traditional cores results in high energy consumption and slows down processing speed due to a large amount of data movement between memory and processing units. Several prior works tried to address data movement issue by enabling Processing In-Memory (PIM)using crossbar analog multiplication. However, these designs suffer from the large overhead of data conversion between analog and digital domains. In this work, we propose RNSnet, which uses Residue Number System (RNS)to execute neural network completely in the digital domain in memory. RNSnet simplifies the fundamental neural network operations and maps them to in-memory addition and data access. We test the efficiency of the proposed design on several popular neural network applications. Our experimental result shows that RNSnet consumes 145.5x less energy and obtains 35.4x speedup as compared to NVIDIA GPU GTX 1080. In addition, our results show that RNSnet can achieve 8.5 x higher energy-delay product as compared to the state-of-the-art neural network accelerators.
在我们生活的世界里,技术进步不断创造出超出我们处理能力的数据。机器学习算法,特别是深度神经网络(dnn),对于处理如此大的数据至关重要。深度神经网络的计算需要将训练好的网络加载到处理单元上,并将结果存储在内存中。因此,运行这些应用程序需要较高的内存带宽。传统核心在内存带宽方面受到限制。因此,在传统核心上运行dnn会导致高能耗,并且由于内存和处理单元之间的大量数据移动而减慢处理速度。之前的一些工作试图通过启用内存处理(PIM)来解决数据移动问题,使用交叉杆模拟乘法。然而,这些设计受到模拟和数字域之间数据转换的巨大开销的影响。在这项工作中,我们提出了RNSnet,它使用剩余数系统(RNS)在内存的数字域中完全执行神经网络。RNSnet简化了基本的神经网络操作,并将它们映射到内存中添加和数据访问。我们在几个流行的神经网络应用中测试了所提出设计的效率。我们的实验结果表明,与NVIDIA GPU GTX 1080相比,RNSnet消耗的能量减少了145.5倍,获得了35.4倍的加速。此外,我们的研究结果表明,与最先进的神经网络加速器相比,RNSnet可以实现8.5倍的能量延迟产物。
{"title":"RNSnet: In-Memory Neural Network Acceleration Using Residue Number System","authors":"Sahand Salamat, M. Imani, Saransh Gupta, T. Simunic","doi":"10.1109/ICRC.2018.8638592","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638592","url":null,"abstract":"We live in a world where technological advances are continually creating more data than what we can deal with. Machine learning algorithms, in particular Deep Neural Networks (DNNs), are essential to process such large data. Computation of DNNs requires loading the trained network on the processing element and storing the result in memory. Therefore, running these applications need a high memory bandwidth. Traditional cores are memory limited in terms of the memory bandwidth. Hence, running DNNs on traditional cores results in high energy consumption and slows down processing speed due to a large amount of data movement between memory and processing units. Several prior works tried to address data movement issue by enabling Processing In-Memory (PIM)using crossbar analog multiplication. However, these designs suffer from the large overhead of data conversion between analog and digital domains. In this work, we propose RNSnet, which uses Residue Number System (RNS)to execute neural network completely in the digital domain in memory. RNSnet simplifies the fundamental neural network operations and maps them to in-memory addition and data access. We test the efficiency of the proposed design on several popular neural network applications. Our experimental result shows that RNSnet consumes 145.5x less energy and obtains 35.4x speedup as compared to NVIDIA GPU GTX 1080. In addition, our results show that RNSnet can achieve 8.5 x higher energy-delay product as compared to the state-of-the-art neural network accelerators.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128398144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
ICRC 2018 Sponsors 红十字国际委员会2018年赞助商
Pub Date : 2018-11-01 DOI: 10.1109/icrc.2018.8638617
{"title":"ICRC 2018 Sponsors","authors":"","doi":"10.1109/icrc.2018.8638617","DOIUrl":"https://doi.org/10.1109/icrc.2018.8638617","url":null,"abstract":"","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128892018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ICRC 2018 Plenary Talk 红十字国际委员会2018年全体会议
Pub Date : 2018-11-01 DOI: 10.1109/icrc.2018.8638587
{"title":"ICRC 2018 Plenary Talk","authors":"","doi":"10.1109/icrc.2018.8638587","DOIUrl":"https://doi.org/10.1109/icrc.2018.8638587","url":null,"abstract":"","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121437420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SC-SD: Towards Low Power Stochastic Computing Using Sigma Delta Streams 利用Sigma Delta流实现低功耗随机计算
Pub Date : 2018-11-01 DOI: 10.1109/ICRC.2018.8638611
Patricia Gonzalez-Guerrero, Xinfei Guo, M. Stan
Processing data using Stochastic Computing (SC) requires only $sim$ 7% of the area and power of the typical binary approach. However, SC has two major drawbacks that eclipse any area and power savings. First, it takes $sim$ 99% more time to finish a computation when compared with the binary approach, since data is represented as streams of bits. Second, the Linear Feedback Shift Registers (LFSRs) required to generate the stochastic streams increment the power and area of the overall SC-LFSR system. These drawbacks result in similar or higher area, power, and energy numbers when compared with the binary counterpart. In this work, we address these drawbacks by applying SC directly on Pulse Density Modulated (PDM) streams. Most modern Systems on Chip (SoCs) already include Analog to Digital Converters (ADCs). The core of $SigmaDelta$ -ADCs is the $SigmaDelta$ Modulator whose output is a PDM stream. Our approach (SC-SD) simplifies the system hardware in two ways. First, we drop the filter stage at the ADC and, second, we replace the costly Stochastic Number Generators (SNGs) with $SigmaDelta$ -Modulators. To further lower the system complexity, we adopt an Asynchronous $SigmaDelta$ -Modulator $(mathrm{A}SigmaDelta mathrm{M})$ architecture. We design and simulate the $mathrm{A}SigmaDelta mathrm{M}$: using an industry-standard 1×FinFET11In modern technologies the node number does not refer to any one feature in the process, and foundries use slightly different conventions; we use 1x to denote the 14/16nm FinFET nodes offered by the foundry. technology with foundry models. We achieve power savings of 81 % in SNG compared to the LFSR approach. To evaluate how this area and power savings scale to more complex applications, we implement Gamma Correction, a popular image processing algorithm. For this application, our simulations show that SC-SD can save 98%-11% in the total system latency and 50%-38% in power consumption when compared with the SC-LFSR approach or the binary counterpart.
使用随机计算(SC)处理数据只需要$sim$ 7% of the area and power of the typical binary approach. However, SC has two major drawbacks that eclipse any area and power savings. First, it takes $sim$ 99% more time to finish a computation when compared with the binary approach, since data is represented as streams of bits. Second, the Linear Feedback Shift Registers (LFSRs) required to generate the stochastic streams increment the power and area of the overall SC-LFSR system. These drawbacks result in similar or higher area, power, and energy numbers when compared with the binary counterpart. In this work, we address these drawbacks by applying SC directly on Pulse Density Modulated (PDM) streams. Most modern Systems on Chip (SoCs) already include Analog to Digital Converters (ADCs). The core of $SigmaDelta$ -ADCs is the $SigmaDelta$ Modulator whose output is a PDM stream. Our approach (SC-SD) simplifies the system hardware in two ways. First, we drop the filter stage at the ADC and, second, we replace the costly Stochastic Number Generators (SNGs) with $SigmaDelta$ -Modulators. To further lower the system complexity, we adopt an Asynchronous $SigmaDelta$ -Modulator $(mathrm{A}SigmaDelta mathrm{M})$ architecture. We design and simulate the $mathrm{A}SigmaDelta mathrm{M}$: using an industry-standard 1×FinFET11In modern technologies the node number does not refer to any one feature in the process, and foundries use slightly different conventions; we use 1x to denote the 14/16nm FinFET nodes offered by the foundry. technology with foundry models. We achieve power savings of 81 % in SNG compared to the LFSR approach. To evaluate how this area and power savings scale to more complex applications, we implement Gamma Correction, a popular image processing algorithm. For this application, our simulations show that SC-SD can save 98%-11% in the total system latency and 50%-38% in power consumption when compared with the SC-LFSR approach or the binary counterpart.
{"title":"SC-SD: Towards Low Power Stochastic Computing Using Sigma Delta Streams","authors":"Patricia Gonzalez-Guerrero, Xinfei Guo, M. Stan","doi":"10.1109/ICRC.2018.8638611","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638611","url":null,"abstract":"Processing data using Stochastic Computing (SC) requires only $sim$ 7% of the area and power of the typical binary approach. However, SC has two major drawbacks that eclipse any area and power savings. First, it takes $sim$ 99% more time to finish a computation when compared with the binary approach, since data is represented as streams of bits. Second, the Linear Feedback Shift Registers (LFSRs) required to generate the stochastic streams increment the power and area of the overall SC-LFSR system. These drawbacks result in similar or higher area, power, and energy numbers when compared with the binary counterpart. In this work, we address these drawbacks by applying SC directly on Pulse Density Modulated (PDM) streams. Most modern Systems on Chip (SoCs) already include Analog to Digital Converters (ADCs). The core of $SigmaDelta$ -ADCs is the $SigmaDelta$ Modulator whose output is a PDM stream. Our approach (SC-SD) simplifies the system hardware in two ways. First, we drop the filter stage at the ADC and, second, we replace the costly Stochastic Number Generators (SNGs) with $SigmaDelta$ -Modulators. To further lower the system complexity, we adopt an Asynchronous $SigmaDelta$ -Modulator $(mathrm{A}SigmaDelta mathrm{M})$ architecture. We design and simulate the $mathrm{A}SigmaDelta mathrm{M}$: using an industry-standard 1×FinFET11In modern technologies the node number does not refer to any one feature in the process, and foundries use slightly different conventions; we use 1x to denote the 14/16nm FinFET nodes offered by the foundry. technology with foundry models. We achieve power savings of 81 % in SNG compared to the LFSR approach. To evaluate how this area and power savings scale to more complex applications, we implement Gamma Correction, a popular image processing algorithm. For this application, our simulations show that SC-SD can save 98%-11% in the total system latency and 50%-38% in power consumption when compared with the SC-LFSR approach or the binary counterpart.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114349983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Image Classification Using Quantum Inference on the D-Wave 2X 基于D-Wave 2X的量子推理图像分类
Pub Date : 2018-11-01 DOI: 10.1109/ICRC.2018.8638596
N. T. Nguyen, Garrett T. Kenyon
We use a quantum annealing D-Wave 2X computer to obtain solutions to NP-hard sparse coding problems. To reduce the dimensionality of the sparse coding problem to fit on the quantum D-Wave 2X hardware, we passed downsampled MNIST images through a bottleneck autoencoder. To establish a benchmark for classification performance on this reduced dimensional data set, we built two deep convolutional neural networks (DCNNs). The first DCNN used an AlexNet-like architecture and the second a state-of-the-art residual network (RESNET)model, both implemented in TensorFlow. The two DCNNs yielded classification scores of 94.54 ± 0.7% and 98.8 ± 0.1%, respectively. As a control, we showed that both DCNN architectures produced near-state-of-the-art classification performance $(sim99%)$ on the original MNIST images. To obtain a set of optimized features for inferring sparse representations of the reduced dimensional MNIST dataset, we imprinted on a random set of 47 image patches followed by an off-line unsupervised learning algorithm using stochastic gradient descent to optimize for sparse coding. Our single-layer of sparse coding matched the stride and patch size of the first convolutional layer of the AlexNet-like DCNN and contained 47 fully-connected features, 47 being the maximum number of dictionary elements that could be embedded onto the D-Wave 2X hardware. When the sparse representations inferred by the D-Wave 2X were passed to a linear support vector machine, we obtained a classification score of 95.68%. We found that the classification performance supported by quantum inference was maximal at an optimal level of sparsity corresponding to a critical value of the sparsity/reconstruction error trade-off parameter that previous work has associated with a second order phase transition, an observation supported by a free energy analysis of D-Wave energy states. We mimicked a transfer learning protocol by feeding the D-Wave representations into a multilayer perceptron (MLP), yielding 98.48% classification performance. The classification performance supported by a single-layer of quantum inference was superior to that supported by a classical matching pursuit algorithm set to the same level of sparsity. Whereas the classification performance of both DCNNs declined as the number of training examples was reduced, the classification performance supported by quantum inference was insensitive to the number of training examples. We thus conclude that quantum inference supports classification of reduced dimensional MNIST images exceeding that of a size-matched AlexNet-like DCNN and nearly equivalent to a state-of-the-art RESNET DCNN.
我们使用量子退火D-Wave 2X计算机来获得NP-hard稀疏编码问题的解。为了降低稀疏编码问题的维数以适应量子D-Wave 2X硬件,我们通过瓶颈自编码器传递下采样的MNIST图像。为了在这个降维数据集上建立分类性能的基准,我们构建了两个深度卷积神经网络(DCNNs)。第一个DCNN使用了类似alexnet的架构,第二个使用了最先进的残余网络(RESNET)模型,两者都是在TensorFlow中实现的。两种DCNNs的分类评分分别为94.54±0.7%和98.8±0.1%。作为对照,我们展示了两种DCNN架构在原始MNIST图像上产生了接近最先进的分类性能。为了获得一组用于推断降维MNIST数据集稀疏表示的优化特征,我们在47个图像补丁的随机集合上进行了印迹,然后使用离线无监督学习算法使用随机梯度下降来优化稀疏编码。我们的单层稀疏编码与类似alexnet的DCNN的第一个卷积层的步幅和补丁大小相匹配,并包含47个完全连接的特征,47个是可以嵌入到D-Wave 2X硬件上的最大字典元素数量。当将D-Wave 2X推断的稀疏表示传递给线性支持向量机时,我们得到了95.68%的分类分数。我们发现量子推理支持的分类性能在最佳稀疏度水平上是最大的,对应于先前的工作与二阶相变相关的稀疏度/重构误差权衡参数的临界值,这一观察得到了d波能量态的自由能分析的支持。我们通过将D-Wave表示输入多层感知器(MLP)来模拟迁移学习协议,获得98.48%的分类性能。单层量子推理支持的分类性能优于相同稀疏度的经典匹配追踪算法支持的分类性能。两种DCNNs的分类性能都随着训练样例数量的减少而下降,而量子推理支持的分类性能对训练样例数量不敏感。因此,我们得出结论,量子推理支持降维MNIST图像的分类,超过了类似alexnet的DCNN的大小匹配,几乎相当于最先进的RESNET DCNN。
{"title":"Image Classification Using Quantum Inference on the D-Wave 2X","authors":"N. T. Nguyen, Garrett T. Kenyon","doi":"10.1109/ICRC.2018.8638596","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638596","url":null,"abstract":"We use a quantum annealing D-Wave 2X computer to obtain solutions to NP-hard sparse coding problems. To reduce the dimensionality of the sparse coding problem to fit on the quantum D-Wave 2X hardware, we passed downsampled MNIST images through a bottleneck autoencoder. To establish a benchmark for classification performance on this reduced dimensional data set, we built two deep convolutional neural networks (DCNNs). The first DCNN used an AlexNet-like architecture and the second a state-of-the-art residual network (RESNET)model, both implemented in TensorFlow. The two DCNNs yielded classification scores of 94.54 ± 0.7% and 98.8 ± 0.1%, respectively. As a control, we showed that both DCNN architectures produced near-state-of-the-art classification performance $(sim99%)$ on the original MNIST images. To obtain a set of optimized features for inferring sparse representations of the reduced dimensional MNIST dataset, we imprinted on a random set of 47 image patches followed by an off-line unsupervised learning algorithm using stochastic gradient descent to optimize for sparse coding. Our single-layer of sparse coding matched the stride and patch size of the first convolutional layer of the AlexNet-like DCNN and contained 47 fully-connected features, 47 being the maximum number of dictionary elements that could be embedded onto the D-Wave 2X hardware. When the sparse representations inferred by the D-Wave 2X were passed to a linear support vector machine, we obtained a classification score of 95.68%. We found that the classification performance supported by quantum inference was maximal at an optimal level of sparsity corresponding to a critical value of the sparsity/reconstruction error trade-off parameter that previous work has associated with a second order phase transition, an observation supported by a free energy analysis of D-Wave energy states. We mimicked a transfer learning protocol by feeding the D-Wave representations into a multilayer perceptron (MLP), yielding 98.48% classification performance. The classification performance supported by a single-layer of quantum inference was superior to that supported by a classical matching pursuit algorithm set to the same level of sparsity. Whereas the classification performance of both DCNNs declined as the number of training examples was reduced, the classification performance supported by quantum inference was insensitive to the number of training examples. We thus conclude that quantum inference supports classification of reduced dimensional MNIST images exceeding that of a size-matched AlexNet-like DCNN and nearly equivalent to a state-of-the-art RESNET DCNN.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124981324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Simple Constraint Embedding for Quantum Annealers 量子退火炉的简单约束嵌入
Pub Date : 2018-11-01 DOI: 10.1109/ICRC.2018.8638624
Tomás Vyskocil, H. Djidjev
Quantum annealers such as the D-Wave 2X computer are designed to natively solve Quadratic Unconstrained Binary Optimization (QUBO) problems to optimality or near optimality. Most NP-hard problems, which are hard for classical computers, can be naturally described as quadratic binary problems that contain a quadratic binary objective function and one or more constraints. Since a QUBO cannot have constraints, each such constraint has to be added to the objective function as a penalty, in order to solve on D-Wave. For a minimization problem, for instance, such penalty can be a quadratic term that gets a value zero, if the constraint is satisfied, and a large value, if it is not. In many cases, however, the penalty can significantly increase the number of quadratic terms in the resulting QUBO and make it too large to embed into the D-Wave hardware. In this paper, we develop an alternative method for formulating and embedding constraints of the type $sum_{i=1}^{s}x_{i}=1$, which is much more scalable than the existing ones, and analyze the properties of the resulting embeddings.
量子退火器,如D-Wave 2X计算机,旨在解决二次无约束二进制优化(QUBO)问题的最优性或近最优性。大多数经典计算机难以解决的np困难问题,可以很自然地描述为包含二次二进制目标函数和一个或多个约束的二次二进制问题。由于QUBO不能有约束,为了在D-Wave上求解,每个这样的约束都必须作为惩罚添加到目标函数中。例如,对于最小化问题,这样的惩罚可以是一个二次项,如果满足约束,它的值为零,如果不满足约束,它的值就很大。然而,在许多情况下,这种代价会显著增加QUBO中二次项的数量,使其太大而无法嵌入到D-Wave硬件中。在本文中,我们开发了一种替代方法来表示和嵌入类型为$sum_{i=1}^{s}x_{i}=1$的约束,该方法比现有的方法更具可扩展性,并分析了所得到的嵌入的性质。
{"title":"Simple Constraint Embedding for Quantum Annealers","authors":"Tomás Vyskocil, H. Djidjev","doi":"10.1109/ICRC.2018.8638624","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638624","url":null,"abstract":"Quantum annealers such as the D-Wave 2X computer are designed to natively solve Quadratic Unconstrained Binary Optimization (QUBO) problems to optimality or near optimality. Most NP-hard problems, which are hard for classical computers, can be naturally described as quadratic binary problems that contain a quadratic binary objective function and one or more constraints. Since a QUBO cannot have constraints, each such constraint has to be added to the objective function as a penalty, in order to solve on D-Wave. For a minimization problem, for instance, such penalty can be a quadratic term that gets a value zero, if the constraint is satisfied, and a large value, if it is not. In many cases, however, the penalty can significantly increase the number of quadratic terms in the resulting QUBO and make it too large to embed into the D-Wave hardware. In this paper, we develop an alternative method for formulating and embedding constraints of the type $sum_{i=1}^{s}x_{i}=1$, which is much more scalable than the existing ones, and analyze the properties of the resulting embeddings.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129254504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2018 IEEE International Conference on Rebooting Computing (ICRC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1