Pub Date : 2018-11-01DOI: 10.1109/icrc.2018.8638589
{"title":"2018 IEEE International Conference on Rebooting Computing (ICRC)","authors":"","doi":"10.1109/icrc.2018.8638589","DOIUrl":"https://doi.org/10.1109/icrc.2018.8638589","url":null,"abstract":"","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130663555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICRC.2018.8638593
Sahand Salamat, M. Azarbad, B. Alizadeh
High-Ievel synthesis accelerates the process of design space exploration in which various transformations and optimizations can be applied to the high-level description. In this paper, a new method has been proposed to improve the high-level synthesis process for non-rectangular multi-dimensional nested loops using reshaping and vectorization techniques. As the high-level descriptions with non-rectangular iteration spaces do not lend themselves well to efficient high-level synthesis process, our method proposes a reshaping technique to convert nonrectangular iteration spaces with certain inter-iteration dependencies to the rectangular ones. Furthermore, the proposed method suggests a vectorization technique to let the different iterations be executed simultaneously in a manner which does not violate inter-iteration dependencies. Finally, this paper combines the proposed reshaping and vectorization techniques to a hybrid method which supports both 2D and 3D perfect/imperfect nested loops and can be extended for the nested loops of dimensions more than three. According to the experimental results, the proposed hybrid method shows average speed-up of 51.9%, 50.1%, and 15.9% in comparison with the state-of-the-art methods for the pipelined perfect, the pipelined imperfect, and the pipelined 3D nested loops, respectively.
{"title":"High-Level Synthesis of Non-Rectangular Multi-Dimensional Nested Loops Using Reshaping and Vectorization","authors":"Sahand Salamat, M. Azarbad, B. Alizadeh","doi":"10.1109/ICRC.2018.8638593","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638593","url":null,"abstract":"High-Ievel synthesis accelerates the process of design space exploration in which various transformations and optimizations can be applied to the high-level description. In this paper, a new method has been proposed to improve the high-level synthesis process for non-rectangular multi-dimensional nested loops using reshaping and vectorization techniques. As the high-level descriptions with non-rectangular iteration spaces do not lend themselves well to efficient high-level synthesis process, our method proposes a reshaping technique to convert nonrectangular iteration spaces with certain inter-iteration dependencies to the rectangular ones. Furthermore, the proposed method suggests a vectorization technique to let the different iterations be executed simultaneously in a manner which does not violate inter-iteration dependencies. Finally, this paper combines the proposed reshaping and vectorization techniques to a hybrid method which supports both 2D and 3D perfect/imperfect nested loops and can be extended for the nested loops of dimensions more than three. According to the experimental results, the proposed hybrid method shows average speed-up of 51.9%, 50.1%, and 15.9% in comparison with the state-of-the-art methods for the pipelined perfect, the pipelined imperfect, and the pipelined 3D nested loops, respectively.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133284136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICRC.2018.8638591
Jackson Henry, Joseph Previti, E. Blair
Quantum-dot cellular automata (QCA)was conceptualized to provide low-power, high-speed, general-purpose computing in the post-CMOS era. Here, an elementary device, called a “cell” is a system of quantum dots and a few mobile charges. The configuration of charge on a cell encodes a binary state, and cells are networked locally using the electrostatic field. Layouts of QCA cells on a substrate provide non-von-Neumann circuits in which digital logic, interconnections, and memory are intermingled. QCA supports reversible, adiabatic computing for arbitrarily low levels of dissipation. Here, we focus on a molecular implementation of QCA and describe the promise this holds. This discussion includes an outline of an architecture for clocked molecular QCA circuits and some technical challenges remaining before molecular QCA computation may be realized. This work focuses on the challenge of using macroscopic devices to write-in bits to nanoscale QCA molecules. We use an electric field established between electrodes fabricated using standard, mature lithographic processes, and the field need not feature single-molecule specificity. An intercellular Hartree approximation is used to model the state of an $N-$ molecule circuit. Simulations of a method for providing bit inputs to clocked molecular circuits are shown.
{"title":"Electric-Field Bit Write-In for Molecular Quantum-Dot Cellular Automata Circuits","authors":"Jackson Henry, Joseph Previti, E. Blair","doi":"10.1109/ICRC.2018.8638591","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638591","url":null,"abstract":"Quantum-dot cellular automata (QCA)was conceptualized to provide low-power, high-speed, general-purpose computing in the post-CMOS era. Here, an elementary device, called a “cell” is a system of quantum dots and a few mobile charges. The configuration of charge on a cell encodes a binary state, and cells are networked locally using the electrostatic field. Layouts of QCA cells on a substrate provide non-von-Neumann circuits in which digital logic, interconnections, and memory are intermingled. QCA supports reversible, adiabatic computing for arbitrarily low levels of dissipation. Here, we focus on a molecular implementation of QCA and describe the promise this holds. This discussion includes an outline of an architecture for clocked molecular QCA circuits and some technical challenges remaining before molecular QCA computation may be realized. This work focuses on the challenge of using macroscopic devices to write-in bits to nanoscale QCA molecules. We use an electric field established between electrodes fabricated using standard, mature lithographic processes, and the field need not feature single-molecule specificity. An intercellular Hartree approximation is used to model the state of an $N-$ molecule circuit. Simulations of a method for providing bit inputs to clocked molecular circuits are shown.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115568985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICRC.2018.8638608
J. R. Gonzalez, P. Gaillardon
Three-Independent-Gate Field-Effect Transistors (TIGFETs)extend the functional diversity of a single transistor by allowing a dynamic electric reconfiguration of the polarity. This property has been shown to unlock unique circuit level opportunities. In this article, a ripple-carry 32-bit adder is uniquely designed using simulated TIGFET technology and its metrics are compared against CMOS High-Performance (HP)and CMOS Low-Voltage. By adopting TIGFET's polarity control characteristic, the proposed ripple-carry adder architecture uses efficient exclusive OR and majority gates to compute complementary carry signals in parallel, leading to a 38% decrease in logic depth as compared to the standard CMOS design. Additionally, a 38% reduction in contacted gates reduces the effects coming from an interconnect-limited design. The results show that the decrease in the logic depth and the reduction in contacted gates lead to a 3.8x lower energy-delay product and a 5.6x lower area-delay product as compared with CMOS HP. The boost in performance coming from realizing arithmetic circuits with TIGFET transistors makes them a promising next-generation high-performance device technology.
{"title":"An Efficient Adder Architecture with Three- Independent-Gate Field-Effect Transistors","authors":"J. R. Gonzalez, P. Gaillardon","doi":"10.1109/ICRC.2018.8638608","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638608","url":null,"abstract":"Three-Independent-Gate Field-Effect Transistors (TIGFETs)extend the functional diversity of a single transistor by allowing a dynamic electric reconfiguration of the polarity. This property has been shown to unlock unique circuit level opportunities. In this article, a ripple-carry 32-bit adder is uniquely designed using simulated TIGFET technology and its metrics are compared against CMOS High-Performance (HP)and CMOS Low-Voltage. By adopting TIGFET's polarity control characteristic, the proposed ripple-carry adder architecture uses efficient exclusive OR and majority gates to compute complementary carry signals in parallel, leading to a 38% decrease in logic depth as compared to the standard CMOS design. Additionally, a 38% reduction in contacted gates reduces the effects coming from an interconnect-limited design. The results show that the decrease in the logic depth and the reduction in contacted gates lead to a 3.8x lower energy-delay product and a 5.6x lower area-delay product as compared with CMOS HP. The boost in performance coming from realizing arithmetic circuits with TIGFET transistors makes them a promising next-generation high-performance device technology.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123757531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICRC.2018.8638592
Sahand Salamat, M. Imani, Saransh Gupta, T. Simunic
We live in a world where technological advances are continually creating more data than what we can deal with. Machine learning algorithms, in particular Deep Neural Networks (DNNs), are essential to process such large data. Computation of DNNs requires loading the trained network on the processing element and storing the result in memory. Therefore, running these applications need a high memory bandwidth. Traditional cores are memory limited in terms of the memory bandwidth. Hence, running DNNs on traditional cores results in high energy consumption and slows down processing speed due to a large amount of data movement between memory and processing units. Several prior works tried to address data movement issue by enabling Processing In-Memory (PIM)using crossbar analog multiplication. However, these designs suffer from the large overhead of data conversion between analog and digital domains. In this work, we propose RNSnet, which uses Residue Number System (RNS)to execute neural network completely in the digital domain in memory. RNSnet simplifies the fundamental neural network operations and maps them to in-memory addition and data access. We test the efficiency of the proposed design on several popular neural network applications. Our experimental result shows that RNSnet consumes 145.5x less energy and obtains 35.4x speedup as compared to NVIDIA GPU GTX 1080. In addition, our results show that RNSnet can achieve 8.5 x higher energy-delay product as compared to the state-of-the-art neural network accelerators.
{"title":"RNSnet: In-Memory Neural Network Acceleration Using Residue Number System","authors":"Sahand Salamat, M. Imani, Saransh Gupta, T. Simunic","doi":"10.1109/ICRC.2018.8638592","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638592","url":null,"abstract":"We live in a world where technological advances are continually creating more data than what we can deal with. Machine learning algorithms, in particular Deep Neural Networks (DNNs), are essential to process such large data. Computation of DNNs requires loading the trained network on the processing element and storing the result in memory. Therefore, running these applications need a high memory bandwidth. Traditional cores are memory limited in terms of the memory bandwidth. Hence, running DNNs on traditional cores results in high energy consumption and slows down processing speed due to a large amount of data movement between memory and processing units. Several prior works tried to address data movement issue by enabling Processing In-Memory (PIM)using crossbar analog multiplication. However, these designs suffer from the large overhead of data conversion between analog and digital domains. In this work, we propose RNSnet, which uses Residue Number System (RNS)to execute neural network completely in the digital domain in memory. RNSnet simplifies the fundamental neural network operations and maps them to in-memory addition and data access. We test the efficiency of the proposed design on several popular neural network applications. Our experimental result shows that RNSnet consumes 145.5x less energy and obtains 35.4x speedup as compared to NVIDIA GPU GTX 1080. In addition, our results show that RNSnet can achieve 8.5 x higher energy-delay product as compared to the state-of-the-art neural network accelerators.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128398144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICRC.2018.8638611
Patricia Gonzalez-Guerrero, Xinfei Guo, M. Stan
Processing data using Stochastic Computing (SC) requires only $sim$ 7% of the area and power of the typical binary approach. However, SC has two major drawbacks that eclipse any area and power savings. First, it takes $sim$ 99% more time to finish a computation when compared with the binary approach, since data is represented as streams of bits. Second, the Linear Feedback Shift Registers (LFSRs) required to generate the stochastic streams increment the power and area of the overall SC-LFSR system. These drawbacks result in similar or higher area, power, and energy numbers when compared with the binary counterpart. In this work, we address these drawbacks by applying SC directly on Pulse Density Modulated (PDM) streams. Most modern Systems on Chip (SoCs) already include Analog to Digital Converters (ADCs). The core of $SigmaDelta$ -ADCs is the $SigmaDelta$ Modulator whose output is a PDM stream. Our approach (SC-SD) simplifies the system hardware in two ways. First, we drop the filter stage at the ADC and, second, we replace the costly Stochastic Number Generators (SNGs) with $SigmaDelta$ -Modulators. To further lower the system complexity, we adopt an Asynchronous $SigmaDelta$ -Modulator $(mathrm{A}SigmaDelta mathrm{M})$ architecture. We design and simulate the $mathrm{A}SigmaDelta mathrm{M}$: using an industry-standard 1×FinFET11In modern technologies the node number does not refer to any one feature in the process, and foundries use slightly different conventions; we use 1x to denote the 14/16nm FinFET nodes offered by the foundry. technology with foundry models. We achieve power savings of 81 % in SNG compared to the LFSR approach. To evaluate how this area and power savings scale to more complex applications, we implement Gamma Correction, a popular image processing algorithm. For this application, our simulations show that SC-SD can save 98%-11% in the total system latency and 50%-38% in power consumption when compared with the SC-LFSR approach or the binary counterpart.
使用随机计算(SC)处理数据只需要$sim$ 7% of the area and power of the typical binary approach. However, SC has two major drawbacks that eclipse any area and power savings. First, it takes $sim$ 99% more time to finish a computation when compared with the binary approach, since data is represented as streams of bits. Second, the Linear Feedback Shift Registers (LFSRs) required to generate the stochastic streams increment the power and area of the overall SC-LFSR system. These drawbacks result in similar or higher area, power, and energy numbers when compared with the binary counterpart. In this work, we address these drawbacks by applying SC directly on Pulse Density Modulated (PDM) streams. Most modern Systems on Chip (SoCs) already include Analog to Digital Converters (ADCs). The core of $SigmaDelta$ -ADCs is the $SigmaDelta$ Modulator whose output is a PDM stream. Our approach (SC-SD) simplifies the system hardware in two ways. First, we drop the filter stage at the ADC and, second, we replace the costly Stochastic Number Generators (SNGs) with $SigmaDelta$ -Modulators. To further lower the system complexity, we adopt an Asynchronous $SigmaDelta$ -Modulator $(mathrm{A}SigmaDelta mathrm{M})$ architecture. We design and simulate the $mathrm{A}SigmaDelta mathrm{M}$: using an industry-standard 1×FinFET11In modern technologies the node number does not refer to any one feature in the process, and foundries use slightly different conventions; we use 1x to denote the 14/16nm FinFET nodes offered by the foundry. technology with foundry models. We achieve power savings of 81 % in SNG compared to the LFSR approach. To evaluate how this area and power savings scale to more complex applications, we implement Gamma Correction, a popular image processing algorithm. For this application, our simulations show that SC-SD can save 98%-11% in the total system latency and 50%-38% in power consumption when compared with the SC-LFSR approach or the binary counterpart.
{"title":"SC-SD: Towards Low Power Stochastic Computing Using Sigma Delta Streams","authors":"Patricia Gonzalez-Guerrero, Xinfei Guo, M. Stan","doi":"10.1109/ICRC.2018.8638611","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638611","url":null,"abstract":"Processing data using Stochastic Computing (SC) requires only $sim$ 7% of the area and power of the typical binary approach. However, SC has two major drawbacks that eclipse any area and power savings. First, it takes $sim$ 99% more time to finish a computation when compared with the binary approach, since data is represented as streams of bits. Second, the Linear Feedback Shift Registers (LFSRs) required to generate the stochastic streams increment the power and area of the overall SC-LFSR system. These drawbacks result in similar or higher area, power, and energy numbers when compared with the binary counterpart. In this work, we address these drawbacks by applying SC directly on Pulse Density Modulated (PDM) streams. Most modern Systems on Chip (SoCs) already include Analog to Digital Converters (ADCs). The core of $SigmaDelta$ -ADCs is the $SigmaDelta$ Modulator whose output is a PDM stream. Our approach (SC-SD) simplifies the system hardware in two ways. First, we drop the filter stage at the ADC and, second, we replace the costly Stochastic Number Generators (SNGs) with $SigmaDelta$ -Modulators. To further lower the system complexity, we adopt an Asynchronous $SigmaDelta$ -Modulator $(mathrm{A}SigmaDelta mathrm{M})$ architecture. We design and simulate the $mathrm{A}SigmaDelta mathrm{M}$: using an industry-standard 1×FinFET11In modern technologies the node number does not refer to any one feature in the process, and foundries use slightly different conventions; we use 1x to denote the 14/16nm FinFET nodes offered by the foundry. technology with foundry models. We achieve power savings of 81 % in SNG compared to the LFSR approach. To evaluate how this area and power savings scale to more complex applications, we implement Gamma Correction, a popular image processing algorithm. For this application, our simulations show that SC-SD can save 98%-11% in the total system latency and 50%-38% in power consumption when compared with the SC-LFSR approach or the binary counterpart.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114349983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICRC.2018.8638596
N. T. Nguyen, Garrett T. Kenyon
We use a quantum annealing D-Wave 2X computer to obtain solutions to NP-hard sparse coding problems. To reduce the dimensionality of the sparse coding problem to fit on the quantum D-Wave 2X hardware, we passed downsampled MNIST images through a bottleneck autoencoder. To establish a benchmark for classification performance on this reduced dimensional data set, we built two deep convolutional neural networks (DCNNs). The first DCNN used an AlexNet-like architecture and the second a state-of-the-art residual network (RESNET)model, both implemented in TensorFlow. The two DCNNs yielded classification scores of 94.54 ± 0.7% and 98.8 ± 0.1%, respectively. As a control, we showed that both DCNN architectures produced near-state-of-the-art classification performance $(sim99%)$ on the original MNIST images. To obtain a set of optimized features for inferring sparse representations of the reduced dimensional MNIST dataset, we imprinted on a random set of 47 image patches followed by an off-line unsupervised learning algorithm using stochastic gradient descent to optimize for sparse coding. Our single-layer of sparse coding matched the stride and patch size of the first convolutional layer of the AlexNet-like DCNN and contained 47 fully-connected features, 47 being the maximum number of dictionary elements that could be embedded onto the D-Wave 2X hardware. When the sparse representations inferred by the D-Wave 2X were passed to a linear support vector machine, we obtained a classification score of 95.68%. We found that the classification performance supported by quantum inference was maximal at an optimal level of sparsity corresponding to a critical value of the sparsity/reconstruction error trade-off parameter that previous work has associated with a second order phase transition, an observation supported by a free energy analysis of D-Wave energy states. We mimicked a transfer learning protocol by feeding the D-Wave representations into a multilayer perceptron (MLP), yielding 98.48% classification performance. The classification performance supported by a single-layer of quantum inference was superior to that supported by a classical matching pursuit algorithm set to the same level of sparsity. Whereas the classification performance of both DCNNs declined as the number of training examples was reduced, the classification performance supported by quantum inference was insensitive to the number of training examples. We thus conclude that quantum inference supports classification of reduced dimensional MNIST images exceeding that of a size-matched AlexNet-like DCNN and nearly equivalent to a state-of-the-art RESNET DCNN.
{"title":"Image Classification Using Quantum Inference on the D-Wave 2X","authors":"N. T. Nguyen, Garrett T. Kenyon","doi":"10.1109/ICRC.2018.8638596","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638596","url":null,"abstract":"We use a quantum annealing D-Wave 2X computer to obtain solutions to NP-hard sparse coding problems. To reduce the dimensionality of the sparse coding problem to fit on the quantum D-Wave 2X hardware, we passed downsampled MNIST images through a bottleneck autoencoder. To establish a benchmark for classification performance on this reduced dimensional data set, we built two deep convolutional neural networks (DCNNs). The first DCNN used an AlexNet-like architecture and the second a state-of-the-art residual network (RESNET)model, both implemented in TensorFlow. The two DCNNs yielded classification scores of 94.54 ± 0.7% and 98.8 ± 0.1%, respectively. As a control, we showed that both DCNN architectures produced near-state-of-the-art classification performance $(sim99%)$ on the original MNIST images. To obtain a set of optimized features for inferring sparse representations of the reduced dimensional MNIST dataset, we imprinted on a random set of 47 image patches followed by an off-line unsupervised learning algorithm using stochastic gradient descent to optimize for sparse coding. Our single-layer of sparse coding matched the stride and patch size of the first convolutional layer of the AlexNet-like DCNN and contained 47 fully-connected features, 47 being the maximum number of dictionary elements that could be embedded onto the D-Wave 2X hardware. When the sparse representations inferred by the D-Wave 2X were passed to a linear support vector machine, we obtained a classification score of 95.68%. We found that the classification performance supported by quantum inference was maximal at an optimal level of sparsity corresponding to a critical value of the sparsity/reconstruction error trade-off parameter that previous work has associated with a second order phase transition, an observation supported by a free energy analysis of D-Wave energy states. We mimicked a transfer learning protocol by feeding the D-Wave representations into a multilayer perceptron (MLP), yielding 98.48% classification performance. The classification performance supported by a single-layer of quantum inference was superior to that supported by a classical matching pursuit algorithm set to the same level of sparsity. Whereas the classification performance of both DCNNs declined as the number of training examples was reduced, the classification performance supported by quantum inference was insensitive to the number of training examples. We thus conclude that quantum inference supports classification of reduced dimensional MNIST images exceeding that of a size-matched AlexNet-like DCNN and nearly equivalent to a state-of-the-art RESNET DCNN.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124981324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICRC.2018.8638624
Tomás Vyskocil, H. Djidjev
Quantum annealers such as the D-Wave 2X computer are designed to natively solve Quadratic Unconstrained Binary Optimization (QUBO) problems to optimality or near optimality. Most NP-hard problems, which are hard for classical computers, can be naturally described as quadratic binary problems that contain a quadratic binary objective function and one or more constraints. Since a QUBO cannot have constraints, each such constraint has to be added to the objective function as a penalty, in order to solve on D-Wave. For a minimization problem, for instance, such penalty can be a quadratic term that gets a value zero, if the constraint is satisfied, and a large value, if it is not. In many cases, however, the penalty can significantly increase the number of quadratic terms in the resulting QUBO and make it too large to embed into the D-Wave hardware. In this paper, we develop an alternative method for formulating and embedding constraints of the type $sum_{i=1}^{s}x_{i}=1$, which is much more scalable than the existing ones, and analyze the properties of the resulting embeddings.
{"title":"Simple Constraint Embedding for Quantum Annealers","authors":"Tomás Vyskocil, H. Djidjev","doi":"10.1109/ICRC.2018.8638624","DOIUrl":"https://doi.org/10.1109/ICRC.2018.8638624","url":null,"abstract":"Quantum annealers such as the D-Wave 2X computer are designed to natively solve Quadratic Unconstrained Binary Optimization (QUBO) problems to optimality or near optimality. Most NP-hard problems, which are hard for classical computers, can be naturally described as quadratic binary problems that contain a quadratic binary objective function and one or more constraints. Since a QUBO cannot have constraints, each such constraint has to be added to the objective function as a penalty, in order to solve on D-Wave. For a minimization problem, for instance, such penalty can be a quadratic term that gets a value zero, if the constraint is satisfied, and a large value, if it is not. In many cases, however, the penalty can significantly increase the number of quadratic terms in the resulting QUBO and make it too large to embed into the D-Wave hardware. In this paper, we develop an alternative method for formulating and embedding constraints of the type $sum_{i=1}^{s}x_{i}=1$, which is much more scalable than the existing ones, and analyze the properties of the resulting embeddings.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129254504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}