Pub Date : 2019-03-01DOI: 10.1109/AICAS.2019.8771559
W. Chang, Yue-Xun Yu, Jheng-Hao Chen, Zhi-Yao Zhang, S. Ko, Tsung-Han Yang, Chia-Hao Hsu, Liang-Bi Chen, Ming-Che Chen
This paper proposes a deep learning based wearable medicines recognition system for visually impaired people. The proposed system is composed of a pair of wearable smart glasses, a wearable waist-mounted drug pills recognition device, a mobile device application, and a cloud-based management platform. The proposed system uses deep learning technology to identify drug pills to avoid taking wrong drugs. The experimental results show that the accuracy of the proposed system has reached up to 90% that can really be achieved the purpose of correct medication for visually impaired people.
{"title":"A Deep Learning Based Wearable Medicines Recognition System for Visually Impaired People","authors":"W. Chang, Yue-Xun Yu, Jheng-Hao Chen, Zhi-Yao Zhang, S. Ko, Tsung-Han Yang, Chia-Hao Hsu, Liang-Bi Chen, Ming-Che Chen","doi":"10.1109/AICAS.2019.8771559","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771559","url":null,"abstract":"This paper proposes a deep learning based wearable medicines recognition system for visually impaired people. The proposed system is composed of a pair of wearable smart glasses, a wearable waist-mounted drug pills recognition device, a mobile device application, and a cloud-based management platform. The proposed system uses deep learning technology to identify drug pills to avoid taking wrong drugs. The experimental results show that the accuracy of the proposed system has reached up to 90% that can really be achieved the purpose of correct medication for visually impaired people.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128758755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-01DOI: 10.1109/AICAS.2019.8771561
Jooho Wang, Ji-Won Kim, Sungmin Moon, Sunwoo Kim, Sungkyung Park, C. Park
A spatial data dependence graph (S-DDG) is newly proposed to model an accelerator dataflow. The pre-RTL simulator based on the S-DDG helps to explore the design space in the early design phase. The simulation results show the impact of memory latency and bandwidth on a convolutional neural network (CNN) accelerator.
{"title":"Spatial Data Dependence Graph Simulator for Convolutional Neural Network Accelerators","authors":"Jooho Wang, Ji-Won Kim, Sungmin Moon, Sunwoo Kim, Sungkyung Park, C. Park","doi":"10.1109/AICAS.2019.8771561","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771561","url":null,"abstract":"A spatial data dependence graph (S-DDG) is newly proposed to model an accelerator dataflow. The pre-RTL simulator based on the S-DDG helps to explore the design space in the early design phase. The simulation results show the impact of memory latency and bandwidth on a convolutional neural network (CNN) accelerator.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121333508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-01DOI: 10.1109/AICAS.2019.8771627
Qingsong Xie, Yongfu Li, Guoxing Wang, Y. Lian
This paper presents a robust algorithm to estimate heart rate (HR) from ballistocardiogram (BCG). The BCG signal can be easily acquired from the vibration or force sensor embedded in a chair or a mattress without any electrode attached to body. The algorithm employs the Hilbert Transform to reveal the frequency content of J-peak in BCG signal. The Viterbi decoding (VD) is used to estimate HR by finding the most likely path through time-frequency state-space plane. The performance of the proposed algorithm is evaluated by BCG recordings from 10 subjects. Mean absolute error (MAE) of 1.35 beats per minute (BPM) and standard deviation of absolute error (STD) of 1.99 BPM are obtained. Pearson correlation coefficient between estimated HR and true HR of 0.94 is also achieved.
{"title":"Heart Rate Estimation from Ballistocardiogram Using Hilbert Transform and Viterbi Decoding","authors":"Qingsong Xie, Yongfu Li, Guoxing Wang, Y. Lian","doi":"10.1109/AICAS.2019.8771627","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771627","url":null,"abstract":"This paper presents a robust algorithm to estimate heart rate (HR) from ballistocardiogram (BCG). The BCG signal can be easily acquired from the vibration or force sensor embedded in a chair or a mattress without any electrode attached to body. The algorithm employs the Hilbert Transform to reveal the frequency content of J-peak in BCG signal. The Viterbi decoding (VD) is used to estimate HR by finding the most likely path through time-frequency state-space plane. The performance of the proposed algorithm is evaluated by BCG recordings from 10 subjects. Mean absolute error (MAE) of 1.35 beats per minute (BPM) and standard deviation of absolute error (STD) of 1.99 BPM are obtained. Pearson correlation coefficient between estimated HR and true HR of 0.94 is also achieved.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114165740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-01DOI: 10.1109/AICAS.2019.8771621
Z. Zou, Yi Jin, P. Nevalainen, Y. Huan, J. Heikkonen, Tomi Westerlund
In recent years, Artificial Intelligence (AI) has been widely deployed in a variety of business sectors and industries, yielding numbers of revolutionary applications and services that are primarily driven by high-performance computation and storage facilities in the cloud. On the other hand, embedding intelligence into edge devices is highly demanded by emerging applications such as autonomous systems, human-machine interactions, and the Internet of Things (IoT). In these applications, it is advantageous to process data near or at the source of data to improve energy & spectrum efficiency and security, and decrease latency. Although the computation capability of edge devices has increased tremendously during the past decade, it is still challenging to perform sophisticated AI algorithms in these resource-constrained edge devices, which calls for not only low-power chips for energy efficient processing at the edge but also a system-level framework to distribute resources and tasks along the edge-cloud continuum. In this overview, we summarize dedicated edge hardware for machine learning from embedded applications to sub-mW “always-on” IoT nodes. Recent advances of circuits and systems incorporating joint design of architectures and algorithms will be reviewed. Fog computing paradigm that enables processing at the edge while still offering the possibility to interact with the cloud will be covered, with focus on opportunities and challenges of exploiting fog computing in AI as a bridge between the edge device and the cloud.
{"title":"Edge and Fog Computing Enabled AI for IoT-An Overview","authors":"Z. Zou, Yi Jin, P. Nevalainen, Y. Huan, J. Heikkonen, Tomi Westerlund","doi":"10.1109/AICAS.2019.8771621","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771621","url":null,"abstract":"In recent years, Artificial Intelligence (AI) has been widely deployed in a variety of business sectors and industries, yielding numbers of revolutionary applications and services that are primarily driven by high-performance computation and storage facilities in the cloud. On the other hand, embedding intelligence into edge devices is highly demanded by emerging applications such as autonomous systems, human-machine interactions, and the Internet of Things (IoT). In these applications, it is advantageous to process data near or at the source of data to improve energy & spectrum efficiency and security, and decrease latency. Although the computation capability of edge devices has increased tremendously during the past decade, it is still challenging to perform sophisticated AI algorithms in these resource-constrained edge devices, which calls for not only low-power chips for energy efficient processing at the edge but also a system-level framework to distribute resources and tasks along the edge-cloud continuum. In this overview, we summarize dedicated edge hardware for machine learning from embedded applications to sub-mW “always-on” IoT nodes. Recent advances of circuits and systems incorporating joint design of architectures and algorithms will be reviewed. Fog computing paradigm that enables processing at the edge while still offering the possibility to interact with the cloud will be covered, with focus on opportunities and challenges of exploiting fog computing in AI as a bridge between the edge device and the cloud.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134532338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-01DOI: 10.1109/AICAS.2019.8771603
Youngsu Kwon, Jeongmin Yang, Yong Cheol Peter Cho, Kyoung-Seon Shin, Jaehoon Chung, Jinho Han, C. Lyuh, Hyun-Mi Kim, Chan Kim, Minseok Choi
State-of-the-art neural network accelerators consist of arithmetic engines organized in a mesh structure datapath surrounded by memory blocks that provide neural data to the datapath. While server-based accelerators coupled with server-class processors are accommodated with large silicon area and consume large amounts of power, electronic control units in autonomous driving vehicles require power-optimized, ‘AI processors’ with a small footprint. An AI processor for mobile applications that integrates general-purpose processor cores with mesh-structured neural network accelerators and high speed memory while achieving high-performance with low-power and compact area constraints necessitates designing a novel AI processor architecture. We present the design of an AI processor for electronic systems in autonomous driving vehicles targeting not only CNN-based object recognition but also MLP-based in-vehicle voice recognition. The AI processor integrates Super-Thread-Cores (STC) for neural network acceleration with function-safe general purpose cores that satisfy vehicular electronics safety requirements. The STC is composed of 16384 programmable nano-cores organized in a mesh-grid structured datapath network. Designed based on thorough analysis of neural network computations, the nano-core-in-memory architecture enhances computation intensity of STC with efficient feeding of multi-dimensional activation and kernel data into the nano-cores. The quad function-safe general purpose cores ensure functional safety of Super-Thread-Core to comply with road vehicle safety standard ISO 26262. The AI processor exhibits 32 Tera FLOPS, enabling hyper real-time execution of CNN, RNN, and FCN.
{"title":"Function-Safe Vehicular AI Processor with Nano Core-In-Memory Architecture","authors":"Youngsu Kwon, Jeongmin Yang, Yong Cheol Peter Cho, Kyoung-Seon Shin, Jaehoon Chung, Jinho Han, C. Lyuh, Hyun-Mi Kim, Chan Kim, Minseok Choi","doi":"10.1109/AICAS.2019.8771603","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771603","url":null,"abstract":"State-of-the-art neural network accelerators consist of arithmetic engines organized in a mesh structure datapath surrounded by memory blocks that provide neural data to the datapath. While server-based accelerators coupled with server-class processors are accommodated with large silicon area and consume large amounts of power, electronic control units in autonomous driving vehicles require power-optimized, ‘AI processors’ with a small footprint. An AI processor for mobile applications that integrates general-purpose processor cores with mesh-structured neural network accelerators and high speed memory while achieving high-performance with low-power and compact area constraints necessitates designing a novel AI processor architecture. We present the design of an AI processor for electronic systems in autonomous driving vehicles targeting not only CNN-based object recognition but also MLP-based in-vehicle voice recognition. The AI processor integrates Super-Thread-Cores (STC) for neural network acceleration with function-safe general purpose cores that satisfy vehicular electronics safety requirements. The STC is composed of 16384 programmable nano-cores organized in a mesh-grid structured datapath network. Designed based on thorough analysis of neural network computations, the nano-core-in-memory architecture enhances computation intensity of STC with efficient feeding of multi-dimensional activation and kernel data into the nano-cores. The quad function-safe general purpose cores ensure functional safety of Super-Thread-Core to comply with road vehicle safety standard ISO 26262. The AI processor exhibits 32 Tera FLOPS, enabling hyper real-time execution of CNN, RNN, and FCN.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133077838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-01DOI: 10.1109/AICAS.2019.8771473
Pablo M. Tostado, B. Pedroni, G. Cauwenberghs
Over the past decade, Deep Neural Networks (DNNs) trained using Deep Learning (DL) frameworks have become the workhorse to solve a wide variety of computational tasks in big data environments. To date, DL DNNs have relied on large amounts of computational power to reach peak performance, typically relying on the high computational bandwidth of GPUs, while straining available memory bandwidth and capacity. With ever increasing data complexity and more stringent energy constraints in Internet-of-Things (IoT) application environments, there has been a growing interest in the development of more efficient DNN inference methods that economize on random-access memory usage in weight access. Herein, we present a systematic analysis of the performance trade-offs of quantized weight representations at variable bit length for memory-efficient inference in pre-trained DNN models. In this work, we vary the mantissa and exponent bit lengths in the representation of the network parameters and examine the effect of DropOut regularization during pre-training and the impact of two different weight truncation mechanisms: stochastic and deterministic rounding. We show drastic reduction in the memory need, down to 4 bits per weight, while maintaining near-optimal test performance of low-complexity DNNs pre-trained on the MNIST and CIFAR-10 datasets. These results offer a simple methodology to achieve high memory and computation efficiency of inference in DNN dedicated low-power hardware for IoT, directly from pre-trained, high-resolution DNNs using standard DL algorithms.
{"title":"Performance Trade-offs in Weight Quantization for Memory-Efficient Inference","authors":"Pablo M. Tostado, B. Pedroni, G. Cauwenberghs","doi":"10.1109/AICAS.2019.8771473","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771473","url":null,"abstract":"Over the past decade, Deep Neural Networks (DNNs) trained using Deep Learning (DL) frameworks have become the workhorse to solve a wide variety of computational tasks in big data environments. To date, DL DNNs have relied on large amounts of computational power to reach peak performance, typically relying on the high computational bandwidth of GPUs, while straining available memory bandwidth and capacity. With ever increasing data complexity and more stringent energy constraints in Internet-of-Things (IoT) application environments, there has been a growing interest in the development of more efficient DNN inference methods that economize on random-access memory usage in weight access. Herein, we present a systematic analysis of the performance trade-offs of quantized weight representations at variable bit length for memory-efficient inference in pre-trained DNN models. In this work, we vary the mantissa and exponent bit lengths in the representation of the network parameters and examine the effect of DropOut regularization during pre-training and the impact of two different weight truncation mechanisms: stochastic and deterministic rounding. We show drastic reduction in the memory need, down to 4 bits per weight, while maintaining near-optimal test performance of low-complexity DNNs pre-trained on the MNIST and CIFAR-10 datasets. These results offer a simple methodology to achieve high memory and computation efficiency of inference in DNN dedicated low-power hardware for IoT, directly from pre-trained, high-resolution DNNs using standard DL algorithms.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115073521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-01DOI: 10.1109/AICAS.2019.8771620
Yang Zhao, Simon Lin, Zhongxia Shang, Y. Lian
Conventional Artificial Neural Networks (ANNs) for classification of cardiac arrhythmias are based on Nyquist sampled electrocardiogram (ECG) signals. The uniform sampling scheme introduces large redundancy in the ANN, which results high power and large silicon area. To address these issues, we propose to use continuous-in-time discrete-in-amplitude (CTDA) sampling scheme as the input of the network. The CTDA sampling scheme significantly reduces the sample points on the baseline part while provides more detail on useful features in the ECG signal. It is shown that the CTDA sampling scheme achieves significant savings on arithmetic operations in the ANN while maintains the similar performance as Nyquist sampling in the classification. The proposed method is evaluated by MIT-BIH arrhythmia database following AAMI recommended practice.
{"title":"Classification of Cardiac Arrhythmias Based on Artificial Neural Networks and Continuous-in-Time Discrete-in-Amplitude Signal Flow","authors":"Yang Zhao, Simon Lin, Zhongxia Shang, Y. Lian","doi":"10.1109/AICAS.2019.8771620","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771620","url":null,"abstract":"Conventional Artificial Neural Networks (ANNs) for classification of cardiac arrhythmias are based on Nyquist sampled electrocardiogram (ECG) signals. The uniform sampling scheme introduces large redundancy in the ANN, which results high power and large silicon area. To address these issues, we propose to use continuous-in-time discrete-in-amplitude (CTDA) sampling scheme as the input of the network. The CTDA sampling scheme significantly reduces the sample points on the baseline part while provides more detail on useful features in the ECG signal. It is shown that the CTDA sampling scheme achieves significant savings on arithmetic operations in the ANN while maintains the similar performance as Nyquist sampling in the classification. The proposed method is evaluated by MIT-BIH arrhythmia database following AAMI recommended practice.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122891669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper compares the system performance of distinct flows with automatic image cropping to without automatic image cropping for age-related macular degeneration (AMD) detection on optical coherence tomography (OCT) images. Using the image cropping, the computational time of noise removal and feature extraction can be significantly reduced by a small loss of detection accuracy. The simulation results show that using the image cropping at the first stage achieves 93.4% accuracy. Compared to the flow without image cropping, using the image cropping loses only 0.5% accuracy but saves about 12 hours computational time and about a half of memory storages.
{"title":"Using A Cropping Technique or Not: Impacts on SVM-based AMD Detection on OCT Images","authors":"C. Ko, Po-Han Chen, Wei-Ming Liao, Cheng-Kai Lu, Cheng-Hung Lin, Jing-Wen Liang","doi":"10.1109/AICAS.2019.8771609","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771609","url":null,"abstract":"This paper compares the system performance of distinct flows with automatic image cropping to without automatic image cropping for age-related macular degeneration (AMD) detection on optical coherence tomography (OCT) images. Using the image cropping, the computational time of noise removal and feature extraction can be significantly reduced by a small loss of detection accuracy. The simulation results show that using the image cropping at the first stage achieves 93.4% accuracy. Compared to the flow without image cropping, using the image cropping loses only 0.5% accuracy but saves about 12 hours computational time and about a half of memory storages.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"59 21","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120972119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-01DOI: 10.1109/AICAS.2019.8771498
Xuyang Guo, Yuanjun Huang, Hsin-Pai Cheng, Bing Li, W. Wen, Siyuan Ma, H. Li, Yiran Chen
Neural networks have shown great performance in cognitive tasks. When deploying network models on mobile devices with limited computation and storage resources, the weight quantization technique has been widely adopted. In practice, 8-bit or 16-bit quantization is mostly likely to be selected in order to maintain the accuracy at the same level as the models in 32-bit floating-point precision. Binary quantization, on the contrary, aims to obtain the highest compression at the cost of much bigger accuracy drop. Applying different precision in different layers/structures can potentially produce the most efficient model. Seeking for the best precision configuration, however, is difficult. In this work, we proposed an automatic search algorithm to address the challenge. By relaxing the search space of quantization bitwidth from discrete to continuous domain, our algorithm can generate a mixed-precision quantization scheme which achieves the compression rate close to the one from the binary-weighted model while maintaining the testing accuracy similar to the original full-precision model.
{"title":"Exploration of Automatic Mixed-Precision Search for Deep Neural Networks","authors":"Xuyang Guo, Yuanjun Huang, Hsin-Pai Cheng, Bing Li, W. Wen, Siyuan Ma, H. Li, Yiran Chen","doi":"10.1109/AICAS.2019.8771498","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771498","url":null,"abstract":"Neural networks have shown great performance in cognitive tasks. When deploying network models on mobile devices with limited computation and storage resources, the weight quantization technique has been widely adopted. In practice, 8-bit or 16-bit quantization is mostly likely to be selected in order to maintain the accuracy at the same level as the models in 32-bit floating-point precision. Binary quantization, on the contrary, aims to obtain the highest compression at the cost of much bigger accuracy drop. Applying different precision in different layers/structures can potentially produce the most efficient model. Seeking for the best precision configuration, however, is difficult. In this work, we proposed an automatic search algorithm to address the challenge. By relaxing the search space of quantization bitwidth from discrete to continuous domain, our algorithm can generate a mixed-precision quantization scheme which achieves the compression rate close to the one from the binary-weighted model while maintaining the testing accuracy similar to the original full-precision model.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128320736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-01DOI: 10.1109/AICAS.2019.8771565
O. Krestinskaya, Otaniyoz Otaniyozov, A. P. James
This paper proposes the analog hardware implementation of Binarized Neural Network (BNN). Most of the existing hardware implementations of neural networks do not consider the memristor variability issue and its effect on the overall system performance. In this work, we investigate the variability in memristive devices in crossbar dot product computation and leakage currents in the proposed BNN, and show how it effects the overall system performance.
{"title":"Binarized Neural Network with Stochastic Memristors","authors":"O. Krestinskaya, Otaniyoz Otaniyozov, A. P. James","doi":"10.1109/AICAS.2019.8771565","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771565","url":null,"abstract":"This paper proposes the analog hardware implementation of Binarized Neural Network (BNN). Most of the existing hardware implementations of neural networks do not consider the memristor variability issue and its effect on the overall system performance. In this work, we investigate the variability in memristive devices in crossbar dot product computation and leakage currents in the proposed BNN, and show how it effects the overall system performance.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"532 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132314997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}