Pub Date : 2019-03-01DOI: 10.1109/AICAS.2019.8771480
Sunwoo Kim, Chung-Mok Lee, Haesung Park, Jooho Wang, Sungkyung Park, C. Park
Sparse CNN (SCNN) accelerators tend to suffer from the bus contention of its scatter network. This paper considers the optimizations of the scatter network. Several network topologies and arbitration algorithms are evaluated in terms of performance and area.
{"title":"Optimizations of Scatter Network for Sparse CNN Accelerators","authors":"Sunwoo Kim, Chung-Mok Lee, Haesung Park, Jooho Wang, Sungkyung Park, C. Park","doi":"10.1109/AICAS.2019.8771480","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771480","url":null,"abstract":"Sparse CNN (SCNN) accelerators tend to suffer from the bus contention of its scatter network. This paper considers the optimizations of the scatter network. Several network topologies and arbitration algorithms are evaluated in terms of performance and area.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117186016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-01DOI: 10.1109/AICAS.2019.8771607
T. Aa, Imen Chakroun, Thomas J. Ashby
Bayesian Matrix Factorization (BMF) is a powerful technique for recommender systems because it produces good results and is relatively robust against overfitting. Yet BMF is more computationally intensive and thus more challenging to implement for large datasets. In this work we present SMURFF a high-performance feature-rich framework to compose and construct different Bayesian matrix-factorization methods. The framework has been successfully used in to do large scale runs of compound-activity prediction. SMURFF is available as open-source and can be used both on a supercomputer and on a desktop or laptop machine. Documentation and several examples are provided as Jupyter notebooks using SMURFF’s high-level Python API.
{"title":"SMURFF: a High-Performance Framework for Matrix Factorization","authors":"T. Aa, Imen Chakroun, Thomas J. Ashby","doi":"10.1109/AICAS.2019.8771607","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771607","url":null,"abstract":"Bayesian Matrix Factorization (BMF) is a powerful technique for recommender systems because it produces good results and is relatively robust against overfitting. Yet BMF is more computationally intensive and thus more challenging to implement for large datasets. In this work we present SMURFF a high-performance feature-rich framework to compose and construct different Bayesian matrix-factorization methods. The framework has been successfully used in to do large scale runs of compound-activity prediction. SMURFF is available as open-source and can be used both on a supercomputer and on a desktop or laptop machine. Documentation and several examples are provided as Jupyter notebooks using SMURFF’s high-level Python API.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124858868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-01DOI: 10.1109/AICAS.2019.8771576
Injune Yeo, Sang-gyun Gi, Jung-gyun Kim, Byung-geun Lee
A CMOS-based resistive computing element (RCE), which can be integrated in a crossbar array, is presented. The RCE successfully solves the hardware constraints of the existing memristive devices such as dynamic ranges of conductance, I-V nonlinearity, and on/off ratio without increasing hardware complexity compared to other CMOS implementations. The RCE has been designed using a 65nm standard CMOS process and SPICE simulations have been performed to evaluate feasibility and functionality of the RCE. In addition, a pulsed neural network employing an RCE crossbar array has also been designed and simulated to verify the operation of the RCE.
{"title":"A CMOS-based Resistive Crossbar Array with Pulsed Neural Network for Deep Learning Accelerator","authors":"Injune Yeo, Sang-gyun Gi, Jung-gyun Kim, Byung-geun Lee","doi":"10.1109/AICAS.2019.8771576","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771576","url":null,"abstract":"A CMOS-based resistive computing element (RCE), which can be integrated in a crossbar array, is presented. The RCE successfully solves the hardware constraints of the existing memristive devices such as dynamic ranges of conductance, I-V nonlinearity, and on/off ratio without increasing hardware complexity compared to other CMOS implementations. The RCE has been designed using a 65nm standard CMOS process and SPICE simulations have been performed to evaluate feasibility and functionality of the RCE. In addition, a pulsed neural network employing an RCE crossbar array has also been designed and simulated to verify the operation of the RCE.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116172433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-01DOI: 10.1109/AICAS.2019.8771533
Yasufumi Sakai, B. Pedroni, Siddharth Joshi, Abraham Akinin, G. Cauwenberghs
DropOut and DropConnect are known as effective methods to improve on the generalization performance of neural networks, by either dropping states of neural units or dropping weights of synaptic connections randomly selected at each time instance throughout the training process. In this paper, we extend on the use of these methods in the design of neuromorphic spiking neural networks (SNN) hardware to improve further on the reliability of inference as impacted by resource constrained errors in network connectivity. Such energy and bandwidth constraints arise for low-power operation in the communication between neural units, which cause dropped spike events due to timeout errors in the transmission. The DropOut and DropConnect processes during training of the network are aligned with a statistical model of the network during inference that accounts for these random errors in the transmission of neural states and synaptic connections. The use of DropOut and DropConnect during training hence allows to simultaneously meet two design objectives: maximizing bandwidth, while minimizing energy of inference in neuromorphic hardware. Simulations of the model with a 5-layer fully connected 784-500-500-500-10 SNN on the MNIST task show a 5-fold and 10-fold improvement in bandwidth during inference at greater than 98% accuracy, using DropOut and DropConnect respectively during backpropagation training.
{"title":"DropOut and DropConnect for Reliable Neuromorphic Inference under Energy and Bandwidth Constraints in Network Connectivity","authors":"Yasufumi Sakai, B. Pedroni, Siddharth Joshi, Abraham Akinin, G. Cauwenberghs","doi":"10.1109/AICAS.2019.8771533","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771533","url":null,"abstract":"DropOut and DropConnect are known as effective methods to improve on the generalization performance of neural networks, by either dropping states of neural units or dropping weights of synaptic connections randomly selected at each time instance throughout the training process. In this paper, we extend on the use of these methods in the design of neuromorphic spiking neural networks (SNN) hardware to improve further on the reliability of inference as impacted by resource constrained errors in network connectivity. Such energy and bandwidth constraints arise for low-power operation in the communication between neural units, which cause dropped spike events due to timeout errors in the transmission. The DropOut and DropConnect processes during training of the network are aligned with a statistical model of the network during inference that accounts for these random errors in the transmission of neural states and synaptic connections. The use of DropOut and DropConnect during training hence allows to simultaneously meet two design objectives: maximizing bandwidth, while minimizing energy of inference in neuromorphic hardware. Simulations of the model with a 5-layer fully connected 784-500-500-500-10 SNN on the MNIST task show a 5-fold and 10-fold improvement in bandwidth during inference at greater than 98% accuracy, using DropOut and DropConnect respectively during backpropagation training.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116883116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-01DOI: 10.1109/AICAS.2019.8771623
Chia-Ching Wang, Hsin-Hua Liu, S. Pei, Kuan-Hsien Liu, Tsung-Jung Liu
In this work, we focus on building style transfer, which transforms ruin buildings to modern architecture. Inspired by Gaty’s and Goodfellow’s style transfer and generative adversarial network (GAN), we use CycleGAN to conquer this type of problem. To avoid the artifacts and generate better images, we add “perception loss” into the network, which is the feature loss extracted by VGG pre-trained model. We also adjust cycle loss by changing the ratio of weighting parameters. Finally, we collect images of both ruin and modern architecture from websites and use unsupervised learning to train the model. The experimental results show our proposed method indeed realize the modern architecture style transfer for ruin buildings.
{"title":"Modern Architecture Style Transfer for Ruin Buildings","authors":"Chia-Ching Wang, Hsin-Hua Liu, S. Pei, Kuan-Hsien Liu, Tsung-Jung Liu","doi":"10.1109/AICAS.2019.8771623","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771623","url":null,"abstract":"In this work, we focus on building style transfer, which transforms ruin buildings to modern architecture. Inspired by Gaty’s and Goodfellow’s style transfer and generative adversarial network (GAN), we use CycleGAN to conquer this type of problem. To avoid the artifacts and generate better images, we add “perception loss” into the network, which is the feature loss extracted by VGG pre-trained model. We also adjust cycle loss by changing the ratio of weighting parameters. Finally, we collect images of both ruin and modern architecture from websites and use unsupervised learning to train the model. The experimental results show our proposed method indeed realize the modern architecture style transfer for ruin buildings.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129673122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-01DOI: 10.1109/AICAS.2019.8771605
Tong-Yu Hsieh, Yuan-Cheng Lin, Hsin-Yung Shen
Machine learning is expected to play an important role in implementing automotive systems such as the Advanced Driver Assistance Systems (ADAS). To make machine learning methods work well, providing a sufficient number of training data is very important. However, collecting the training data may be difficult or very timing-consuming. In this paper we investigate automatic generation of training data for automotive applications. The Generative Adversarial Network (GAN) techniques are employed to generate fake yet still high-quality data for machine learning. Although using GAN to generate training images has been proposed in the literature, the previous work does not consider automotive applications. In this work a case study on vehicle detection is provided to demonstrate powerfulness of GAN and the effectiveness of the generated training images by GAN. The generated fake bus images are employed as training data and a SVM (Support Vector Machine) method is implemented to detect buses. The results show that the SVM trained by the fake images achieves almost the same detection accuracy as that by real images. The result also shows that GAN can generate the training images very fast. The extension of GAN to generate road images with various weather conditions such as fogs or nights is also discussed.
{"title":"On Automatic Generation of Training Images for Machine Learning in Automotive Applications","authors":"Tong-Yu Hsieh, Yuan-Cheng Lin, Hsin-Yung Shen","doi":"10.1109/AICAS.2019.8771605","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771605","url":null,"abstract":"Machine learning is expected to play an important role in implementing automotive systems such as the Advanced Driver Assistance Systems (ADAS). To make machine learning methods work well, providing a sufficient number of training data is very important. However, collecting the training data may be difficult or very timing-consuming. In this paper we investigate automatic generation of training data for automotive applications. The Generative Adversarial Network (GAN) techniques are employed to generate fake yet still high-quality data for machine learning. Although using GAN to generate training images has been proposed in the literature, the previous work does not consider automotive applications. In this work a case study on vehicle detection is provided to demonstrate powerfulness of GAN and the effectiveness of the generated training images by GAN. The generated fake bus images are employed as training data and a SVM (Support Vector Machine) method is implemented to detect buses. The results show that the SVM trained by the fake images achieves almost the same detection accuracy as that by real images. The result also shows that GAN can generate the training images very fast. The extension of GAN to generate road images with various weather conditions such as fogs or nights is also discussed.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121538888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-01DOI: 10.1109/AICAS.2019.8771492
Jongmin Park, Seungsik Moon, Younghoon Byun, Sunggu Lee, Youngjoo Lee
Targeting the resource-limited intelligent mobile systems, in this paper, we present a multi-level weight indexing method that relaxes the memory requirements for realizing the convolutional neural networks (CNNs). In contrast that the previous works are only focusing on the positions of unpruned weights, the proposed work considers the consecutive pruned positions to generate the group-level validations. Denoting the survived indices only for the valid groups, the proposed multi-level indexing scheme reduces the amount of indexing data. In addition, we introduce the indexing-aware multi-level pruning and indexing methods with variable group sizes, which can further optimize the memory overheads. For the same pruning factor, as a result, the memory size for storing the indexing information is remarkably reduced by up to 81%, leading to the practical CNN architecture for intelligent mobile devices.
{"title":"Multi-level Weight Indexing Scheme for Memory-Reduced Convolutional Neural Network","authors":"Jongmin Park, Seungsik Moon, Younghoon Byun, Sunggu Lee, Youngjoo Lee","doi":"10.1109/AICAS.2019.8771492","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771492","url":null,"abstract":"Targeting the resource-limited intelligent mobile systems, in this paper, we present a multi-level weight indexing method that relaxes the memory requirements for realizing the convolutional neural networks (CNNs). In contrast that the previous works are only focusing on the positions of unpruned weights, the proposed work considers the consecutive pruned positions to generate the group-level validations. Denoting the survived indices only for the valid groups, the proposed multi-level indexing scheme reduces the amount of indexing data. In addition, we introduce the indexing-aware multi-level pruning and indexing methods with variable group sizes, which can further optimize the memory overheads. For the same pruning factor, as a result, the memory size for storing the indexing information is remarkably reduced by up to 81%, leading to the practical CNN architecture for intelligent mobile devices.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127888030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-01DOI: 10.1109/AICAS.2019.8771589
I-Lok Cheng, Ching-Hwa Cheng, Don-Gey Liu
Most of today's logistic systems require people to control them. If there are no enough man-power, e.g. drivers, or the destination is unfamiliar by the driver, delivery could be delayed or the goods may send to the wrong location. This paper demonstrated a prototype of a learnable smart system for precise positioning of unmanned transport machines. The proposed system consists of robotic arms, land vehicles, and unmanned aerial vehicles, which can easily deliver light-cargo to a designated place. The proposed design can automatically deliver goods to designated locations while avoiding environmental influences. Interactive use of unmanned vehicles and unmanned aerial vehicles for transport makes it possible to transport goods to a precise destination. This learnable prototype system can be demonstrated to evaluate the feasibility and performance for a learnable unmanned intelligent transportation system.
{"title":"A Learnable Unmanned Smart Logistics Prototype System Design and Implementation","authors":"I-Lok Cheng, Ching-Hwa Cheng, Don-Gey Liu","doi":"10.1109/AICAS.2019.8771589","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771589","url":null,"abstract":"Most of today's logistic systems require people to control them. If there are no enough man-power, e.g. drivers, or the destination is unfamiliar by the driver, delivery could be delayed or the goods may send to the wrong location. This paper demonstrated a prototype of a learnable smart system for precise positioning of unmanned transport machines. The proposed system consists of robotic arms, land vehicles, and unmanned aerial vehicles, which can easily deliver light-cargo to a designated place. The proposed design can automatically deliver goods to designated locations while avoiding environmental influences. Interactive use of unmanned vehicles and unmanned aerial vehicles for transport makes it possible to transport goods to a precise destination. This learnable prototype system can be demonstrated to evaluate the feasibility and performance for a learnable unmanned intelligent transportation system.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115740576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-01DOI: 10.1109/AICAS.2019.8771629
Yi-Hsiang Chen, Shao-Yi Chien
To accelerate Convolutional Neural Networks (CNN) operations on resource-limited mobile graphics processing units (GPUs), taking advantage of the common characteristics between texture filtering and convolutional layer, we propose a configurable texture unit called tensor and texture unit (TTU) to offload the computation from shader cores. With adding a new datapath for loading weight parameters in the texture unit, reusing the original texture cache, increasing the flexibility of the filtering unit, and packing the input data and weight parameters to fixed-point format, we make the texture unit be able to support convolutional and pooling layers with only small modifications. The proposed architecture is verified by integrating TTU into a GPU system in RTL level. Experimental results show that 18.54x speedup can be achieved with the overhead of only 8.5% compared with a GPU system with a traditional texture unit.
{"title":"Configurable Texture Unit for Convolutional Neural Networks on Graphics Processing Units","authors":"Yi-Hsiang Chen, Shao-Yi Chien","doi":"10.1109/AICAS.2019.8771629","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771629","url":null,"abstract":"To accelerate Convolutional Neural Networks (CNN) operations on resource-limited mobile graphics processing units (GPUs), taking advantage of the common characteristics between texture filtering and convolutional layer, we propose a configurable texture unit called tensor and texture unit (TTU) to offload the computation from shader cores. With adding a new datapath for loading weight parameters in the texture unit, reusing the original texture cache, increasing the flexibility of the filtering unit, and packing the input data and weight parameters to fixed-point format, we make the texture unit be able to support convolutional and pooling layers with only small modifications. The proposed architecture is verified by integrating TTU into a GPU system in RTL level. Experimental results show that 18.54x speedup can be achieved with the overhead of only 8.5% compared with a GPU system with a traditional texture unit.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"351 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115779087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As artificial intelligence (AI) algorithms requiring high accuracy become exceedingly more complex and Edge/IoT generated data becomes increasingly bigger, flexible reconfigurable processing is crucial in the design of efficient smart edge systems requiring low power and is introduced in this paper. In AI, analytics algorithms are typically used to analyze speech, audio, image video data, etc. In current cross-level system design methodology different algorithmic realizations are analyzed in the form of dataflow graphs (DFG) to further increase efficiency and flexibility in constituting “analytics architecture”. Having information on both algorithmic behavior and architectural information including software and hardware, the DFG so introduced provides a mathematical representation which, as opposed to traditional linear difference equations, better models the underlying computational platform for systematic analysis thus providing flexible and efficient management of the computational and storage resources. In our analytics architecture work, parallel and reconfigurable computing are formulated via DFG which are analogous to the analysis and synthesis equations of the well-known Fourier transform pair. In parallel computing, a connected component is eigen-decomposed to unconnected components for concurrent processing. For computation resource saving, commonalities in DFGs are analyzed for reuse when synthesizing or reconfiguring the edge platform. In this paper, we specifically introduce lightweight edge upon which algorithmic convolution for Convolution Neural Network are eigen-transformed to matrix operations with higher symmetry which facilitates fewer operations, lower data transfer rate and storage anticipating lower power when synthesizing or reconfiguring the eigenvectors.
{"title":"Reconfigurable Edge via Analytics Architecture","authors":"Shih-Yu Chen, G. Lee, Tai-Ping Wang, Chin-Wei Huang, Jia-Hong Chen, Chang-Ling Tsai","doi":"10.1109/AICAS.2019.8771528","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771528","url":null,"abstract":"As artificial intelligence (AI) algorithms requiring high accuracy become exceedingly more complex and Edge/IoT generated data becomes increasingly bigger, flexible reconfigurable processing is crucial in the design of efficient smart edge systems requiring low power and is introduced in this paper. In AI, analytics algorithms are typically used to analyze speech, audio, image video data, etc. In current cross-level system design methodology different algorithmic realizations are analyzed in the form of dataflow graphs (DFG) to further increase efficiency and flexibility in constituting “analytics architecture”. Having information on both algorithmic behavior and architectural information including software and hardware, the DFG so introduced provides a mathematical representation which, as opposed to traditional linear difference equations, better models the underlying computational platform for systematic analysis thus providing flexible and efficient management of the computational and storage resources. In our analytics architecture work, parallel and reconfigurable computing are formulated via DFG which are analogous to the analysis and synthesis equations of the well-known Fourier transform pair. In parallel computing, a connected component is eigen-decomposed to unconnected components for concurrent processing. For computation resource saving, commonalities in DFGs are analyzed for reuse when synthesizing or reconfiguring the edge platform. In this paper, we specifically introduce lightweight edge upon which algorithmic convolution for Convolution Neural Network are eigen-transformed to matrix operations with higher symmetry which facilitates fewer operations, lower data transfer rate and storage anticipating lower power when synthesizing or reconfiguring the eigenvectors.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123468957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}