首页 > 最新文献

2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)最新文献

英文 中文
Optimizations of Scatter Network for Sparse CNN Accelerators 稀疏CNN加速器的散射网络优化
Sunwoo Kim, Chung-Mok Lee, Haesung Park, Jooho Wang, Sungkyung Park, C. Park
Sparse CNN (SCNN) accelerators tend to suffer from the bus contention of its scatter network. This paper considers the optimizations of the scatter network. Several network topologies and arbitration algorithms are evaluated in terms of performance and area.
稀疏CNN (SCNN)加速器容易受到其分散网络总线争用的困扰。本文考虑了散射网络的优化问题。几种网络拓扑和仲裁算法在性能和面积方面进行了评估。
{"title":"Optimizations of Scatter Network for Sparse CNN Accelerators","authors":"Sunwoo Kim, Chung-Mok Lee, Haesung Park, Jooho Wang, Sungkyung Park, C. Park","doi":"10.1109/AICAS.2019.8771480","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771480","url":null,"abstract":"Sparse CNN (SCNN) accelerators tend to suffer from the bus contention of its scatter network. This paper considers the optimizations of the scatter network. Several network topologies and arbitration algorithms are evaluated in terms of performance and area.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117186016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
SMURFF: a High-Performance Framework for Matrix Factorization SMURFF:一个高性能的矩阵分解框架
T. Aa, Imen Chakroun, Thomas J. Ashby
Bayesian Matrix Factorization (BMF) is a powerful technique for recommender systems because it produces good results and is relatively robust against overfitting. Yet BMF is more computationally intensive and thus more challenging to implement for large datasets. In this work we present SMURFF a high-performance feature-rich framework to compose and construct different Bayesian matrix-factorization methods. The framework has been successfully used in to do large scale runs of compound-activity prediction. SMURFF is available as open-source and can be used both on a supercomputer and on a desktop or laptop machine. Documentation and several examples are provided as Jupyter notebooks using SMURFF’s high-level Python API.
对于推荐系统来说,贝叶斯矩阵分解(BMF)是一种强大的技术,因为它可以产生良好的结果,并且对过拟合具有相对的鲁棒性。然而,BMF的计算量更大,因此对大型数据集的实现更具挑战性。在这项工作中,我们提出了一个高性能的特征丰富的框架SMURFF来组合和构造不同的贝叶斯矩阵分解方法。该框架已成功用于化合物活性预测的大规模运行。SMURFF是开源的,既可以在超级计算机上使用,也可以在台式机或笔记本电脑上使用。文档和几个示例作为Jupyter笔记本提供,使用SMURFF的高级Python API。
{"title":"SMURFF: a High-Performance Framework for Matrix Factorization","authors":"T. Aa, Imen Chakroun, Thomas J. Ashby","doi":"10.1109/AICAS.2019.8771607","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771607","url":null,"abstract":"Bayesian Matrix Factorization (BMF) is a powerful technique for recommender systems because it produces good results and is relatively robust against overfitting. Yet BMF is more computationally intensive and thus more challenging to implement for large datasets. In this work we present SMURFF a high-performance feature-rich framework to compose and construct different Bayesian matrix-factorization methods. The framework has been successfully used in to do large scale runs of compound-activity prediction. SMURFF is available as open-source and can be used both on a supercomputer and on a desktop or laptop machine. Documentation and several examples are provided as Jupyter notebooks using SMURFF’s high-level Python API.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124858868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A CMOS-based Resistive Crossbar Array with Pulsed Neural Network for Deep Learning Accelerator
Injune Yeo, Sang-gyun Gi, Jung-gyun Kim, Byung-geun Lee
A CMOS-based resistive computing element (RCE), which can be integrated in a crossbar array, is presented. The RCE successfully solves the hardware constraints of the existing memristive devices such as dynamic ranges of conductance, I-V nonlinearity, and on/off ratio without increasing hardware complexity compared to other CMOS implementations. The RCE has been designed using a 65nm standard CMOS process and SPICE simulations have been performed to evaluate feasibility and functionality of the RCE. In addition, a pulsed neural network employing an RCE crossbar array has also been designed and simulated to verify the operation of the RCE.
提出了一种基于cmos的电阻计算元件(RCE),该元件可集成在交叉棒阵列中。RCE成功地解决了现有忆阻器件的硬件限制,如电导的动态范围、I-V非线性和开/关比,而与其他CMOS实现相比,没有增加硬件复杂性。RCE采用65nm标准CMOS工艺设计,并进行了SPICE模拟以评估RCE的可行性和功能。此外,还设计了一个脉冲神经网络,利用RCE交叉棒阵列进行仿真,验证了RCE的工作原理。
{"title":"A CMOS-based Resistive Crossbar Array with Pulsed Neural Network for Deep Learning Accelerator","authors":"Injune Yeo, Sang-gyun Gi, Jung-gyun Kim, Byung-geun Lee","doi":"10.1109/AICAS.2019.8771576","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771576","url":null,"abstract":"A CMOS-based resistive computing element (RCE), which can be integrated in a crossbar array, is presented. The RCE successfully solves the hardware constraints of the existing memristive devices such as dynamic ranges of conductance, I-V nonlinearity, and on/off ratio without increasing hardware complexity compared to other CMOS implementations. The RCE has been designed using a 65nm standard CMOS process and SPICE simulations have been performed to evaluate feasibility and functionality of the RCE. In addition, a pulsed neural network employing an RCE crossbar array has also been designed and simulated to verify the operation of the RCE.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116172433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DropOut and DropConnect for Reliable Neuromorphic Inference under Energy and Bandwidth Constraints in Network Connectivity 网络连接中能量和带宽约束下可靠神经形态推理的DropOut和DropConnect
Yasufumi Sakai, B. Pedroni, Siddharth Joshi, Abraham Akinin, G. Cauwenberghs
DropOut and DropConnect are known as effective methods to improve on the generalization performance of neural networks, by either dropping states of neural units or dropping weights of synaptic connections randomly selected at each time instance throughout the training process. In this paper, we extend on the use of these methods in the design of neuromorphic spiking neural networks (SNN) hardware to improve further on the reliability of inference as impacted by resource constrained errors in network connectivity. Such energy and bandwidth constraints arise for low-power operation in the communication between neural units, which cause dropped spike events due to timeout errors in the transmission. The DropOut and DropConnect processes during training of the network are aligned with a statistical model of the network during inference that accounts for these random errors in the transmission of neural states and synaptic connections. The use of DropOut and DropConnect during training hence allows to simultaneously meet two design objectives: maximizing bandwidth, while minimizing energy of inference in neuromorphic hardware. Simulations of the model with a 5-layer fully connected 784-500-500-500-10 SNN on the MNIST task show a 5-fold and 10-fold improvement in bandwidth during inference at greater than 98% accuracy, using DropOut and DropConnect respectively during backpropagation training.
DropOut和DropConnect是提高神经网络泛化性能的有效方法,它们要么放弃神经单元的状态,要么放弃在整个训练过程中随机选择的每个时间实例的突触连接的权值。在本文中,我们扩展了这些方法在神经形态尖峰神经网络(SNN)硬件设计中的使用,以进一步提高网络连接中受资源约束错误影响的推理可靠性。这种能量和带宽限制出现在神经单元之间通信的低功耗操作中,由于传输中的超时错误导致尖峰事件下降。网络训练期间的DropOut和DropConnect过程与推理期间的网络统计模型保持一致,该模型解释了神经状态和突触连接传递中的这些随机误差。因此,在训练期间使用DropOut和DropConnect可以同时满足两个设计目标:最大化带宽,同时最小化神经形态硬件中的推理能量。在MNIST任务上对具有5层全连接784-500-500-500-10 SNN的模型进行仿真显示,在反向传播训练期间分别使用DropOut和DropConnect,在推理期间带宽提高了5倍和10倍,准确率超过98%。
{"title":"DropOut and DropConnect for Reliable Neuromorphic Inference under Energy and Bandwidth Constraints in Network Connectivity","authors":"Yasufumi Sakai, B. Pedroni, Siddharth Joshi, Abraham Akinin, G. Cauwenberghs","doi":"10.1109/AICAS.2019.8771533","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771533","url":null,"abstract":"DropOut and DropConnect are known as effective methods to improve on the generalization performance of neural networks, by either dropping states of neural units or dropping weights of synaptic connections randomly selected at each time instance throughout the training process. In this paper, we extend on the use of these methods in the design of neuromorphic spiking neural networks (SNN) hardware to improve further on the reliability of inference as impacted by resource constrained errors in network connectivity. Such energy and bandwidth constraints arise for low-power operation in the communication between neural units, which cause dropped spike events due to timeout errors in the transmission. The DropOut and DropConnect processes during training of the network are aligned with a statistical model of the network during inference that accounts for these random errors in the transmission of neural states and synaptic connections. The use of DropOut and DropConnect during training hence allows to simultaneously meet two design objectives: maximizing bandwidth, while minimizing energy of inference in neuromorphic hardware. Simulations of the model with a 5-layer fully connected 784-500-500-500-10 SNN on the MNIST task show a 5-fold and 10-fold improvement in bandwidth during inference at greater than 98% accuracy, using DropOut and DropConnect respectively during backpropagation training.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116883116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Modern Architecture Style Transfer for Ruin Buildings 废墟建筑的现代建筑风格转换
Chia-Ching Wang, Hsin-Hua Liu, S. Pei, Kuan-Hsien Liu, Tsung-Jung Liu
In this work, we focus on building style transfer, which transforms ruin buildings to modern architecture. Inspired by Gaty’s and Goodfellow’s style transfer and generative adversarial network (GAN), we use CycleGAN to conquer this type of problem. To avoid the artifacts and generate better images, we add “perception loss” into the network, which is the feature loss extracted by VGG pre-trained model. We also adjust cycle loss by changing the ratio of weighting parameters. Finally, we collect images of both ruin and modern architecture from websites and use unsupervised learning to train the model. The experimental results show our proposed method indeed realize the modern architecture style transfer for ruin buildings.
在这项工作中,我们关注的是建筑风格的转换,将废墟建筑转变为现代建筑。受gatey和Goodfellow的风格转移和生成对抗网络(GAN)的启发,我们使用CycleGAN来解决这类问题。为了避免伪影,生成更好的图像,我们在网络中加入了“感知损失”,即VGG预训练模型提取的特征损失。我们还通过改变加权参数的比例来调整周期损耗。最后,我们从网站上收集废墟和现代建筑的图像,并使用无监督学习来训练模型。实验结果表明,该方法确实实现了废墟建筑的现代建筑风格转换。
{"title":"Modern Architecture Style Transfer for Ruin Buildings","authors":"Chia-Ching Wang, Hsin-Hua Liu, S. Pei, Kuan-Hsien Liu, Tsung-Jung Liu","doi":"10.1109/AICAS.2019.8771623","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771623","url":null,"abstract":"In this work, we focus on building style transfer, which transforms ruin buildings to modern architecture. Inspired by Gaty’s and Goodfellow’s style transfer and generative adversarial network (GAN), we use CycleGAN to conquer this type of problem. To avoid the artifacts and generate better images, we add “perception loss” into the network, which is the feature loss extracted by VGG pre-trained model. We also adjust cycle loss by changing the ratio of weighting parameters. Finally, we collect images of both ruin and modern architecture from websites and use unsupervised learning to train the model. The experimental results show our proposed method indeed realize the modern architecture style transfer for ruin buildings.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129673122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
On Automatic Generation of Training Images for Machine Learning in Automotive Applications 汽车应用中机器学习训练图像的自动生成
Tong-Yu Hsieh, Yuan-Cheng Lin, Hsin-Yung Shen
Machine learning is expected to play an important role in implementing automotive systems such as the Advanced Driver Assistance Systems (ADAS). To make machine learning methods work well, providing a sufficient number of training data is very important. However, collecting the training data may be difficult or very timing-consuming. In this paper we investigate automatic generation of training data for automotive applications. The Generative Adversarial Network (GAN) techniques are employed to generate fake yet still high-quality data for machine learning. Although using GAN to generate training images has been proposed in the literature, the previous work does not consider automotive applications. In this work a case study on vehicle detection is provided to demonstrate powerfulness of GAN and the effectiveness of the generated training images by GAN. The generated fake bus images are employed as training data and a SVM (Support Vector Machine) method is implemented to detect buses. The results show that the SVM trained by the fake images achieves almost the same detection accuracy as that by real images. The result also shows that GAN can generate the training images very fast. The extension of GAN to generate road images with various weather conditions such as fogs or nights is also discussed.
机器学习有望在高级驾驶辅助系统(ADAS)等汽车系统的实施中发挥重要作用。为了使机器学习方法很好地工作,提供足够数量的训练数据是非常重要的。然而,收集训练数据可能很困难,或者非常耗时。在本文中,我们研究了汽车应用中训练数据的自动生成。生成对抗网络(GAN)技术用于生成假但仍然高质量的机器学习数据。虽然文献中已经提出使用GAN生成训练图像,但之前的工作并未考虑汽车应用。在这项工作中,提供了一个车辆检测的案例研究,以证明GAN的强大功能以及GAN生成的训练图像的有效性。将生成的假公交车图像作为训练数据,采用支持向量机方法对公交车进行检测。结果表明,用假图像训练的支持向量机检测精度与用真实图像训练的支持向量机检测精度基本一致。实验结果还表明,GAN可以快速生成训练图像。本文还讨论了将GAN扩展到生成各种天气条件下的道路图像,如雾或夜晚。
{"title":"On Automatic Generation of Training Images for Machine Learning in Automotive Applications","authors":"Tong-Yu Hsieh, Yuan-Cheng Lin, Hsin-Yung Shen","doi":"10.1109/AICAS.2019.8771605","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771605","url":null,"abstract":"Machine learning is expected to play an important role in implementing automotive systems such as the Advanced Driver Assistance Systems (ADAS). To make machine learning methods work well, providing a sufficient number of training data is very important. However, collecting the training data may be difficult or very timing-consuming. In this paper we investigate automatic generation of training data for automotive applications. The Generative Adversarial Network (GAN) techniques are employed to generate fake yet still high-quality data for machine learning. Although using GAN to generate training images has been proposed in the literature, the previous work does not consider automotive applications. In this work a case study on vehicle detection is provided to demonstrate powerfulness of GAN and the effectiveness of the generated training images by GAN. The generated fake bus images are employed as training data and a SVM (Support Vector Machine) method is implemented to detect buses. The results show that the SVM trained by the fake images achieves almost the same detection accuracy as that by real images. The result also shows that GAN can generate the training images very fast. The extension of GAN to generate road images with various weather conditions such as fogs or nights is also discussed.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121538888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multi-level Weight Indexing Scheme for Memory-Reduced Convolutional Neural Network 记忆缩减卷积神经网络的多级权重索引方案
Jongmin Park, Seungsik Moon, Younghoon Byun, Sunggu Lee, Youngjoo Lee
Targeting the resource-limited intelligent mobile systems, in this paper, we present a multi-level weight indexing method that relaxes the memory requirements for realizing the convolutional neural networks (CNNs). In contrast that the previous works are only focusing on the positions of unpruned weights, the proposed work considers the consecutive pruned positions to generate the group-level validations. Denoting the survived indices only for the valid groups, the proposed multi-level indexing scheme reduces the amount of indexing data. In addition, we introduce the indexing-aware multi-level pruning and indexing methods with variable group sizes, which can further optimize the memory overheads. For the same pruning factor, as a result, the memory size for storing the indexing information is remarkably reduced by up to 81%, leading to the practical CNN architecture for intelligent mobile devices.
针对资源有限的智能移动系统,提出了一种多级权值索引方法,降低了实现卷积神经网络(cnn)的内存要求。与以往的工作只关注未修剪的权重位置相比,本文考虑连续修剪的位置来生成组级验证。多级索引方案只表示有效组的幸存索引,减少了索引数据量。此外,我们还引入了索引感知的多级剪枝和可变组大小的索引方法,可以进一步优化内存开销。在相同的修剪因子下,用于存储索引信息的内存大小显著减少,减少幅度高达81%,从而实现了适用于智能移动设备的实用CNN架构。
{"title":"Multi-level Weight Indexing Scheme for Memory-Reduced Convolutional Neural Network","authors":"Jongmin Park, Seungsik Moon, Younghoon Byun, Sunggu Lee, Youngjoo Lee","doi":"10.1109/AICAS.2019.8771492","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771492","url":null,"abstract":"Targeting the resource-limited intelligent mobile systems, in this paper, we present a multi-level weight indexing method that relaxes the memory requirements for realizing the convolutional neural networks (CNNs). In contrast that the previous works are only focusing on the positions of unpruned weights, the proposed work considers the consecutive pruned positions to generate the group-level validations. Denoting the survived indices only for the valid groups, the proposed multi-level indexing scheme reduces the amount of indexing data. In addition, we introduce the indexing-aware multi-level pruning and indexing methods with variable group sizes, which can further optimize the memory overheads. For the same pruning factor, as a result, the memory size for storing the indexing information is remarkably reduced by up to 81%, leading to the practical CNN architecture for intelligent mobile devices.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127888030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Learnable Unmanned Smart Logistics Prototype System Design and Implementation 可学习无人智能物流原型系统设计与实现
I-Lok Cheng, Ching-Hwa Cheng, Don-Gey Liu
Most of today's logistic systems require people to control them. If there are no enough man-power, e.g. drivers, or the destination is unfamiliar by the driver, delivery could be delayed or the goods may send to the wrong location. This paper demonstrated a prototype of a learnable smart system for precise positioning of unmanned transport machines. The proposed system consists of robotic arms, land vehicles, and unmanned aerial vehicles, which can easily deliver light-cargo to a designated place. The proposed design can automatically deliver goods to designated locations while avoiding environmental influences. Interactive use of unmanned vehicles and unmanned aerial vehicles for transport makes it possible to transport goods to a precise destination. This learnable prototype system can be demonstrated to evaluate the feasibility and performance for a learnable unmanned intelligent transportation system.
今天的大多数物流系统都需要人来控制。如果没有足够的人力,例如司机,或者司机对目的地不熟悉,可能会导致交货延迟或货物送到错误的地点。本文展示了一种用于无人运输机器精确定位的可学习智能系统的原型。该系统由机械臂、陆地车辆、无人驾驶飞行器组成,可以轻松地将轻型货物运送到指定地点。提出的设计可以自动将货物运送到指定地点,同时避免对环境的影响。交互式使用无人驾驶车辆和无人驾驶飞行器进行运输,使货物运输到精确的目的地成为可能。该可学习原型系统可用于评估可学习无人智能交通系统的可行性和性能。
{"title":"A Learnable Unmanned Smart Logistics Prototype System Design and Implementation","authors":"I-Lok Cheng, Ching-Hwa Cheng, Don-Gey Liu","doi":"10.1109/AICAS.2019.8771589","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771589","url":null,"abstract":"Most of today's logistic systems require people to control them. If there are no enough man-power, e.g. drivers, or the destination is unfamiliar by the driver, delivery could be delayed or the goods may send to the wrong location. This paper demonstrated a prototype of a learnable smart system for precise positioning of unmanned transport machines. The proposed system consists of robotic arms, land vehicles, and unmanned aerial vehicles, which can easily deliver light-cargo to a designated place. The proposed design can automatically deliver goods to designated locations while avoiding environmental influences. Interactive use of unmanned vehicles and unmanned aerial vehicles for transport makes it possible to transport goods to a precise destination. This learnable prototype system can be demonstrated to evaluate the feasibility and performance for a learnable unmanned intelligent transportation system.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115740576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Configurable Texture Unit for Convolutional Neural Networks on Graphics Processing Units 图形处理单元上卷积神经网络的可配置纹理单元
Yi-Hsiang Chen, Shao-Yi Chien
To accelerate Convolutional Neural Networks (CNN) operations on resource-limited mobile graphics processing units (GPUs), taking advantage of the common characteristics between texture filtering and convolutional layer, we propose a configurable texture unit called tensor and texture unit (TTU) to offload the computation from shader cores. With adding a new datapath for loading weight parameters in the texture unit, reusing the original texture cache, increasing the flexibility of the filtering unit, and packing the input data and weight parameters to fixed-point format, we make the texture unit be able to support convolutional and pooling layers with only small modifications. The proposed architecture is verified by integrating TTU into a GPU system in RTL level. Experimental results show that 18.54x speedup can be achieved with the overhead of only 8.5% compared with a GPU system with a traditional texture unit.
为了在资源有限的移动图形处理单元(gpu)上加速卷积神经网络(CNN)的运算,利用纹理过滤和卷积层之间的共同特征,我们提出了一种可配置的纹理单元,称为张量和纹理单元(TTU),以减轻着色器内核的计算负担。通过在纹理单元中添加新的加载权值参数的数据路径,重用原始纹理缓存,增加过滤单元的灵活性,并将输入数据和权值参数打包为定点格式,我们使纹理单元仅经过少量修改即可支持卷积层和池化层。通过将TTU集成到RTL级的GPU系统中,验证了所提出的架构。实验结果表明,与使用传统纹理单元的GPU系统相比,该算法可以实现18.54倍的加速,开销仅为8.5%。
{"title":"Configurable Texture Unit for Convolutional Neural Networks on Graphics Processing Units","authors":"Yi-Hsiang Chen, Shao-Yi Chien","doi":"10.1109/AICAS.2019.8771629","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771629","url":null,"abstract":"To accelerate Convolutional Neural Networks (CNN) operations on resource-limited mobile graphics processing units (GPUs), taking advantage of the common characteristics between texture filtering and convolutional layer, we propose a configurable texture unit called tensor and texture unit (TTU) to offload the computation from shader cores. With adding a new datapath for loading weight parameters in the texture unit, reusing the original texture cache, increasing the flexibility of the filtering unit, and packing the input data and weight parameters to fixed-point format, we make the texture unit be able to support convolutional and pooling layers with only small modifications. The proposed architecture is verified by integrating TTU into a GPU system in RTL level. Experimental results show that 18.54x speedup can be achieved with the overhead of only 8.5% compared with a GPU system with a traditional texture unit.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"351 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115779087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Reconfigurable Edge via Analytics Architecture 通过分析架构可重新配置边缘
Shih-Yu Chen, G. Lee, Tai-Ping Wang, Chin-Wei Huang, Jia-Hong Chen, Chang-Ling Tsai
As artificial intelligence (AI) algorithms requiring high accuracy become exceedingly more complex and Edge/IoT generated data becomes increasingly bigger, flexible reconfigurable processing is crucial in the design of efficient smart edge systems requiring low power and is introduced in this paper. In AI, analytics algorithms are typically used to analyze speech, audio, image video data, etc. In current cross-level system design methodology different algorithmic realizations are analyzed in the form of dataflow graphs (DFG) to further increase efficiency and flexibility in constituting “analytics architecture”. Having information on both algorithmic behavior and architectural information including software and hardware, the DFG so introduced provides a mathematical representation which, as opposed to traditional linear difference equations, better models the underlying computational platform for systematic analysis thus providing flexible and efficient management of the computational and storage resources. In our analytics architecture work, parallel and reconfigurable computing are formulated via DFG which are analogous to the analysis and synthesis equations of the well-known Fourier transform pair. In parallel computing, a connected component is eigen-decomposed to unconnected components for concurrent processing. For computation resource saving, commonalities in DFGs are analyzed for reuse when synthesizing or reconfiguring the edge platform. In this paper, we specifically introduce lightweight edge upon which algorithmic convolution for Convolution Neural Network are eigen-transformed to matrix operations with higher symmetry which facilitates fewer operations, lower data transfer rate and storage anticipating lower power when synthesizing or reconfiguring the eigenvectors.
随着要求高精度的人工智能(AI)算法变得越来越复杂,边缘/物联网生成的数据越来越大,灵活的可重构处理对于设计高效、低功耗的智能边缘系统至关重要,本文介绍了这一点。在人工智能中,分析算法通常用于分析语音、音频、图像视频数据等。在当前的跨层系统设计方法中,不同的算法实现以数据流图(DFG)的形式进行分析,以进一步提高构建“分析架构”的效率和灵活性。由于包含了算法行为和架构信息(包括软件和硬件),因此引入的DFG提供了一种数学表示,与传统的线性差分方程相反,它可以更好地为系统分析建立底层计算平台的模型,从而提供灵活有效的计算和存储资源管理。在我们的分析架构工作中,并行和可重构计算是通过DFG制定的,类似于著名的傅立叶变换对的分析和合成方程。在并行计算中,连接的组件被特征分解为未连接的组件,以供并发处理。为了节省计算资源,分析了DFGs的共性,以便在综合或重新配置边缘平台时重用。在本文中,我们特别引入了轻量级边缘,在此边缘上卷积神经网络的算法卷积被特征变换为具有更高对称性的矩阵运算,这使得在合成或重新配置特征向量时更少的运算,更低的数据传输速率和更低的存储预期功耗。
{"title":"Reconfigurable Edge via Analytics Architecture","authors":"Shih-Yu Chen, G. Lee, Tai-Ping Wang, Chin-Wei Huang, Jia-Hong Chen, Chang-Ling Tsai","doi":"10.1109/AICAS.2019.8771528","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771528","url":null,"abstract":"As artificial intelligence (AI) algorithms requiring high accuracy become exceedingly more complex and Edge/IoT generated data becomes increasingly bigger, flexible reconfigurable processing is crucial in the design of efficient smart edge systems requiring low power and is introduced in this paper. In AI, analytics algorithms are typically used to analyze speech, audio, image video data, etc. In current cross-level system design methodology different algorithmic realizations are analyzed in the form of dataflow graphs (DFG) to further increase efficiency and flexibility in constituting “analytics architecture”. Having information on both algorithmic behavior and architectural information including software and hardware, the DFG so introduced provides a mathematical representation which, as opposed to traditional linear difference equations, better models the underlying computational platform for systematic analysis thus providing flexible and efficient management of the computational and storage resources. In our analytics architecture work, parallel and reconfigurable computing are formulated via DFG which are analogous to the analysis and synthesis equations of the well-known Fourier transform pair. In parallel computing, a connected component is eigen-decomposed to unconnected components for concurrent processing. For computation resource saving, commonalities in DFGs are analyzed for reuse when synthesizing or reconfiguring the edge platform. In this paper, we specifically introduce lightweight edge upon which algorithmic convolution for Convolution Neural Network are eigen-transformed to matrix operations with higher symmetry which facilitates fewer operations, lower data transfer rate and storage anticipating lower power when synthesizing or reconfiguring the eigenvectors.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123468957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1