首页 > 最新文献

2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)最新文献

英文 中文
Autopiler: An AI Based Framework for Program Autotuning and Options Recommendation Autopiler:一个基于AI的程序自动调整和选项推荐框架
Kang-Lin Wang, Chi-Bang Kuan, Jiann-Fuh Liaw, Wei-Liang Kuo
Program autotuning has been proved to achieve great performance improvement in many compiler usage scenarios. Many autotuning frameworks have been provided to support fully-customizable configuration representations, a wide variety of representations for domain-specific tuning, and a user friendly interface for interaction between the program and the autotuner. However, tuning programs takes time, no matter it is autotuned or manually tuned. Oftentimes, programmers don’t have the time waiting for autotuners to finish and want to have rather good options to use instantly. This paper introduces Autopiler, a framework for building non-domain-specific program autotuners with machine learning based recommender systems for options prediction. This framework supports not only non-domain-specific tuning techniques, but also learns from previous tuning results and can make adequate good options recommendation before any tuning happens. We will illustrate the architecture of Autopiler and how to leverage recommender system for compiler options recommendation, in such way Autopiler can learn from the programs and becomes an AI boosted smart compiler. The experiment results show that Autopiler can deliver up to 19.46% performance improvement for in-house 4G LTE modem workloads.
程序自动调优已被证明可以在许多编译器使用场景中实现巨大的性能改进。已经提供了许多自动调优框架来支持完全可定制的配置表示、用于特定领域调优的各种表示,以及用于程序和自动调优器之间交互的用户友好界面。然而,调优程序需要时间,无论是自动调优还是手动调优。通常情况下,程序员没有时间等待自动调谐器完成,并且希望有相当好的选项可以立即使用。本文介绍了Autopiler,这是一个基于机器学习的推荐系统构建非特定领域程序自动调谐器的框架,用于选项预测。该框架不仅支持非特定于领域的调优技术,而且还可以从以前的调优结果中学习,并可以在进行任何调优之前提供足够好的选项建议。我们将说明Autopiler的架构以及如何利用推荐系统进行编译器选项推荐,这样Autopiler就可以从程序中学习,成为一个人工智能增强的智能编译器。实验结果表明,Autopiler可以为内部4G LTE调制解调器工作负载提供高达19.46%的性能提升。
{"title":"Autopiler: An AI Based Framework for Program Autotuning and Options Recommendation","authors":"Kang-Lin Wang, Chi-Bang Kuan, Jiann-Fuh Liaw, Wei-Liang Kuo","doi":"10.1109/AICAS.2019.8771625","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771625","url":null,"abstract":"Program autotuning has been proved to achieve great performance improvement in many compiler usage scenarios. Many autotuning frameworks have been provided to support fully-customizable configuration representations, a wide variety of representations for domain-specific tuning, and a user friendly interface for interaction between the program and the autotuner. However, tuning programs takes time, no matter it is autotuned or manually tuned. Oftentimes, programmers don’t have the time waiting for autotuners to finish and want to have rather good options to use instantly. This paper introduces Autopiler, a framework for building non-domain-specific program autotuners with machine learning based recommender systems for options prediction. This framework supports not only non-domain-specific tuning techniques, but also learns from previous tuning results and can make adequate good options recommendation before any tuning happens. We will illustrate the architecture of Autopiler and how to leverage recommender system for compiler options recommendation, in such way Autopiler can learn from the programs and becomes an AI boosted smart compiler. The experiment results show that Autopiler can deliver up to 19.46% performance improvement for in-house 4G LTE modem workloads.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"7 1-4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131492105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Intelligent Policy Selection for GPU Warp Scheduler GPU Warp Scheduler的智能策略选择
L. Chiou, Tsung-Han Yang, Jian-Tang Syu, Che-Pin Chang, Yeong-Jar Chang
The graphics processing unit (GPU) is widely used in applications that require massive computing resources such as big data, machine learning, computer vision, etc. As the diversity of applications grows, the GPU’s performance becomes difficult to maintain by its warp scheduler. Most of the prior studies of the warp scheduler are based on static analysis of GPU hardware behavior for certain types of benchmarks. We propose for the first time (to the best of our knowledge), a machine learning approach to intelligently select suitable policies for various applications in runtime. The simulation results indicate that the proposed approach can maintain performance comparable to the best policy across different applications.
图形处理单元(graphics processing unit, GPU)被广泛应用于大数据、机器学习、计算机视觉等需要大量计算资源的应用中。随着应用程序多样性的增长,GPU的性能变得难以通过其warp调度器来维持。先前对warp调度器的大多数研究都是基于对特定类型基准测试的GPU硬件行为的静态分析。我们首次(据我们所知)提出了一种机器学习方法,可以在运行时为各种应用程序智能地选择合适的策略。仿真结果表明,该方法可以在不同的应用程序中保持与最佳策略相当的性能。
{"title":"Intelligent Policy Selection for GPU Warp Scheduler","authors":"L. Chiou, Tsung-Han Yang, Jian-Tang Syu, Che-Pin Chang, Yeong-Jar Chang","doi":"10.1109/AICAS.2019.8771596","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771596","url":null,"abstract":"The graphics processing unit (GPU) is widely used in applications that require massive computing resources such as big data, machine learning, computer vision, etc. As the diversity of applications grows, the GPU’s performance becomes difficult to maintain by its warp scheduler. Most of the prior studies of the warp scheduler are based on static analysis of GPU hardware behavior for certain types of benchmarks. We propose for the first time (to the best of our knowledge), a machine learning approach to intelligently select suitable policies for various applications in runtime. The simulation results indicate that the proposed approach can maintain performance comparable to the best policy across different applications.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124671993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improved Hybrid Memory Cube for Weight-Sharing Deep Convolutional Neural Networks 基于权重共享深度卷积神经网络的改进混合记忆立方体
Hao Zhang, Jiongrui He, S. Ko
In recent years, many deep neural network accelerator architectures are proposed to improve the performance of processing deep neural network models. However, memory bandwidth is still the major issue and performance bottleneck of the deep neural network accelerators. The emerging 3D memory, such as hybrid memory cube (HMC) and processing-in-memory techniques provide new solutions to deep neural network implementation. In this paper, a novel HMC architecture is proposed for weight-sharing deep convolutional neural networks in order to solve the memory bandwidth bottleneck during the neural network implementation. The proposed HMC is designed based on conventional HMC architecture with only minor changes. In the logic layer, the vault controller is modified to enable parallel vault access. The weight parameters of pre-trained convolutional neural network are quantized to 16 numbers. During processing, the accumulation of the activations with shared weights is performed and only the accumulated results are transferred to the processing elements to perform multiplications with weights. By using this proposed architecture, the data transfer between main memory and processing elements can be reduced and the throughout of convolution operations can be improved by 30% compared to using HMC based multiply-accumulate design.
近年来,人们提出了许多深度神经网络加速器架构来提高处理深度神经网络模型的性能。然而,内存带宽仍然是深度神经网络加速器的主要问题和性能瓶颈。新兴的3D存储器,如混合存储器立方体(HMC)和内存处理技术,为深度神经网络的实现提供了新的解决方案。本文提出了一种新的权重共享深度卷积神经网络HMC架构,以解决神经网络实现过程中的内存带宽瓶颈问题。所提出的HMC是在传统HMC架构的基础上设计的,只有很小的变化。在逻辑层,修改保险库控制器以启用并行保险库访问。将预训练卷积神经网络的权值参数量化为16个数。在处理期间,执行具有共享权重的激活的累积,并且只有累积的结果被传输到处理元素以执行具有权重的乘法。与基于HMC的多重累加设计相比,采用该架构可减少主存与处理单元之间的数据传输,并将卷积运算的总次数提高30%。
{"title":"Improved Hybrid Memory Cube for Weight-Sharing Deep Convolutional Neural Networks","authors":"Hao Zhang, Jiongrui He, S. Ko","doi":"10.1109/AICAS.2019.8771540","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771540","url":null,"abstract":"In recent years, many deep neural network accelerator architectures are proposed to improve the performance of processing deep neural network models. However, memory bandwidth is still the major issue and performance bottleneck of the deep neural network accelerators. The emerging 3D memory, such as hybrid memory cube (HMC) and processing-in-memory techniques provide new solutions to deep neural network implementation. In this paper, a novel HMC architecture is proposed for weight-sharing deep convolutional neural networks in order to solve the memory bandwidth bottleneck during the neural network implementation. The proposed HMC is designed based on conventional HMC architecture with only minor changes. In the logic layer, the vault controller is modified to enable parallel vault access. The weight parameters of pre-trained convolutional neural network are quantized to 16 numbers. During processing, the accumulation of the activations with shared weights is performed and only the accumulated results are transferred to the processing elements to perform multiplications with weights. By using this proposed architecture, the data transfer between main memory and processing elements can be reduced and the throughout of convolution operations can be improved by 30% compared to using HMC based multiply-accumulate design.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116755644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Outstanding Bit Error Tolerance of Resistive RAM-Based Binarized Neural Networks 电阻式ram二值化神经网络的容错性能
T. Hirtzlin, M. Bocquet, Jacques-Olivier Klein, E. Nowak, E. Vianello, J. Portal, D. Querlioz
Resistive random access memories (RRAM) are novel nonvolatile memory technologies, which can be embedded at the core of CMOS, and which could be ideal for the in-memory implementation of deep neural networks. A particularly exciting vision is using them for implementing Binarized Neural Networks (BNNs), a class of deep neural networks with a highly reduced memory footprint. The challenge of resistive memory, however, is that they are prone to device variation, which can lead to bit errors. In this work we show that BNNs can tolerate these bit errors to an outstanding level, through simulations of networks on the MNIST and CIFAR10 tasks. If a standard BNN is used, up to 10−4 bit error rate can be tolerated with little impact on recognition performance on both MNIST and CIFAR10. We then show that by adapting the training procedure to the fact that the BNN will be operated on error-prone hardware, this tolerance can be extended to a bit error rate of 4 × 10−2. The requirements for RRAM are therefore a lot less stringent for BNNs than more traditional applications. We show, based on experimental measurements on a RRAM HfO2 technology, that this result can allow reduce RRAM programming energy by a factor 30.
电阻式随机存取存储器(RRAM)是一种新型的非易失性存储器技术,可以嵌入到CMOS的核心,是实现深度神经网络的理想存储器。一个特别令人兴奋的愿景是使用它们来实现二值化神经网络(bnn),这是一类内存占用高度减少的深度神经网络。然而,电阻式存储器的挑战在于它们容易受到器件变化的影响,这可能导致位错误。在这项工作中,我们通过模拟MNIST和CIFAR10任务的网络,证明了bnn可以容忍这些比特错误到一个出色的水平。如果使用标准的BNN,则可以容忍高达10−4比特的错误率,并且对MNIST和CIFAR10的识别性能几乎没有影响。然后我们证明,通过调整训练过程来适应BNN将在容易出错的硬件上运行的事实,这种容错可以扩展到4 × 10−2的误码率。因此,与更传统的应用相比,bnn对RRAM的要求要宽松得多。我们表明,基于对RRAM HfO2技术的实验测量,该结果可以将RRAM编程能量降低30倍。
{"title":"Outstanding Bit Error Tolerance of Resistive RAM-Based Binarized Neural Networks","authors":"T. Hirtzlin, M. Bocquet, Jacques-Olivier Klein, E. Nowak, E. Vianello, J. Portal, D. Querlioz","doi":"10.1109/AICAS.2019.8771544","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771544","url":null,"abstract":"Resistive random access memories (RRAM) are novel nonvolatile memory technologies, which can be embedded at the core of CMOS, and which could be ideal for the in-memory implementation of deep neural networks. A particularly exciting vision is using them for implementing Binarized Neural Networks (BNNs), a class of deep neural networks with a highly reduced memory footprint. The challenge of resistive memory, however, is that they are prone to device variation, which can lead to bit errors. In this work we show that BNNs can tolerate these bit errors to an outstanding level, through simulations of networks on the MNIST and CIFAR10 tasks. If a standard BNN is used, up to 10−4 bit error rate can be tolerated with little impact on recognition performance on both MNIST and CIFAR10. We then show that by adapting the training procedure to the fact that the BNN will be operated on error-prone hardware, this tolerance can be extended to a bit error rate of 4 × 10−2. The requirements for RRAM are therefore a lot less stringent for BNNs than more traditional applications. We show, based on experimental measurements on a RRAM HfO2 technology, that this result can allow reduce RRAM programming energy by a factor 30.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121909140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Online Anomaly Detection in HPC Systems 高性能计算系统中的在线异常检测
Andrea Borghesi, Antonio Libri, L. Benini, Andrea Bartolini
Reliability is a cumbersome problem in High Performance Computing Systems and Data Centers evolution. During operation, several types of fault conditions or anomalies can arise, ranging from malfunctioning hardware to improper configurations or imperfect software. Currently, system administrator and final users have to discover it manually. Clearly this approach does not scale to large scale supercomputers and facilities: automated methods to detect faults and unhealthy conditions is needed. Our method uses a type of neural network called autoncoder trained to learn the normal behavior of a real, in-production HPC system and it is deployed on the edge of each computing node. We obtain a very good accuracy (values ranging between 90% and 95%) and we also demonstrate that the approach can be deployed on the supercomputer nodes without negatively affecting the computing units performance.
可靠性是高性能计算系统和数据中心发展中的一个棘手问题。在操作过程中,可能会出现几种类型的故障情况或异常,从硬件故障到配置不当或软件不完善。目前需要系统管理员和最终用户手工发现。显然,这种方法不适用于大型超级计算机和设备:需要自动检测故障和不健康状况的方法。我们的方法使用一种被称为自动编码器的神经网络来学习真实的、生产中的高性能计算系统的正常行为,并将其部署在每个计算节点的边缘。我们获得了非常好的精度(值范围在90%到95%之间),并且我们还证明了该方法可以部署在超级计算机节点上,而不会对计算单元的性能产生负面影响。
{"title":"Online Anomaly Detection in HPC Systems","authors":"Andrea Borghesi, Antonio Libri, L. Benini, Andrea Bartolini","doi":"10.1109/AICAS.2019.8771527","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771527","url":null,"abstract":"Reliability is a cumbersome problem in High Performance Computing Systems and Data Centers evolution. During operation, several types of fault conditions or anomalies can arise, ranging from malfunctioning hardware to improper configurations or imperfect software. Currently, system administrator and final users have to discover it manually. Clearly this approach does not scale to large scale supercomputers and facilities: automated methods to detect faults and unhealthy conditions is needed. Our method uses a type of neural network called autoncoder trained to learn the normal behavior of a real, in-production HPC system and it is deployed on the edge of each computing node. We obtain a very good accuracy (values ranging between 90% and 95%) and we also demonstrate that the approach can be deployed on the supercomputer nodes without negatively affecting the computing units performance.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127924945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Extended Bit-Plane Compression for Convolutional Neural Network Accelerators 卷积神经网络加速器的扩展位平面压缩
L. Cavigelli, L. Benini
After the tremendous success of convolutional neural networks in image classification, object detection, speech recognition, etc., there is now rising demand for deployment of these compute-intensive ML models on tightly power constrained embedded and mobile systems at low cost as well as for pushing the throughput in data centers. This has triggered a wave of research towards specialized hardware accelerators. Their performance is often constrained by I/O bandwidth and the energy consumption is dominated by I/O transfers to off-chip memory. We introduce and evaluate a novel, hardware-friendly compression scheme for the feature maps present within convolutional neural networks. We show that an average compression ratio of 4.4× relative to uncompressed data and a gain of 60% over existing method can be achieved for ResNet-34 with a compression block requiring <300 bit of sequential cells and minimal combinational logic.
在卷积神经网络在图像分类、目标检测、语音识别等方面取得巨大成功之后,现在越来越多的人需要将这些计算密集型的ML模型以低成本部署在功耗受限的嵌入式和移动系统上,以及推动数据中心的吞吐量。这引发了对专用硬件加速器的研究浪潮。它们的性能通常受到I/O带宽的限制,而能量消耗主要是I/O传输到片外存储器。我们介绍并评估了一种新颖的,硬件友好的压缩方案,用于卷积神经网络中存在的特征映射。我们表明,相对于未压缩数据,ResNet-34的平均压缩比为4.4倍,比现有方法的增益为60%,压缩块需要<300比特的顺序单元和最小的组合逻辑。
{"title":"Extended Bit-Plane Compression for Convolutional Neural Network Accelerators","authors":"L. Cavigelli, L. Benini","doi":"10.1109/AICAS.2019.8771562","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771562","url":null,"abstract":"After the tremendous success of convolutional neural networks in image classification, object detection, speech recognition, etc., there is now rising demand for deployment of these compute-intensive ML models on tightly power constrained embedded and mobile systems at low cost as well as for pushing the throughput in data centers. This has triggered a wave of research towards specialized hardware accelerators. Their performance is often constrained by I/O bandwidth and the energy consumption is dominated by I/O transfers to off-chip memory. We introduce and evaluate a novel, hardware-friendly compression scheme for the feature maps present within convolutional neural networks. We show that an average compression ratio of 4.4× relative to uncompressed data and a gain of 60% over existing method can be achieved for ResNet-34 with a compression block requiring <300 bit of sequential cells and minimal combinational logic.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129430638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Elastic Neural Networks for Classification 弹性神经网络分类
Yi Zhou, Yue Bai, S. Bhattacharyya, H. Huttunen
In this work we propose a framework for improving the performance of any deep neural network that may suffer from vanishing gradients. To address the vanishing gradient issue, we study a framework, where we insert an intermediate output branch after each layer in the computational graph and use the corresponding prediction loss for feeding the gradient to the early layers. The framework—which we name Elastic network—is tested with several well-known networks on CIFAR10 and CIFAR100 datasets, and the experimental results show that the proposed framework improves the accuracy on both shallow networks (e.g., MobileNet) and deep convolutional neural networks (e.g., DenseNet). We also identify the types of networks where the framework does not improve the performance and discuss the reasons. Finally, as a side product, the computational complexity of the resulting networks can be adjusted in an elastic manner by selecting the output branch according to current computational budget.
在这项工作中,我们提出了一个框架,用于改善任何可能遭受梯度消失的深度神经网络的性能。为了解决梯度消失的问题,我们研究了一个框架,我们在计算图的每一层之后插入一个中间输出分支,并使用相应的预测损失将梯度馈送到早期的层。我们将该框架命名为Elastic network,并在CIFAR10和CIFAR100数据集上对几个知名网络进行了测试,实验结果表明,所提出的框架提高了浅层网络(例如MobileNet)和深度卷积神经网络(例如DenseNet)的准确性。我们还确定了框架不能提高性能的网络类型,并讨论了原因。最后,作为副产物,通过根据当前计算预算选择输出分支,可以弹性地调整所得到网络的计算复杂度。
{"title":"Elastic Neural Networks for Classification","authors":"Yi Zhou, Yue Bai, S. Bhattacharyya, H. Huttunen","doi":"10.1109/AICAS.2019.8771475","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771475","url":null,"abstract":"In this work we propose a framework for improving the performance of any deep neural network that may suffer from vanishing gradients. To address the vanishing gradient issue, we study a framework, where we insert an intermediate output branch after each layer in the computational graph and use the corresponding prediction loss for feeding the gradient to the early layers. The framework—which we name Elastic network—is tested with several well-known networks on CIFAR10 and CIFAR100 datasets, and the experimental results show that the proposed framework improves the accuracy on both shallow networks (e.g., MobileNet) and deep convolutional neural networks (e.g., DenseNet). We also identify the types of networks where the framework does not improve the performance and discuss the reasons. Finally, as a side product, the computational complexity of the resulting networks can be adjusted in an elastic manner by selecting the output branch according to current computational budget.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133452455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Design of Intelligent EEG System for Human Emotion Recognition with Convolutional Neural Network 基于卷积神经网络的人类情绪识别智能脑电图系统设计
Kai-Yen Wang, Yun-Lung Ho, Yu-De Huang, W. Fang
Emotions play a significant role in the field of affective computing and Human-Computer Interfaces(HCI). In this paper, we propose an intelligent human emotion detection system based on EEG features with a multi-channel fused processing. We also proposed an advanced convolutional neural network that was implemented in VLSI hardware design. This hardware design can accelerate both the training and classification processes and meet real-time system requirements for fast emotion detection. The performance of this design was validated using DEAP [1] database with datasets from 32 subjects, the mean classification accuracy achieved is 83.88%.
情绪在情感计算和人机界面领域中扮演着重要的角色。本文提出了一种基于脑电特征的多通道融合处理的智能人类情绪检测系统。我们还提出了一种先进的卷积神经网络实现在VLSI硬件设计。该硬件设计可以加快训练和分类过程,满足系统对快速情感检测的实时性要求。采用DEAP[1]数据库对32名受试者的数据集进行了性能验证,平均分类准确率为83.88%。
{"title":"Design of Intelligent EEG System for Human Emotion Recognition with Convolutional Neural Network","authors":"Kai-Yen Wang, Yun-Lung Ho, Yu-De Huang, W. Fang","doi":"10.1109/AICAS.2019.8771581","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771581","url":null,"abstract":"Emotions play a significant role in the field of affective computing and Human-Computer Interfaces(HCI). In this paper, we propose an intelligent human emotion detection system based on EEG features with a multi-channel fused processing. We also proposed an advanced convolutional neural network that was implemented in VLSI hardware design. This hardware design can accelerate both the training and classification processes and meet real-time system requirements for fast emotion detection. The performance of this design was validated using DEAP [1] database with datasets from 32 subjects, the mean classification accuracy achieved is 83.88%.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129007619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
An Energy-Efficient Accelerator with Relative- Indexing Memory for Sparse Compressed Convolutional Neural Network 稀疏压缩卷积神经网络中具有相对索引存储器的高效加速器
I-Chen Wu, Po-Tsang Huang, Chin-Yang Lo, W. Hwang
Deep convolutional neural networks (CNNs) are widely used in image recognition and feature classification. However, deep CNNs are hard to be fully deployed for edge devices due to both computation-intensive and memory-intensive workloads. The energy efficiency of CNNs is dominated by off-chip memory accesses and convolution computation. In this paper, an energy-efficient accelerator is proposed for sparse compressed CNNs by reducing DRAM accesses and eliminating zero-operand computation. Weight compression is utilized for sparse compressed CNNs to reduce the required memory capacity/bandwidth and a large portion of connections. Thus, ReLU function produces zero-valued activations. Additionally, the workloads are distributed based on channels to increase the degree of task parallelism, and all-row- to-all-row non-zero element multiplication is adopted for skipping redundant computation. The simulation results over the dense accelerator show that the proposed accelerator achieves 1.79x speedup and reduces 23.51%, 69.53%, 88.67% on-chip memory size, energy, and DRAM accesses of VGG-16.
深度卷积神经网络(cnn)广泛应用于图像识别和特征分类。然而,由于计算密集型和内存密集型的工作负载,深度cnn很难完全部署在边缘设备上。cnn的能量效率主要由片外存储器访问和卷积计算决定。本文通过减少DRAM访问和消除零操作数计算,提出了一种用于稀疏压缩cnn的节能加速器。对于稀疏压缩的cnn,利用权值压缩来减少所需的内存容量/带宽和大量的连接。因此,ReLU函数产生零值激活。此外,基于通道分配工作负载以提高任务并行度,并采用全行到全行非零元素乘法来跳过冗余计算。在密集加速器上的仿真结果表明,所提出的加速器实现了1.79倍的加速,并减少了VGG-16的片上存储器大小、能量和DRAM访问,分别减少了23.51%、69.53%和88.67%。
{"title":"An Energy-Efficient Accelerator with Relative- Indexing Memory for Sparse Compressed Convolutional Neural Network","authors":"I-Chen Wu, Po-Tsang Huang, Chin-Yang Lo, W. Hwang","doi":"10.1109/AICAS.2019.8771600","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771600","url":null,"abstract":"Deep convolutional neural networks (CNNs) are widely used in image recognition and feature classification. However, deep CNNs are hard to be fully deployed for edge devices due to both computation-intensive and memory-intensive workloads. The energy efficiency of CNNs is dominated by off-chip memory accesses and convolution computation. In this paper, an energy-efficient accelerator is proposed for sparse compressed CNNs by reducing DRAM accesses and eliminating zero-operand computation. Weight compression is utilized for sparse compressed CNNs to reduce the required memory capacity/bandwidth and a large portion of connections. Thus, ReLU function produces zero-valued activations. Additionally, the workloads are distributed based on channels to increase the degree of task parallelism, and all-row- to-all-row non-zero element multiplication is adopted for skipping redundant computation. The simulation results over the dense accelerator show that the proposed accelerator achieves 1.79x speedup and reduces 23.51%, 69.53%, 88.67% on-chip memory size, energy, and DRAM accesses of VGG-16.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127244267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Customized Convolutional Neural Network Design Using Improved Softmax Layer for Real-time Human Emotion Recognition 基于改进Softmax层的自定义卷积神经网络设计用于实时人类情绪识别
Kai-Yen Wang, Yu-De Huang, Yun-Lung Ho, W. Fang
This paper proposes an improved softmax layer algorithm and hardware implementation, which is applicable to an effective convolutional neural network of EEG-based real-time human emotion recognition. Compared with the general softmax layer, this hardware design adds threshold layers to accelerate the training speed and replace the Euler’s base value with a dynamic base value to improve the network accuracy. This work also shows a hardware-friendly way to implement batch normalization layer on chip. Using the EEG emotion DEAP[7] database, the maximum and mean classification accuracy were achieved as 96.03% and 83.88% respectively. In this work, the usage of improved softmax layer can save up to 15% of training model convergence time and also increase by 3 to 5% the average accuracy.
本文提出了一种改进的softmax层算法和硬件实现,适用于基于脑电图的有效卷积神经网络实时人类情绪识别。与一般的softmax层相比,本硬件设计增加了阈值层,加快了训练速度,并用动态基值代替欧拉基值,提高了网络精度。本工作还展示了一种在芯片上实现批规范化层的硬件友好的方法。使用EEG情绪DEAP[7]数据库,分类准确率最高为96.03%,平均为83.88%。在这项工作中,使用改进的softmax层可以节省高达15%的训练模型收敛时间,并提高3 - 5%的平均精度。
{"title":"A Customized Convolutional Neural Network Design Using Improved Softmax Layer for Real-time Human Emotion Recognition","authors":"Kai-Yen Wang, Yu-De Huang, Yun-Lung Ho, W. Fang","doi":"10.1109/AICAS.2019.8771616","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771616","url":null,"abstract":"This paper proposes an improved softmax layer algorithm and hardware implementation, which is applicable to an effective convolutional neural network of EEG-based real-time human emotion recognition. Compared with the general softmax layer, this hardware design adds threshold layers to accelerate the training speed and replace the Euler’s base value with a dynamic base value to improve the network accuracy. This work also shows a hardware-friendly way to implement batch normalization layer on chip. Using the EEG emotion DEAP[7] database, the maximum and mean classification accuracy were achieved as 96.03% and 83.88% respectively. In this work, the usage of improved softmax layer can save up to 15% of training model convergence time and also increase by 3 to 5% the average accuracy.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"304 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132649611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1