首页 > 最新文献

2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)最新文献

英文 中文
Sub-Word Parallel Precision-Scalable MAC Engines for Efficient Embedded DNN Inference 用于高效嵌入式DNN推理的子字并行精确可扩展MAC引擎
L. Mei, Mohit Dandekar, D. Rodopoulos, J. Constantin, P. Debacker, R. Lauwereins, M. Verhelst
To enable energy-efficient embedded execution of Deep Neural Networks (DNNs), the critical sections of these workloads, their multiply-accumulate (MAC) operations, need to be carefully optimized. The SotA pursues this through run-time precision-scalable MAC operators, which can support the varying precision needs of DNNs in an energy-efficient way. Yet, to implement the adaptable precision MAC operation, most SotA solutions rely on separately optimized low precision multipliers and a precision-variable accumulation scheme, with the possible disadvantages of a high control complexity and degraded throughput. This paper, first optimizes one of the most effective SotA techniques to support fully-connected DNN layers. This mode, exploiting the transformation of a high precision multiplier into independent parallel low-precision multipliers, will be called the Sum Separate (SS) mode. In addition, this work suggests an alternative low-precision scheme, i.e. the implicit accumulation of multiple low precision products within the multiplier itself, called the Sum Together (ST) mode. Based on the two types of MAC arrangements explored, corresponding architectures have been proposed to implement DNN processing. The two architectures, yielding the same throughput, are compared in different working precisions (2/4/8/16-bit), based on Post-Synthesis simulation. The result shows that the proposed ST-Mode based architecture outperforms the earlier SS-Mode by up to ×1.6 on Energy Efficiency (TOPS/W) and ×1.5 on Area Efficiency (GOPS/mm2).
为了实现深度神经网络(dnn)的节能嵌入式执行,需要仔细优化这些工作负载的关键部分,即它们的乘法累积(MAC)操作。SotA通过运行时精度可扩展的MAC运营商来实现这一目标,该运营商可以以节能的方式支持dnn的不同精度需求。然而,为了实现自适应精度MAC操作,大多数SotA解决方案依赖于单独优化的低精度乘子和精度变量累积方案,这可能具有高控制复杂性和降低吞吐量的缺点。本文首先优化了最有效的SotA技术之一,以支持完全连接的DNN层。这种模式利用高精度乘法器转换成独立的并行低精度乘法器,将被称为和分离(SS)模式。此外,这项工作提出了另一种低精度方案,即乘法器本身内多个低精度产品的隐式积累,称为Sum Together (ST)模式。基于所探索的两种类型的MAC安排,提出了相应的架构来实现深度神经网络处理。两种架构,产生相同的吞吐量,在不同的工作精度(2/4/8/16位)进行比较,基于合成后仿真。结果表明,所提出的基于st模式的体系结构在能量效率(TOPS/W)和面积效率(GOPS/mm2)上分别优于早期的ss模式×1.6和×1.5。
{"title":"Sub-Word Parallel Precision-Scalable MAC Engines for Efficient Embedded DNN Inference","authors":"L. Mei, Mohit Dandekar, D. Rodopoulos, J. Constantin, P. Debacker, R. Lauwereins, M. Verhelst","doi":"10.1109/AICAS.2019.8771481","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771481","url":null,"abstract":"To enable energy-efficient embedded execution of Deep Neural Networks (DNNs), the critical sections of these workloads, their multiply-accumulate (MAC) operations, need to be carefully optimized. The SotA pursues this through run-time precision-scalable MAC operators, which can support the varying precision needs of DNNs in an energy-efficient way. Yet, to implement the adaptable precision MAC operation, most SotA solutions rely on separately optimized low precision multipliers and a precision-variable accumulation scheme, with the possible disadvantages of a high control complexity and degraded throughput. This paper, first optimizes one of the most effective SotA techniques to support fully-connected DNN layers. This mode, exploiting the transformation of a high precision multiplier into independent parallel low-precision multipliers, will be called the Sum Separate (SS) mode. In addition, this work suggests an alternative low-precision scheme, i.e. the implicit accumulation of multiple low precision products within the multiplier itself, called the Sum Together (ST) mode. Based on the two types of MAC arrangements explored, corresponding architectures have been proposed to implement DNN processing. The two architectures, yielding the same throughput, are compared in different working precisions (2/4/8/16-bit), based on Post-Synthesis simulation. The result shows that the proposed ST-Mode based architecture outperforms the earlier SS-Mode by up to ×1.6 on Energy Efficiency (TOPS/W) and ×1.5 on Area Efficiency (GOPS/mm2).","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133298951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
AICAS 2019 Author Index AICAS 2019作者索引
{"title":"AICAS 2019 Author Index","authors":"","doi":"10.1109/aicas.2019.8771499","DOIUrl":"https://doi.org/10.1109/aicas.2019.8771499","url":null,"abstract":"","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133370745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AIP: Saving the DRAM Access Energy of CNNs Using Approximate Inner Products AIP:利用近似内积节省cnn的DRAM存取能量
C. Cheng, Ren-Shuo Liu
In this work, we propose AIP (Approximate Inner Product), which approximates the inner products of CNNs’ fully-connected (FC) layers by using only a small fraction (e.g., one-sixteenth) of parameters. We observe that FC layers possess several characteristics that naturally fit AIP: the dropout training strategy, rectified linear units (ReLUs), and top-n operator. Experimental results show that 48% of DRAM access energy can be reduced at the cost of only 2% of top-5 accuracy loss (for VGG-f).
在这项工作中,我们提出了AIP(近似内积),它通过仅使用一小部分(例如,十六分之一)参数来近似cnn的全连接(FC)层的内积。我们观察到FC层具有几个自然适合AIP的特征:dropout训练策略,整流线性单元(relu)和top-n算子。实验结果表明,仅以前5位精度损失的2%(对于VGG-f)为代价,可以减少48%的DRAM存取能量。
{"title":"AIP: Saving the DRAM Access Energy of CNNs Using Approximate Inner Products","authors":"C. Cheng, Ren-Shuo Liu","doi":"10.1109/AICAS.2019.8771595","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771595","url":null,"abstract":"In this work, we propose AIP (Approximate Inner Product), which approximates the inner products of CNNs’ fully-connected (FC) layers by using only a small fraction (e.g., one-sixteenth) of parameters. We observe that FC layers possess several characteristics that naturally fit AIP: the dropout training strategy, rectified linear units (ReLUs), and top-n operator. Experimental results show that 48% of DRAM access energy can be reduced at the cost of only 2% of top-5 accuracy loss (for VGG-f).","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131525190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Heterogeneous Activation Function Extraction for Training and Optimization of SNN Systems 异构激活函数提取用于SNN系统的训练和优化
A. Zjajo, Sumeet S. Kumar, R. V. Leuken
Energy-efficiency and computation capability characteristics of analog/mixed-signal spiking neural networks offer capable platform for implementation of cognitive tasks on resource-limited embedded platforms. However, inherent mismatch in analog devices severely influence accuracy and reliability of the computing system. In this paper, we devise efficient algorithm for extracting of heterogeneous activation functions of analog hardware neurons as a set of constraints in an off-line training and optimization process, and examine how compensation of the mismatch effects influence synchronicity and information processing capabilities of the system.
模拟/混合信号尖峰神经网络的能量效率和计算能力特点为在资源有限的嵌入式平台上实现认知任务提供了良好的平台。然而,模拟器件固有的失配严重影响了计算系统的精度和可靠性。在本文中,我们设计了一种有效的算法来提取模拟硬件神经元的异构激活函数,作为离线训练和优化过程中的一组约束,并研究了补偿错配效应如何影响系统的同步性和信息处理能力。
{"title":"Heterogeneous Activation Function Extraction for Training and Optimization of SNN Systems","authors":"A. Zjajo, Sumeet S. Kumar, R. V. Leuken","doi":"10.1109/AICAS.2019.8771619","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771619","url":null,"abstract":"Energy-efficiency and computation capability characteristics of analog/mixed-signal spiking neural networks offer capable platform for implementation of cognitive tasks on resource-limited embedded platforms. However, inherent mismatch in analog devices severely influence accuracy and reliability of the computing system. In this paper, we devise efficient algorithm for extracting of heterogeneous activation functions of analog hardware neurons as a set of constraints in an off-line training and optimization process, and examine how compensation of the mismatch effects influence synchronicity and information processing capabilities of the system.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123047690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analog Weights in ReRAM DNN Accelerators ReRAM DNN加速器中的模拟权重
J. Eshraghian, S. Kang, Seungbum Baek, G. Orchard, H. Iu, W. Lei
Artificial neural networks have become ubiquitous in modern life, which has triggered the emergence of a new class of application specific integrated circuits for their acceleration. ReRAM-based accelerators have gained significant traction due to their ability to leverage in-memory computations. In a crossbar structure, they can perform multiply-and-accumulate operations more efficiently than standard CMOS logic. By virtue of being resistive switches, ReRAM switches can only reliably store one of two states. This is a severe limitation on the range of values in a computational kernel. This paper presents a novel scheme in alleviating the single-bit-per-device restriction by exploiting frequency dependence of v-i plane hysteresis, and assigning kernel information not only to the device conductance but also partially distributing it to the frequency of a time-varying input.We show this approach reduces average power consumption for a single crossbar convolution by up to a factor of ×16 for an unsigned 8-bit input image, where each convolutional process consumes a worst-case of 1.1mW, and reduces area by a factor of ×8, without reducing accuracy to the level of binarized neural networks. This presents a massive saving in computing cost when there are many simultaneous in-situ multiply-and-accumulate processes occurring across different crossbars.
人工神经网络在现代生活中无处不在,这引发了一类新的应用特定集成电路的出现。基于reram的加速器由于能够利用内存中的计算而获得了巨大的吸引力。在横杆结构中,它们可以比标准CMOS逻辑更有效地执行乘法和累加运算。由于是电阻开关,ReRAM开关只能可靠地存储两种状态中的一种。这是对计算内核中值范围的严重限制。本文提出了一种新的方案,利用v-i面迟滞的频率依赖性,将内核信息分配给器件电导,并将其部分分配给时变输入的频率,以减轻每个器件的单比特限制。我们表明,对于无符号8位输入图像,这种方法将单个交叉条卷积的平均功耗降低了×16,其中每个卷积过程消耗的最坏情况为1.1mW,并将面积减少了×8,而不会将精度降低到二值化神经网络的水平。当在不同的交叉条上同时发生许多原位乘法和累积过程时,这将大大节省计算成本。
{"title":"Analog Weights in ReRAM DNN Accelerators","authors":"J. Eshraghian, S. Kang, Seungbum Baek, G. Orchard, H. Iu, W. Lei","doi":"10.1109/AICAS.2019.8771550","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771550","url":null,"abstract":"Artificial neural networks have become ubiquitous in modern life, which has triggered the emergence of a new class of application specific integrated circuits for their acceleration. ReRAM-based accelerators have gained significant traction due to their ability to leverage in-memory computations. In a crossbar structure, they can perform multiply-and-accumulate operations more efficiently than standard CMOS logic. By virtue of being resistive switches, ReRAM switches can only reliably store one of two states. This is a severe limitation on the range of values in a computational kernel. This paper presents a novel scheme in alleviating the single-bit-per-device restriction by exploiting frequency dependence of v-i plane hysteresis, and assigning kernel information not only to the device conductance but also partially distributing it to the frequency of a time-varying input.We show this approach reduces average power consumption for a single crossbar convolution by up to a factor of ×16 for an unsigned 8-bit input image, where each convolutional process consumes a worst-case of 1.1mW, and reduces area by a factor of ×8, without reducing accuracy to the level of binarized neural networks. This presents a massive saving in computing cost when there are many simultaneous in-situ multiply-and-accumulate processes occurring across different crossbars.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114612841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
AnalogHTM: Memristive Spatial Pooler Learning with Backpropagation 基于反向传播的记忆空间池学习
O. Krestinskaya, A. P. James
Spatial pooler is responsible for feature extraction in Hierarchical Temporal Memory (HTM). In this paper, we present analog backpropagation learning circuits integrated to the memristive circuit design of spatial pooler. Using 0.18μm CMOS technology and TiOx memristor models, the maximum on-chip area and power consumption of the proposed design are 8335.074μm2 and 51.55mW, respectively. The system is tested for a face recognition problem AR face database achieving a recognition accuracy of 90%.
空间池负责分层时间记忆(Hierarchical Temporal Memory, HTM)中的特征提取。本文提出了将模拟反向传播学习电路集成到空间池的忆阻电路设计中。采用0.18μm CMOS工艺和TiOx忆阻器模型,设计的最大片上面积和功耗分别为8335.074μm2和51.55mW。该系统针对人脸识别问题AR人脸数据库进行了测试,识别准确率达到90%。
{"title":"AnalogHTM: Memristive Spatial Pooler Learning with Backpropagation","authors":"O. Krestinskaya, A. P. James","doi":"10.1109/AICAS.2019.8771628","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771628","url":null,"abstract":"Spatial pooler is responsible for feature extraction in Hierarchical Temporal Memory (HTM). In this paper, we present analog backpropagation learning circuits integrated to the memristive circuit design of spatial pooler. Using 0.18μm CMOS technology and TiOx memristor models, the maximum on-chip area and power consumption of the proposed design are 8335.074μm2 and 51.55mW, respectively. The system is tested for a face recognition problem AR face database achieving a recognition accuracy of 90%.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128874036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Conversion of Synchronous Artificial Neural Network to Asynchronous Spiking Neural Network using sigma-delta quantization 用sigma-delta量化将同步人工神经网络转换为异步尖峰神经网络
A. Yousefzadeh, Sahar Hosseini, Priscila C. Holanda, Sam Leroux, T. Werner, T. Serrano-Gotarredona, B. Linares-Barranco, B. Dhoedt, P. Simoens
Artificial Neural Networks (ANNs) show great performance in several data analysis tasks including visual and auditory applications. However, direct implementation of these algorithms without considering the sparsity of data requires high processing power, consume vast amounts of energy and suffer from scalability issues. Inspired by biology, one of the methods which can reduce power consumption and allow scalability in the implementation of neural networks is asynchronous processing and communication by means of action potentials, so-called spikes. In this work, we use the well-known sigma-delta quantization method and introduce an easy and straightforward solution to convert an Artificial Neural Network to a Spiking Neural Network which can be implemented asynchronously in a neuromorphic platform. Briefly, we used asynchronous spikes to communicate the quantized output activations of the neurons. Despite the fact that our proposed mechanism is simple and applicable to a wide range of different ANNs, it outperforms the state-of-the-art implementations from the accuracy and energy consumption point of view. All source code for this project is available upon request for the academic purpose1.
人工神经网络在包括视觉和听觉应用在内的多种数据分析任务中表现出优异的性能。然而,在不考虑数据稀疏性的情况下直接实现这些算法需要很高的处理能力,消耗大量的能量,并且存在可扩展性问题。受生物学的启发,在神经网络的实现中,一种可以降低功耗并允许可扩展性的方法是通过动作电位(所谓的峰值)进行异步处理和通信。在这项工作中,我们使用著名的sigma-delta量化方法,并引入了一种简单直接的解决方案,将人工神经网络转换为可在神经形态平台中异步实现的峰值神经网络。简单地说,我们使用异步尖峰来传递神经元的量化输出激活。尽管我们提出的机制简单且适用于各种不同的人工神经网络,但从准确性和能耗的角度来看,它优于最先进的实现。本项目的所有源代码可根据学术目的的要求提供。
{"title":"Conversion of Synchronous Artificial Neural Network to Asynchronous Spiking Neural Network using sigma-delta quantization","authors":"A. Yousefzadeh, Sahar Hosseini, Priscila C. Holanda, Sam Leroux, T. Werner, T. Serrano-Gotarredona, B. Linares-Barranco, B. Dhoedt, P. Simoens","doi":"10.1109/AICAS.2019.8771624","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771624","url":null,"abstract":"Artificial Neural Networks (ANNs) show great performance in several data analysis tasks including visual and auditory applications. However, direct implementation of these algorithms without considering the sparsity of data requires high processing power, consume vast amounts of energy and suffer from scalability issues. Inspired by biology, one of the methods which can reduce power consumption and allow scalability in the implementation of neural networks is asynchronous processing and communication by means of action potentials, so-called spikes. In this work, we use the well-known sigma-delta quantization method and introduce an easy and straightforward solution to convert an Artificial Neural Network to a Spiking Neural Network which can be implemented asynchronously in a neuromorphic platform. Briefly, we used asynchronous spikes to communicate the quantized output activations of the neurons. Despite the fact that our proposed mechanism is simple and applicable to a wide range of different ANNs, it outperforms the state-of-the-art implementations from the accuracy and energy consumption point of view. All source code for this project is available upon request for the academic purpose1.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116581818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Special Session: 2018 Low-Power Image Recognition Challenge and Beyond 特别会议:2018低功耗图像识别挑战及超越
M. Ardi, A. Berg, Bo Chen, Yen-kuang Chen, Yiran Chen, Donghyun Kang, Junhyeok Lee, Seungjae Lee, Yang Lu, Yung-Hsiang Lu, Fei Sun
The IEEE Low-Power Image Recognition Challenge (LPIRC) is an annual competition started in 2015. The competition identifies the best technologies that can detect objects in images efficiently (short execution time and low energy consumption). This paper summarizes LPIRC in year 2018 by describing the winners’ solutions. The paper also discusses the future of low-power computer vision.
IEEE低功耗图像识别挑战赛(LPIRC)是2015年开始的年度比赛。该竞赛旨在确定能够有效检测图像中物体的最佳技术(执行时间短,能耗低)。本文通过描述获胜者的解决方案,总结了2018年的LPIRC。文章还讨论了低功耗计算机视觉的发展前景。
{"title":"Special Session: 2018 Low-Power Image Recognition Challenge and Beyond","authors":"M. Ardi, A. Berg, Bo Chen, Yen-kuang Chen, Yiran Chen, Donghyun Kang, Junhyeok Lee, Seungjae Lee, Yang Lu, Yung-Hsiang Lu, Fei Sun","doi":"10.1109/AICAS.2019.8771606","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771606","url":null,"abstract":"The IEEE Low-Power Image Recognition Challenge (LPIRC) is an annual competition started in 2015. The competition identifies the best technologies that can detect objects in images efficiently (short execution time and low energy consumption). This paper summarizes LPIRC in year 2018 by describing the winners’ solutions. The paper also discusses the future of low-power computer vision.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115968802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
NeuroPilot: A Cross-Platform Framework for Edge-AI NeuroPilot: Edge-AI跨平台框架
Tung-Chien Chen, Wei-Ting Wang, Kloze Kao, Chia-Lin Yu, C. Lin, Shu-Hsin Chang, Pei-Kuei Tsung
Artificial intelligence (AI) has been applied from cloud servers to edge devices because of its rapid response, privacy, robustness, and the efficient use of network bandwidth. However, it is challengeable to deploy the computation and memory-bandwidth intensive AI to edge devices for the power and hardware resource are limited. The various needs of applications, diverse devices and the fragmented supporting tools make the integration a tough work. In this paper, the NeuroPilot, a cross-platform framework for edge AI, is introduced. Technologies on software, hardware and integration levels are proposed to achieve the high performance and preserve the flexibility meanwhile. The NeuroPilot solution provides the superior edge AI ability for a wide range of applications.
人工智能(AI)因其快速响应、隐私、鲁棒性和有效利用网络带宽而被从云服务器应用到边缘设备。然而,由于功率和硬件资源的限制,将计算和内存带宽密集型人工智能部署到边缘设备是具有挑战性的。应用程序的各种需求、不同的设备和分散的支持工具使得集成成为一项艰巨的工作。本文介绍了边缘人工智能的跨平台框架NeuroPilot。从软件、硬件和集成度三个层面提出了实现高性能和保持灵活性的技术。NeuroPilot解决方案为广泛的应用提供了卓越的边缘AI能力。
{"title":"NeuroPilot: A Cross-Platform Framework for Edge-AI","authors":"Tung-Chien Chen, Wei-Ting Wang, Kloze Kao, Chia-Lin Yu, C. Lin, Shu-Hsin Chang, Pei-Kuei Tsung","doi":"10.1109/AICAS.2019.8771536","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771536","url":null,"abstract":"Artificial intelligence (AI) has been applied from cloud servers to edge devices because of its rapid response, privacy, robustness, and the efficient use of network bandwidth. However, it is challengeable to deploy the computation and memory-bandwidth intensive AI to edge devices for the power and hardware resource are limited. The various needs of applications, diverse devices and the fragmented supporting tools make the integration a tough work. In this paper, the NeuroPilot, a cross-platform framework for edge AI, is introduced. Technologies on software, hardware and integration levels are proposed to achieve the high performance and preserve the flexibility meanwhile. The NeuroPilot solution provides the superior edge AI ability for a wide range of applications.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"216 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121943031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Automatic HCC Detection Using Convolutional Network with Multi-Magnification Input Images 基于多倍放大输入图像的卷积网络肝癌自动检测
Wei-Che Huang, P. Chung, H. Tsai, N. Chow, Y. Juang, H. Tsai, Shih-Hsuan Lin, Cheng-Hsiung Wang
Liver cancer postoperative pathologic examination of stained tissues is an important step in identifying prognostic factors for follow-up care. Traditionally, liver cancer detection would be performed by pathologists with observing the entire biological tissue, resulting in heavy work loading and potential misjudgment. Accordingly, the studies of the automatic pathological examination have been popular for a long period of time. Most approaches of the existing cancer detection, however, only extract cell level information based on single-scale high-magnification patch. In liver tissues, common cell change phenomena such as apoptosis, necrosis, and steatosis are similar in tumor and benign. Hence, the detection may fail when the patch only covered the changed cells area that cannot provide enough neighboring cell structure information. To conquer this problem, the convolutional network architecture with multi-magnification input can provide not only the cell level information by referencing high-magnification patches, but also the cell structure information by referencing low-magnification patches. The detection algorithm consists of two main structures: 1) extraction of cell level and cell structure level feature maps from high-magnification and low-magnification images respectively by separate general convolutional networks, and 2) integration of multi-magnification features by fully connected network. In this paper, VGG16 and Inception V4 were applied as the based convolutional network for liver tumor detection task. The experimental results showed that VGG16 based multi-magnification input convolutional network achieved 91% mIOU on HCC tumor detection task. In addition, with comparison between single-scale CNN (SSCN) and multi-scale CNN (MSCN) approaches, the MSCN demonstrated that the multi-scale patches could provide better performance on HCC classification task.
肝癌术后染色组织的病理检查是确定预后因素的重要步骤。传统上,肝癌的检测是由病理学家通过观察整个生物组织来完成的,这导致了繁重的工作量和潜在的误判。因此,病理自动检查的研究在很长一段时间内都很流行。然而,现有的大多数癌症检测方法仅基于单尺度高放大贴片提取细胞水平信息。在肝脏组织中,常见的细胞变化现象如凋亡、坏死、脂肪变性在肿瘤和良性组织中是相似的。因此,当补丁只覆盖了变化的细胞区域,不能提供足够的相邻细胞结构信息时,检测可能会失败。为了解决这一问题,多倍输入的卷积网络架构既可以通过引用高倍率补丁提供细胞水平信息,也可以通过引用低倍率补丁提供细胞结构信息。该检测算法包括两个主要结构:1)通过单独的通用卷积网络分别从高倍和低倍图像中提取细胞水平和细胞结构水平特征映射;2)通过全连接网络对多倍特征进行集成。本文采用VGG16和Inception V4作为基于卷积网络的肝脏肿瘤检测任务。实验结果表明,基于VGG16的多倍放大输入卷积网络在HCC肿瘤检测任务上达到91%的mIOU。此外,通过对单尺度CNN (SSCN)和多尺度CNN (MSCN)方法的比较,MSCN表明,多尺度贴片在HCC分类任务上可以提供更好的性能。
{"title":"Automatic HCC Detection Using Convolutional Network with Multi-Magnification Input Images","authors":"Wei-Che Huang, P. Chung, H. Tsai, N. Chow, Y. Juang, H. Tsai, Shih-Hsuan Lin, Cheng-Hsiung Wang","doi":"10.1109/AICAS.2019.8771535","DOIUrl":"https://doi.org/10.1109/AICAS.2019.8771535","url":null,"abstract":"Liver cancer postoperative pathologic examination of stained tissues is an important step in identifying prognostic factors for follow-up care. Traditionally, liver cancer detection would be performed by pathologists with observing the entire biological tissue, resulting in heavy work loading and potential misjudgment. Accordingly, the studies of the automatic pathological examination have been popular for a long period of time. Most approaches of the existing cancer detection, however, only extract cell level information based on single-scale high-magnification patch. In liver tissues, common cell change phenomena such as apoptosis, necrosis, and steatosis are similar in tumor and benign. Hence, the detection may fail when the patch only covered the changed cells area that cannot provide enough neighboring cell structure information. To conquer this problem, the convolutional network architecture with multi-magnification input can provide not only the cell level information by referencing high-magnification patches, but also the cell structure information by referencing low-magnification patches. The detection algorithm consists of two main structures: 1) extraction of cell level and cell structure level feature maps from high-magnification and low-magnification images respectively by separate general convolutional networks, and 2) integration of multi-magnification features by fully connected network. In this paper, VGG16 and Inception V4 were applied as the based convolutional network for liver tumor detection task. The experimental results showed that VGG16 based multi-magnification input convolutional network achieved 91% mIOU on HCC tumor detection task. In addition, with comparison between single-scale CNN (SSCN) and multi-scale CNN (MSCN) approaches, the MSCN demonstrated that the multi-scale patches could provide better performance on HCC classification task.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123598530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1