首页 > 最新文献

Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion最新文献

英文 中文
A quantifiable approach to approximate computing: special session 近似计算的可量化方法:特殊会话
Chaofan Li, Deepashree Sengupta, F. S. Snigdha, Wenbin Xu, Jiang Hu, S. Sapatnekar
Approximate computing has applications in areas such as image processing, neural computation, distributed systems, and real-time systems, where the results may be acceptable in the presence of controlled levels of error. The promise of approximate computing is in its ability to render just enough performance to meet quality constraints. However, going from this theoretical promise to a practical implementation requires a clear comprehension of the system requirements and matching them to the design of approximations as the system is implemented. This involves the tasks of (a) identifying the design space of potential approximations, (b) modeling the injected error as a function of the level of approximation, and (c) optimizing the system over the design space to maximize a metric, typically the power savings, under constraints on the maximum allowable degradation. Often, the error may be introduced at a low level of design (e.g., at the level of a full adder) but its impact must be percolated up to system-level error metrics (e.g., PSNR in a compressed image), and a practical approach must devise a coherent and quantifiable way of translating between error/power tradeoffs at all levels of design.
近似计算在图像处理、神经计算、分布式系统和实时系统等领域都有应用,在控制误差水平的情况下,结果可能是可以接受的。近似计算的前景在于它能够提供足够的性能来满足质量限制。然而,从这个理论承诺到实际实现,需要对系统需求有清晰的理解,并在系统实现时将它们与近似设计相匹配。这涉及到以下任务:(a)识别潜在近似的设计空间,(b)将注入误差建模为近似水平的函数,以及(c)在最大允许退化的约束下,在设计空间上优化系统以最大化度量,通常是节能。通常,误差可能在较低的设计级别(例如,在全加法器级别)引入,但其影响必须渗透到系统级误差度量(例如,压缩图像中的PSNR),并且实用的方法必须设计出一种连贯且可量化的方法,在所有设计级别的误差/功率权衡之间进行转换。
{"title":"A quantifiable approach to approximate computing: special session","authors":"Chaofan Li, Deepashree Sengupta, F. S. Snigdha, Wenbin Xu, Jiang Hu, S. Sapatnekar","doi":"10.1145/3125501.3125511","DOIUrl":"https://doi.org/10.1145/3125501.3125511","url":null,"abstract":"Approximate computing has applications in areas such as image processing, neural computation, distributed systems, and real-time systems, where the results may be acceptable in the presence of controlled levels of error. The promise of approximate computing is in its ability to render just enough performance to meet quality constraints. However, going from this theoretical promise to a practical implementation requires a clear comprehension of the system requirements and matching them to the design of approximations as the system is implemented. This involves the tasks of (a) identifying the design space of potential approximations, (b) modeling the injected error as a function of the level of approximation, and (c) optimizing the system over the design space to maximize a metric, typically the power savings, under constraints on the maximum allowable degradation. Often, the error may be introduced at a low level of design (e.g., at the level of a full adder) but its impact must be percolated up to system-level error metrics (e.g., PSNR in a compressed image), and a practical approach must devise a coherent and quantifiable way of translating between error/power tradeoffs at all levels of design.","PeriodicalId":259093,"journal":{"name":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122610546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Advanced ahead-of-time compilation for Javascript engine: work-in-progress 高级提前编译Javascript引擎:工作在进行中
Hyukwoo Park, SungKook Kim, Soo-Mook Moon
JavaScript1 is heavily used in the web, yet it is much slower than other languages. To improve the JavaScript performance, ahead-of-time compilation (AOTC) has been used, either to reuse the bytecode or the machine code generated by the baseline just-in-time compilation (JITC). JavaScript engines today employ high-performance optimizing JITC. So, we propose an AOTC that reuses the code generated by the optimizing JITC. It is more challenging than existing AOTCs since we need to handle more complex address relocation issues. Our preliminary evaluation shows that the proposed AOTC is promising, though.
javascript在web中被大量使用,但它比其他语言慢得多。为了提高JavaScript性能,使用了提前编译(AOTC),或者重用字节码,或者重用由基线即时编译(JITC)生成的机器码。今天的JavaScript引擎使用高性能优化JITC。因此,我们提出了一个AOTC,该AOTC重用由优化JITC生成的代码。由于我们需要处理更复杂的地址搬迁问题,因此它比现有的aotc更具挑战性。然而,我们的初步评估表明,拟议的AOTC是有希望的。
{"title":"Advanced ahead-of-time compilation for Javascript engine: work-in-progress","authors":"Hyukwoo Park, SungKook Kim, Soo-Mook Moon","doi":"10.1145/3125501.3125512","DOIUrl":"https://doi.org/10.1145/3125501.3125512","url":null,"abstract":"JavaScript1 is heavily used in the web, yet it is much slower than other languages. To improve the JavaScript performance, ahead-of-time compilation (AOTC) has been used, either to reuse the bytecode or the machine code generated by the baseline just-in-time compilation (JITC). JavaScript engines today employ high-performance optimizing JITC. So, we propose an AOTC that reuses the code generated by the optimizing JITC. It is more challenging than existing AOTCs since we need to handle more complex address relocation issues. Our preliminary evaluation shows that the proposed AOTC is promising, though.","PeriodicalId":259093,"journal":{"name":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130548967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Incremental training of CNNs for user customization: work-in-progress 针对用户定制的cnn增量训练:正在进行中
M. S. Moghaddam, B. Harris, Duseok Kang, Inpyo Bae, Euiseok Kim, Hyemi Min, Hansu Cho, Sukjin Kim, Bernhard Egger, S. Ha, Kiyoung Choi
This paper presents a convolutional neural network architecture that supports transfer learning for user customization. The architecture consists of a large basic inference engine and a small augmenting engine. Initially, both engines are trained using a large dataset. Only the augmenting engine is tuned to the user-specific dataset. To preserve the accuracy for the original dataset, the novel concept of quality factor is proposed. The final network is evaluated with the Caffe framework, and our own implementation on a coarse-grained reconfigurable array (CGRA) processor. Experiments with MNIST, NIST'19, and our user-specific datasets show the effectiveness of the proposed approach and the potential of CGRAs as DNN processors.
提出了一种支持用户自定义迁移学习的卷积神经网络体系结构。该架构由一个大型的基本推理引擎和一个小型的增强引擎组成。最初,两个引擎都使用大型数据集进行训练。只有扩展引擎会调优到特定于用户的数据集。为了保持原始数据集的准确性,提出了质量因子的新概念。最终的网络是用Caffe框架和我们自己在粗粒度可重构阵列(CGRA)处理器上的实现来评估的。使用MNIST、NIST'19和我们的用户特定数据集进行的实验显示了所提出方法的有效性以及CGRAs作为DNN处理器的潜力。
{"title":"Incremental training of CNNs for user customization: work-in-progress","authors":"M. S. Moghaddam, B. Harris, Duseok Kang, Inpyo Bae, Euiseok Kim, Hyemi Min, Hansu Cho, Sukjin Kim, Bernhard Egger, S. Ha, Kiyoung Choi","doi":"10.1145/3125501.3125519","DOIUrl":"https://doi.org/10.1145/3125501.3125519","url":null,"abstract":"This paper presents a convolutional neural network architecture that supports transfer learning for user customization. The architecture consists of a large basic inference engine and a small augmenting engine. Initially, both engines are trained using a large dataset. Only the augmenting engine is tuned to the user-specific dataset. To preserve the accuracy for the original dataset, the novel concept of quality factor is proposed. The final network is evaluated with the Caffe framework, and our own implementation on a coarse-grained reconfigurable array (CGRA) processor. Experiments with MNIST, NIST'19, and our user-specific datasets show the effectiveness of the proposed approach and the potential of CGRAs as DNN processors.","PeriodicalId":259093,"journal":{"name":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129301688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Emerging (un-)reliability based security threats and mitigations for embedded systems: special session 嵌入式系统新出现的(非)基于可靠性的安全威胁和缓解:特别会议
H. Amrouch, P. Krishnamurthy, Naman Patel, J. Henkel, R. Karri, F. Khorrami
This paper addresses two reliability-based security threats and mitigations for embedded systems namely, aging and thermal side channels. Device aging can be used as a hardware attack vector by using voltage scaling or specially crafted instruction sequences to violate embedded processor guard bands. Short-term aging effects can be utilized to cause transient degradation of the embedded device without leaving any trace of the attack. (Thermal) side channels can be used as an attack vector and as a defense. Specifically, thermal side channels are an effective and secure way to remotely monitor code execution on an embedded processor and/or to possibly leak information. Although various algorithmic means to detect anomaly are available, machine learning tools are effective for anomaly detection. We will show such utilization of deep learning networks in conjunction with thermal side channels to detect code injection/modification representing anomaly.
本文讨论了嵌入式系统的两种基于可靠性的安全威胁和缓解措施,即老化和热侧通道。通过使用电压缩放或特殊设计的指令序列来侵犯嵌入式处理器保护带,可以将器件老化用作硬件攻击向量。可以利用短期老化效应来引起嵌入式设备的瞬态退化而不留下攻击的任何痕迹。(热)侧通道可以用作攻击向量和防御。具体来说,热侧通道是远程监控嵌入式处理器上的代码执行和/或可能泄漏信息的有效且安全的方法。虽然检测异常的算法手段多种多样,但机器学习工具是检测异常的有效工具。我们将展示这种利用深度学习网络与热侧通道相结合来检测代表异常的代码注入/修改。
{"title":"Emerging (un-)reliability based security threats and mitigations for embedded systems: special session","authors":"H. Amrouch, P. Krishnamurthy, Naman Patel, J. Henkel, R. Karri, F. Khorrami","doi":"10.1145/3125501.3125529","DOIUrl":"https://doi.org/10.1145/3125501.3125529","url":null,"abstract":"This paper addresses two reliability-based security threats and mitigations for embedded systems namely, aging and thermal side channels. Device aging can be used as a hardware attack vector by using voltage scaling or specially crafted instruction sequences to violate embedded processor guard bands. Short-term aging effects can be utilized to cause transient degradation of the embedded device without leaving any trace of the attack. (Thermal) side channels can be used as an attack vector and as a defense. Specifically, thermal side channels are an effective and secure way to remotely monitor code execution on an embedded processor and/or to possibly leak information. Although various algorithmic means to detect anomaly are available, machine learning tools are effective for anomaly detection. We will show such utilization of deep learning networks in conjunction with thermal side channels to detect code injection/modification representing anomaly.","PeriodicalId":259093,"journal":{"name":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130748706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Probabilistic reasoning for analysis of approximate computations 近似计算分析的概率推理
Sasa Misailovic
Popular application domains such as multimedia processing, machine learning, and big-data analytics operate on inherently noisy data and make decisions under uncertainty. While these applications are often good candidates for both algorithmic and system-level approximation, a major open challenge is how to analyze the influence of noisy data and candidate approximations on the application's outputs. At the same time, probabilistic programming languages provide an intuitive way to model uncertainty by expressing complex probabilistic models as computer programs. The talk will give an overview of PSI (http://www.psisolver.org), a system for exact symbolic inference. PSI computes succinct symbolic representations of the joint posterior distribution represented by a probabilistic program using static analysis. PSI supports programs with both discrete and continuous distributions. It can compute answers to various posterior distribution queries, expectation queries and assertion queries using its own back-end for symbolic reasoning. This talk will present how we can represent some problems in approximate computing as probabilistic programs and use PSI to automatically get symbolic expressions that represent the distributions of the output error.
多媒体处理、机器学习和大数据分析等流行的应用领域在固有的噪声数据上运行,并在不确定的情况下做出决策。虽然这些应用程序通常是算法级和系统级近似的良好候选,但一个主要的开放挑战是如何分析噪声数据和候选近似对应用程序输出的影响。同时,概率编程语言通过将复杂的概率模型表示为计算机程序,为不确定性建模提供了一种直观的方法。讲座将概述PSI (http://www.psisolver.org),一个精确的符号推理系统。PSI计算关节后验分布的简洁符号表示,表示的概率程序使用静态分析。PSI支持离散和连续分布的程序。它可以使用自己的后端进行符号推理,计算各种后验分布查询、期望查询和断言查询的答案。本讲座将介绍我们如何将近似计算中的一些问题表示为概率程序,并使用PSI自动获得表示输出误差分布的符号表达式。
{"title":"Probabilistic reasoning for analysis of approximate computations","authors":"Sasa Misailovic","doi":"10.1145/3125501.3125524","DOIUrl":"https://doi.org/10.1145/3125501.3125524","url":null,"abstract":"Popular application domains such as multimedia processing, machine learning, and big-data analytics operate on inherently noisy data and make decisions under uncertainty. While these applications are often good candidates for both algorithmic and system-level approximation, a major open challenge is how to analyze the influence of noisy data and candidate approximations on the application's outputs. At the same time, probabilistic programming languages provide an intuitive way to model uncertainty by expressing complex probabilistic models as computer programs. The talk will give an overview of PSI (http://www.psisolver.org), a system for exact symbolic inference. PSI computes succinct symbolic representations of the joint posterior distribution represented by a probabilistic program using static analysis. PSI supports programs with both discrete and continuous distributions. It can compute answers to various posterior distribution queries, expectation queries and assertion queries using its own back-end for symbolic reasoning. This talk will present how we can represent some problems in approximate computing as probabilistic programs and use PSI to automatically get symbolic expressions that represent the distributions of the output error.","PeriodicalId":259093,"journal":{"name":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130824237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards efficient quantized neural network inference on mobile devices: work-in-progress 在移动设备上实现高效量化神经网络推理:正在进行中
Yaman Umuroglu, Magnus Jahre
From voice recognition to object detection, Deep Neural Networks (DNNs) are steadily getting better at extracting information from complex raw data. Combined with the popularity of mobile computing and the rise of the Internet-of-Things (IoT), there is enormous potential for widespread deployment of intelligent devices, but a computational challenge remains. A modern DNN can require billions of floating point operations to classify a single image, which is far too costly for energy-constrained mobile devices. Offloading DNNs to powerful servers in the cloud is only a limited solution, as it requires significant energy for data transfer and cannot address applications with low-latency requirements such as augmented reality or navigation for autonomous drones.
从语音识别到目标检测,深度神经网络(dnn)在从复杂的原始数据中提取信息方面正稳步进步。结合移动计算的普及和物联网(IoT)的兴起,智能设备的广泛部署具有巨大的潜力,但计算方面的挑战仍然存在。现代深度神经网络可能需要数十亿次浮点运算才能对一张图像进行分类,对于能量有限的移动设备来说,这太昂贵了。将dnn卸载到云中的强大服务器只是一个有限的解决方案,因为它需要大量的能量进行数据传输,并且无法解决低延迟要求的应用程序,例如增强现实或自主无人机导航。
{"title":"Towards efficient quantized neural network inference on mobile devices: work-in-progress","authors":"Yaman Umuroglu, Magnus Jahre","doi":"10.1145/3125501.3125528","DOIUrl":"https://doi.org/10.1145/3125501.3125528","url":null,"abstract":"From voice recognition to object detection, Deep Neural Networks (DNNs) are steadily getting better at extracting information from complex raw data. Combined with the popularity of mobile computing and the rise of the Internet-of-Things (IoT), there is enormous potential for widespread deployment of intelligent devices, but a computational challenge remains. A modern DNN can require billions of floating point operations to classify a single image, which is far too costly for energy-constrained mobile devices. Offloading DNNs to powerful servers in the cloud is only a limited solution, as it requires significant energy for data transfer and cannot address applications with low-latency requirements such as augmented reality or navigation for autonomous drones.","PeriodicalId":259093,"journal":{"name":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122390130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Balanced cache bypassing for critical warp reduction: work-in-progress 平衡缓存绕过关键翘曲减少:工作在进行中
Sungin Hong, Hyunjun Kim, Hwansoo Han
Warp-level cache bypassing has been proposed to resolve GPU memory resource contention on GPU computing. However, the proposed cache bypassing scheme has sub-optimal performance due to warp criticality problem in balanced workload. In this paper, we show that warp-level cache bypassing is a sub-optimal solution and propose a balanced cache bypassing scheme to solve this problem.
为了解决GPU计算中GPU内存资源的争用问题,提出了warp级缓存绕过算法。然而,在平衡工作负载下,由于翘曲临界性问题,所提出的缓存绕过方案的性能不是最优的。在本文中,我们证明了warp级缓存旁路是一个次优解决方案,并提出了一个平衡的缓存旁路方案来解决这个问题。
{"title":"Balanced cache bypassing for critical warp reduction: work-in-progress","authors":"Sungin Hong, Hyunjun Kim, Hwansoo Han","doi":"10.1145/3125501.3125513","DOIUrl":"https://doi.org/10.1145/3125501.3125513","url":null,"abstract":"Warp-level cache bypassing has been proposed to resolve GPU memory resource contention on GPU computing. However, the proposed cache bypassing scheme has sub-optimal performance due to warp criticality problem in balanced workload. In this paper, we show that warp-level cache bypassing is a sub-optimal solution and propose a balanced cache bypassing scheme to solve this problem.","PeriodicalId":259093,"journal":{"name":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125436640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Code-size-aware mapping for synchronous dataflow graphs on multicore systems: work-in-progress 多核系统上同步数据流图的代码大小感知映射:正在进行的工作
Mingze Ma, R. Sakellariou
Synchronous Dataflow Graphs (SDFGs) are widely used to model streaming applications (e.g. digital signal processing applications), which are commonly executed by embedded systems. The usage of on-chip resources is always strictly constrained in embedded systems. As the cost of instruction memory is a significant part of on-chip resource costs, code size reduction is an effective way to control the overall costs of on-chip resources. In this work, a code-size-aware mapping heuristic is proposed to decrease the code size for SDFGs on multicore systems. The mapping heuristic is jointly used with a self-timed scheduling heuristic to decrease the code size of the original schedule. In preliminary experiments, the proposed heuristic achieves significant code size reduction for all the tested SDFGs without affecting throughput.
同步数据流图(sdfg)被广泛用于建模流应用程序(例如数字信号处理应用程序),这些应用程序通常由嵌入式系统执行。在嵌入式系统中,片上资源的使用总是受到严格的限制。由于指令存储器的成本是片上资源成本的重要组成部分,减小代码大小是控制片上资源总体成本的有效方法。在这项工作中,提出了一种代码大小感知映射启发式方法来减少多核系统上sdfg的代码大小。将映射启发式方法与自定时调度启发式方法联合使用,以减小原始调度的代码大小。在初步实验中,所提出的启发式方法在不影响吞吐量的情况下显著减少了所有被测试的sdfg的代码大小。
{"title":"Code-size-aware mapping for synchronous dataflow graphs on multicore systems: work-in-progress","authors":"Mingze Ma, R. Sakellariou","doi":"10.1145/3125501.3125514","DOIUrl":"https://doi.org/10.1145/3125501.3125514","url":null,"abstract":"Synchronous Dataflow Graphs (SDFGs) are widely used to model streaming applications (e.g. digital signal processing applications), which are commonly executed by embedded systems. The usage of on-chip resources is always strictly constrained in embedded systems. As the cost of instruction memory is a significant part of on-chip resource costs, code size reduction is an effective way to control the overall costs of on-chip resources. In this work, a code-size-aware mapping heuristic is proposed to decrease the code size for SDFGs on multicore systems. The mapping heuristic is jointly used with a self-timed scheduling heuristic to decrease the code size of the original schedule. In preliminary experiments, the proposed heuristic achieves significant code size reduction for all the tested SDFGs without affecting throughput.","PeriodicalId":259093,"journal":{"name":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126318621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Towards industry strength mapping of AUTOSAR automotive functionality on multicore architectures: work-in-progress 在多核架构上实现AUTOSAR汽车功能的行业优势映射:正在进行中
Cosmin Avasalcai, Dhanesh Budhrani, P. Pop
The automotive electronic architectures have moved from federated architectures, where one function is implemented in one ECU (Electronic Control Unit), to distributed architectures, consisting of several multicore ECUs. In addition, multicore ECUs are being adopted because of better performance, cost, size, fault-tolerance and power consumption. Automotive manufacturers use AUTomotive Open System ARchitecture (AUTOSAR) as the standardized software architecture for ECUs. With AUTOSAR, the functionality is modeled as a set of software components composed of subtasks, called runnables. In this paper we propose an approach for the automatic software functionality assignment to multicore distributed architectures, implemented as a software tool. The AUTOMAP, decides: the (i) mapping of software components to multicore ECUs, (ii) the assignment of runnables to the ECU cores, (iii) the clustering of runnables into tasks and (iv) the mapping of tasks to 'OS-Applications', such that timing and mapping constraints are satisfied. AUTOMAP has been developed to handle large industrialsized use cases, fine-grained realistic mapping and timing constraints, and to produce outputs that support the system engineer in the mapping task. We have successfully evaluated AUTOMAP on several realistic use cases from Volvo Trucks.
汽车电子体系结构已经从联邦体系结构(在一个ECU(电子控制单元)中实现一个功能)转变为分布式体系结构(由多个多核ECU组成)。此外,多核ecu由于性能、成本、尺寸、容错性和功耗更好而被采用。汽车制造商使用汽车开放系统架构(AUTOSAR)作为ecu的标准化软件架构。使用AUTOSAR,功能被建模为一组由称为可运行项的子任务组成的软件组件。在本文中,我们提出了一种将软件功能自动分配到多核分布式架构的方法,并将其作为软件工具实现。AUTOMAP决定:(i)将软件组件映射到多核ECU, (ii)将可运行程序分配到ECU内核,(iii)将可运行程序集群到任务中,(iv)将任务映射到“OS-Applications”,从而满足时间和映射约束。AUTOMAP的开发是为了处理大型工业用例,细粒度的真实映射和时间限制,并产生支持系统工程师在映射任务中的输出。我们已经在沃尔沃卡车的几个实际用例中成功地评估了AUTOMAP。
{"title":"Towards industry strength mapping of AUTOSAR automotive functionality on multicore architectures: work-in-progress","authors":"Cosmin Avasalcai, Dhanesh Budhrani, P. Pop","doi":"10.1145/3125501.3125623","DOIUrl":"https://doi.org/10.1145/3125501.3125623","url":null,"abstract":"The automotive electronic architectures have moved from federated architectures, where one function is implemented in one ECU (Electronic Control Unit), to distributed architectures, consisting of several multicore ECUs. In addition, multicore ECUs are being adopted because of better performance, cost, size, fault-tolerance and power consumption. Automotive manufacturers use AUTomotive Open System ARchitecture (AUTOSAR) as the standardized software architecture for ECUs. With AUTOSAR, the functionality is modeled as a set of software components composed of subtasks, called runnables. In this paper we propose an approach for the automatic software functionality assignment to multicore distributed architectures, implemented as a software tool. The AUTOMAP, decides: the (i) mapping of software components to multicore ECUs, (ii) the assignment of runnables to the ECU cores, (iii) the clustering of runnables into tasks and (iv) the mapping of tasks to 'OS-Applications', such that timing and mapping constraints are satisfied. AUTOMAP has been developed to handle large industrialsized use cases, fine-grained realistic mapping and timing constraints, and to produce outputs that support the system engineer in the mapping task. We have successfully evaluated AUTOMAP on several realistic use cases from Volvo Trucks.","PeriodicalId":259093,"journal":{"name":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","volume":"2645 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131305334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Optimizing DCNN FPGA accelerator design for handwritten hangul character recognition: work-in-progress 优化手写韩文字符识别的DCNN FPGA加速器设计:正在进行中
Hanwool Park, Changdae Lee, Hakkyung Lee, Yechan Yoo, Yoonjin Park, Injung Kim, Kang Yi
Deep1 Convolutional Neural Network (DCNN) is a break-through technology in image recognition. However, because of extreme computing resource requirements, DCNN need to be implemented by hardware accelerator. In this paper, we present an FPGA-based accelerator design techniques of DCNN for handwritten Hangul character recognition engine. We achieved about 11.9ms recognition time per character with Xilinx FPGA accelerator. Our design optimization was performed with Xilinx HLS and SDAccel environment targeting Kintex XCKU115 FPGA from Xilinx. Our design outperforms CPU in terms of execution time 6.25 times, and GPGPU in terms of energy efficiency 4.7 times and cooling cost for the computing servers by 17 times. We think the research results imply deep learning with FPGA accelerator will be alternative to GPGPU solutions for real-time applications, especially in data centers or sever farms.
深度卷积神经网络(deep卷积Neural Network, DCNN)是图像识别领域的一项突破性技术。然而,由于对计算资源的要求极高,DCNN需要通过硬件加速器来实现。本文提出了一种基于fpga的手写韩文字符识别引擎DCNN加速器设计技术。我们使用Xilinx FPGA加速器实现了每个字符11.9ms的识别时间。我们的设计优化是在Xilinx HLS和SDAccel环境下进行的,目标是Xilinx的Kintex XCKU115 FPGA。我们的设计在执行时间上是CPU的6.25倍,在能效上是GPGPU的4.7倍,在计算服务器的冷却成本上是GPGPU的17倍。我们认为,研究结果意味着FPGA加速器的深度学习将成为GPGPU解决方案的替代方案,用于实时应用,特别是在数据中心或服务器场。
{"title":"Optimizing DCNN FPGA accelerator design for handwritten hangul character recognition: work-in-progress","authors":"Hanwool Park, Changdae Lee, Hakkyung Lee, Yechan Yoo, Yoonjin Park, Injung Kim, Kang Yi","doi":"10.1145/3125501.3125522","DOIUrl":"https://doi.org/10.1145/3125501.3125522","url":null,"abstract":"Deep1 Convolutional Neural Network (DCNN) is a break-through technology in image recognition. However, because of extreme computing resource requirements, DCNN need to be implemented by hardware accelerator. In this paper, we present an FPGA-based accelerator design techniques of DCNN for handwritten Hangul character recognition engine. We achieved about 11.9ms recognition time per character with Xilinx FPGA accelerator. Our design optimization was performed with Xilinx HLS and SDAccel environment targeting Kintex XCKU115 FPGA from Xilinx. Our design outperforms CPU in terms of execution time 6.25 times, and GPGPU in terms of energy efficiency 4.7 times and cooling cost for the computing servers by 17 times. We think the research results imply deep learning with FPGA accelerator will be alternative to GPGPU solutions for real-time applications, especially in data centers or sever farms.","PeriodicalId":259093,"journal":{"name":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123139800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1