首页 > 最新文献

2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)最新文献

英文 中文
Fault Localization in Programmable Microfluidic Devices 可编程微流体装置故障定位
Pub Date : 2019-03-25 DOI: 10.23919/DATE.2019.8715023
Alessandro Bernardini, Chunfeng Liu, Bing Li, Ulf Schlichtmann
Programmable Microfluidic Devices (PMDs) have revolutionized the traditional biochemical experiment flow. Test algorithms for PMDs have recently been proposed. Test patterns can be generated algorithmically. But an algorithm for fault localization once some faults have been identified is not yet available. When testing a PMD, once a test pattern fails it is unknown where the stuck valve is located. The stuck valve can be any one valve out of many valves forming the test pattern. In this paper, we propose an effective algorithm for the localization of stuck-at-0 faults and stuck-at-1 faults in a PMD. The stuck valve is localized either exactly or within a very small set of candidate valves. Once the locations of faulty valves are known, it becomes possible to continue to use the PMD by resynthesizing the application.
可编程微流控装置(PMDs)的出现彻底改变了传统的生化实验流程。最近提出了pmd的测试算法。测试模式可以通过算法生成。但目前还没有一种故障定位算法。当测试PMD时,一旦测试模式失败,就不知道卡阀位于何处。卡阀可以是形成测试模式的许多阀门中的任何一个阀门。本文提出了一种有效的PMD中卡在0和卡在1故障的定位算法。卡住的阀门要么精确定位,要么在很小的一组候选阀门中定位。一旦知道故障阀门的位置,就可以通过重新合成应用程序继续使用PMD。
{"title":"Fault Localization in Programmable Microfluidic Devices","authors":"Alessandro Bernardini, Chunfeng Liu, Bing Li, Ulf Schlichtmann","doi":"10.23919/DATE.2019.8715023","DOIUrl":"https://doi.org/10.23919/DATE.2019.8715023","url":null,"abstract":"Programmable Microfluidic Devices (PMDs) have revolutionized the traditional biochemical experiment flow. Test algorithms for PMDs have recently been proposed. Test patterns can be generated algorithmically. But an algorithm for fault localization once some faults have been identified is not yet available. When testing a PMD, once a test pattern fails it is unknown where the stuck valve is located. The stuck valve can be any one valve out of many valves forming the test pattern. In this paper, we propose an effective algorithm for the localization of stuck-at-0 faults and stuck-at-1 faults in a PMD. The stuck valve is localized either exactly or within a very small set of candidate valves. Once the locations of faulty valves are known, it becomes possible to continue to use the PMD by resynthesizing the application.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115931658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
QBLK: Towards Fully Exploiting the Parallelism of Open-Channel SSDs QBLK:充分利用开放通道ssd的并行性
Pub Date : 2019-03-25 DOI: 10.23919/DATE.2019.8715049
Hongwei Qin, D. Feng, Wei Tong, Jingning Liu, Yutong Zhao
By exposing physical channels to host software, Open-Channel SSD shows great potential in future high performance storage systems. However, the existing scheme fails to achieve acceptable performance under heavy workloads. The main reasons reside not only in its single-buffer architecture, more importantly, but also in its line-based physical address management. Besides, the lock of address mapping table is also a performance burden under heavy workloads. We propose QBLK, an open source driver which tries to better exploit the parallelism of Open-Channel SSDs. Particularly, QBLK adopts four key techniques, namely (1) Multi-queue based buffering, (2) Per-channel based address management, (3) Lock-free address mapping, and (4) Fine-grained draining. Experimental results show that QBLK achieves up to 97.4% bandwidth improvement compared with the state-of-the-art PBLK scheme.
Open-Channel SSD通过向主机软件开放物理通道,在未来的高性能存储系统中具有很大的应用潜力。但是,现有方案在大负载下无法达到可接受的性能。主要原因不仅在于它的单缓冲区体系结构,更重要的是,还在于它基于行的物理地址管理。此外,地址映射表的锁在高负载情况下也是一个性能负担。我们提出QBLK,一个开源驱动程序,它试图更好地利用开放通道ssd的并行性。特别是,QBLK采用了四项关键技术,即:(1)基于多队列的缓冲,(2)基于每通道的地址管理,(3)无锁地址映射,(4)细粒度排水。实验结果表明,与最先进的PBLK方案相比,QBLK方案的带宽提高了97.4%。
{"title":"QBLK: Towards Fully Exploiting the Parallelism of Open-Channel SSDs","authors":"Hongwei Qin, D. Feng, Wei Tong, Jingning Liu, Yutong Zhao","doi":"10.23919/DATE.2019.8715049","DOIUrl":"https://doi.org/10.23919/DATE.2019.8715049","url":null,"abstract":"By exposing physical channels to host software, Open-Channel SSD shows great potential in future high performance storage systems. However, the existing scheme fails to achieve acceptable performance under heavy workloads. The main reasons reside not only in its single-buffer architecture, more importantly, but also in its line-based physical address management. Besides, the lock of address mapping table is also a performance burden under heavy workloads. We propose QBLK, an open source driver which tries to better exploit the parallelism of Open-Channel SSDs. Particularly, QBLK adopts four key techniques, namely (1) Multi-queue based buffering, (2) Per-channel based address management, (3) Lock-free address mapping, and (4) Fine-grained draining. Experimental results show that QBLK achieves up to 97.4% bandwidth improvement compared with the state-of-the-art PBLK scheme.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131960763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Methodology for EM Fault Injection: Charge-based Fault Model 电磁故障注入方法:基于电荷的故障模型
Pub Date : 2019-03-25 DOI: 10.23919/DATE.2019.8715150
Haohao Liao, C. Gebotys
Recently electromagnetic fault injection (EMFI) techniques have been found to have significant implications on the security of embedded devices. Unfortunately there is still a lack of understanding of EM fault models and countermeasures for embedded processors. For the first time, this paper proposes an extended fault model based on the concept of critical charge and a new EMFI backside methodology based on over-clocking. Results show that exact timing of EM pulses can provide reliable repeatable instruction replacement faults for specific programs. An attack on AES is demonstrated showing that the EM fault injection requires on average less than 222 EM pulses and 5.3 plaintexts to retrieve the full AES key. This research is critical for ensuring embedded processors and their instruction set architectures are secure and resistant to fault injection attacks.
近年来,人们发现电磁故障注入(EMFI)技术对嵌入式设备的安全具有重要意义。不幸的是,人们对嵌入式处理器的电磁故障模型和对策仍然缺乏了解。本文首次提出了一种基于临界电荷概念的扩展故障模型和一种基于超频的EMFI后端方法。结果表明,精确的电磁脉冲时序可以为特定程序提供可靠的可重复指令替换故障。对AES的攻击表明,EM故障注入平均需要少于222个EM脉冲和5.3个明文来检索完整的AES密钥。这项研究对于确保嵌入式处理器及其指令集架构的安全性和抵抗故障注入攻击至关重要。
{"title":"Methodology for EM Fault Injection: Charge-based Fault Model","authors":"Haohao Liao, C. Gebotys","doi":"10.23919/DATE.2019.8715150","DOIUrl":"https://doi.org/10.23919/DATE.2019.8715150","url":null,"abstract":"Recently electromagnetic fault injection (EMFI) techniques have been found to have significant implications on the security of embedded devices. Unfortunately there is still a lack of understanding of EM fault models and countermeasures for embedded processors. For the first time, this paper proposes an extended fault model based on the concept of critical charge and a new EMFI backside methodology based on over-clocking. Results show that exact timing of EM pulses can provide reliable repeatable instruction replacement faults for specific programs. An attack on AES is demonstrated showing that the EM fault injection requires on average less than 222 EM pulses and 5.3 plaintexts to retrieve the full AES key. This research is critical for ensuring embedded processors and their instruction set architectures are secure and resistant to fault injection attacks.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133957591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A Compiler for Automatic Selection of Suitable Processing-in-Memory Instructions 一种自动选择合适的内存处理指令的编译器
Pub Date : 2019-03-25 DOI: 10.23919/DATE.2019.8714956
Hameeza Ahmed, P. C. Santos, J. P. C. Lima, Rafael Fão de Moura, M. Alves, A. C. S. Beck, L. Carro
Although not a new technique, due to the advent of 3D-stacked technologies, the integration of large memories and logic circuitry able to compute large amount of data has revived the Processing-in-Memory (PIM) techniques. PIM is a technique to increase performance while reducing energy consumption when dealing with large amounts of data. Despite several designs of PIM are available in the literature, their effective implementation still burdens the programmer. Also, various PIM instances are required to take advantage of the internal 3D-stacked memories, which further increases the challenges faced by the programmers. In this way, this work presents the Processing-In-Memory cOmpiler (PRIMO). Our compiler is able to efficiently exploit large vector units on a PIM architecture, directly from the original code. PRIMO is able to automatically select suitable PIM operations, allowing its automatic offloading. Moreover, PRIMO concerns about several PIM instances, selecting the most suitable instance while reduces internal communication between different PIM units. The compilation results of different benchmarks depict how PRIMO is able to exploit large vectors, while achieving a near-optimal performance when compared to the ideal execution for the case study PIM. PRIMO allows a speedup of 38× for specific kernels, while on average achieves 11.8 × for a set of benchmarks from PolyBench Suite.
虽然这不是一项新技术,但由于3d堆叠技术的出现,能够计算大量数据的大型存储器和逻辑电路的集成使内存中处理(PIM)技术重新焕发生机。PIM是一种在处理大量数据时提高性能同时降低能耗的技术。尽管文献中有几种PIM设计,但它们的有效实现仍然给程序员带来负担。此外,需要各种PIM实例来利用内部3d堆叠存储器,这进一步增加了程序员面临的挑战。通过这种方式,这项工作提出了内存中处理编译器(PRIMO)。我们的编译器能够直接从原始代码中有效地利用PIM架构上的大型向量单元。PRIMO能够自动选择合适的PIM操作,允许其自动卸载。此外,PRIMO关注多个PIM实例,选择最合适的实例,同时减少不同PIM单元之间的内部通信。不同基准测试的编译结果描述了PRIMO如何能够利用大向量,同时与案例研究PIM的理想执行相比,获得近乎最佳的性能。PRIMO允许特定内核的38倍加速,而PolyBench套件的一组基准平均达到11.8倍。
{"title":"A Compiler for Automatic Selection of Suitable Processing-in-Memory Instructions","authors":"Hameeza Ahmed, P. C. Santos, J. P. C. Lima, Rafael Fão de Moura, M. Alves, A. C. S. Beck, L. Carro","doi":"10.23919/DATE.2019.8714956","DOIUrl":"https://doi.org/10.23919/DATE.2019.8714956","url":null,"abstract":"Although not a new technique, due to the advent of 3D-stacked technologies, the integration of large memories and logic circuitry able to compute large amount of data has revived the Processing-in-Memory (PIM) techniques. PIM is a technique to increase performance while reducing energy consumption when dealing with large amounts of data. Despite several designs of PIM are available in the literature, their effective implementation still burdens the programmer. Also, various PIM instances are required to take advantage of the internal 3D-stacked memories, which further increases the challenges faced by the programmers. In this way, this work presents the Processing-In-Memory cOmpiler (PRIMO). Our compiler is able to efficiently exploit large vector units on a PIM architecture, directly from the original code. PRIMO is able to automatically select suitable PIM operations, allowing its automatic offloading. Moreover, PRIMO concerns about several PIM instances, selecting the most suitable instance while reduces internal communication between different PIM units. The compilation results of different benchmarks depict how PRIMO is able to exploit large vectors, while achieving a near-optimal performance when compared to the ideal execution for the case study PIM. PRIMO allows a speedup of 38× for specific kernels, while on average achieves 11.8 × for a set of benchmarks from PolyBench Suite.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131781511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Hot Spot Identification and System Parameterized Thermal Modeling for Multi-Core Processors Through Infrared Thermal Imaging 基于红外热成像的多核处理器热点识别与系统参数化热建模
Pub Date : 2019-03-25 DOI: 10.23919/DATE.2019.8714918
Sheriff Sadiqbatcha, Hengyang Zhao, H. Amrouch, J. Henkel, S. Tan
Accurate thermal models suitable for system level dynamic thermal, power and reliability regulation and management are vital for many commercial multi-core processors. However, developing such accurate thermal models and identifying the related thermal-power relevant spatial locations for commercial processors is a challenging task due to the lack of information and available tools. Existing tools such as HotSpot-like thermal models may suffer from inaccuracy or inefficiency for online applications, primarily because most rely on parameters that cannot be precisely quantified, such as power-traces, while others are numerical methods not suitable for runtime use. In this work, we propose a novel approach to automatically detecting the major heat-sources on a commercial multi-core microprocessor using an infrared thermal imaging setup. Our approach involves a number of steps including 2D discrete cosine transformation filter for noise reduction on the measured thermal maps, and Laplacian transformation followed by K-mean clustering for heat-source identification. Since the identified heat-sources are the thermally vulnerable areas of the die, we propose a novel approach to deriving a thermal model capable of predicting their temperatures during runtime. We apply Long-Short-Term-Memory (LSTM) networks to build a dynamic thermal model which uses system-level variables such as chip frequency, voltage and instruction count as inputs. The model is trained and tested exclusively using measured thermal data from a commercial multi-core processor. Experimental results show that the proposed thermal model achieves very high accuracy (root-mean-square-error: 2.04°C to 2.57° C) in predicting the temperature of all the identified heat-sources on the chip.
适合于系统级动态热、功率和可靠性调节和管理的精确热模型对于许多商用多核处理器至关重要。然而,由于缺乏信息和可用工具,为商用处理器开发如此精确的热模型并确定相关的热功率相关空间位置是一项具有挑战性的任务。现有的工具,如hotspot类热模型,对于在线应用程序可能存在不准确或效率低下的问题,主要是因为大多数工具依赖于无法精确量化的参数,例如功率走线,而其他工具则是不适合运行时使用的数值方法。在这项工作中,我们提出了一种利用红外热成像装置自动检测商用多核微处理器上主要热源的新方法。我们的方法涉及许多步骤,包括用于在测量的热图上降低噪声的二维离散余弦变换滤波器,以及用于热源识别的拉普拉斯变换之后的k -均值聚类。由于确定的热源是模具的热脆弱区域,我们提出了一种新的方法来推导能够预测其运行期间温度的热模型。我们使用长短期记忆(LSTM)网络建立一个动态热模型,该模型使用系统级变量,如芯片频率,电压和指令计数作为输入。该模型是专门使用商业多核处理器测量的热数据进行训练和测试的。实验结果表明,所提出的热模型在预测芯片上所有识别热源的温度方面具有很高的精度(均方根误差为2.04℃~ 2.57℃)。
{"title":"Hot Spot Identification and System Parameterized Thermal Modeling for Multi-Core Processors Through Infrared Thermal Imaging","authors":"Sheriff Sadiqbatcha, Hengyang Zhao, H. Amrouch, J. Henkel, S. Tan","doi":"10.23919/DATE.2019.8714918","DOIUrl":"https://doi.org/10.23919/DATE.2019.8714918","url":null,"abstract":"Accurate thermal models suitable for system level dynamic thermal, power and reliability regulation and management are vital for many commercial multi-core processors. However, developing such accurate thermal models and identifying the related thermal-power relevant spatial locations for commercial processors is a challenging task due to the lack of information and available tools. Existing tools such as HotSpot-like thermal models may suffer from inaccuracy or inefficiency for online applications, primarily because most rely on parameters that cannot be precisely quantified, such as power-traces, while others are numerical methods not suitable for runtime use. In this work, we propose a novel approach to automatically detecting the major heat-sources on a commercial multi-core microprocessor using an infrared thermal imaging setup. Our approach involves a number of steps including 2D discrete cosine transformation filter for noise reduction on the measured thermal maps, and Laplacian transformation followed by K-mean clustering for heat-source identification. Since the identified heat-sources are the thermally vulnerable areas of the die, we propose a novel approach to deriving a thermal model capable of predicting their temperatures during runtime. We apply Long-Short-Term-Memory (LSTM) networks to build a dynamic thermal model which uses system-level variables such as chip frequency, voltage and instruction count as inputs. The model is trained and tested exclusively using measured thermal data from a commercial multi-core processor. Experimental results show that the proposed thermal model achieves very high accuracy (root-mean-square-error: 2.04°C to 2.57° C) in predicting the temperature of all the identified heat-sources on the chip.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"208 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133672668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A Satisfiability-Based Approximate Algorithm for Logic Synthesis Using Switching Lattices 基于可满足性的开关格逻辑综合近似算法
Pub Date : 2019-03-25 DOI: 10.23919/DATE.2019.8714809
L. Aksoy, M. Altun
In recent years the realization of a logic function on two-dimensional arrays of four-terminal switches, called switching lattices, has attracted considerable interest. Exact and approximate methods have been proposed for the problem of synthesizing Boolean functions on switching lattices with minimum size, called lattice synthesis (LS) problem. However, the exact method can only handle relatively small instances and the approximate methods may find solutions that are far from the optimum. This paper introduces an approximate algorithm, called JANUS, that formalizes the problem of realizing a logic function on a given lattice, called lattice mapping (LM) problem, as a satisfiability problem and explores the search space of the LS problem in a dichotomic search manner, solving LM problems for possible lattice candidates. This paper also presents three methods to improve the initial upper bound and an efficient way to realize multiple logic functions on a single lattice. Experimental results show that JANUS can find solutions very close to the minimum in a reasonable time and obtain better results than the existing approximate methods. The solutions of JANUS can also be better than those of the exact method, which cannot be determined to be optimal due to the given time limit, where the maximum gain on the number of switches reaches up to 25%.
近年来,在四端开关的二维阵列(称为开关晶格)上实现逻辑功能引起了相当大的兴趣。针对最小尺寸开关格上布尔函数的合成问题,提出了精确和近似的方法,称为格合成问题。然而,精确方法只能处理相对较小的实例,近似方法可能会找到离最优解很远的解。本文介绍了一种称为JANUS的近似算法,该算法将在给定晶格上实现逻辑函数的问题(称为晶格映射问题)形式化为可满足性问题,并以二分类搜索的方式探索了LS问题的搜索空间,解决了可能的候选晶格的LM问题。本文还提出了三种改进初始上界的方法,以及一种在单个格上实现多个逻辑函数的有效方法。实验结果表明,该方法可以在合理的时间内找到非常接近最小值的解,并获得比现有近似方法更好的结果。JANUS的解决方案也可以比精确方法的解决方案更好,由于给定的时间限制,无法确定是否最优,其中开关数量的最大增益可达25%。
{"title":"A Satisfiability-Based Approximate Algorithm for Logic Synthesis Using Switching Lattices","authors":"L. Aksoy, M. Altun","doi":"10.23919/DATE.2019.8714809","DOIUrl":"https://doi.org/10.23919/DATE.2019.8714809","url":null,"abstract":"In recent years the realization of a logic function on two-dimensional arrays of four-terminal switches, called switching lattices, has attracted considerable interest. Exact and approximate methods have been proposed for the problem of synthesizing Boolean functions on switching lattices with minimum size, called lattice synthesis (LS) problem. However, the exact method can only handle relatively small instances and the approximate methods may find solutions that are far from the optimum. This paper introduces an approximate algorithm, called JANUS, that formalizes the problem of realizing a logic function on a given lattice, called lattice mapping (LM) problem, as a satisfiability problem and explores the search space of the LS problem in a dichotomic search manner, solving LM problems for possible lattice candidates. This paper also presents three methods to improve the initial upper bound and an efficient way to realize multiple logic functions on a single lattice. Experimental results show that JANUS can find solutions very close to the minimum in a reasonable time and obtain better results than the existing approximate methods. The solutions of JANUS can also be better than those of the exact method, which cannot be determined to be optimal due to the given time limit, where the maximum gain on the number of switches reaches up to 25%.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115920488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
SAID: A Supergate-Aided Logic Synthesis Flow for Memristive Crossbars 忆阻性横梁的超门辅助逻辑合成流程
Pub Date : 2019-03-25 DOI: 10.23919/DATE.2019.8714939
V. Tenace, R. G. Rizzo, Debjyoti Bhattacharjee, A. Chattopadhyay, A. Calimera
A Memristor is a two-terminal device that can serve as a non-volatile memory element with built-in logic capabilities. Arranged in a crossbar structure, memristive arrays allow to represent complex Boolean logic functions that adhere to the logic-in-memory paradigm, where data and logic gates are glued together on the same piece of hardware. Needless to say, novel and ad-hoc CAD solutions are required to achieve practical and feasible hardware implementations. Existing techniques aim at optimal mapping strategies that account for Boolean logic functions described by means of 2-input NOR and NOT gates, thus overlooking the optimization capabilities that a smart and dedicated technology-aware logic synthesis can provide. In this paper, we introduce a novel library-free supergate-aided (SAID) logic synthesis approach with a dedicated mapping strategy tailored on MAGIC crossbars. Supergates are obtained with a Look-Up Table (LUT)-based synthesis that splits a complex logic network into smaller Boolean functions. Those functions are then mapped on the crossbar array as to minimize latency. The proposed SAID flow allows to (i) maximize supergate-level parallelism, thus reducing the total number of computing cycles, and (ii) relax mapping constraints, allowing an easy and fast mapping of Boolean functions on memristive crossbars. Experimental results obtained on several benchmarks from ISCAS’85 and IWLS’93 suites demonstrate that our solution is capable to outperform other state-of-the-art techniques in terms of speedup (3.89× in the best case), at the expense of a very low area overhead.
忆阻器是一种双端器件,可作为具有内置逻辑功能的非易失性存储器元件。记忆阵列以横杆结构排列,允许表示复杂的布尔逻辑函数,这些函数遵循内存中的逻辑范式,其中数据和逻辑门在同一块硬件上粘合在一起。不用说,为了实现实际可行的硬件实现,需要新颖的和特别的CAD解决方案。现有技术的目标是考虑通过2输入NOR和NOT门描述的布尔逻辑函数的最优映射策略,从而忽略了智能和专用技术感知逻辑综合可以提供的优化能力。在本文中,我们介绍了一种新的无库超级门辅助(SAID)逻辑综合方法,该方法具有针对MAGIC交叉棒定制的专用映射策略。超级门是通过基于查找表(LUT)的综合来获得的,该综合将一个复杂的逻辑网络分割成更小的布尔函数。然后将这些函数映射到crossbar数组上,以最小化延迟。所提出的SAID流允许(i)最大化超门级并行性,从而减少计算周期的总数,以及(ii)放松映射约束,允许在记忆交叉棒上轻松快速地映射布尔函数。在ISCAS ' 85和IWLS ' 93套件的几个基准测试中获得的实验结果表明,我们的解决方案能够在加速(最佳情况下为3.89倍)方面优于其他最先进的技术,而代价是非常低的面积开销。
{"title":"SAID: A Supergate-Aided Logic Synthesis Flow for Memristive Crossbars","authors":"V. Tenace, R. G. Rizzo, Debjyoti Bhattacharjee, A. Chattopadhyay, A. Calimera","doi":"10.23919/DATE.2019.8714939","DOIUrl":"https://doi.org/10.23919/DATE.2019.8714939","url":null,"abstract":"A Memristor is a two-terminal device that can serve as a non-volatile memory element with built-in logic capabilities. Arranged in a crossbar structure, memristive arrays allow to represent complex Boolean logic functions that adhere to the logic-in-memory paradigm, where data and logic gates are glued together on the same piece of hardware. Needless to say, novel and ad-hoc CAD solutions are required to achieve practical and feasible hardware implementations. Existing techniques aim at optimal mapping strategies that account for Boolean logic functions described by means of 2-input NOR and NOT gates, thus overlooking the optimization capabilities that a smart and dedicated technology-aware logic synthesis can provide. In this paper, we introduce a novel library-free supergate-aided (SAID) logic synthesis approach with a dedicated mapping strategy tailored on MAGIC crossbars. Supergates are obtained with a Look-Up Table (LUT)-based synthesis that splits a complex logic network into smaller Boolean functions. Those functions are then mapped on the crossbar array as to minimize latency. The proposed SAID flow allows to (i) maximize supergate-level parallelism, thus reducing the total number of computing cycles, and (ii) relax mapping constraints, allowing an easy and fast mapping of Boolean functions on memristive crossbars. Experimental results obtained on several benchmarks from ISCAS’85 and IWLS’93 suites demonstrate that our solution is capable to outperform other state-of-the-art techniques in terms of speedup (3.89× in the best case), at the expense of a very low area overhead.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114373616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A Camera with Brain – Embedding Machine Learning in 3D Sensors 三维传感器中具有脑嵌入机器学习功能的相机
Pub Date : 2019-03-25 DOI: 10.23919/DATE.2019.8715258
B. Mudassar, Priyabrata Saha, Yun Long, M. Amir, Evan Gebhardt, Taesik Na, J. Ko, M. Wolf, S. Mukhopadhyay
The cameras today are designed to capture signals with highest possible accuracy to most faithfully represent what it sees. However, many mission-critical autonomous applications ranging from traffic monitoring to disaster recovery to defense requires quality of information, where useful information depends on the tasks and is defined using complex features, rather than only changes in captured signal. Such applications require cameras that capture useful information from a scene with highest quality while meeting system constraints such as power, performance, and bandwidth. This paper will discuss the feasibility of a camera that learns how to capture task-dependent information with highest quality, paving the pathway to design a camera with brain. 3D integration of digital pixel sensors with massively parallel computing platform for machine learning creates a hardware architecture for such a camera. The paper will discuss embedded machine learning algorithms that can run on such platform to enhance quality of useful information by real-time control of the sensor parameters. We conclude by identifying critical challenges as well as opportunities for hardware and algorithmic innovations to enable machine learning in the feedback loop of a 3D image sensor based camera.
如今的摄像机被设计成以尽可能高的精度捕捉信号,以最忠实地呈现它所看到的东西。然而,从交通监控到灾难恢复再到防御,许多关键任务的自主应用都需要高质量的信息,其中有用的信息取决于任务,并使用复杂的特征来定义,而不仅仅是捕获信号的变化。这样的应用需要相机从场景中以最高质量捕获有用的信息,同时满足系统的限制,如功率,性能和带宽。本文将讨论学习如何以最高质量捕获任务相关信息的相机的可行性,为设计具有大脑的相机铺平道路。将数字像素传感器与用于机器学习的大规模并行计算平台的3D集成为这样的相机创建了硬件架构。本文将讨论可在该平台上运行的嵌入式机器学习算法,通过实时控制传感器参数来提高有用信息的质量。最后,我们确定了硬件和算法创新的关键挑战和机遇,以便在基于3D图像传感器的相机的反馈回路中实现机器学习。
{"title":"A Camera with Brain – Embedding Machine Learning in 3D Sensors","authors":"B. Mudassar, Priyabrata Saha, Yun Long, M. Amir, Evan Gebhardt, Taesik Na, J. Ko, M. Wolf, S. Mukhopadhyay","doi":"10.23919/DATE.2019.8715258","DOIUrl":"https://doi.org/10.23919/DATE.2019.8715258","url":null,"abstract":"The cameras today are designed to capture signals with highest possible accuracy to most faithfully represent what it sees. However, many mission-critical autonomous applications ranging from traffic monitoring to disaster recovery to defense requires quality of information, where useful information depends on the tasks and is defined using complex features, rather than only changes in captured signal. Such applications require cameras that capture useful information from a scene with highest quality while meeting system constraints such as power, performance, and bandwidth. This paper will discuss the feasibility of a camera that learns how to capture task-dependent information with highest quality, paving the pathway to design a camera with brain. 3D integration of digital pixel sensors with massively parallel computing platform for machine learning creates a hardware architecture for such a camera. The paper will discuss embedded machine learning algorithms that can run on such platform to enhance quality of useful information by real-time control of the sensor parameters. We conclude by identifying critical challenges as well as opportunities for hardware and algorithmic innovations to enable machine learning in the feedback loop of a 3D image sensor based camera.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114539488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
DMRM: Distributed Market-Based Resource Management of Edge Computing Systems DMRM:基于市场的分布式边缘计算系统资源管理
Pub Date : 2019-03-25 DOI: 10.23919/DATE.2019.8714930
Manolis Katsaragakis, Dimosthenis Masouros, Vasileios Tsoutsouras, Farzad Samie, L. Bauer, J. Henkel, D. Soudris
Resource management is a key technique for efficiently operating devices in Internet of Things (IoT). In this paper, we propose DMRM, a new algorithm based on economic and pricing models for dynamic resource management of IoT networks under CPU, memory, bandwidth and latency constraints. We use a supply and demand model, smart data pricing and perceived valued pricing, implementing a marketplace where IoT devices and Gateways buy and sell computing and communication resources necessary for task execution. Our new market-based algorithm is compared to relevant approaches showing that it not only reaches near-optimal results, but also, its scalable, distributed nature leads to three orders of magnitude lower execution requirements compared to centralized approaches.
资源管理是物联网设备高效运行的关键技术。在本文中,我们提出了一种基于经济和定价模型的DMRM算法,用于在CPU,内存,带宽和延迟约束下的物联网网络动态资源管理。我们使用供需模型,智能数据定价和感知价值定价,实现物联网设备和网关买卖任务执行所需的计算和通信资源的市场。我们的基于市场的新算法与相关方法进行了比较,结果表明,它不仅达到了接近最优的结果,而且与集中式方法相比,其可扩展性和分布式特性使执行需求降低了三个数量级。
{"title":"DMRM: Distributed Market-Based Resource Management of Edge Computing Systems","authors":"Manolis Katsaragakis, Dimosthenis Masouros, Vasileios Tsoutsouras, Farzad Samie, L. Bauer, J. Henkel, D. Soudris","doi":"10.23919/DATE.2019.8714930","DOIUrl":"https://doi.org/10.23919/DATE.2019.8714930","url":null,"abstract":"Resource management is a key technique for efficiently operating devices in Internet of Things (IoT). In this paper, we propose DMRM, a new algorithm based on economic and pricing models for dynamic resource management of IoT networks under CPU, memory, bandwidth and latency constraints. We use a supply and demand model, smart data pricing and perceived valued pricing, implementing a marketplace where IoT devices and Gateways buy and sell computing and communication resources necessary for task execution. Our new market-based algorithm is compared to relevant approaches showing that it not only reaches near-optimal results, but also, its scalable, distributed nature leads to three orders of magnitude lower execution requirements compared to centralized approaches.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121953678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Bayesian Optimized Importance Sampling for High Sigma Failure Rate Estimation 高西格玛故障率估计的贝叶斯优化重要抽样
Pub Date : 2019-03-25 DOI: 10.23919/DATE.2019.8714879
Dennis D. Weller, Michael Hefenbrock, M. Golanbari, M. Beigl, M. Tahoori
Due to aggressive technology downscaling, process and runtime variations have a strong impact on the correct functionality in the field as well as manufacturing yield. The assessment of the yield and failure rate is extremely crucial for design optimization. The common practice is to use Monte Carlo simulations in order to account for device variations and estimate failure rate. However, Monte Carlo methods are infeasible for estimating rare events such as high sigma failure rates, and hence, various importance sampling methods have been proposed. In this paper, we present an efficient importance sampling approach based on Bayesian optimization. Its advantages include constant complexity independent of the dimensions of design space, the potential to find the global extrema, and higher trustworthiness of the estimated failure rate. We evaluated the approach on a 6T SRAM cell based on a 28nm FDSOI process. The results show significant speedup and more than two orders of magnitude better accuracy in failure rate estimation, compared to the best state-of-the-art technique.
由于积极的技术缩减,工艺和运行时的变化对现场的正确功能以及制造良率产生了强烈的影响。良率和故障率的评估对优化设计至关重要。常见的做法是使用蒙特卡罗模拟,以解释设备的变化和估计故障率。然而,蒙特卡罗方法是不可用于估计罕见事件,如高西格玛失败率,因此,各种重要性抽样方法被提出。本文提出了一种基于贝叶斯优化的重要抽样方法。它的优点包括独立于设计空间维度的恒定复杂性,找到全局极值的潜力以及估计故障率的更高可信度。我们在基于28nm FDSOI工艺的6T SRAM电池上对该方法进行了评估。结果表明,与最先进的技术相比,该方法在故障率估计方面具有显著的加速和两个数量级以上的精度提高。
{"title":"Bayesian Optimized Importance Sampling for High Sigma Failure Rate Estimation","authors":"Dennis D. Weller, Michael Hefenbrock, M. Golanbari, M. Beigl, M. Tahoori","doi":"10.23919/DATE.2019.8714879","DOIUrl":"https://doi.org/10.23919/DATE.2019.8714879","url":null,"abstract":"Due to aggressive technology downscaling, process and runtime variations have a strong impact on the correct functionality in the field as well as manufacturing yield. The assessment of the yield and failure rate is extremely crucial for design optimization. The common practice is to use Monte Carlo simulations in order to account for device variations and estimate failure rate. However, Monte Carlo methods are infeasible for estimating rare events such as high sigma failure rates, and hence, various importance sampling methods have been proposed. In this paper, we present an efficient importance sampling approach based on Bayesian optimization. Its advantages include constant complexity independent of the dimensions of design space, the potential to find the global extrema, and higher trustworthiness of the estimated failure rate. We evaluated the approach on a 6T SRAM cell based on a 28nm FDSOI process. The results show significant speedup and more than two orders of magnitude better accuracy in failure rate estimation, compared to the best state-of-the-art technique.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128175636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1