首页 > 最新文献

2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)最新文献

英文 中文
On-the-fly and DAG-aware: Rewriting Boolean Networks with Exact Synthesis 动态和dag感知:用精确合成重写布尔网络
Pub Date : 2019-07-23 DOI: 10.23919/DATE.2019.8715185
Heinz Riener, Winston Haaswijk, A. Mishchenko, G. Micheli, Mathias Soeken
The paper presents a generalization of DAG-aware AIG rewriting for k-feasible Boolean networks, whose nodes are k-input lookup tables (k-LUTs). We introduce a high-effort DAG-aware rewriting algorithm, called cut rewriting, which uses exact synthesis to compute replacements on the fly, with support for Boolean don’t cares. Cut rewriting pre-computes a large number of possible replacement candidates, but instead of eagerly rewriting the Boolean network, stores the replacements in a conflict graph. Heuristic optimization is used to derive a best, maximal subset of replacements that can be simultaneously applied to the Boolean network from the conflict graph. We optimize LUT mapped Boolean networks obtained from the ISCAS and EPFL combinational benchmark suites. For 3-LUT networks, experiments show that we achieve an average size improvement of 5.58% and up to 40.19% after state-of-the-art Boolean rewriting techniques were applied until saturation. Similarly, for 4-LUT networks, we obtain an average improvement of 4.04% and up to 12.60%.
针对节点为k输入查找表的k可行布尔网络,提出了一种基于dag感知的AIG改写方法。我们引入了一种高效的dag感知重写算法,称为切割重写,它使用精确的合成来动态地计算替换,并支持布尔不在乎。Cut重写预先计算了大量可能的替代候选,但不是急切地重写布尔网络,而是将替换存储在冲突图中。启发式优化用于从冲突图中导出可同时应用于布尔网络的最佳、最大替换子集。我们优化了从ISCAS和EPFL组合基准套件中获得的LUT映射布尔网络。对于3-LUT网络,实验表明,在应用最先进的布尔重写技术直到饱和之后,我们实现了5.58%的平均大小改进,最高可达40.19%。同样,对于4-LUT网络,我们获得了4.04%到12.60%的平均改进。
{"title":"On-the-fly and DAG-aware: Rewriting Boolean Networks with Exact Synthesis","authors":"Heinz Riener, Winston Haaswijk, A. Mishchenko, G. Micheli, Mathias Soeken","doi":"10.23919/DATE.2019.8715185","DOIUrl":"https://doi.org/10.23919/DATE.2019.8715185","url":null,"abstract":"The paper presents a generalization of DAG-aware AIG rewriting for k-feasible Boolean networks, whose nodes are k-input lookup tables (k-LUTs). We introduce a high-effort DAG-aware rewriting algorithm, called cut rewriting, which uses exact synthesis to compute replacements on the fly, with support for Boolean don’t cares. Cut rewriting pre-computes a large number of possible replacement candidates, but instead of eagerly rewriting the Boolean network, stores the replacements in a conflict graph. Heuristic optimization is used to derive a best, maximal subset of replacements that can be simultaneously applied to the Boolean network from the conflict graph. We optimize LUT mapped Boolean networks obtained from the ISCAS and EPFL combinational benchmark suites. For 3-LUT networks, experiments show that we achieve an average size improvement of 5.58% and up to 40.19% after state-of-the-art Boolean rewriting techniques were applied until saturation. Similarly, for 4-LUT networks, we obtain an average improvement of 4.04% and up to 12.60%.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131315248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Circuit Design and Design Automation for Printed Electronics 印刷电子电路设计与设计自动化
Pub Date : 2019-05-14 DOI: 10.23919/DATE.2019.8715095
M. Fattori, Joost A. Fijn, L. Hu, E. Cantatore, F. Torricelli, M. Charbonneau
A Process Design Kit (PDK) for gravure-printed Organic Thin-Film Transistor (OTFT) technology is presented in this paper. The transistor model developed in the PDK enables an accurate prediction of static, dynamic and noise performance of complex organic circuits. The developed Electronic Design Automation (EDA) tools exploit an adaptive strategy to improve the versatility of the PDK in relation to the advancements of the manufacturing process. The design and experimental characterization of a Charge Sensitive Amplifier is used to demonstrate the effectiveness of the PDK. The availability of a versatile and accurate Process Design Kit is expected to enable a reliable design process for complex circuits based on an organic printed technology.
本文介绍了一种用于凹印有机薄膜晶体管(OTFT)技术的工艺设计套件(PDK)。PDK中开发的晶体管模型能够准确预测复杂有机电路的静态、动态和噪声性能。开发的电子设计自动化(EDA)工具利用自适应策略来提高PDK的多功能性,与制造过程的进步有关。用电荷敏感放大器的设计和实验表征来证明PDK的有效性。多功能和精确的工艺设计套件的可用性有望为基于有机印刷技术的复杂电路提供可靠的设计过程。
{"title":"Circuit Design and Design Automation for Printed Electronics","authors":"M. Fattori, Joost A. Fijn, L. Hu, E. Cantatore, F. Torricelli, M. Charbonneau","doi":"10.23919/DATE.2019.8715095","DOIUrl":"https://doi.org/10.23919/DATE.2019.8715095","url":null,"abstract":"A Process Design Kit (PDK) for gravure-printed Organic Thin-Film Transistor (OTFT) technology is presented in this paper. The transistor model developed in the PDK enables an accurate prediction of static, dynamic and noise performance of complex organic circuits. The developed Electronic Design Automation (EDA) tools exploit an adaptive strategy to improve the versatility of the PDK in relation to the advancements of the manufacturing process. The design and experimental characterization of a Charge Sensitive Amplifier is used to demonstrate the effectiveness of the PDK. The availability of a versatile and accurate Process Design Kit is expected to enable a reliable design process for complex circuits based on an organic printed technology.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130003772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Implementation-aware design of image-based control with on-line measurable variable-delay 基于图像的在线可变延迟可测量控制的实现感知设计
Pub Date : 2019-05-14 DOI: 10.23919/DATE.2019.8714840
Róbinson Medina Sánchez, S. Stuijk, Dip Goswami, T. Basten
Image-based control uses image-processing algorithms to acquire sensing information. The sensing delay associated with the image-processing algorithm is typically platform-dependent and time-varying. Modern embedded platforms allow to characterize the sensing delay at design-time obtaining a delay histogram, and at run-time measuring its precise value. We exploit this knowledge to design variable-delay controllers. This design also takes into account the resource configuration of the image processing algorithm: sequential (with one processing resource) or pipelined (with multiprocessing capabilities). Since the control performance strongly depends on the model quality, we present a simulation benchmark that uses the model uncertainty and the delay histogram to obtain bounds on control performance. Our benchmark is used to select a variable-delay controller and a resource configuration that outperform a constant worst-case delay controller.
基于图像的控制使用图像处理算法来获取传感信息。与图像处理算法相关的传感延迟是典型的平台相关和时变的。现代嵌入式平台允许在设计时获得延迟直方图来表征传感延迟,并在运行时测量其精确值。我们利用这些知识来设计变延迟控制器。该设计还考虑了图像处理算法的资源配置:顺序(具有一个处理资源)或流水线(具有多处理能力)。由于控制性能在很大程度上依赖于模型质量,我们提出了一个利用模型不确定性和延迟直方图来获得控制性能界限的仿真基准。我们的基准测试用于选择一个可变延迟控制器和一个优于恒定最坏情况延迟控制器的资源配置。
{"title":"Implementation-aware design of image-based control with on-line measurable variable-delay","authors":"Róbinson Medina Sánchez, S. Stuijk, Dip Goswami, T. Basten","doi":"10.23919/DATE.2019.8714840","DOIUrl":"https://doi.org/10.23919/DATE.2019.8714840","url":null,"abstract":"Image-based control uses image-processing algorithms to acquire sensing information. The sensing delay associated with the image-processing algorithm is typically platform-dependent and time-varying. Modern embedded platforms allow to characterize the sensing delay at design-time obtaining a delay histogram, and at run-time measuring its precise value. We exploit this knowledge to design variable-delay controllers. This design also takes into account the resource configuration of the image processing algorithm: sequential (with one processing resource) or pipelined (with multiprocessing capabilities). Since the control performance strongly depends on the model quality, we present a simulation benchmark that uses the model uncertainty and the delay histogram to obtain bounds on control performance. Our benchmark is used to select a variable-delay controller and a resource configuration that outperform a constant worst-case delay controller.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126055064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Exploiting System Dynamics for Resource-Efficient Automotive CPS Design 利用系统动力学进行资源高效汽车CPS设计
Pub Date : 2019-05-14 DOI: 10.23919/DATE.2019.8715176
L. Maldonado, Wanli Chang, Debayan Roy, A. Annaswamy, Dip Goswami, S. Chakraborty
Automotive embedded systems are safety-critical, while being highly cost-sensitive at the same time. The former requires resource dimensioning that accounts for the worst case, even if such a case occurs infrequently, while this is in conflict with the latter requirement. In order to manage both of these aspects at the same time, one research direction being explored is to dynamically assign a mixture of resources based on needs and priorities of different tasks. Along this direction, in this paper we show that by properly modeling the physical dynamics of the systems that an automotive control software interacts with, it is possible to better save resources while still guaranteeing safety properties. Towards this, we focus on a distributed controller implementation that uses an automotive FlexRay bus. Our approach combines techniques from timing/schedulability analysis and control theory and shows the significance of synergistically combining the cyber component and physical processes in the cyber-physical systems (CPS) design paradigm.
汽车嵌入式系统对安全性至关重要,同时对成本也非常敏感。前者需要考虑到最坏情况的资源维度,即使这种情况很少发生,而这与后一种要求相冲突。为了同时管理这两个方面,正在探索的一个研究方向是根据不同任务的需求和优先级动态分配混合资源。沿着这个方向,在本文中,我们表明,通过适当地建模汽车控制软件与之交互的系统的物理动力学,可以更好地节省资源,同时仍然保证安全性能。为此,我们将重点放在使用汽车FlexRay总线的分布式控制器实现上。我们的方法结合了时间/可调度性分析和控制理论的技术,并显示了在网络物理系统(CPS)设计范式中协同结合网络组件和物理过程的重要性。
{"title":"Exploiting System Dynamics for Resource-Efficient Automotive CPS Design","authors":"L. Maldonado, Wanli Chang, Debayan Roy, A. Annaswamy, Dip Goswami, S. Chakraborty","doi":"10.23919/DATE.2019.8715176","DOIUrl":"https://doi.org/10.23919/DATE.2019.8715176","url":null,"abstract":"Automotive embedded systems are safety-critical, while being highly cost-sensitive at the same time. The former requires resource dimensioning that accounts for the worst case, even if such a case occurs infrequently, while this is in conflict with the latter requirement. In order to manage both of these aspects at the same time, one research direction being explored is to dynamically assign a mixture of resources based on needs and priorities of different tasks. Along this direction, in this paper we show that by properly modeling the physical dynamics of the systems that an automotive control software interacts with, it is possible to better save resources while still guaranteeing safety properties. Towards this, we focus on a distributed controller implementation that uses an automotive FlexRay bus. Our approach combines techniques from timing/schedulability analysis and control theory and shows the significance of synergistically combining the cyber component and physical processes in the cyber-physical systems (CPS) design paradigm.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130807363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Chip Health Tracking Using Dynamic In-Situ Delay Monitoring 使用动态原位延迟监测的芯片健康跟踪
Pub Date : 2019-05-14 DOI: 10.23919/DATE.2019.8715014
H. A. Balef, K. Goossens, J. P. D. Gyvez
Tracking the gradual effect of silicon aging requires fine-grain slack monitoring. Conventional slack monitoring techniques intend to measure worst-case static slack, i.e. the slack of longest timing path. In sharp contrast to the conventional techniques, we propose a novel technique that is based on dynamic excitation of in-situ delay monitors, i.e. dynamic excitation of the timing paths that are monitored. As the delays degrade, the path delays increase and the monitors are excited more frequently. With the proposed technique, a fine-grained signature of the delay degradation is extracted from the excitation rate of monitors.
跟踪硅老化的逐渐影响需要细粒度松弛监测。传统的松弛监测技术旨在测量最坏情况下的静态松弛,即最长定时路径的松弛。与传统技术形成鲜明对比的是,我们提出了一种基于原位延迟监测器动态激励的新技术,即对被监测的时序路径进行动态激励。随着延迟的降低,路径延迟增加,监视器被更频繁地激发。利用该技术,从监测器的激励率中提取了延迟退化的细粒度特征。
{"title":"Chip Health Tracking Using Dynamic In-Situ Delay Monitoring","authors":"H. A. Balef, K. Goossens, J. P. D. Gyvez","doi":"10.23919/DATE.2019.8715014","DOIUrl":"https://doi.org/10.23919/DATE.2019.8715014","url":null,"abstract":"Tracking the gradual effect of silicon aging requires fine-grain slack monitoring. Conventional slack monitoring techniques intend to measure worst-case static slack, i.e. the slack of longest timing path. In sharp contrast to the conventional techniques, we propose a novel technique that is based on dynamic excitation of in-situ delay monitors, i.e. dynamic excitation of the timing paths that are monitored. As the delays degrade, the path delays increase and the monitors are excited more frequently. With the proposed technique, a fine-grained signature of the delay degradation is extracted from the excitation rate of monitors.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124141360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Hardware Trojans in Emerging Non-Volatile Memories 新兴非易失性存储器中的硬件木马
Pub Date : 2019-05-14 DOI: 10.23919/DATE.2019.8714843
Mohammad Nasim Imtiaz Khan, Karthikeyan Nagarajan, Swaroop Ghosh
Emerging Non-Volatile Memories (NVMs) possess unique characteristics that make them a top target for deploying Hardware Trojan. In this paper, we investigate such knobs that can be targeted by the Trojans to cause read/write failure. For example, NVM read operation depends on clamp voltage which the adversary can manipulate. Adversary can also use ground bounce generated in NVM write operation to hamper another parallel read/write operation. We have designed a Trojan that can be activated and deactivated by writing a specific data pattern to a particular address. Once activated, the Trojan can couple two predetermined addresses and data written to one address (victim’s address space) will get copied to another address (adversary’s address space). This will leak sensitive information e.g., encryption keys. Adversary can also create read/write failure to predetermined locations (fault injection). Simulation results indicate that the Trojan can be activated by writing a specific data pattern to a specific address for 1956 times. Once activated, the attack duration can be as low as 52.4μs and as high as 1.1ms (with reset-enable trigger). We also show that the proposed Trojan can scale down the clamp voltage by 400mV from optimum value which is sufficient to inject specific data-polarity read error. We also propose techniques to inject noise in the ground/power rail to cause read/write failure.
新兴非易失性存储器(nvm)具有独特的特性,使其成为部署硬件木马的首选目标。在本文中,我们研究了这些旋钮,这些旋钮可以被木马攻击,导致读/写失败。例如,NVM读取操作依赖于对手可以操纵的箝位电压。攻击者还可以使用NVM写操作中产生的地面反弹来阻碍另一个并行读写操作。我们设计了一个特洛伊木马,它可以通过向特定地址写入特定数据模式来激活和停用。一旦激活,木马可以耦合两个预定的地址和数据写入一个地址(受害者的地址空间)将被复制到另一个地址(对手的地址空间)。这将泄露敏感信息,例如加密密钥。攻击者还可以在预定位置造成读/写故障(故障注入)。仿真结果表明,该木马可以通过将特定数据模式写入特定地址1956次来激活。一旦激活,攻击持续时间可低至52.4μs,高至1.1ms(具有复位触发)。我们还表明,所提出的木马可以将钳位电压从最佳值降低400mV,这足以注入特定的数据极性读取误差。我们还提出了在地/电源轨中注入噪声以导致读/写失败的技术。
{"title":"Hardware Trojans in Emerging Non-Volatile Memories","authors":"Mohammad Nasim Imtiaz Khan, Karthikeyan Nagarajan, Swaroop Ghosh","doi":"10.23919/DATE.2019.8714843","DOIUrl":"https://doi.org/10.23919/DATE.2019.8714843","url":null,"abstract":"Emerging Non-Volatile Memories (NVMs) possess unique characteristics that make them a top target for deploying Hardware Trojan. In this paper, we investigate such knobs that can be targeted by the Trojans to cause read/write failure. For example, NVM read operation depends on clamp voltage which the adversary can manipulate. Adversary can also use ground bounce generated in NVM write operation to hamper another parallel read/write operation. We have designed a Trojan that can be activated and deactivated by writing a specific data pattern to a particular address. Once activated, the Trojan can couple two predetermined addresses and data written to one address (victim’s address space) will get copied to another address (adversary’s address space). This will leak sensitive information e.g., encryption keys. Adversary can also create read/write failure to predetermined locations (fault injection). Simulation results indicate that the Trojan can be activated by writing a specific data pattern to a specific address for 1956 times. Once activated, the attack duration can be as low as 52.4μs and as high as 1.1ms (with reset-enable trigger). We also show that the proposed Trojan can scale down the clamp voltage by 400mV from optimum value which is sufficient to inject specific data-polarity read error. We also propose techniques to inject noise in the ground/power rail to cause read/write failure.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121389133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
An Efficient FPGA-based Floating Random Walk Solver for Capacitance Extraction using SDAccel 基于SDAccel的高效fpga浮动随机漫步电容提取求解器
Pub Date : 2019-05-14 DOI: 10.23919/DATE.2019.8714992
Xin Wei, Changhao Yan, Hai Zhou, Dian Zhou, Xuan Zeng
The floating random walk (FRW) algorithm is an important method widely used in the capacitance extraction of very large-scale integration (VLSI) interconnects. FRW could be both time-consuming and power-consuming as the circuit scale grows. However, its highly parallel nature prompts us to accelerate it with FPGAs, which have shown great performance and energy efficiency potential to other computing architectures. In this paper, we propose a scalable FPGA/CPU heterogeneous framework of FRW using SDAccel. Large-scale circuits are partitioned first by the CPU into several segments, and these segments are then sent to the FPGA random walking one by one. The framework solves the challenge of limited FPGA on-chip resource and integrates both merits of FPGAs and CPUs by targeting separate parts of the algorithm to suitable architecture, and the FPGA bitstream is built once for all. Several kernel optimization strategies are used to maximize performance of FPGAs. Besides, the FRW algorithm we use is the naive version with walking on spheres (WOS), which is much simpler and easier to implement than the complicatedly optimized version with walking on cubes (WOC). The implementation on AWS EC2 F1 (Xilinx VU9P FPGA) shows up to 6.1x performance and 42.6x energy efficiency over a quad-core CPU, and 5.2x energy efficiency over the state-of-the-art WOC implementation on an 8-core CPU.
浮动随机漫步(FRW)算法是一种广泛应用于超大规模集成电路(VLSI)互连电容提取的重要方法。随着电路规模的扩大,FRW可能既耗时又耗电。然而,它的高度并行性促使我们用fpga来加速它,fpga在其他计算架构中表现出了巨大的性能和能效潜力。本文提出了一种基于SDAccel的可扩展FPGA/CPU异构FRW框架。大规模电路首先由CPU划分成若干段,然后将这些段逐个送到FPGA随机行走。该框架解决了FPGA片上资源有限的挑战,通过将算法的各个部分定位到合适的架构中,集成了FPGA和cpu的优点,并且一次性构建了FPGA的比特流。为了使fpga的性能最大化,采用了几种核优化策略。此外,我们使用的FRW算法是基于球面行走(WOS)的朴素版本,它比基于立方体行走(WOC)的复杂优化版本更简单,更容易实现。在AWS EC2 F1 (Xilinx VU9P FPGA)上实现的性能比四核CPU高6.1倍,能效为42.6倍,比最先进的8核CPU WOC实现的能效高5.2倍。
{"title":"An Efficient FPGA-based Floating Random Walk Solver for Capacitance Extraction using SDAccel","authors":"Xin Wei, Changhao Yan, Hai Zhou, Dian Zhou, Xuan Zeng","doi":"10.23919/DATE.2019.8714992","DOIUrl":"https://doi.org/10.23919/DATE.2019.8714992","url":null,"abstract":"The floating random walk (FRW) algorithm is an important method widely used in the capacitance extraction of very large-scale integration (VLSI) interconnects. FRW could be both time-consuming and power-consuming as the circuit scale grows. However, its highly parallel nature prompts us to accelerate it with FPGAs, which have shown great performance and energy efficiency potential to other computing architectures. In this paper, we propose a scalable FPGA/CPU heterogeneous framework of FRW using SDAccel. Large-scale circuits are partitioned first by the CPU into several segments, and these segments are then sent to the FPGA random walking one by one. The framework solves the challenge of limited FPGA on-chip resource and integrates both merits of FPGAs and CPUs by targeting separate parts of the algorithm to suitable architecture, and the FPGA bitstream is built once for all. Several kernel optimization strategies are used to maximize performance of FPGAs. Besides, the FRW algorithm we use is the naive version with walking on spheres (WOS), which is much simpler and easier to implement than the complicatedly optimized version with walking on cubes (WOC). The implementation on AWS EC2 F1 (Xilinx VU9P FPGA) shows up to 6.1x performance and 42.6x energy efficiency over a quad-core CPU, and 5.2x energy efficiency over the state-of-the-art WOC implementation on an 8-core CPU.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133758444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
An Energy Efficient Non-Volatile Flip-Flop based on CoMET Technology 基于CoMET技术的高能效非易失性触发器
Pub Date : 2019-05-14 DOI: 10.23919/DATE.2019.8714916
Robert Perricone, Zhaoxin Liang, Meghna G. Mankalale, M. Niemier, S. Sapatnekar, Jianping Wang, X. Hu
As we approach the limits of CMOS scaling, researchers are developing "beyond-CMOS" technologies to sustain the technological benefits associated with device scaling. Spin-tronic technologies have emerged as a promising beyond-CMOS technology due to their inherent benefits over CMOS such as high integration density, low leakage power, radiation hardness, and non-volatility. These benefits make spintronic devices an attractive successor to CMOS—especially for memory circuits. However, spintronic devices generally suffer from slower switching speeds and higher write energy, which limits their usability. In an effort to close the energy-delay gap between CMOS and spintronics, device concepts such as CoMET (Composite-Input Magnetoelectric-base Logic Technology) have been introduced, which collectively leverage material phenomena such as the spin-Hall effect and the magnetoelectric effect to enable fast, energy efficient device operation. In this work, we propose a non-volatile flip-flop (NVFF) based on CoMET technology that is capable of achieving up to two orders of magnitude less write energy than CMOS. This low write energy (≈2 aJ) makes our CoMET NVFF especially attractive to architectures that require frequent backup operations—e.g., for energy harvesting non-volatile processors.
随着我们接近CMOS缩放的极限,研究人员正在开发“超越CMOS”技术,以保持与器件缩放相关的技术优势。自旋电子技术由于其固有的优势,如高集成密度、低泄漏功率、辐射硬度和无挥发性,已经成为一种有前途的超越CMOS的技术。这些优点使自旋电子器件成为cmos的一个有吸引力的继承者,特别是在存储电路方面。然而,自旋电子器件通常遭受较慢的开关速度和较高的写入能量,这限制了它们的可用性。为了缩小CMOS和自旋电子学之间的能量延迟差距,已经引入了诸如CoMET(复合输入磁电基逻辑技术)之类的设备概念,它们共同利用自旋霍尔效应和磁电效应等材料现象来实现快速,节能的设备操作。在这项工作中,我们提出了一种基于CoMET技术的非易失性触发器(NVFF),能够实现比CMOS少两个数量级的写入能量。这种低写入能量(≈2 aJ)使我们的CoMET NVFF对需要频繁备份操作的架构特别有吸引力。,用于能量收集的非易失性处理器。
{"title":"An Energy Efficient Non-Volatile Flip-Flop based on CoMET Technology","authors":"Robert Perricone, Zhaoxin Liang, Meghna G. Mankalale, M. Niemier, S. Sapatnekar, Jianping Wang, X. Hu","doi":"10.23919/DATE.2019.8714916","DOIUrl":"https://doi.org/10.23919/DATE.2019.8714916","url":null,"abstract":"As we approach the limits of CMOS scaling, researchers are developing \"beyond-CMOS\" technologies to sustain the technological benefits associated with device scaling. Spin-tronic technologies have emerged as a promising beyond-CMOS technology due to their inherent benefits over CMOS such as high integration density, low leakage power, radiation hardness, and non-volatility. These benefits make spintronic devices an attractive successor to CMOS—especially for memory circuits. However, spintronic devices generally suffer from slower switching speeds and higher write energy, which limits their usability. In an effort to close the energy-delay gap between CMOS and spintronics, device concepts such as CoMET (Composite-Input Magnetoelectric-base Logic Technology) have been introduced, which collectively leverage material phenomena such as the spin-Hall effect and the magnetoelectric effect to enable fast, energy efficient device operation. In this work, we propose a non-volatile flip-flop (NVFF) based on CoMET technology that is capable of achieving up to two orders of magnitude less write energy than CMOS. This low write energy (≈2 aJ) makes our CoMET NVFF especially attractive to architectures that require frequent backup operations—e.g., for energy harvesting non-volatile processors.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116454570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
RAFS: A RAID-Aware File System to Reduce the Parity Update Overhead for SSD RAID RAFS:一种RAID感知文件系统,减少SSD RAID的奇偶校验更新开销
Pub Date : 2019-03-25 DOI: 10.23919/DATE.2019.8714938
Chenlei Tang, Ji-guang Wan, Yifeng Zhu, Zhiyuan Liu, Peng Xu, Fei Wu, C. Xie
In a parity-based SSD RAID, small write requests not only accelerate the wear-out of SSDs due to extra writes for updating parities but also deteriorate performance due to associated expensive garbage collection. To mitigate the problem of small writes, a buffer is often added at the RAID controller to absorb overwrites and writes performed to the same stripe. However, this approach achieves only suboptimal efficiency because file layout information is invisible at the block level.This paper proposes RAFS, a RAID-aware file system, which utilizes a RAID-friendly data layout to improve the reliability and performance of SSD-based RAID 5. By leveraging delayed allocation of modern file systems, RAFS employs a stripe-aware buffer policy to coalesce writes to the same file. To reduce parity updates, RAFS compacts buffered updates and flushes back in stripe units to mitigate the parity update overhead. RAFS adopts a stripe-granularity allocation scheme to align writes to stripe boundaries. Experimental results show that RAFS can improve throughput by up to 90%, compared to Ext4.
在基于奇偶校验的SSD RAID中,小的写请求不仅会因为更新奇偶校验而进行额外的写操作而加速SSD的损耗,而且还会因为相关的昂贵的垃圾收集而降低性能。为了减轻小写的问题,通常在RAID控制器上添加一个缓冲区,以吸收对同一条执行的覆盖和写操作。然而,这种方法只能达到次优的效率,因为文件布局信息在块级别是不可见的。本文提出了一种RAID感知文件系统RAFS,它利用一种RAID友好的数据布局来提高基于ssd的RAID 5的可靠性和性能。通过利用现代文件系统的延迟分配,RAFS采用条带感知缓冲策略来合并对同一文件的写操作。为了减少奇偶更新,RAFS压缩了缓存更新,并以条带单位进行刷新,以减轻奇偶更新的开销。RAFS采用条带粒度分配方案,将写对齐到条带边界。实验结果表明,与Ext4相比,RAFS可以将吞吐量提高90%。
{"title":"RAFS: A RAID-Aware File System to Reduce the Parity Update Overhead for SSD RAID","authors":"Chenlei Tang, Ji-guang Wan, Yifeng Zhu, Zhiyuan Liu, Peng Xu, Fei Wu, C. Xie","doi":"10.23919/DATE.2019.8714938","DOIUrl":"https://doi.org/10.23919/DATE.2019.8714938","url":null,"abstract":"In a parity-based SSD RAID, small write requests not only accelerate the wear-out of SSDs due to extra writes for updating parities but also deteriorate performance due to associated expensive garbage collection. To mitigate the problem of small writes, a buffer is often added at the RAID controller to absorb overwrites and writes performed to the same stripe. However, this approach achieves only suboptimal efficiency because file layout information is invisible at the block level.This paper proposes RAFS, a RAID-aware file system, which utilizes a RAID-friendly data layout to improve the reliability and performance of SSD-based RAID 5. By leveraging delayed allocation of modern file systems, RAFS employs a stripe-aware buffer policy to coalesce writes to the same file. To reduce parity updates, RAFS compacts buffered updates and flushes back in stripe units to mitigate the parity update overhead. RAFS adopts a stripe-granularity allocation scheme to align writes to stripe boundaries. Experimental results show that RAFS can improve throughput by up to 90%, compared to Ext4.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115610769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Transfer and Online Reinforcement Learning in STT-MRAM Based Embedded Systems for Autonomous Drones 基于STT-MRAM的自主无人机嵌入式系统的迁移和在线强化学习
Pub Date : 2019-03-25 DOI: 10.23919/DATE.2019.8715066
Insik Yoon, Malik Aqeel Anwar, Titash Rakshit, A. Raychowdhury
In this paper we present an algorithm-hardware co-design for camera-based autonomous flight in small drones. We show that the large write-latency and write-energy for nonvolatile memory (NVM) based embedded systems makes them unsuitable for real-time reinforcement learning (RL). We address this by performing transfer learning (TL) on meta-environments and RL on the last few layers of a deep convolutional network. While the NVM stores the meta-model from TL, an on-die SRAM stores the weights of the last few layers. Thus all the real-time updates via RL are carried out on the SRAM arrays. This provides us with a practical platform with comparable performance as end-to-end RL and 83.4% lower energy per image frame.
本文提出了一种基于摄像机的小型无人机自主飞行算法-硬件协同设计方法。我们证明了基于非易失性存储器(NVM)的嵌入式系统的大写入延迟和写入能量使它们不适合实时强化学习(RL)。我们通过在元环境上执行迁移学习(TL)和在深度卷积网络的最后几层执行强化学习来解决这个问题。当NVM存储来自TL的元模型时,片上SRAM存储最后几层的权重。因此,所有通过RL的实时更新都是在SRAM阵列上进行的。这为我们提供了一个实用的平台,其性能与端到端RL相当,每帧图像的能量降低了83.4%。
{"title":"Transfer and Online Reinforcement Learning in STT-MRAM Based Embedded Systems for Autonomous Drones","authors":"Insik Yoon, Malik Aqeel Anwar, Titash Rakshit, A. Raychowdhury","doi":"10.23919/DATE.2019.8715066","DOIUrl":"https://doi.org/10.23919/DATE.2019.8715066","url":null,"abstract":"In this paper we present an algorithm-hardware co-design for camera-based autonomous flight in small drones. We show that the large write-latency and write-energy for nonvolatile memory (NVM) based embedded systems makes them unsuitable for real-time reinforcement learning (RL). We address this by performing transfer learning (TL) on meta-environments and RL on the last few layers of a deep convolutional network. While the NVM stores the meta-model from TL, an on-die SRAM stores the weights of the last few layers. Thus all the real-time updates via RL are carried out on the SRAM arrays. This provides us with a practical platform with comparable performance as end-to-end RL and 83.4% lower energy per image frame.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"207 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114651051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1