Pub Date : 2019-07-23DOI: 10.23919/DATE.2019.8715185
Heinz Riener, Winston Haaswijk, A. Mishchenko, G. Micheli, Mathias Soeken
The paper presents a generalization of DAG-aware AIG rewriting for k-feasible Boolean networks, whose nodes are k-input lookup tables (k-LUTs). We introduce a high-effort DAG-aware rewriting algorithm, called cut rewriting, which uses exact synthesis to compute replacements on the fly, with support for Boolean don’t cares. Cut rewriting pre-computes a large number of possible replacement candidates, but instead of eagerly rewriting the Boolean network, stores the replacements in a conflict graph. Heuristic optimization is used to derive a best, maximal subset of replacements that can be simultaneously applied to the Boolean network from the conflict graph. We optimize LUT mapped Boolean networks obtained from the ISCAS and EPFL combinational benchmark suites. For 3-LUT networks, experiments show that we achieve an average size improvement of 5.58% and up to 40.19% after state-of-the-art Boolean rewriting techniques were applied until saturation. Similarly, for 4-LUT networks, we obtain an average improvement of 4.04% and up to 12.60%.
{"title":"On-the-fly and DAG-aware: Rewriting Boolean Networks with Exact Synthesis","authors":"Heinz Riener, Winston Haaswijk, A. Mishchenko, G. Micheli, Mathias Soeken","doi":"10.23919/DATE.2019.8715185","DOIUrl":"https://doi.org/10.23919/DATE.2019.8715185","url":null,"abstract":"The paper presents a generalization of DAG-aware AIG rewriting for k-feasible Boolean networks, whose nodes are k-input lookup tables (k-LUTs). We introduce a high-effort DAG-aware rewriting algorithm, called cut rewriting, which uses exact synthesis to compute replacements on the fly, with support for Boolean don’t cares. Cut rewriting pre-computes a large number of possible replacement candidates, but instead of eagerly rewriting the Boolean network, stores the replacements in a conflict graph. Heuristic optimization is used to derive a best, maximal subset of replacements that can be simultaneously applied to the Boolean network from the conflict graph. We optimize LUT mapped Boolean networks obtained from the ISCAS and EPFL combinational benchmark suites. For 3-LUT networks, experiments show that we achieve an average size improvement of 5.58% and up to 40.19% after state-of-the-art Boolean rewriting techniques were applied until saturation. Similarly, for 4-LUT networks, we obtain an average improvement of 4.04% and up to 12.60%.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131315248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-05-14DOI: 10.23919/DATE.2019.8715095
M. Fattori, Joost A. Fijn, L. Hu, E. Cantatore, F. Torricelli, M. Charbonneau
A Process Design Kit (PDK) for gravure-printed Organic Thin-Film Transistor (OTFT) technology is presented in this paper. The transistor model developed in the PDK enables an accurate prediction of static, dynamic and noise performance of complex organic circuits. The developed Electronic Design Automation (EDA) tools exploit an adaptive strategy to improve the versatility of the PDK in relation to the advancements of the manufacturing process. The design and experimental characterization of a Charge Sensitive Amplifier is used to demonstrate the effectiveness of the PDK. The availability of a versatile and accurate Process Design Kit is expected to enable a reliable design process for complex circuits based on an organic printed technology.
{"title":"Circuit Design and Design Automation for Printed Electronics","authors":"M. Fattori, Joost A. Fijn, L. Hu, E. Cantatore, F. Torricelli, M. Charbonneau","doi":"10.23919/DATE.2019.8715095","DOIUrl":"https://doi.org/10.23919/DATE.2019.8715095","url":null,"abstract":"A Process Design Kit (PDK) for gravure-printed Organic Thin-Film Transistor (OTFT) technology is presented in this paper. The transistor model developed in the PDK enables an accurate prediction of static, dynamic and noise performance of complex organic circuits. The developed Electronic Design Automation (EDA) tools exploit an adaptive strategy to improve the versatility of the PDK in relation to the advancements of the manufacturing process. The design and experimental characterization of a Charge Sensitive Amplifier is used to demonstrate the effectiveness of the PDK. The availability of a versatile and accurate Process Design Kit is expected to enable a reliable design process for complex circuits based on an organic printed technology.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130003772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-05-14DOI: 10.23919/DATE.2019.8714840
Róbinson Medina Sánchez, S. Stuijk, Dip Goswami, T. Basten
Image-based control uses image-processing algorithms to acquire sensing information. The sensing delay associated with the image-processing algorithm is typically platform-dependent and time-varying. Modern embedded platforms allow to characterize the sensing delay at design-time obtaining a delay histogram, and at run-time measuring its precise value. We exploit this knowledge to design variable-delay controllers. This design also takes into account the resource configuration of the image processing algorithm: sequential (with one processing resource) or pipelined (with multiprocessing capabilities). Since the control performance strongly depends on the model quality, we present a simulation benchmark that uses the model uncertainty and the delay histogram to obtain bounds on control performance. Our benchmark is used to select a variable-delay controller and a resource configuration that outperform a constant worst-case delay controller.
{"title":"Implementation-aware design of image-based control with on-line measurable variable-delay","authors":"Róbinson Medina Sánchez, S. Stuijk, Dip Goswami, T. Basten","doi":"10.23919/DATE.2019.8714840","DOIUrl":"https://doi.org/10.23919/DATE.2019.8714840","url":null,"abstract":"Image-based control uses image-processing algorithms to acquire sensing information. The sensing delay associated with the image-processing algorithm is typically platform-dependent and time-varying. Modern embedded platforms allow to characterize the sensing delay at design-time obtaining a delay histogram, and at run-time measuring its precise value. We exploit this knowledge to design variable-delay controllers. This design also takes into account the resource configuration of the image processing algorithm: sequential (with one processing resource) or pipelined (with multiprocessing capabilities). Since the control performance strongly depends on the model quality, we present a simulation benchmark that uses the model uncertainty and the delay histogram to obtain bounds on control performance. Our benchmark is used to select a variable-delay controller and a resource configuration that outperform a constant worst-case delay controller.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126055064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-05-14DOI: 10.23919/DATE.2019.8715176
L. Maldonado, Wanli Chang, Debayan Roy, A. Annaswamy, Dip Goswami, S. Chakraborty
Automotive embedded systems are safety-critical, while being highly cost-sensitive at the same time. The former requires resource dimensioning that accounts for the worst case, even if such a case occurs infrequently, while this is in conflict with the latter requirement. In order to manage both of these aspects at the same time, one research direction being explored is to dynamically assign a mixture of resources based on needs and priorities of different tasks. Along this direction, in this paper we show that by properly modeling the physical dynamics of the systems that an automotive control software interacts with, it is possible to better save resources while still guaranteeing safety properties. Towards this, we focus on a distributed controller implementation that uses an automotive FlexRay bus. Our approach combines techniques from timing/schedulability analysis and control theory and shows the significance of synergistically combining the cyber component and physical processes in the cyber-physical systems (CPS) design paradigm.
{"title":"Exploiting System Dynamics for Resource-Efficient Automotive CPS Design","authors":"L. Maldonado, Wanli Chang, Debayan Roy, A. Annaswamy, Dip Goswami, S. Chakraborty","doi":"10.23919/DATE.2019.8715176","DOIUrl":"https://doi.org/10.23919/DATE.2019.8715176","url":null,"abstract":"Automotive embedded systems are safety-critical, while being highly cost-sensitive at the same time. The former requires resource dimensioning that accounts for the worst case, even if such a case occurs infrequently, while this is in conflict with the latter requirement. In order to manage both of these aspects at the same time, one research direction being explored is to dynamically assign a mixture of resources based on needs and priorities of different tasks. Along this direction, in this paper we show that by properly modeling the physical dynamics of the systems that an automotive control software interacts with, it is possible to better save resources while still guaranteeing safety properties. Towards this, we focus on a distributed controller implementation that uses an automotive FlexRay bus. Our approach combines techniques from timing/schedulability analysis and control theory and shows the significance of synergistically combining the cyber component and physical processes in the cyber-physical systems (CPS) design paradigm.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130807363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-05-14DOI: 10.23919/DATE.2019.8715014
H. A. Balef, K. Goossens, J. P. D. Gyvez
Tracking the gradual effect of silicon aging requires fine-grain slack monitoring. Conventional slack monitoring techniques intend to measure worst-case static slack, i.e. the slack of longest timing path. In sharp contrast to the conventional techniques, we propose a novel technique that is based on dynamic excitation of in-situ delay monitors, i.e. dynamic excitation of the timing paths that are monitored. As the delays degrade, the path delays increase and the monitors are excited more frequently. With the proposed technique, a fine-grained signature of the delay degradation is extracted from the excitation rate of monitors.
{"title":"Chip Health Tracking Using Dynamic In-Situ Delay Monitoring","authors":"H. A. Balef, K. Goossens, J. P. D. Gyvez","doi":"10.23919/DATE.2019.8715014","DOIUrl":"https://doi.org/10.23919/DATE.2019.8715014","url":null,"abstract":"Tracking the gradual effect of silicon aging requires fine-grain slack monitoring. Conventional slack monitoring techniques intend to measure worst-case static slack, i.e. the slack of longest timing path. In sharp contrast to the conventional techniques, we propose a novel technique that is based on dynamic excitation of in-situ delay monitors, i.e. dynamic excitation of the timing paths that are monitored. As the delays degrade, the path delays increase and the monitors are excited more frequently. With the proposed technique, a fine-grained signature of the delay degradation is extracted from the excitation rate of monitors.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124141360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-05-14DOI: 10.23919/DATE.2019.8714843
Mohammad Nasim Imtiaz Khan, Karthikeyan Nagarajan, Swaroop Ghosh
Emerging Non-Volatile Memories (NVMs) possess unique characteristics that make them a top target for deploying Hardware Trojan. In this paper, we investigate such knobs that can be targeted by the Trojans to cause read/write failure. For example, NVM read operation depends on clamp voltage which the adversary can manipulate. Adversary can also use ground bounce generated in NVM write operation to hamper another parallel read/write operation. We have designed a Trojan that can be activated and deactivated by writing a specific data pattern to a particular address. Once activated, the Trojan can couple two predetermined addresses and data written to one address (victim’s address space) will get copied to another address (adversary’s address space). This will leak sensitive information e.g., encryption keys. Adversary can also create read/write failure to predetermined locations (fault injection). Simulation results indicate that the Trojan can be activated by writing a specific data pattern to a specific address for 1956 times. Once activated, the attack duration can be as low as 52.4μs and as high as 1.1ms (with reset-enable trigger). We also show that the proposed Trojan can scale down the clamp voltage by 400mV from optimum value which is sufficient to inject specific data-polarity read error. We also propose techniques to inject noise in the ground/power rail to cause read/write failure.
{"title":"Hardware Trojans in Emerging Non-Volatile Memories","authors":"Mohammad Nasim Imtiaz Khan, Karthikeyan Nagarajan, Swaroop Ghosh","doi":"10.23919/DATE.2019.8714843","DOIUrl":"https://doi.org/10.23919/DATE.2019.8714843","url":null,"abstract":"Emerging Non-Volatile Memories (NVMs) possess unique characteristics that make them a top target for deploying Hardware Trojan. In this paper, we investigate such knobs that can be targeted by the Trojans to cause read/write failure. For example, NVM read operation depends on clamp voltage which the adversary can manipulate. Adversary can also use ground bounce generated in NVM write operation to hamper another parallel read/write operation. We have designed a Trojan that can be activated and deactivated by writing a specific data pattern to a particular address. Once activated, the Trojan can couple two predetermined addresses and data written to one address (victim’s address space) will get copied to another address (adversary’s address space). This will leak sensitive information e.g., encryption keys. Adversary can also create read/write failure to predetermined locations (fault injection). Simulation results indicate that the Trojan can be activated by writing a specific data pattern to a specific address for 1956 times. Once activated, the attack duration can be as low as 52.4μs and as high as 1.1ms (with reset-enable trigger). We also show that the proposed Trojan can scale down the clamp voltage by 400mV from optimum value which is sufficient to inject specific data-polarity read error. We also propose techniques to inject noise in the ground/power rail to cause read/write failure.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121389133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-05-14DOI: 10.23919/DATE.2019.8714992
Xin Wei, Changhao Yan, Hai Zhou, Dian Zhou, Xuan Zeng
The floating random walk (FRW) algorithm is an important method widely used in the capacitance extraction of very large-scale integration (VLSI) interconnects. FRW could be both time-consuming and power-consuming as the circuit scale grows. However, its highly parallel nature prompts us to accelerate it with FPGAs, which have shown great performance and energy efficiency potential to other computing architectures. In this paper, we propose a scalable FPGA/CPU heterogeneous framework of FRW using SDAccel. Large-scale circuits are partitioned first by the CPU into several segments, and these segments are then sent to the FPGA random walking one by one. The framework solves the challenge of limited FPGA on-chip resource and integrates both merits of FPGAs and CPUs by targeting separate parts of the algorithm to suitable architecture, and the FPGA bitstream is built once for all. Several kernel optimization strategies are used to maximize performance of FPGAs. Besides, the FRW algorithm we use is the naive version with walking on spheres (WOS), which is much simpler and easier to implement than the complicatedly optimized version with walking on cubes (WOC). The implementation on AWS EC2 F1 (Xilinx VU9P FPGA) shows up to 6.1x performance and 42.6x energy efficiency over a quad-core CPU, and 5.2x energy efficiency over the state-of-the-art WOC implementation on an 8-core CPU.
浮动随机漫步(FRW)算法是一种广泛应用于超大规模集成电路(VLSI)互连电容提取的重要方法。随着电路规模的扩大,FRW可能既耗时又耗电。然而,它的高度并行性促使我们用fpga来加速它,fpga在其他计算架构中表现出了巨大的性能和能效潜力。本文提出了一种基于SDAccel的可扩展FPGA/CPU异构FRW框架。大规模电路首先由CPU划分成若干段,然后将这些段逐个送到FPGA随机行走。该框架解决了FPGA片上资源有限的挑战,通过将算法的各个部分定位到合适的架构中,集成了FPGA和cpu的优点,并且一次性构建了FPGA的比特流。为了使fpga的性能最大化,采用了几种核优化策略。此外,我们使用的FRW算法是基于球面行走(WOS)的朴素版本,它比基于立方体行走(WOC)的复杂优化版本更简单,更容易实现。在AWS EC2 F1 (Xilinx VU9P FPGA)上实现的性能比四核CPU高6.1倍,能效为42.6倍,比最先进的8核CPU WOC实现的能效高5.2倍。
{"title":"An Efficient FPGA-based Floating Random Walk Solver for Capacitance Extraction using SDAccel","authors":"Xin Wei, Changhao Yan, Hai Zhou, Dian Zhou, Xuan Zeng","doi":"10.23919/DATE.2019.8714992","DOIUrl":"https://doi.org/10.23919/DATE.2019.8714992","url":null,"abstract":"The floating random walk (FRW) algorithm is an important method widely used in the capacitance extraction of very large-scale integration (VLSI) interconnects. FRW could be both time-consuming and power-consuming as the circuit scale grows. However, its highly parallel nature prompts us to accelerate it with FPGAs, which have shown great performance and energy efficiency potential to other computing architectures. In this paper, we propose a scalable FPGA/CPU heterogeneous framework of FRW using SDAccel. Large-scale circuits are partitioned first by the CPU into several segments, and these segments are then sent to the FPGA random walking one by one. The framework solves the challenge of limited FPGA on-chip resource and integrates both merits of FPGAs and CPUs by targeting separate parts of the algorithm to suitable architecture, and the FPGA bitstream is built once for all. Several kernel optimization strategies are used to maximize performance of FPGAs. Besides, the FRW algorithm we use is the naive version with walking on spheres (WOS), which is much simpler and easier to implement than the complicatedly optimized version with walking on cubes (WOC). The implementation on AWS EC2 F1 (Xilinx VU9P FPGA) shows up to 6.1x performance and 42.6x energy efficiency over a quad-core CPU, and 5.2x energy efficiency over the state-of-the-art WOC implementation on an 8-core CPU.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133758444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-05-14DOI: 10.23919/DATE.2019.8714916
Robert Perricone, Zhaoxin Liang, Meghna G. Mankalale, M. Niemier, S. Sapatnekar, Jianping Wang, X. Hu
As we approach the limits of CMOS scaling, researchers are developing "beyond-CMOS" technologies to sustain the technological benefits associated with device scaling. Spin-tronic technologies have emerged as a promising beyond-CMOS technology due to their inherent benefits over CMOS such as high integration density, low leakage power, radiation hardness, and non-volatility. These benefits make spintronic devices an attractive successor to CMOS—especially for memory circuits. However, spintronic devices generally suffer from slower switching speeds and higher write energy, which limits their usability. In an effort to close the energy-delay gap between CMOS and spintronics, device concepts such as CoMET (Composite-Input Magnetoelectric-base Logic Technology) have been introduced, which collectively leverage material phenomena such as the spin-Hall effect and the magnetoelectric effect to enable fast, energy efficient device operation. In this work, we propose a non-volatile flip-flop (NVFF) based on CoMET technology that is capable of achieving up to two orders of magnitude less write energy than CMOS. This low write energy (≈2 aJ) makes our CoMET NVFF especially attractive to architectures that require frequent backup operations—e.g., for energy harvesting non-volatile processors.
{"title":"An Energy Efficient Non-Volatile Flip-Flop based on CoMET Technology","authors":"Robert Perricone, Zhaoxin Liang, Meghna G. Mankalale, M. Niemier, S. Sapatnekar, Jianping Wang, X. Hu","doi":"10.23919/DATE.2019.8714916","DOIUrl":"https://doi.org/10.23919/DATE.2019.8714916","url":null,"abstract":"As we approach the limits of CMOS scaling, researchers are developing \"beyond-CMOS\" technologies to sustain the technological benefits associated with device scaling. Spin-tronic technologies have emerged as a promising beyond-CMOS technology due to their inherent benefits over CMOS such as high integration density, low leakage power, radiation hardness, and non-volatility. These benefits make spintronic devices an attractive successor to CMOS—especially for memory circuits. However, spintronic devices generally suffer from slower switching speeds and higher write energy, which limits their usability. In an effort to close the energy-delay gap between CMOS and spintronics, device concepts such as CoMET (Composite-Input Magnetoelectric-base Logic Technology) have been introduced, which collectively leverage material phenomena such as the spin-Hall effect and the magnetoelectric effect to enable fast, energy efficient device operation. In this work, we propose a non-volatile flip-flop (NVFF) based on CoMET technology that is capable of achieving up to two orders of magnitude less write energy than CMOS. This low write energy (≈2 aJ) makes our CoMET NVFF especially attractive to architectures that require frequent backup operations—e.g., for energy harvesting non-volatile processors.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116454570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In a parity-based SSD RAID, small write requests not only accelerate the wear-out of SSDs due to extra writes for updating parities but also deteriorate performance due to associated expensive garbage collection. To mitigate the problem of small writes, a buffer is often added at the RAID controller to absorb overwrites and writes performed to the same stripe. However, this approach achieves only suboptimal efficiency because file layout information is invisible at the block level.This paper proposes RAFS, a RAID-aware file system, which utilizes a RAID-friendly data layout to improve the reliability and performance of SSD-based RAID 5. By leveraging delayed allocation of modern file systems, RAFS employs a stripe-aware buffer policy to coalesce writes to the same file. To reduce parity updates, RAFS compacts buffered updates and flushes back in stripe units to mitigate the parity update overhead. RAFS adopts a stripe-granularity allocation scheme to align writes to stripe boundaries. Experimental results show that RAFS can improve throughput by up to 90%, compared to Ext4.
{"title":"RAFS: A RAID-Aware File System to Reduce the Parity Update Overhead for SSD RAID","authors":"Chenlei Tang, Ji-guang Wan, Yifeng Zhu, Zhiyuan Liu, Peng Xu, Fei Wu, C. Xie","doi":"10.23919/DATE.2019.8714938","DOIUrl":"https://doi.org/10.23919/DATE.2019.8714938","url":null,"abstract":"In a parity-based SSD RAID, small write requests not only accelerate the wear-out of SSDs due to extra writes for updating parities but also deteriorate performance due to associated expensive garbage collection. To mitigate the problem of small writes, a buffer is often added at the RAID controller to absorb overwrites and writes performed to the same stripe. However, this approach achieves only suboptimal efficiency because file layout information is invisible at the block level.This paper proposes RAFS, a RAID-aware file system, which utilizes a RAID-friendly data layout to improve the reliability and performance of SSD-based RAID 5. By leveraging delayed allocation of modern file systems, RAFS employs a stripe-aware buffer policy to coalesce writes to the same file. To reduce parity updates, RAFS compacts buffered updates and flushes back in stripe units to mitigate the parity update overhead. RAFS adopts a stripe-granularity allocation scheme to align writes to stripe boundaries. Experimental results show that RAFS can improve throughput by up to 90%, compared to Ext4.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115610769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-25DOI: 10.23919/DATE.2019.8715066
Insik Yoon, Malik Aqeel Anwar, Titash Rakshit, A. Raychowdhury
In this paper we present an algorithm-hardware co-design for camera-based autonomous flight in small drones. We show that the large write-latency and write-energy for nonvolatile memory (NVM) based embedded systems makes them unsuitable for real-time reinforcement learning (RL). We address this by performing transfer learning (TL) on meta-environments and RL on the last few layers of a deep convolutional network. While the NVM stores the meta-model from TL, an on-die SRAM stores the weights of the last few layers. Thus all the real-time updates via RL are carried out on the SRAM arrays. This provides us with a practical platform with comparable performance as end-to-end RL and 83.4% lower energy per image frame.
{"title":"Transfer and Online Reinforcement Learning in STT-MRAM Based Embedded Systems for Autonomous Drones","authors":"Insik Yoon, Malik Aqeel Anwar, Titash Rakshit, A. Raychowdhury","doi":"10.23919/DATE.2019.8715066","DOIUrl":"https://doi.org/10.23919/DATE.2019.8715066","url":null,"abstract":"In this paper we present an algorithm-hardware co-design for camera-based autonomous flight in small drones. We show that the large write-latency and write-energy for nonvolatile memory (NVM) based embedded systems makes them unsuitable for real-time reinforcement learning (RL). We address this by performing transfer learning (TL) on meta-environments and RL on the last few layers of a deep convolutional network. While the NVM stores the meta-model from TL, an on-die SRAM stores the weights of the last few layers. Thus all the real-time updates via RL are carried out on the SRAM arrays. This provides us with a practical platform with comparable performance as end-to-end RL and 83.4% lower energy per image frame.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"207 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114651051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}