L. Gantel, M. A. Benkhelifa, F. Verdier, F. Lemonnier
In a Reconfigurable System-on-Chip (SoC) platform, the application is divided into threads managed by an operating system, and whereas some threads are implemented as hardware threads and allocated into a partition of the chip, others run as software threads on embedded processing elements. Relying on the Multicore Resource management API (MRAPI) specification and client-server mechanisms, we propose solutions to provide a flexible access to the operating system services for every threads whatever is the core they are running on. In order to improve the application deployment process when targeting HRSoC, we realized both a hardware and a software implementation of this API. Depending on affinities of the operating system services with each core, such a solution allows a fine-grain implementation of these services over the platform.
{"title":"MRAPI Implementation for Heterogeneous Reconfigurable Systems-on-Chip","authors":"L. Gantel, M. A. Benkhelifa, F. Verdier, F. Lemonnier","doi":"10.1109/FCCM.2014.74","DOIUrl":"https://doi.org/10.1109/FCCM.2014.74","url":null,"abstract":"In a Reconfigurable System-on-Chip (SoC) platform, the application is divided into threads managed by an operating system, and whereas some threads are implemented as hardware threads and allocated into a partition of the chip, others run as software threads on embedded processing elements. Relying on the Multicore Resource management API (MRAPI) specification and client-server mechanisms, we propose solutions to provide a flexible access to the operating system services for every threads whatever is the core they are running on. In order to improve the application deployment process when targeting HRSoC, we realized both a hardware and a software implementation of this API. Depending on affinities of the operating system services with each core, such a solution allows a fine-grain implementation of these services over the platform.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125552671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As technology feature sizes shrink, aggressive voltage scaling is required to contain power density. However, this also increases the rate of transient upsets-potentially preventing us from scaling down voltage and possibly even requiring voltage increases to maintain reliability. Duplication with checking and triple-modular redundancy are traditional approaches to combat transient errors, but spending 2-3× the energy for redundant computation can diminish or reverse the benefits of voltage scaling. As an alternative, we explore the opportunity to use checking computations that are cheaper than the base computation they are guarding. We identify and evaluate the effectiveness of lightweight checks in a broad set of common FPGA tasks in scientific computing and signal and image processing. We find that the lightweight checks cost less than 14% of the base computation. Using an exponential model for the relationship between voltage and transient upset rate, we are able to show over 80% net energy reduction by aggressive voltage scaling without compromising reliability compared to operation at the nominal voltage.
{"title":"Energy Reduction through Differential Reliability and Lightweight Checking","authors":"E. Kadrić, K. Mahajan, A. DeHon","doi":"10.1109/FCCM.2014.78","DOIUrl":"https://doi.org/10.1109/FCCM.2014.78","url":null,"abstract":"As technology feature sizes shrink, aggressive voltage scaling is required to contain power density. However, this also increases the rate of transient upsets-potentially preventing us from scaling down voltage and possibly even requiring voltage increases to maintain reliability. Duplication with checking and triple-modular redundancy are traditional approaches to combat transient errors, but spending 2-3× the energy for redundant computation can diminish or reverse the benefits of voltage scaling. As an alternative, we explore the opportunity to use checking computations that are cheaper than the base computation they are guarding. We identify and evaluate the effectiveness of lightweight checks in a broad set of common FPGA tasks in scientific computing and signal and image processing. We find that the lightweight checks cost less than 14% of the base computation. Using an exponential model for the relationship between voltage and transient upset rate, we are able to show over 80% net energy reduction by aggressive voltage scaling without compromising reliability compared to operation at the nominal voltage.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124339666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep Packet Inspection (DPI) has become crucial for providing rich internet services, such as intrusion and phishing protection, but the use of DPI raises concerns for protecting the privacy of internet users. In this paper, a RAM-based hardware anonymizer is proposed for implementation on a Virtex-5 FPGA device. The results of the hardware anonymizer showed that the proposed architecture reduced circuit usage by 40%.
{"title":"High-Throughput and Low-Cost Hardware Accelerator for Privacy Preserving Publishing","authors":"Fumito Yamaguchi, H. Nishi","doi":"10.1109/FCCM.2014.77","DOIUrl":"https://doi.org/10.1109/FCCM.2014.77","url":null,"abstract":"Deep Packet Inspection (DPI) has become crucial for providing rich internet services, such as intrusion and phishing protection, but the use of DPI raises concerns for protecting the privacy of internet users. In this paper, a RAM-based hardware anonymizer is proposed for implementation on a Virtex-5 FPGA device. The results of the hardware anonymizer showed that the proposed architecture reduced circuit usage by 40%.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125730191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vivek D. Tovinakere, O. Sentieys, Steven Derrien, Christophe Huriaux
A key concern in the design of controllers in wireless sensor network (WSN) nodes is the flexibility to execute different control tasks involving sensing, communications and computational resources of the node. In this paper, low power flexible controllers for WSN nodes based on reconfigurable microtasks composed of an FSM and datapath are presented. Coarse grain power gating opportunities are exploited in FSM and datapath for low power operation in reconfigurable microtasks. Power estimation results on typical benchmark microtasks show a 2× to 5× improvement in energy efficiency w.r.t a microcontroller at a cost of 5× relative to a microtask implemented as an ASIC with higher NRE costs.
{"title":"Low Power Reconfigurable Controllers for Wireless Sensor Network Nodes","authors":"Vivek D. Tovinakere, O. Sentieys, Steven Derrien, Christophe Huriaux","doi":"10.1109/FCCM.2014.68","DOIUrl":"https://doi.org/10.1109/FCCM.2014.68","url":null,"abstract":"A key concern in the design of controllers in wireless sensor network (WSN) nodes is the flexibility to execute different control tasks involving sensing, communications and computational resources of the node. In this paper, low power flexible controllers for WSN nodes based on reconfigurable microtasks composed of an FSM and datapath are presented. Coarse grain power gating opportunities are exploited in FSM and datapath for low power operation in reconfigurable microtasks. Power estimation results on typical benchmark microtasks show a 2× to 5× improvement in energy efficiency w.r.t a microcontroller at a cost of 5× relative to a microtask implemented as an ASIC with higher NRE costs.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123036878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hendrik Noll, Sebastian Siegert, Johannes Hiltscher, W. Rehm
Testing and debugging of an Field Programmable Gate Array (FPGA) based Peripheral Component Interconnect Express (PCIe) extension card require an access to its resources and the system's main memory. Both are accessible via the physical memory address space (PMAS). User-level solutions for accessing this address space exist, but are proprietary and/or limited to specific address ranges, among others. An arbitrary user-level access, e.g. for a flexible validation of an intellectual property (IP) core, is not possible. Enabling such accesses, the open source Linux tool set UTOPIA - including its concept, structure and interfaces - is presented in this paper. Further, bandwidths and latencies between user-level applications and the PMAS are measured and evaluated.
{"title":"UTOPIA: Generic User-Level Access to the Physical Memory Address Space for IP Core Debugging and Validation on FPGA Based PCIe Extension Cards","authors":"Hendrik Noll, Sebastian Siegert, Johannes Hiltscher, W. Rehm","doi":"10.1109/FCCM.2014.41","DOIUrl":"https://doi.org/10.1109/FCCM.2014.41","url":null,"abstract":"Testing and debugging of an Field Programmable Gate Array (FPGA) based Peripheral Component Interconnect Express (PCIe) extension card require an access to its resources and the system's main memory. Both are accessible via the physical memory address space (PMAS). User-level solutions for accessing this address space exist, but are proprietary and/or limited to specific address ranges, among others. An arbitrary user-level access, e.g. for a flexible validation of an intellectual property (IP) core, is not possible. Enabling such accesses, the open source Linux tool set UTOPIA - including its concept, structure and interfaces - is presented in this paper. Further, bandwidths and latencies between user-level applications and the PMAS are measured and evaluated.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121591044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mixed-precision implementation of computation can deliver area, throughput and power improvements for dataflow computations over homogeneous fixed-precision circuits without any loss in accuracy. When designing circuits for reconfigurable hardware, we can exercise independent control over bitwidth selection of each variable in the computation. However, selecting the best precision for each variable is an NP-hard problem. While traditional solutions use automated heuristics like simulated annealing or integer linear programming, they still rely on the manual formulation of resource models, which can be tedious, and potentially inaccurate due to the unpredictable interactions between different stages of the FPGA CAD flow. We develop MixFX-SCORE, an automated tool-flow based on FX-SCORE fixed-point compilation framework and simulated annealing, to address this challenge. We outsource error analysis (Gappa++) and resource model generation (Vivado HLS, Logic Synthesis, Xilinx Place-and-Route) to external tools that offer a more accurate representation of error behavior (backed by proofs) and resource usage (based on actual utilization). We demonstrate 1.1-3.5x LUTs count savings, 1-1.8x DSP count reductions, and 1-3.9x dynamic power improvements while still satisfying the accuracy constraints when compared to homogeneous fixed-point implementations.
{"title":"MixFX-SCORE: Heterogeneous Fixed-Point Compilation of Dataflow Computations","authors":"Deheng Ye, Nachiket Kapre","doi":"10.1109/FCCM.2014.64","DOIUrl":"https://doi.org/10.1109/FCCM.2014.64","url":null,"abstract":"Mixed-precision implementation of computation can deliver area, throughput and power improvements for dataflow computations over homogeneous fixed-precision circuits without any loss in accuracy. When designing circuits for reconfigurable hardware, we can exercise independent control over bitwidth selection of each variable in the computation. However, selecting the best precision for each variable is an NP-hard problem. While traditional solutions use automated heuristics like simulated annealing or integer linear programming, they still rely on the manual formulation of resource models, which can be tedious, and potentially inaccurate due to the unpredictable interactions between different stages of the FPGA CAD flow. We develop MixFX-SCORE, an automated tool-flow based on FX-SCORE fixed-point compilation framework and simulated annealing, to address this challenge. We outsource error analysis (Gappa++) and resource model generation (Vivado HLS, Logic Synthesis, Xilinx Place-and-Route) to external tools that offer a more accurate representation of error behavior (backed by proofs) and resource usage (based on actual utilization). We demonstrate 1.1-3.5x LUTs count savings, 1-1.8x DSP count reductions, and 1-3.9x dynamic power improvements while still satisfying the accuracy constraints when compared to homogeneous fixed-point implementations.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"77 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134410979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The aim of this paper is to show a novel floorplanner based on Mixed-Integer Linear Programming (MILP), providing a suitable formulation that makes the problem tractable using state-of-the-art solvers. The proposed method takes into account an accurate description of heterogeneous resources and partially reconfigurable constraints of recent FPGAs. A global optimum can be found for small instances in a small amount of time. For large instances, with a time limited search, a 20% average improvement can be achieved over floorplanners based on simulated annealing. Our approach allows the designer to customize the objective function to be minimized, so that different weights can be assigned to a linear combination of metrics such as total wire length, aspect ratio and area occupancy.
{"title":"Floorplanning for Partially-Reconfigurable FPGA Systems via Mixed-Integer Linear Programming","authors":"Marco Rabozzi, J. Lillis, M. Santambrogio","doi":"10.1109/FCCM.2014.61","DOIUrl":"https://doi.org/10.1109/FCCM.2014.61","url":null,"abstract":"The aim of this paper is to show a novel floorplanner based on Mixed-Integer Linear Programming (MILP), providing a suitable formulation that makes the problem tractable using state-of-the-art solvers. The proposed method takes into account an accurate description of heterogeneous resources and partially reconfigurable constraints of recent FPGAs. A global optimum can be found for small instances in a small amount of time. For large instances, with a time limited search, a 20% average improvement can be achieved over floorplanners based on simulated annealing. Our approach allows the designer to customize the objective function to be minimized, so that different weights can be assigned to a linear combination of metrics such as total wire length, aspect ratio and area occupancy.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125883148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}