Pub Date : 2021-12-06DOI: 10.1109/ICFPT52863.2021.9609859
Tomás Fukac, J. Matoušek, J. Korenek, Lukás Kekely
Increasing speed of network links continuously pushes up requirements on the performance of network security and monitoring systems, including their typical representative and its core function: an intrusion detection system (IDS) and pattern matching. To allow the operation of IDS applications like Snort and Suricata in networks supporting throughput of 100Gbps or even more, a recently proposed pre-filtering architecture approximates exact pattern matching using hash-based matching of short strings that represent a given set of patterns. This architecture can scale supported throughput by adjusting the number of parallel hash functions and on-chip memory blocks utilized in the implementation of a hash table. Since each hash function can address every memory block, scaling throughput also increases the total capacity of the hash table. Nevertheless, the original architecture utilizes the available capacity of the hash table inefficiently. We therefore propose three optimization techniques that either reduce the amount of information stored in the hash table or increase its achievable occupancy. Moreover, we also design modifications of the architecture that enable resource-efficient utilization of all three optimization techniques together in synergy. Compared to the original pre-filtering architecture, combined use of the proposed optimizations in the 100Gbps scenario increases the achievable capacity for short strings by three orders of magnitude. It also reduces the utilization of FPGA logic resources to only a third.
{"title":"Increasing Memory Efficiency of Hash-Based Pattern Matching for High-Speed Networks","authors":"Tomás Fukac, J. Matoušek, J. Korenek, Lukás Kekely","doi":"10.1109/ICFPT52863.2021.9609859","DOIUrl":"https://doi.org/10.1109/ICFPT52863.2021.9609859","url":null,"abstract":"Increasing speed of network links continuously pushes up requirements on the performance of network security and monitoring systems, including their typical representative and its core function: an intrusion detection system (IDS) and pattern matching. To allow the operation of IDS applications like Snort and Suricata in networks supporting throughput of 100Gbps or even more, a recently proposed pre-filtering architecture approximates exact pattern matching using hash-based matching of short strings that represent a given set of patterns. This architecture can scale supported throughput by adjusting the number of parallel hash functions and on-chip memory blocks utilized in the implementation of a hash table. Since each hash function can address every memory block, scaling throughput also increases the total capacity of the hash table. Nevertheless, the original architecture utilizes the available capacity of the hash table inefficiently. We therefore propose three optimization techniques that either reduce the amount of information stored in the hash table or increase its achievable occupancy. Moreover, we also design modifications of the architecture that enable resource-efficient utilization of all three optimization techniques together in synergy. Compared to the original pre-filtering architecture, combined use of the proposed optimizations in the 100Gbps scenario increases the achievable capacity for short strings by three orders of magnitude. It also reduces the utilization of FPGA logic resources to only a third.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121990730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-06DOI: 10.1109/ICFPT52863.2021.9609886
K. Sugiura, Hiroki Matsutani
A fast and reliable LiDAR (Light Detection and Ranging) SLAM (Simultaneous Localization and Mapping) system is the growing need for autonomous mobile robots, which are used for a variety of tasks such as indoor cleaning, navigation, and transportation. To bridge the gap between the limited processing power on such robots and the high computational requirement of the SLAM system, in this paper we propose a unified accelerator design for 2D SLAM algorithms on resource-limited FPGA devices. As scan matching is the heart of these algorithms, the proposed FPGA-based accelerator utilizes scan matching cores on the programmable logic part and users can switch the SLAM algorithms to adapt to performance requirements and environments without modifying and re-synthesizing the logic part. We integrate the accelerator into two representative SLAM algorithms, namely particle filter-based and graph-based SLAM. They are evaluated in terms of resource utilization, processing speed, and quality of output results with various real-world datasets, highlighting their algorithmic characteristics. Experiment results on a Pynq-Z2 board demonstrate that scan matching is accelerated by 13.67–14.84x, improving the overall performance of particle filter-based and graph-based SLAM by 4.03–4.67x and 3.09–4.00x respectively, while maintaining the accuracy comparable to their software counterparts and even state-of-the-art methods.
{"title":"A unified accelerator design for LiDAR SLAM algorithms for low-end FPGAs","authors":"K. Sugiura, Hiroki Matsutani","doi":"10.1109/ICFPT52863.2021.9609886","DOIUrl":"https://doi.org/10.1109/ICFPT52863.2021.9609886","url":null,"abstract":"A fast and reliable LiDAR (Light Detection and Ranging) SLAM (Simultaneous Localization and Mapping) system is the growing need for autonomous mobile robots, which are used for a variety of tasks such as indoor cleaning, navigation, and transportation. To bridge the gap between the limited processing power on such robots and the high computational requirement of the SLAM system, in this paper we propose a unified accelerator design for 2D SLAM algorithms on resource-limited FPGA devices. As scan matching is the heart of these algorithms, the proposed FPGA-based accelerator utilizes scan matching cores on the programmable logic part and users can switch the SLAM algorithms to adapt to performance requirements and environments without modifying and re-synthesizing the logic part. We integrate the accelerator into two representative SLAM algorithms, namely particle filter-based and graph-based SLAM. They are evaluated in terms of resource utilization, processing speed, and quality of output results with various real-world datasets, highlighting their algorithmic characteristics. Experiment results on a Pynq-Z2 board demonstrate that scan matching is accelerated by 13.67–14.84x, improving the overall performance of particle filter-based and graph-based SLAM by 4.03–4.67x and 3.09–4.00x respectively, while maintaining the accuracy comparable to their software counterparts and even state-of-the-art methods.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127075435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-06DOI: 10.1109/ICFPT52863.2021.9609832
P. Silva, João Bispo, N. Paulino
We discuss the concept of FPGA-unfriendliness, the property of certain algorithms, programs, or domains which may limit their applicability to FPGAs. Specifically, we look at graph analysis, which has recently seen increased interest in combination with High-Level Synthesis, but has yet to find great success compared to established acceleration mechanisms. To this end, we make use of Xilinx's Vitis Graph Library to implement Single-Source Shortest Paths (SSSP) and PageRank (PR), and present a custom kernel written from the ground up for Distinctiveness Centrality (DC, a novel graph centrality measure). We use public datasets to test these implementations, and analyse power consumption and execution time. Our comparisons against published data for GPU and CPU execution show FPGA slowdowns in execution time between around 18.5x and 328x for SSSP, and around 1.8x and 195x for PR, respectively. In some instances, we obtained FPGA speedups versus CPU of up to 2.5x for PR. Regarding DC, results show speedups from 0.1x to 3.5x, and energy efficiency increases from 0.8x to 6x. Lastly, we provide some insights regarding the applicability of FPGAs in FPGA-unfriendly domains, and comment on the future as FPGA and HLS technology advances.
{"title":"FPGAs as General-Purpose Accelerators for Non-Experts via HLS: The Graph Analysis Example","authors":"P. Silva, João Bispo, N. Paulino","doi":"10.1109/ICFPT52863.2021.9609832","DOIUrl":"https://doi.org/10.1109/ICFPT52863.2021.9609832","url":null,"abstract":"We discuss the concept of FPGA-unfriendliness, the property of certain algorithms, programs, or domains which may limit their applicability to FPGAs. Specifically, we look at graph analysis, which has recently seen increased interest in combination with High-Level Synthesis, but has yet to find great success compared to established acceleration mechanisms. To this end, we make use of Xilinx's Vitis Graph Library to implement Single-Source Shortest Paths (SSSP) and PageRank (PR), and present a custom kernel written from the ground up for Distinctiveness Centrality (DC, a novel graph centrality measure). We use public datasets to test these implementations, and analyse power consumption and execution time. Our comparisons against published data for GPU and CPU execution show FPGA slowdowns in execution time between around 18.5x and 328x for SSSP, and around 1.8x and 195x for PR, respectively. In some instances, we obtained FPGA speedups versus CPU of up to 2.5x for PR. Regarding DC, results show speedups from 0.1x to 3.5x, and energy efficiency increases from 0.8x to 6x. Lastly, we provide some insights regarding the applicability of FPGAs in FPGA-unfriendly domains, and comment on the future as FPGA and HLS technology advances.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133458510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-06DOI: 10.1109/ICFPT52863.2021.9609867
Philippos Papaphilippou, K. Sano, B. Adhi, W. Luk
This paper presents a novel FPGA-based switch design that achieves high algorithmic performance and an efficient FPGA implementation. Crossbar switches based on virtual output queues (VOQs) and variations have been rather popular for implementing switches on FPGAs, with applications to network-on-chip (NoC) routers and network switches. The efficiency of VOQs is well-documented on ASICs, though we show that their disadvantages can outweigh their advantages on FPGAs. Our proposed design uses an output-queued switch internally for simplifying scheduling, and a queue balancing technique to avoid queue fragmentation and reduce the need for memory-sharing VOQs. Our implementation approaches the scheduling performance of the state-of-the-art, while requiring considerably fewer FPGA resources.
{"title":"Efficient Queue-Balancing Switch for FPGAs","authors":"Philippos Papaphilippou, K. Sano, B. Adhi, W. Luk","doi":"10.1109/ICFPT52863.2021.9609867","DOIUrl":"https://doi.org/10.1109/ICFPT52863.2021.9609867","url":null,"abstract":"This paper presents a novel FPGA-based switch design that achieves high algorithmic performance and an efficient FPGA implementation. Crossbar switches based on virtual output queues (VOQs) and variations have been rather popular for implementing switches on FPGAs, with applications to network-on-chip (NoC) routers and network switches. The efficiency of VOQs is well-documented on ASICs, though we show that their disadvantages can outweigh their advantages on FPGAs. Our proposed design uses an output-queued switch internally for simplifying scheduling, and a queue balancing technique to avoid queue fragmentation and reduce the need for memory-sharing VOQs. Our implementation approaches the scheduling performance of the state-of-the-art, while requiring considerably fewer FPGA resources.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114842910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-06DOI: 10.1109/ICFPT52863.2021.9609805
Kaichuang Shi, Hao Zhou, Lingli Wang
Field Programmable Gate Arrays (FPGAs) are widely used for their flexibility and short time to market. FPGA routing architecture design is the key problem due to the fact that it plays a dominant role in the area, delay and power. Most of modern FPGAs are island-style which provide abundant vertical and horizontal tracks to guarantee the circuit designs can be routed successfully. Most connections in placed netlists are diagonal which may lead to passing through extra turning switches, resulting in increased delay cost and high routing density. In this paper, we propose a hexagon-based honeycomb FPGA routing architecture to improve the routability and performance. In honeycomb architecture, there are three kinds of routing channels which can provide more freedom to decrease the turning switches on the routing paths. In addition, the router lookahead algorithm is enhanced to support the honeycomb architecture which is then evaluated by the enhanced VTR with provided benchmarks. The experimental results show that the honeycomb architecture can improve the minimum routing channel width by 7.7% compared with traditional rectangular architecture with length-1 wires. In addition, the honeycomb architecture can achieve 9.9% improvement on the routed wirelength, 11.5% on the critical path delay and 12.4% on the area-delay product.
{"title":"A Hexagon-Based Honeycomb Routing Architecture for FPGA","authors":"Kaichuang Shi, Hao Zhou, Lingli Wang","doi":"10.1109/ICFPT52863.2021.9609805","DOIUrl":"https://doi.org/10.1109/ICFPT52863.2021.9609805","url":null,"abstract":"Field Programmable Gate Arrays (FPGAs) are widely used for their flexibility and short time to market. FPGA routing architecture design is the key problem due to the fact that it plays a dominant role in the area, delay and power. Most of modern FPGAs are island-style which provide abundant vertical and horizontal tracks to guarantee the circuit designs can be routed successfully. Most connections in placed netlists are diagonal which may lead to passing through extra turning switches, resulting in increased delay cost and high routing density. In this paper, we propose a hexagon-based honeycomb FPGA routing architecture to improve the routability and performance. In honeycomb architecture, there are three kinds of routing channels which can provide more freedom to decrease the turning switches on the routing paths. In addition, the router lookahead algorithm is enhanced to support the honeycomb architecture which is then evaluated by the enhanced VTR with provided benchmarks. The experimental results show that the honeycomb architecture can improve the minimum routing channel width by 7.7% compared with traditional rectangular architecture with length-1 wires. In addition, the honeycomb architecture can achieve 9.9% improvement on the routed wirelength, 11.5% on the critical path delay and 12.4% on the area-delay product.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127795738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Han-Sok Suh, Jian Meng, Ty Nguyen, S. Venkataramanaiah, Vijay Kumar, Yu Cao, Jae-sun Seo
Convolutional neural network (CNN) based object detection has achieved very high accuracy, e.g. single-shot multi-box detectors (SSD) can efficiently detect and localize various objects in an input image. However, they require a high amount of computation and memory storage, which makes it difficult to perform efficient inference on resource-constrained hardware devices such as drones or unmanned aerial vehicles (UAVs). Drone/UAV detection is an important task for applications including surveillance, defense, and multi-drone self-localization and formation control. In this paper, we designed and co-optimized algorithm and hardware for energy-efficient drone detection on resource-constrained FPGA devices. We trained SSD object detection algorithm with a custom drone dataset. For inference, we employed low-precision quantization and adapted the width of the SSD CNN model. To improve throughput, we use dual-data rate operations for DSPs to effectively double the throughput with limited DSP counts. For different SSD algorithm models, we analyze accuracy or mean average precision (mAP) and evaluate the corresponding FPGA hardware utilization, DRAM communication, throughput optimization. Our proposed design achieves a high mAP of 88.42% on the multi-drone dataset, with a high energy-efficiency of 79 GOPS/W and throughput of 158 GOPS using Xilinx Zynq ZU3EG FPGA device on the Open Vision Computer version 3 (OVC3) platform. Our design achieves 2.7X higher energy efficiency than prior works using the same FPGA device, at a low-power consumption of 1.98 W.
{"title":"Algorithm-Hardware Co-Optimization for Energy-Efficient Drone Detection on Resource-Constrained FPGA","authors":"Han-Sok Suh, Jian Meng, Ty Nguyen, S. Venkataramanaiah, Vijay Kumar, Yu Cao, Jae-sun Seo","doi":"10.1145/3583074","DOIUrl":"https://doi.org/10.1145/3583074","url":null,"abstract":"Convolutional neural network (CNN) based object detection has achieved very high accuracy, e.g. single-shot multi-box detectors (SSD) can efficiently detect and localize various objects in an input image. However, they require a high amount of computation and memory storage, which makes it difficult to perform efficient inference on resource-constrained hardware devices such as drones or unmanned aerial vehicles (UAVs). Drone/UAV detection is an important task for applications including surveillance, defense, and multi-drone self-localization and formation control. In this paper, we designed and co-optimized algorithm and hardware for energy-efficient drone detection on resource-constrained FPGA devices. We trained SSD object detection algorithm with a custom drone dataset. For inference, we employed low-precision quantization and adapted the width of the SSD CNN model. To improve throughput, we use dual-data rate operations for DSPs to effectively double the throughput with limited DSP counts. For different SSD algorithm models, we analyze accuracy or mean average precision (mAP) and evaluate the corresponding FPGA hardware utilization, DRAM communication, throughput optimization. Our proposed design achieves a high mAP of 88.42% on the multi-drone dataset, with a high energy-efficiency of 79 GOPS/W and throughput of 158 GOPS using Xilinx Zynq ZU3EG FPGA device on the Open Vision Computer version 3 (OVC3) platform. Our design achieves 2.7X higher energy efficiency than prior works using the same FPGA device, at a low-power consumption of 1.98 W.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125218993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-06DOI: 10.1109/ICFPT52863.2021.9609852
Hyuga Hashimoto, Ryoko Naka, Yasutaka Wada
This paper explains the ongoing development of an FPGA-based image recognition system for an autonomous driving robot car for the FPT'21 Design Competition. We are developing an FPGA-based image recognition on “ad-refkit” equipped with a Zybo-Z7 board to realize two main functionalities: 1) to provide recognition results in the actual environment of the contest through the Internet and 2) to be updated remotely based on the results. With these functionalities, we can realize high-performance and low-power systems using an FPGA.
{"title":"An FPGA-Based Image Recognition with Remote Update Functions for Autonomous Driving on “ad-refkit”","authors":"Hyuga Hashimoto, Ryoko Naka, Yasutaka Wada","doi":"10.1109/ICFPT52863.2021.9609852","DOIUrl":"https://doi.org/10.1109/ICFPT52863.2021.9609852","url":null,"abstract":"This paper explains the ongoing development of an FPGA-based image recognition system for an autonomous driving robot car for the FPT'21 Design Competition. We are developing an FPGA-based image recognition on “ad-refkit” equipped with a Zybo-Z7 board to realize two main functionalities: 1) to provide recognition results in the actual environment of the contest through the Internet and 2) to be updated remotely based on the results. With these functionalities, we can realize high-performance and low-power systems using an FPGA.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114922114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-06DOI: 10.1109/ICFPT52863.2021.9609919
Jingbo Gao, Yu Qian, Yihan Hu, Xitian Fan, W. Luk, Wei Cao, Lingli Wang
Lightweight convolutional neural networks (CNNs) have become increasingly popular due to their lower computational complexity and fewer memory accesses with equivalent accuracy compared to previous CNN models. However, the newly proposed networks bring new challenges to efficient hardware design, such as, in EfficientNet, depthwise convolution, squeeze-and-excitation (SE) module, and swish/sigmoid functions. Although individual engine architecture could achieve a high computing efficiency for the standard convolution or the depth-wise convolution, it is still not efficient for EfficientNet because the workload imbalance between two types of convolutional engines causes inevitable idling. To overcome this problem, we present a lightweight reconfigurable computational kernel based on FPGA with an exchangeable-track datapath scheme. In addition, a low-accuracy-loss function replacement strategy is proposed for swish/sigmoid functions. Furthermore, the low-cost hardware architecture to implement the replaced functions is designed. The proposed accelerator (LETA) can implement EfficientNet on Xilinx XCVU37P with a 300 MHz system clock and a 600 MHz kernel clock. The linear growth of resource usage in the 4-kernel implementation in 1 super logic region (SLR) with the same clock frequencies justifies the scalability of LETA. The experimental results show that LETA can achieve 2× throughput/DSP compared to the latest FPGA-based accelerator with 1.6% (0.7%) top-1 (top-5) accuracy loss on EfficientNet-B3.
{"title":"LETA: A lightweight exchangeable-track accelerator for efficientnet based on FPGA","authors":"Jingbo Gao, Yu Qian, Yihan Hu, Xitian Fan, W. Luk, Wei Cao, Lingli Wang","doi":"10.1109/ICFPT52863.2021.9609919","DOIUrl":"https://doi.org/10.1109/ICFPT52863.2021.9609919","url":null,"abstract":"Lightweight convolutional neural networks (CNNs) have become increasingly popular due to their lower computational complexity and fewer memory accesses with equivalent accuracy compared to previous CNN models. However, the newly proposed networks bring new challenges to efficient hardware design, such as, in EfficientNet, depthwise convolution, squeeze-and-excitation (SE) module, and swish/sigmoid functions. Although individual engine architecture could achieve a high computing efficiency for the standard convolution or the depth-wise convolution, it is still not efficient for EfficientNet because the workload imbalance between two types of convolutional engines causes inevitable idling. To overcome this problem, we present a lightweight reconfigurable computational kernel based on FPGA with an exchangeable-track datapath scheme. In addition, a low-accuracy-loss function replacement strategy is proposed for swish/sigmoid functions. Furthermore, the low-cost hardware architecture to implement the replaced functions is designed. The proposed accelerator (LETA) can implement EfficientNet on Xilinx XCVU37P with a 300 MHz system clock and a 600 MHz kernel clock. The linear growth of resource usage in the 4-kernel implementation in 1 super logic region (SLR) with the same clock frequencies justifies the scalability of LETA. The experimental results show that LETA can achieve 2× throughput/DSP compared to the latest FPGA-based accelerator with 1.6% (0.7%) top-1 (top-5) accuracy loss on EfficientNet-B3.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"603 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116327636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We are developing the unmanned mobile vehicle implemented on SoC FPGA for the FPGA design competition. For highly productive development of image-based self-driving mobile vehicles, a remote verification and debugging environment with real-time image transmission is important. This paper presents an image transmission system with which we can monitor onboard camera images of the vehicle and feature detection results over VNC Wi-Fi connection. We implemented the whole system on a Xilinx Zynq-7000 with a maximum operating frequency of 125 MHz. The evaluation of the system showed that the resolution of 640×720 is the most beneficial for VNC in this experiment in terms of the performance ratio of VNC to SSH X11 forwarding. We also shortly describe other components to be used to develop the autonomous driving system in this paper.
{"title":"SoC FPGA implementation of an unmanned mobile vehicle with an image transmission system over VNC","authors":"Keigo Motoyoshi, Yuta Imamura, Taichi Saikai, Koki Fujita, Daiki Furukawa, Masatomo Matsuda, Tatsuma Mori, Yasutoshi Araki, Takehiro Miura, Keizo Yamashita, Haruto Ikehara, Kaito Ohira, Katsuaki Kamimae, Takuho Kawazu, Masahiro Nishimura, Shintaro Matsui, Koki Tomonaga, Taito Manabe, Yuichiro Shibata","doi":"10.1109/ICFPT52863.2021.9609904","DOIUrl":"https://doi.org/10.1109/ICFPT52863.2021.9609904","url":null,"abstract":"We are developing the unmanned mobile vehicle implemented on SoC FPGA for the FPGA design competition. For highly productive development of image-based self-driving mobile vehicles, a remote verification and debugging environment with real-time image transmission is important. This paper presents an image transmission system with which we can monitor onboard camera images of the vehicle and feature detection results over VNC Wi-Fi connection. We implemented the whole system on a Xilinx Zynq-7000 with a maximum operating frequency of 125 MHz. The evaluation of the system showed that the resolution of 640×720 is the most beneficial for VNC in this experiment in terms of the performance ratio of VNC to SSH X11 forwarding. We also shortly describe other components to be used to develop the autonomous driving system in this paper.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114448970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-06DOI: 10.1109/ICFPT52863.2021.9609910
Hirotoshi Ito, Minoru Watanabe
This paper presents the total-ionizing-dose tolerance evaluation of an optoelectronic field programmable gate array (FPGA) during operation. The optoelectronic FPGA was fabricated using 0.18 ${mu m}$ standard complementary metal oxide semiconductor (CMOS) process technology. An experiment assessing the total-ionizing-dose tolerance of the optoelectronic FPGA was conducted at a 2.27–2.28 kGy/h dose rate using a60 Co gamma radiation source. Results clarified that the optoelectronic FPGA can function correctly under a 2.27–2.28 kGy/h dose rate and that the total-ionizing-dose tolerance of the optoelectronic FPGA is greater than 80 Mrad during operation. The total-ionizing-dose tolerance result is 80 times higher than that of typical radiation-hardened very large scale integrated circuits (VLSIs) and typical radiation-hardened FPGAs.
{"title":"Total-ionizing-dose tolerance evaluation of an optoelectronic field programmable gate array VLSI during operation","authors":"Hirotoshi Ito, Minoru Watanabe","doi":"10.1109/ICFPT52863.2021.9609910","DOIUrl":"https://doi.org/10.1109/ICFPT52863.2021.9609910","url":null,"abstract":"This paper presents the total-ionizing-dose tolerance evaluation of an optoelectronic field programmable gate array (FPGA) during operation. The optoelectronic FPGA was fabricated using 0.18 ${mu m}$ standard complementary metal oxide semiconductor (CMOS) process technology. An experiment assessing the total-ionizing-dose tolerance of the optoelectronic FPGA was conducted at a 2.27–2.28 kGy/h dose rate using a60 Co gamma radiation source. Results clarified that the optoelectronic FPGA can function correctly under a 2.27–2.28 kGy/h dose rate and that the total-ionizing-dose tolerance of the optoelectronic FPGA is greater than 80 Mrad during operation. The total-ionizing-dose tolerance result is 80 times higher than that of typical radiation-hardened very large scale integrated circuits (VLSIs) and typical radiation-hardened FPGAs.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134462933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}