首页 > 最新文献

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)最新文献

英文 中文
Acceleration of Gravitation Field Analysis for Asteroids by GPU Computation 基于GPU计算的小行星重力场加速度分析
Fumiya Kono, N. Nakasato, N. Hirata, K. Matsumoto
Researches with explorations by space probes for asteroids have been performed actively to approach to the origin of the solar system and life. One of methods toward the goal is analyzing structure of solar system bodies by numerical simulation. GFandSlope is a code which calculates the gravitation field, slope, and attraction of given model data for small solar system bodies. When we use the existing sequential computation code, it is inevitable to take large time to analyze high resolution models with different initial conditions. This work achieved to compute several thousands faster than the previous by GPU implementation, which will also boost researches in the field of space science. This paper presents the evaluation of our GPU codes for fast gravitation field analysis and discusses numerical precision in floating point operations on the GPU for practical application.
在探索太阳系和生命起源方面,人们积极开展小行星探测研究。实现这一目标的方法之一是通过数值模拟分析太阳系天体的结构。GFandSlope是一个计算太阳系小天体的引力场,斜率和引力的给定模型数据的代码。当我们使用现有的顺序计算代码时,不可避免地要花费大量的时间来分析具有不同初始条件的高分辨率模型。这项工作通过GPU的实现实现了比以前的计算速度快几千个,这也将推动空间科学领域的研究。本文对我们的快速重力场分析GPU代码进行了评价,并讨论了在GPU上进行浮点运算的数值精度,以供实际应用。
{"title":"Acceleration of Gravitation Field Analysis for Asteroids by GPU Computation","authors":"Fumiya Kono, N. Nakasato, N. Hirata, K. Matsumoto","doi":"10.1109/MCSoC51149.2021.00010","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00010","url":null,"abstract":"Researches with explorations by space probes for asteroids have been performed actively to approach to the origin of the solar system and life. One of methods toward the goal is analyzing structure of solar system bodies by numerical simulation. GFandSlope is a code which calculates the gravitation field, slope, and attraction of given model data for small solar system bodies. When we use the existing sequential computation code, it is inevitable to take large time to analyze high resolution models with different initial conditions. This work achieved to compute several thousands faster than the previous by GPU implementation, which will also boost researches in the field of space science. This paper presents the evaluation of our GPU codes for fast gravitation field analysis and discusses numerical precision in floating point operations on the GPU for practical application.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123363042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Low Cost and Portable Mini Motor Car System with a BNN Accelerator on FPGA 基于FPGA的基于BNN加速器的低成本便携式微型汽车系统
Fumio Hamanaka, Takuto Kanamori, Kenji Kise
To realize autonomous driving, a deep neural network (DNN) is one of the key technologies. However, since DNN needs a lot of computation, it is challenging for an edge device to support DNN with limited computation resources. A binarized neural network (BNN) has been proposed to reduce latency and parameter size and is suited for hardware implementation. Since current DNN technology is a growing and better algorithm change with time, implementing DNN on an FPGA is preferable to an ASIC. In this paper, we propose a low cost and portable mini motor car system with a BNN accelerator on an FPGA. We compare the road tracking demonstration with a similar motor car using Raspberry Pi and show the effectiveness of FPGA in a DNN implementation. The proposed system is implemented on Nexys A7, one of the most popular FPGA development boards using an Artix-7 FPGA.
要实现自动驾驶,深度神经网络(DNN)是关键技术之一。然而,由于深度神经网络需要大量的计算,边缘设备在有限的计算资源下支持深度神经网络是一个挑战。提出了一种二值化神经网络(BNN),以减少延迟和参数大小,适合硬件实现。由于目前的深度神经网络技术是一种不断发展和更好的算法,随着时间的推移,在FPGA上实现深度神经网络比在ASIC上更好。本文提出了一种基于FPGA的低成本便携式微型汽车加速器系统。我们将道路跟踪演示与使用树莓派的类似汽车进行比较,并展示了FPGA在DNN实现中的有效性。提出的系统是在Nexys A7上实现的,这是最流行的FPGA开发板之一,使用Artix-7 FPGA。
{"title":"A Low Cost and Portable Mini Motor Car System with a BNN Accelerator on FPGA","authors":"Fumio Hamanaka, Takuto Kanamori, Kenji Kise","doi":"10.1109/MCSoC51149.2021.00020","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00020","url":null,"abstract":"To realize autonomous driving, a deep neural network (DNN) is one of the key technologies. However, since DNN needs a lot of computation, it is challenging for an edge device to support DNN with limited computation resources. A binarized neural network (BNN) has been proposed to reduce latency and parameter size and is suited for hardware implementation. Since current DNN technology is a growing and better algorithm change with time, implementing DNN on an FPGA is preferable to an ASIC. In this paper, we propose a low cost and portable mini motor car system with a BNN accelerator on an FPGA. We compare the road tracking demonstration with a similar motor car using Raspberry Pi and show the effectiveness of FPGA in a DNN implementation. The proposed system is implemented on Nexys A7, one of the most popular FPGA development boards using an Artix-7 FPGA.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114512264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Multi-scale Binarized Neural Network Application Based on All Programmable System on Chip 基于全可编程片上系统的多尺度二值化神经网络应用
Maoyang Xiang, T. Teo
Binary neural networks (BNNs) are particularly well-suited for low-power embedded devices with limited computational capabilities. Due to the binary weight parameters, it significantly reduces memory footprint and arithmetic logic unit operations. Nevertheless, one of the disadvantages of BNN is low accuracy and sharp optimization space. Several studies of BNNs have recently shown improved accuracy in various tests via more operations and more complicated topologies. This approach, however, is incompatible with the embedded BNN application since it requires complicated data type translation. Hence, We propose a novel approach for the BNN application on the embedded system with multi-scale neural network topology in this research from two optimization perspectives: hardware structure and BNN topology, which preserves more low-level information during the feed-forward process with few operations. Our network topology achieves 91.3% accuracy for the CIFAR-10 dataset, one of the highest recorded by BNN and can process 537 tiny pictures per second when deployed on an All programmable System on Chip (APSoc) device with 4.4W power consumption.
二进制神经网络(bnn)特别适合于计算能力有限的低功耗嵌入式设备。由于采用二进制权重参数,它显著减少了内存占用和算术逻辑单元操作。然而,BNN的缺点之一是精度低,优化空间大。最近对bnn的几项研究表明,通过更多的操作和更复杂的拓扑结构,在各种测试中提高了准确性。然而,这种方法与嵌入式BNN应用程序不兼容,因为它需要复杂的数据类型转换。因此,本研究从硬件结构优化和BNN拓扑优化两方面提出了一种新颖的BNN在多尺度神经网络拓扑的嵌入式系统中的应用方法,该方法在前馈过程中以较少的操作保留了更多的底层信息。我们的网络拓扑在CIFAR-10数据集上实现了91.3%的准确率,这是BNN记录的最高准确率之一,当部署在功耗为4.4W的全可编程片上系统(APSoc)设备上时,每秒可以处理537张微小图片。
{"title":"A Multi-scale Binarized Neural Network Application Based on All Programmable System on Chip","authors":"Maoyang Xiang, T. Teo","doi":"10.1109/MCSoC51149.2021.00030","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00030","url":null,"abstract":"Binary neural networks (BNNs) are particularly well-suited for low-power embedded devices with limited computational capabilities. Due to the binary weight parameters, it significantly reduces memory footprint and arithmetic logic unit operations. Nevertheless, one of the disadvantages of BNN is low accuracy and sharp optimization space. Several studies of BNNs have recently shown improved accuracy in various tests via more operations and more complicated topologies. This approach, however, is incompatible with the embedded BNN application since it requires complicated data type translation. Hence, We propose a novel approach for the BNN application on the embedded system with multi-scale neural network topology in this research from two optimization perspectives: hardware structure and BNN topology, which preserves more low-level information during the feed-forward process with few operations. Our network topology achieves 91.3% accuracy for the CIFAR-10 dataset, one of the highest recorded by BNN and can process 537 tiny pictures per second when deployed on an All programmable System on Chip (APSoc) device with 4.4W power consumption.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"451 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116180381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Surface Type Classification for Autonomous Robots Using Temporal, Statistical and Spectral Feature Extraction and Selection 基于时间、统计和光谱特征提取与选择的自主机器人表面类型分类
Md. Al Mehedi Hasan, Fuad Al Abir, Jungpil Shin
Real-time surface recognition has become a crucial component in assuring the safe walking of intelligent autonomous robots in a complex human-living interior environment. Numerous studies have been done addressing the problem recently. Still, there is a scope of improvements for accurate classification and inference time. In this paper, we have extracted features from accelerometer and gyroscope data in the temporal, statistical and spectral domain and classified them using a tree-based ensembling classification algorithm. We have achieved 80.81% mean accuracy, classifying 9 different surfaces with 1.0% standard deviation in 10-fold cross-validation and 97.25% average AUC score. Our method acquired state-of-the-art accuracy ensuring minimal inference time which is essential for real-time recognition for the autonomous robots.
实时表面识别已成为保证智能自主机器人在复杂的人类居住室内环境中安全行走的重要组成部分。最近针对这个问题做了大量的研究。尽管如此,在准确分类和推理时间方面仍有很大的改进空间。在本文中,我们从加速度计和陀螺仪数据中提取了时间域、统计域和频谱域的特征,并使用基于树的集成分类算法对它们进行分类。在10倍交叉验证中,我们对9个不同的表面进行了分类,平均准确率为80.81%,标准差为1.0%,平均AUC得分为97.25%。我们的方法获得了最先进的精度,确保了最小的推理时间,这对自主机器人的实时识别至关重要。
{"title":"Surface Type Classification for Autonomous Robots Using Temporal, Statistical and Spectral Feature Extraction and Selection","authors":"Md. Al Mehedi Hasan, Fuad Al Abir, Jungpil Shin","doi":"10.1109/MCSoC51149.2021.00029","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00029","url":null,"abstract":"Real-time surface recognition has become a crucial component in assuring the safe walking of intelligent autonomous robots in a complex human-living interior environment. Numerous studies have been done addressing the problem recently. Still, there is a scope of improvements for accurate classification and inference time. In this paper, we have extracted features from accelerometer and gyroscope data in the temporal, statistical and spectral domain and classified them using a tree-based ensembling classification algorithm. We have achieved 80.81% mean accuracy, classifying 9 different surfaces with 1.0% standard deviation in 10-fold cross-validation and 97.25% average AUC score. Our method acquired state-of-the-art accuracy ensuring minimal inference time which is essential for real-time recognition for the autonomous robots.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126181872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of Recursive Feature Elimination and LASSO Regularization-based optimized feature selection approaches for cervical cancer prediction 基于递归特征消除和LASSO正则化的宫颈癌预测优化特征选择方法评价
Mohamed Hamada, Jesse Jeremiah Tanimu, Mohammed Hassan, H. Kakudi, Patience Robert
Cervical cancer is one of the leading causes of premature mortality among women worldwide and more than 85% of these deaths are in developing countries. There are several risk factors associated with cervical cancer. In this research, the aim is to develop a predictive model for predicting the outcome of patient's cervical cancer results, given risk patterns from individual medical records and preliminary screening. This work presents a machine learning method using Decision Tree (DT) algorithm to analyze the risk factors of cervical cancer. Recursive Feature Elimination (RFE) and least absolute shrinkage and selection operator (LASSO) feature selection techniques were fully explored to determine the most important attributes for cervical cancer prediction. Comparative analysis of the 2 feature selection techniques were performed to show the importance of feature selection in cervical cancer prediction. Based on the result of the analysis, we can conclude that the proposed model produced the highest accuracy of 98% and 96% respectively while using DT with RFE and LASSO feature selection techniques respectively.
宫颈癌是全世界妇女过早死亡的主要原因之一,其中85%以上的死亡发生在发展中国家。有几个与子宫颈癌有关的危险因素。在这项研究中,目的是建立一个预测模型,根据个人医疗记录和初步筛查的风险模式,预测患者宫颈癌结果的结果。本文提出了一种使用决策树(DT)算法的机器学习方法来分析宫颈癌的危险因素。充分探索了递归特征消除(RFE)和最小绝对收缩和选择算子(LASSO)特征选择技术,以确定宫颈癌预测的最重要属性。通过对两种特征选择方法的比较分析,说明特征选择在宫颈癌预测中的重要性。根据分析结果,我们可以得出结论,当DT与RFE和LASSO特征选择技术分别使用时,所提出的模型分别产生了98%和96%的最高准确率。
{"title":"Evaluation of Recursive Feature Elimination and LASSO Regularization-based optimized feature selection approaches for cervical cancer prediction","authors":"Mohamed Hamada, Jesse Jeremiah Tanimu, Mohammed Hassan, H. Kakudi, Patience Robert","doi":"10.1109/MCSoC51149.2021.00056","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00056","url":null,"abstract":"Cervical cancer is one of the leading causes of premature mortality among women worldwide and more than 85% of these deaths are in developing countries. There are several risk factors associated with cervical cancer. In this research, the aim is to develop a predictive model for predicting the outcome of patient's cervical cancer results, given risk patterns from individual medical records and preliminary screening. This work presents a machine learning method using Decision Tree (DT) algorithm to analyze the risk factors of cervical cancer. Recursive Feature Elimination (RFE) and least absolute shrinkage and selection operator (LASSO) feature selection techniques were fully explored to determine the most important attributes for cervical cancer prediction. Comparative analysis of the 2 feature selection techniques were performed to show the importance of feature selection in cervical cancer prediction. Based on the result of the analysis, we can conclude that the proposed model produced the highest accuracy of 98% and 96% respectively while using DT with RFE and LASSO feature selection techniques respectively.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121541358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Energy saving in a multi-context coarse grained reconfigurable array with non-volatile flip-flops 具有非易失性触发器的多上下文粗粒度可重构阵列的节能
Aika Kamei, Takuya Kojima, H. Amano, Daiki Yokoyama, Hisato Miyauchi, K. Usami, Keizo Hiraga, Kenta Suzuki, K. Bessho
In this study, a second-generation coarse-grained reconfigurable array with non-volatile flip-flops (NVFFs), known as the non-volatile cool mega array with multi-context (NVCMA/MC), is proposed. Similar to the previous NVCMA, verify-and-retriable NVFFs (VR-NVFFs) are provided for their configuration memory, constant memory, data memory, and instruction memory. The dedicated instructions for controlling the store, verify, and restore operations of the NVFFs are provided to the microcontroller in addition to power gating functions. Based on experience of the NVCMA, four hardware contexts are introduced to maintain the configuration data for four tasks, without the sacrifice of memory leakage. The array size is expanded, and pipeline registers are introduced to reduce the trade-off between the performance and power consumption. This study mainly focuses on the energy-saving effect of the VR-NVFFs and the multi-context facility of the NVCMA/MC, including the measurement of the break-even point. The evaluation of a real chip implemented with 40 nm MTJ/MOS hybrid process technology demonstrates that the store energy is reduced by 65% with the two-step store control of the VR-NVFFs. Moreover, applications that run intermittently for intervals as short as approximately 3 μs can benefit from the multi-context power gating.
本文提出了一种具有非易失性触发器(NVFFs)的第二代粗粒度可重构阵列,即非易失性多上下文冷阵列(NVCMA/MC)。与之前的NVCMA类似,可验证和可检索的nvff (vr - nvff)提供了配置内存、常量内存、数据内存和指令内存。除了电源门控功能外,还向微控制器提供了用于控制nvff的存储、验证和恢复操作的专用指令。根据NVCMA的经验,在不牺牲内存泄漏的情况下,引入了四种硬件上下文来维护四个任务的配置数据。扩展了阵列大小,并引入了流水线寄存器以减少性能和功耗之间的权衡。本研究主要关注vr - nvff和NVCMA/MC的多情境设施的节能效果,包括盈亏平衡点的测量。对采用40 nm MTJ/MOS混合工艺技术实现的实际芯片的评估表明,采用VR-NVFFs的两步存储控制,存储能量降低了65%。此外,间歇运行时间短至约3 μs的应用程序可以从多上下文功率门控中受益。
{"title":"Energy saving in a multi-context coarse grained reconfigurable array with non-volatile flip-flops","authors":"Aika Kamei, Takuya Kojima, H. Amano, Daiki Yokoyama, Hisato Miyauchi, K. Usami, Keizo Hiraga, Kenta Suzuki, K. Bessho","doi":"10.1109/MCSoC51149.2021.00047","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00047","url":null,"abstract":"In this study, a second-generation coarse-grained reconfigurable array with non-volatile flip-flops (NVFFs), known as the non-volatile cool mega array with multi-context (NVCMA/MC), is proposed. Similar to the previous NVCMA, verify-and-retriable NVFFs (VR-NVFFs) are provided for their configuration memory, constant memory, data memory, and instruction memory. The dedicated instructions for controlling the store, verify, and restore operations of the NVFFs are provided to the microcontroller in addition to power gating functions. Based on experience of the NVCMA, four hardware contexts are introduced to maintain the configuration data for four tasks, without the sacrifice of memory leakage. The array size is expanded, and pipeline registers are introduced to reduce the trade-off between the performance and power consumption. This study mainly focuses on the energy-saving effect of the VR-NVFFs and the multi-context facility of the NVCMA/MC, including the measurement of the break-even point. The evaluation of a real chip implemented with 40 nm MTJ/MOS hybrid process technology demonstrates that the store energy is reduced by 65% with the two-step store control of the VR-NVFFs. Moreover, applications that run intermittently for intervals as short as approximately 3 μs can benefit from the multi-context power gating.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125459062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Distance Estimation Method to Railway Crossing Using Warning Signs 基于警示标志的铁路道口距离估算方法
Kaisei Shimura, Yoichi Tomioka, Qiang Zhao
A mobility scooter has come to be used to expand the range of mobility for the elderly. On the other hand, accidents involving mobility scooters have become serious problems. For example, if a mobility scooter stops inside a railway crossing due to battery exhaustion, it is very dangerous because accidental contact with a train may happen. Measuring the distance to a railway crossing during driving is helpful to avoid entrance to a railway crossing without enough battery. In this paper, we propose a method for predicting the distance to a railroad crossing based on the railway crossing warning signs in the video from a camera installed in front of the mobility scooter. In experiments, we evaluate the proposed method using images taken at various positions in relation to the railway crossing and show that the proposed method achieves higher accuracy than the distance estimation using a depth sensor.
一种机动滑板车已经被用来扩大老年人的活动范围。另一方面,涉及机动滑板车的事故已经成为严重的问题。例如,如果一辆机动滑板车由于电池耗尽而停在铁路道口内,这是非常危险的,因为可能会发生意外接触火车。在开车时测量到铁路道口的距离,有助于避免在没有足够电量的情况下进入铁路道口。在本文中,我们提出了一种基于安装在移动滑板车前面的摄像机的视频中的铁路道口警告标志来预测到铁路道口距离的方法。在实验中,我们使用与铁路道口相关的不同位置拍摄的图像来评估所提出的方法,并表明所提出的方法比使用深度传感器的距离估计具有更高的精度。
{"title":"A Distance Estimation Method to Railway Crossing Using Warning Signs","authors":"Kaisei Shimura, Yoichi Tomioka, Qiang Zhao","doi":"10.1109/MCSoC51149.2021.00034","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00034","url":null,"abstract":"A mobility scooter has come to be used to expand the range of mobility for the elderly. On the other hand, accidents involving mobility scooters have become serious problems. For example, if a mobility scooter stops inside a railway crossing due to battery exhaustion, it is very dangerous because accidental contact with a train may happen. Measuring the distance to a railway crossing during driving is helpful to avoid entrance to a railway crossing without enough battery. In this paper, we propose a method for predicting the distance to a railroad crossing based on the railway crossing warning signs in the video from a camera installed in front of the mobility scooter. In experiments, we evaluate the proposed method using images taken at various positions in relation to the railway crossing and show that the proposed method achieves higher accuracy than the distance estimation using a depth sensor.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"452 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113967152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Highly Efficient Layout-Aware FPGA Overlay Accelerator Mapping Method 一种高效的FPGA布局感知叠加加速器映射方法
Tanvir Ahmed, Johannes Maximilian Kühn, Ken Namura
FPGAs are gathering traction as a platform for the acceleration of applications requiring both high performance and specialization. However, exploiting the maximum compute potential of FPGAs remains a critical and time-consuming task, usually requiring expert knowledge. Typically, designers seek to maximize the usage of hardened arithmetic blocks (DSP, such as DSP48 in Xilinx devices), but as their number is limited, the critical path quickly increases when portions are mapped to lookup tables (LUT). To mitigate the DSP limitation and to maximize FPGA utilization, we propose combining FPGA overlay accelerators and a mapping method that efficiently exploits the FPGA's layout information and its resources. This mapping method relies on a two-step process: 1. extraction of architectural and layout information of the FPGA, 2. optimized placement of the processing elements (PEs) of the accelerator onto the FPGA resources. The placement step maps the PEs to DSPs and LUTs to reduce the critical path among PEs. We applied our method to implement a systolic array, a multiplier array, and a coarse-grained reconfigurable architecture (CGRA) on a Xilinx FPGA. The proposed method achieves more than 14 x performance and energy efficiency increase over the vendor tool mapping while equally maximizing FPGA utilization by more than 1.5 x compared to DSP limited mappings.
fpga作为加速需要高性能和专门化的应用程序的平台,越来越受到关注。然而,利用fpga的最大计算潜力仍然是一项关键且耗时的任务,通常需要专业知识。通常,设计人员寻求最大限度地使用强化算术块(DSP,如Xilinx设备中的DSP48),但由于它们的数量有限,当部分映射到查找表(LUT)时,关键路径迅速增加。为了减轻DSP的限制并最大限度地提高FPGA的利用率,我们提出将FPGA覆盖加速器与有效利用FPGA布局信息及其资源的映射方法相结合。这种映射方法依赖于两个步骤:1。2. FPGA结构和布局信息的提取;将加速器的处理元件(pe)优化放置到FPGA资源上。放置步骤将pe映射到dsp和lut,以减少pe之间的关键路径。我们应用我们的方法在Xilinx FPGA上实现了收缩阵列、乘法器阵列和粗粒度可重构架构(CGRA)。与供应商工具映射相比,所提出的方法实现了超过14倍的性能和能效提升,同时与DSP有限映射相比,FPGA利用率同样最大化1.5倍以上。
{"title":"A Highly Efficient Layout-Aware FPGA Overlay Accelerator Mapping Method","authors":"Tanvir Ahmed, Johannes Maximilian Kühn, Ken Namura","doi":"10.1109/MCSoC51149.2021.00046","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00046","url":null,"abstract":"FPGAs are gathering traction as a platform for the acceleration of applications requiring both high performance and specialization. However, exploiting the maximum compute potential of FPGAs remains a critical and time-consuming task, usually requiring expert knowledge. Typically, designers seek to maximize the usage of hardened arithmetic blocks (DSP, such as DSP48 in Xilinx devices), but as their number is limited, the critical path quickly increases when portions are mapped to lookup tables (LUT). To mitigate the DSP limitation and to maximize FPGA utilization, we propose combining FPGA overlay accelerators and a mapping method that efficiently exploits the FPGA's layout information and its resources. This mapping method relies on a two-step process: 1. extraction of architectural and layout information of the FPGA, 2. optimized placement of the processing elements (PEs) of the accelerator onto the FPGA resources. The placement step maps the PEs to DSPs and LUTs to reduce the critical path among PEs. We applied our method to implement a systolic array, a multiplier array, and a coarse-grained reconfigurable architecture (CGRA) on a Xilinx FPGA. The proposed method achieves more than 14 x performance and energy efficiency increase over the vendor tool mapping while equally maximizing FPGA utilization by more than 1.5 x compared to DSP limited mappings.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130648263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RVCoreP-32IC: An optimized RISC- V soft processor supporting the compressed instructions RVCoreP-32IC:支持压缩指令的优化RISC- V软处理器
Takuto Kanamori, Kenji Kise
The compressed instructions extension in RISC-V reduces the program size. However, it needs a complicated logic for the instruction fetch unit and has an impact on performance. In this paper, we propose an instruction fetch unit that supports the compressed instructions achieving high performance. Furthermore, we propose a RISC-V soft processor using this unit. We implement this proposed processor in Verilog HDL and verify the behavior using Verilog simulation and a Xilinx Artix-7 FPGA board. We compare the results of some benchmarks and the amount of hardware with related works. From the evaluation results, we show that the proposed processor achieves 42.5% performance improvement compared with VexRiscv, which is a high-performance and open source RV32IC processor.
RISC-V中的压缩指令扩展减小了程序的大小。但是,它需要一个复杂的指令获取单元逻辑,并且对性能有影响。本文提出了一种支持压缩指令的指令提取单元。在此基础上,提出了一种RISC-V软处理器。我们在Verilog HDL中实现了该处理器,并使用Verilog仿真和Xilinx Artix-7 FPGA板验证了该处理器的行为。我们将一些基准测试的结果和硬件数量与相关工作进行了比较。从评估结果来看,与高性能开源RV32IC处理器VexRiscv相比,该处理器的性能提高了42.5%。
{"title":"RVCoreP-32IC: An optimized RISC- V soft processor supporting the compressed instructions","authors":"Takuto Kanamori, Kenji Kise","doi":"10.1109/MCSoC51149.2021.00014","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00014","url":null,"abstract":"The compressed instructions extension in RISC-V reduces the program size. However, it needs a complicated logic for the instruction fetch unit and has an impact on performance. In this paper, we propose an instruction fetch unit that supports the compressed instructions achieving high performance. Furthermore, we propose a RISC-V soft processor using this unit. We implement this proposed processor in Verilog HDL and verify the behavior using Verilog simulation and a Xilinx Artix-7 FPGA board. We compare the results of some benchmarks and the amount of hardware with related works. From the evaluation results, we show that the proposed processor achieves 42.5% performance improvement compared with VexRiscv, which is a high-performance and open source RV32IC processor.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116323591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Execution Right Delegation Scheduling Algorithm for Multiprocessor 多处理机的执行权委托调度算法
Takaharu Suzuki, Kiyofumi Tanaka
In scheduling algorithms based on the Rate Monotonic (RM) method widely used in development of real-time systems, tasks with shorter periods have higher priorities. In contrast, ones with longer periods are likely to suffer from increased response times and jitters due to their lower priorities. We proposed the Execution Right Delegation (ERD) method for uniprocessor systems based on RM where a high-priority server for a privileged (or important) task is introduced to shorten response times of the task. In this paper, we propose an extended ERD method for multiprocessor systems. Our system model is based on partitioned systems while only a privileged task can migrate. In the evaluation, it is confirmed that response times of a privileged task are reduced compared with partitioned Fixed-Task-Priority(FTP) and global FTP scheduling.
在实时系统开发中广泛使用的基于速率单调法的调度算法中,周期越短的任务优先级越高。相比之下,那些周期较长的人可能会因为优先级较低而增加响应时间和紧张。我们提出了基于RM的单处理器系统的执行权委托(ERD)方法,其中为特权(或重要)任务引入高优先级服务器以缩短任务的响应时间。本文提出了一种适用于多处理机系统的扩展ERD方法。我们的系统模型基于分区系统,只有特权任务可以迁移。评估结果表明,与分区式FTP调度和全局FTP调度相比,特权任务的响应时间明显缩短。
{"title":"Execution Right Delegation Scheduling Algorithm for Multiprocessor","authors":"Takaharu Suzuki, Kiyofumi Tanaka","doi":"10.1109/MCSoC51149.2021.00015","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00015","url":null,"abstract":"In scheduling algorithms based on the Rate Monotonic (RM) method widely used in development of real-time systems, tasks with shorter periods have higher priorities. In contrast, ones with longer periods are likely to suffer from increased response times and jitters due to their lower priorities. We proposed the Execution Right Delegation (ERD) method for uniprocessor systems based on RM where a high-priority server for a privileged (or important) task is introduced to shorten response times of the task. In this paper, we propose an extended ERD method for multiprocessor systems. Our system model is based on partitioned systems while only a privileged task can migrate. In the evaluation, it is confirmed that response times of a privileged task are reduced compared with partitioned Fixed-Task-Priority(FTP) and global FTP scheduling.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131188584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1