首页 > 最新文献

2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)最新文献

英文 中文
Design of Reward Functions for RL-based High-Speed Autonomous Driving 基于rl的高速自动驾驶奖励函数设计
Tanaka Kohsuke, Yuta Shintomi, Y. Okuyama, Taro Suzuki
We aim to design a reward function for autonomous driving by reinforcement learning for achieving high-speed driving while maintaining training stability for reaching the racetrack's goal. High-speed driving is aggressive, such as running on the road's edge as fast as possible at corners. Thus, creating reinforcement learning agents that drive at high speeds and can reach a goal is difficult in racing competition situations because of running off the road or collisions with other objects. In general, human drivers see the road ahead and make control decisions. Therefore, we design a reward function to consider the road ahead depending on the driving speed. Through experiments in a simulator, we compared our proposed reward function with others proposed in previous works in terms of driving speed and the training stability about reaching the goal. As a result of the experiment, our proposed reward function achieves an improvement of lap time by 0.71 seconds (3 %) with only a 4.4 % loss in stability in reaching a goal compared to the most stable reward function proposed in previous work.
我们的目标是通过强化学习设计一个自动驾驶的奖励函数,以实现高速驾驶,同时保持训练稳定性以达到赛道目标。高速驾驶是激进的,比如在弯道上尽可能快地在道路边缘行驶。因此,在赛车比赛中,由于会偏离道路或与其他物体发生碰撞,很难创造出高速行驶并能达到目标的强化学习代理。一般来说,人类驾驶员会看到前方的道路并做出控制决策。因此,我们设计了一个根据行驶速度考虑前方道路的奖励函数。通过仿真实验,我们将所提出的奖励函数与前人提出的奖励函数在行驶速度和达到目标的训练稳定性方面进行了比较。实验结果表明,与之前最稳定的奖励函数相比,我们提出的奖励函数将单圈时间提高了0.71秒(3%),而达到目标的稳定性仅损失4.4%。
{"title":"Design of Reward Functions for RL-based High-Speed Autonomous Driving","authors":"Tanaka Kohsuke, Yuta Shintomi, Y. Okuyama, Taro Suzuki","doi":"10.1109/MCSoC57363.2022.00015","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00015","url":null,"abstract":"We aim to design a reward function for autonomous driving by reinforcement learning for achieving high-speed driving while maintaining training stability for reaching the racetrack's goal. High-speed driving is aggressive, such as running on the road's edge as fast as possible at corners. Thus, creating reinforcement learning agents that drive at high speeds and can reach a goal is difficult in racing competition situations because of running off the road or collisions with other objects. In general, human drivers see the road ahead and make control decisions. Therefore, we design a reward function to consider the road ahead depending on the driving speed. Through experiments in a simulator, we compared our proposed reward function with others proposed in previous works in terms of driving speed and the training stability about reaching the goal. As a result of the experiment, our proposed reward function achieves an improvement of lap time by 0.71 seconds (3 %) with only a 4.4 % loss in stability in reaching a goal compared to the most stable reward function proposed in previous work.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127383373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Hardware Architecture for Posit Addition/Subtraction 正数加减法的高效硬件架构
Susheel Ujwal Siddamshetty, Srinivas Boppu, D. Ghosh
This paper proposes an efficient architecture for the design of adder/subtractor for the recently developed universal posit number system. Posits are designed as a direct drop-in replacement for IEEE-754 standard floating-point numbers. They provide compelling advantages over floats, such as larger dynamic range, higher accuracy than the same bit width floats, bit-wise identical results across systems, no overflow or underflow, tapered accuracy, and simpler exception handling. The word size $(N)$ and exponent size $(ES)$ define a posit format. It includes a variable exponent, consisting of variable length regime-bits and exponent-bits with a maximum size of up to $ES$ bits. This also leads to a change in the size and position of the mantissa bits. These run-time variations in the length of the regime, exponent, and mantissa fields pose a challenge while designing arithmetic hardware units. Though a few adder/subtractors are proposed in the literature, they are not 100% accurate. However, the proposed design is efficient in performance metrics such as area, delay, and leakage power. Furthermore, our design is 100% accurate, on an average 15 % area, and 23 % leakage power efficient while having a similar critical path delay when compared to the recent designs proposed in the literature when synthesized using Cadence's 45 nm standard cell library.
本文针对最近发展起来的通用正数系统,提出了一种有效的加减法设计体系结构。位被设计为IEEE-754标准浮点数的直接插入式替代品。与浮点数相比,它们提供了令人信服的优势,例如更大的动态范围、比相同位宽的浮点数更高的精度、跨系统的按位计算相同的结果、没有溢出或下溢、逐渐减小的精度以及更简单的异常处理。单词大小$(N)$和指数大小$(ES)$定义了正数格式。它包括一个可变指数,由可变长度的体制位和指数位组成,最大大小可达$ES$ bits。这也导致尾数位的大小和位置的变化。这些运行时状态、指数和尾数字段长度的变化给设计算术硬件单元带来了挑战。虽然在文献中提出了一些加/减法,但它们不是100%准确的。然而,所提出的设计是有效的性能指标,如面积,延迟和泄漏功率。此外,与文献中提出的使用Cadence的45纳米标准电池库合成的最新设计相比,我们的设计具有100%的准确性,平均面积为15%,泄漏功率效率为23%,同时具有相似的关键路径延迟。
{"title":"Efficient Hardware Architecture for Posit Addition/Subtraction","authors":"Susheel Ujwal Siddamshetty, Srinivas Boppu, D. Ghosh","doi":"10.1109/MCSoC57363.2022.00068","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00068","url":null,"abstract":"This paper proposes an efficient architecture for the design of adder/subtractor for the recently developed universal posit number system. Posits are designed as a direct drop-in replacement for IEEE-754 standard floating-point numbers. They provide compelling advantages over floats, such as larger dynamic range, higher accuracy than the same bit width floats, bit-wise identical results across systems, no overflow or underflow, tapered accuracy, and simpler exception handling. The word size $(N)$ and exponent size $(ES)$ define a posit format. It includes a variable exponent, consisting of variable length regime-bits and exponent-bits with a maximum size of up to $ES$ bits. This also leads to a change in the size and position of the mantissa bits. These run-time variations in the length of the regime, exponent, and mantissa fields pose a challenge while designing arithmetic hardware units. Though a few adder/subtractors are proposed in the literature, they are not 100% accurate. However, the proposed design is efficient in performance metrics such as area, delay, and leakage power. Furthermore, our design is 100% accurate, on an average 15 % area, and 23 % leakage power efficient while having a similar critical path delay when compared to the recent designs proposed in the literature when synthesized using Cadence's 45 nm standard cell library.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130323948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Lightweight End-to-end Network for Wearing Mask Recognition on Low-resolution Images 基于低分辨率图像的面罩识别轻量端到端网络
Menglei Li, Hongbo Chen, Zixue Cheng
In realistic scenarios, resolution is still one of the major problems in wearing mask recognition. Due to the large distances between surveillance cameras and human faces, facial images captured by low-power devices usually have low resolution and lead to poor recognition results. To address the above issue, we propose a lightweight end-to-end network to reconstruct Super-resolution (SR) images and achieve wearing mask recognition. Besides, to apply to challenging real applications, we combine hardware devices and software technology to simulate the recognition process of wearing masks in real scenarios. To demonstrate the effectiveness of the method, we comprehensively evaluate our proposed method by comparing it with state-of-the-art methods. The recognition accuracy using super-resolution is 98.44%, outperforming RepVGG-A2 (97.00%) and ResNet34 (93.75%). Moreover, experimental results show that the number of parameters and FLOPs in our recognition model is 9.34 million and 1.85 billion, respectively, both of which outperform traditional CNN methods (20 million+ parameters and 3 billion+ FLOPs). The performance of our recognition system is competitive with state-of-the-art methods in terms of low memory usage and computational complexity, showing that the system can be cost-effectively and widely applied in real-world environments and thus has potential applications in respiratory disease prevention.
在现实场景中,分辨率仍然是戴口罩识别的主要问题之一。由于监控摄像头与人脸距离较远,低功耗设备采集的人脸图像通常分辨率较低,导致识别效果不佳。为了解决上述问题,我们提出了一种轻量级的端到端网络来重建超分辨率(SR)图像并实现戴面具识别。此外,为了适应具有挑战性的实际应用,我们将硬件设备和软件技术相结合,模拟真实场景中戴口罩的识别过程。为了证明该方法的有效性,我们通过将其与最先进的方法进行比较来全面评估我们提出的方法。超分辨率识别准确率为98.44%,优于RepVGG-A2(97.00%)和ResNet34(93.75%)。此外,实验结果表明,我们的识别模型中参数和FLOPs的数量分别为934万个和18.5亿个,均优于传统的CNN方法(2000万+参数和30亿+ FLOPs)。我们的识别系统在低内存使用和计算复杂性方面与最先进的方法具有竞争力,表明该系统可以经济有效地广泛应用于现实环境,因此在呼吸系统疾病预防方面具有潜在的应用前景。
{"title":"A Lightweight End-to-end Network for Wearing Mask Recognition on Low-resolution Images","authors":"Menglei Li, Hongbo Chen, Zixue Cheng","doi":"10.1109/MCSoC57363.2022.00016","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00016","url":null,"abstract":"In realistic scenarios, resolution is still one of the major problems in wearing mask recognition. Due to the large distances between surveillance cameras and human faces, facial images captured by low-power devices usually have low resolution and lead to poor recognition results. To address the above issue, we propose a lightweight end-to-end network to reconstruct Super-resolution (SR) images and achieve wearing mask recognition. Besides, to apply to challenging real applications, we combine hardware devices and software technology to simulate the recognition process of wearing masks in real scenarios. To demonstrate the effectiveness of the method, we comprehensively evaluate our proposed method by comparing it with state-of-the-art methods. The recognition accuracy using super-resolution is 98.44%, outperforming RepVGG-A2 (97.00%) and ResNet34 (93.75%). Moreover, experimental results show that the number of parameters and FLOPs in our recognition model is 9.34 million and 1.85 billion, respectively, both of which outperform traditional CNN methods (20 million+ parameters and 3 billion+ FLOPs). The performance of our recognition system is competitive with state-of-the-art methods in terms of low memory usage and computational complexity, showing that the system can be cost-effectively and widely applied in real-world environments and thus has potential applications in respiratory disease prevention.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133140041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Systolic Array Based Convolutional Neural Network Inference on FPGA 基于收缩阵列的卷积神经网络推理FPGA
Shi Hui Chua, T. Teo, Mulat Ayinet Tiruye, I-Chyn Wey
Convolutional Neural Networks (CNNs) possess a particular edge over its predecessor, the Multi-Layer Perceptron (MLP). This is due to its weight sharing features that allows the CNN to use less parameters for the same number of outputs as compared to the MLP. Systolic arrays capitalize on the weight sharing property of CNNs to do data reuse while performing convolutional operations, in order to reduce the power consumption from the memory accesses. A kernel fitting systolic processing element array was designed with only positive multiplication to increase the throughput and power efficiency of the CNN accelerator, while using weight stationary dataflow to achieve data reuse in the systolic array. A cost-optimized lightweight solution is implemented through low-cost FPGA hardware so as to allow for greater accessibility. The CNN accelerator consumes 0.363 W power at 100 MHz operating frequency. A peak throughput of 10.98 GOps/s was achieved with peak performance density of 0.200 GOps/s/DSP and peak power efficiency of 30.26 GOps/s/W. Even with the added support for additional functions, proposed design achieved up to 1.59x better power efficiency compared to other systolic implementations and up to 6.17x better power efficiency compared to non-systolic implementations.
卷积神经网络(cnn)比其前身多层感知器(MLP)具有特殊的优势。这是由于与MLP相比,它的权重共享特性允许CNN使用更少的参数来获得相同数量的输出。收缩数组利用cnn的权值共享特性,在执行卷积运算的同时进行数据重用,以减少内存访问的功耗。为了提高CNN加速器的吞吐量和功率效率,设计了一种仅采用正乘法的核拟合收缩处理单元阵列,同时使用权值固定的数据流实现收缩阵列中的数据重用。通过低成本的FPGA硬件实现了成本优化的轻量级解决方案,从而允许更大的可访问性。工作频率为100mhz时,CNN加速器功耗为0.363 W。峰值吞吐量为10.98 GOps/s,峰值性能密度为0.200 GOps/s/DSP,峰值功率效率为30.26 GOps/s/W。即使增加了对其他功能的支持,与其他收缩实现相比,所提出的设计的功率效率提高了1.59倍,与非收缩实现相比,功率效率提高了6.17倍。
{"title":"Systolic Array Based Convolutional Neural Network Inference on FPGA","authors":"Shi Hui Chua, T. Teo, Mulat Ayinet Tiruye, I-Chyn Wey","doi":"10.1109/MCSoC57363.2022.00029","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00029","url":null,"abstract":"Convolutional Neural Networks (CNNs) possess a particular edge over its predecessor, the Multi-Layer Perceptron (MLP). This is due to its weight sharing features that allows the CNN to use less parameters for the same number of outputs as compared to the MLP. Systolic arrays capitalize on the weight sharing property of CNNs to do data reuse while performing convolutional operations, in order to reduce the power consumption from the memory accesses. A kernel fitting systolic processing element array was designed with only positive multiplication to increase the throughput and power efficiency of the CNN accelerator, while using weight stationary dataflow to achieve data reuse in the systolic array. A cost-optimized lightweight solution is implemented through low-cost FPGA hardware so as to allow for greater accessibility. The CNN accelerator consumes 0.363 W power at 100 MHz operating frequency. A peak throughput of 10.98 GOps/s was achieved with peak performance density of 0.200 GOps/s/DSP and peak power efficiency of 30.26 GOps/s/W. Even with the added support for additional functions, proposed design achieved up to 1.59x better power efficiency compared to other systolic implementations and up to 6.17x better power efficiency compared to non-systolic implementations.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131252527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
FPGA-Based Prototype of a Quantum Annealing Simulator for Sparse Ising Model 基于fpga的稀疏Ising模型量子退火模拟器原型
H. M. Waidyasooriya, Yuta Ohma, M. Hariyama
Quantum annealing (QA) is a probabilistic approx-imation method to find the global optimum of a combinatorial optimization problem. QA is done on quantum annealers such as D-wave using quantum properties. Since the number of qubits in quantum annealers is limited, it is difficult to use those to solve large-scale real-world problems. Therefore, quantum annealing simulation on digital computers is necessary. In this paper, we discuss an FPGA based quantum annealing simulator for sparse Ising model. Unlike a fully-connected Ising model, the number of connections among spins in sparse model is limited. Highly sparse Ising models require significantly low amount of computations while allowing more parallel operations. One the other hand, sparsity and the connections among spins are not the same for different Ising models, and it is difficult to propose one specific accelerator architecture for all. We propose a method to automatically generate an application specific accelerator archi-tecture for a given sparse Ising model. The proposed accelerator fully exploits the parallelism to increase the processing speed. We design an FPGA prototype of the proposed accelerator and confirmed the correct behavior. In future, we expect to extend the proposed method to execute large quantum annealing simulations using multiple FPGAs.
量子退火(QA)是一种寻找组合优化问题全局最优解的概率逼近方法。QA是利用量子特性在量子退火器(如D-wave)上完成的。由于量子退火炉中的量子比特数量有限,因此很难使用这些量子比特来解决大规模的现实问题。因此,在数字计算机上进行量子退火模拟是必要的。本文讨论了一种基于FPGA的稀疏Ising模型量子退火模拟器。与全连接的Ising模型不同,稀疏模型中自旋之间的连接数是有限的。高度稀疏的Ising模型需要非常少的计算量,同时允许更多的并行操作。另一方面,不同的伊辛模型的稀疏性和自旋之间的联系是不一样的,而且很难为所有的模型提出一个特定的加速器架构。我们提出了一种针对给定的稀疏Ising模型自动生成特定于应用的加速器架构的方法。该加速器充分利用并行性来提高处理速度。我们设计了所提出的加速器的FPGA原型,并验证了其正确的行为。未来,我们期望将所提出的方法扩展到使用多个fpga执行大型量子退火模拟。
{"title":"FPGA-Based Prototype of a Quantum Annealing Simulator for Sparse Ising Model","authors":"H. M. Waidyasooriya, Yuta Ohma, M. Hariyama","doi":"10.1109/MCSoC57363.2022.00039","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00039","url":null,"abstract":"Quantum annealing (QA) is a probabilistic approx-imation method to find the global optimum of a combinatorial optimization problem. QA is done on quantum annealers such as D-wave using quantum properties. Since the number of qubits in quantum annealers is limited, it is difficult to use those to solve large-scale real-world problems. Therefore, quantum annealing simulation on digital computers is necessary. In this paper, we discuss an FPGA based quantum annealing simulator for sparse Ising model. Unlike a fully-connected Ising model, the number of connections among spins in sparse model is limited. Highly sparse Ising models require significantly low amount of computations while allowing more parallel operations. One the other hand, sparsity and the connections among spins are not the same for different Ising models, and it is difficult to propose one specific accelerator architecture for all. We propose a method to automatically generate an application specific accelerator archi-tecture for a given sparse Ising model. The proposed accelerator fully exploits the parallelism to increase the processing speed. We design an FPGA prototype of the proposed accelerator and confirmed the correct behavior. In future, we expect to extend the proposed method to execute large quantum annealing simulations using multiple FPGAs.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116122286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fake Image Detection Using An Ensemble of CNN Models Specialized For Individual Face Parts 使用针对单个面部部分的CNN模型集合进行假图像检测
Akihisa Kawabe, Ryuto Haga, Yoichi Tomioka, Y. Okuyama, Jungpil Shin
With the rapid increase of deep learning technology, creating human face images with artificial intelligence (AI) is becoming easier. Those generated images are coming up to images that humans cannot distinguish from authentic ones. It is essential to realize an accurate method to detect such fake images to avoid abusing them. In this paper, we propose a fake image detection using an ensemble model of convolutional neural network (CNN) models that focus on deepfake detection of individual face parts. Our results show that a combination of deepfake detection based on different face parts is effective. This idea can be adopted on partially manipulated deepfake images/videos.
随着深度学习技术的快速发展,用人工智能(AI)创建人脸图像变得越来越容易。这些生成的图像接近于人类无法与真实图像区分的图像。为了避免虚假图像的滥用,有必要实现一种准确的检测方法。在本文中,我们提出了一种使用卷积神经网络(CNN)模型的集成模型的假图像检测,该模型专注于对单个面部部位的深度假检测。结果表明,基于不同人脸部位的深度假检测组合是有效的。这个想法可以在部分操纵的深度伪造图像/视频上采用。
{"title":"Fake Image Detection Using An Ensemble of CNN Models Specialized For Individual Face Parts","authors":"Akihisa Kawabe, Ryuto Haga, Yoichi Tomioka, Y. Okuyama, Jungpil Shin","doi":"10.1109/MCSoC57363.2022.00021","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00021","url":null,"abstract":"With the rapid increase of deep learning technology, creating human face images with artificial intelligence (AI) is becoming easier. Those generated images are coming up to images that humans cannot distinguish from authentic ones. It is essential to realize an accurate method to detect such fake images to avoid abusing them. In this paper, we propose a fake image detection using an ensemble model of convolutional neural network (CNN) models that focus on deepfake detection of individual face parts. Our results show that a combination of deepfake detection based on different face parts is effective. This idea can be adopted on partially manipulated deepfake images/videos.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122603764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Hardware Implementation of an Automatic Color Equalization Algorithm for Real-time Image Enhancement 用于实时图像增强的自动色彩均衡算法的硬件实现
Xiang-Yu Chen, Yu-Hsiang Wang, Yao-Song Zhang, Yen-Jui Chen, Shiann-Rong Kuang
Automatic color equalization (ACE) algorithm is an effective method for color image enhancement, but its computational complexity is extremely high. In this paper, we first modify the ACE algorithm to reduce the computational complexity and realization cost while maintaining good visual quality. Subsequently, an efficient VLSI architecture for the hardware-friendly ACE algorithm is proposed to meet the requirement of real-time image enhancement. FPGA (Field Programmable Gate Arrays) implementation result shows that the proposed architecture can operate at 120MHz and achieve a throughput of 60 frame/s for 256×256 resolution images using about 1.15k and 1.78k of FPGA's logic (LUT) and register resources, respectively. Compared with the existing design, the proposed architecture can achieve higher performance with fewer hardware resources and comparable visual quality.
自动色彩均衡(ACE)算法是一种有效的彩色图像增强方法,但其计算复杂度极高。本文首先对ACE算法进行改进,在保持良好视觉质量的同时降低计算复杂度和实现成本。随后,针对ACE算法的实时图像增强需求,提出了一种高效的VLSI架构。FPGA(现场可编程门阵列)的实现结果表明,所提出的架构可以在120MHz的频率下工作,对于256×256分辨率的图像,分别使用约1.15k和1.78k的FPGA逻辑(LUT)和寄存器资源,实现60帧/秒的吞吐量。与现有设计相比,该架构可以在更少的硬件资源和相当的视觉质量下实现更高的性能。
{"title":"Hardware Implementation of an Automatic Color Equalization Algorithm for Real-time Image Enhancement","authors":"Xiang-Yu Chen, Yu-Hsiang Wang, Yao-Song Zhang, Yen-Jui Chen, Shiann-Rong Kuang","doi":"10.1109/MCSoC57363.2022.00036","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00036","url":null,"abstract":"Automatic color equalization (ACE) algorithm is an effective method for color image enhancement, but its computational complexity is extremely high. In this paper, we first modify the ACE algorithm to reduce the computational complexity and realization cost while maintaining good visual quality. Subsequently, an efficient VLSI architecture for the hardware-friendly ACE algorithm is proposed to meet the requirement of real-time image enhancement. FPGA (Field Programmable Gate Arrays) implementation result shows that the proposed architecture can operate at 120MHz and achieve a throughput of 60 frame/s for 256×256 resolution images using about 1.15k and 1.78k of FPGA's logic (LUT) and register resources, respectively. Compared with the existing design, the proposed architecture can achieve higher performance with fewer hardware resources and comparable visual quality.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131179326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of Different Microarchitectures for Energy-Efficient RISC-V Cores 节能RISC-V内核的不同微架构评估
J. Kadomoto, H. Irie, S. Sakai
The increase in Internet of Things $(text{IoT})$ applications has triggered the development of energy-efficient embedded SoCs that can utilize limited energy sources. Relatively simple general-purpose processor cores are a vital component of SoCs, and op-timizing the power consumption, performance, and area is a key issue in the design of $text{SoCs}$. Therefore, this study quantitatively compared the power, performance, and area of several 32-bit RISC-V cores with different microarchitectures. The simulation evaluations were performed for each processor with different pipeline configurations, with and without a multiplier and divider. The benchmark execution performance of the processors in a register transfer level (RTL) design, as well as the estimated power consumption and area based on logic synthesis and place-and-route using various CMOS process technologies are presented. Based on the results, we provided a brief guideline for the selection of microarchitectures for energy-efficient embedded SoCs.
物联网应用的增加引发了可以利用有限能源的节能嵌入式soc的发展。相对简单的通用处理器内核是soc的重要组成部分,优化功耗、性能和面积是设计soc的关键问题。因此,本研究定量比较了几种不同微架构的32位RISC-V内核的功耗、性能和面积。对每个具有不同管道配置的处理器进行了仿真评估,有和没有乘法器和除法器。给出了寄存器传输级(RTL)设计的处理器的基准执行性能,以及基于逻辑合成和采用各种CMOS工艺技术的放置和布线的估计功耗和面积。基于这些结果,我们为节能嵌入式soc的微架构选择提供了一个简要的指导方针。
{"title":"Evaluation of Different Microarchitectures for Energy-Efficient RISC-V Cores","authors":"J. Kadomoto, H. Irie, S. Sakai","doi":"10.1109/MCSoC57363.2022.00022","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00022","url":null,"abstract":"The increase in Internet of Things $(text{IoT})$ applications has triggered the development of energy-efficient embedded SoCs that can utilize limited energy sources. Relatively simple general-purpose processor cores are a vital component of SoCs, and op-timizing the power consumption, performance, and area is a key issue in the design of $text{SoCs}$. Therefore, this study quantitatively compared the power, performance, and area of several 32-bit RISC-V cores with different microarchitectures. The simulation evaluations were performed for each processor with different pipeline configurations, with and without a multiplier and divider. The benchmark execution performance of the processors in a register transfer level (RTL) design, as well as the estimated power consumption and area based on logic synthesis and place-and-route using various CMOS process technologies are presented. Based on the results, we provided a brief guideline for the selection of microarchitectures for energy-efficient embedded SoCs.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127046995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Radar and Camera Fusion for Object Forecasting in Driving Scenarios 基于雷达与相机融合的驾驶场景目标预测
Albert Budi Christian, Yu-Hsuan Wu, Chih-Yu Lin, Lan-Da Van, Y. Tseng
In this paper, we propose a sensor fusion architecture that combines data collected by the camera and radars and utilizes radar velocity for road users' trajectory prediction in real-world driving scenarios. This architecture is multi-stage, following the detect-track-predict paradigm. In the detection stage, camera images and radar point clouds are used to detect objects in the vehicle's surroundings by adopting two object detection models. The detected objects are tracked by an online tracking method. We also design a radar association method to extract radar velocity for an object. In the prediction stage, we build a recurrent neural network to process an object's temporal sequence of positions and velocities and predict future trajectories. Experiments on the real-world autonomous driving nuScenes dataset show that the radar velocity mainly affects the center of the bounding box representing the position of an object and thus improves the prediction performance.
在本文中,我们提出了一种传感器融合架构,该架构结合了摄像头和雷达收集的数据,并利用雷达速度来预测道路使用者在真实驾驶场景中的轨迹。该体系结构是多阶段的,遵循检测-跟踪-预测范式。在检测阶段,采用两种目标检测模型,利用摄像头图像和雷达点云对车辆周围的目标进行检测。通过在线跟踪方法跟踪检测到的对象。我们还设计了一种雷达关联方法来提取目标的雷达速度。在预测阶段,我们建立了一个递归神经网络来处理物体的位置和速度的时间序列,并预测未来的轨迹。在真实自动驾驶nuScenes数据集上的实验表明,雷达速度主要影响代表物体位置的边界框的中心,从而提高了预测性能。
{"title":"Radar and Camera Fusion for Object Forecasting in Driving Scenarios","authors":"Albert Budi Christian, Yu-Hsuan Wu, Chih-Yu Lin, Lan-Da Van, Y. Tseng","doi":"10.1109/MCSoC57363.2022.00026","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00026","url":null,"abstract":"In this paper, we propose a sensor fusion architecture that combines data collected by the camera and radars and utilizes radar velocity for road users' trajectory prediction in real-world driving scenarios. This architecture is multi-stage, following the detect-track-predict paradigm. In the detection stage, camera images and radar point clouds are used to detect objects in the vehicle's surroundings by adopting two object detection models. The detected objects are tracked by an online tracking method. We also design a radar association method to extract radar velocity for an object. In the prediction stage, we build a recurrent neural network to process an object's temporal sequence of positions and velocities and predict future trajectories. Experiments on the real-world autonomous driving nuScenes dataset show that the radar velocity mainly affects the center of the bounding box representing the position of an object and thus improves the prediction performance.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116827074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1