Pub Date : 2024-08-31DOI: 10.1007/s11554-024-01543-4
Bing Zhao, Aoran Guo, Ruitao Ma, Yanfei Zhang, Jinliang Gong
With the development of apple-picking robots, deep learning models have become essential in apple detection. However, current detection models are often disrupted by complex backgrounds, leading to low recognition accuracy and slow speeds in natural environments. To address these issues, this study proposes an improved model, YOLOv8s-CFB, based on YOLOv8s. This model introduces partial convolution (PConv) in the backbone network, enhances the C2f module, and forms a new architecture, CSPPC, to reduce computational complexity and improve speed. Additionally, FocalModulation technology replaces the original SPPF module to enhance the model’s ability to recognize key areas. Finally, the bidirectional feature pyramid (BiFPN) is introduced to adaptively learn the importance of weights at each scale, effectively retaining multi-scale information through a bidirectional context information transmission mechanism, and improving the model’s detection ability for occluded targets. Test results show that the improved YOLOv8 network achieves better detection performance, with an average accuracy of 93.86%, a parameter volume of 8.83 M, and a detection time of 0.7 ms. The improved algorithm achieves high detection accuracy with a small weight file, making it suitable for deployment on mobile devices. Therefore, the improved model can efficiently and accurately detect apples in complex orchard environments in real time.
{"title":"YOLOv8s-CFB: a lightweight method for real-time detection of apple fruits in complex environments","authors":"Bing Zhao, Aoran Guo, Ruitao Ma, Yanfei Zhang, Jinliang Gong","doi":"10.1007/s11554-024-01543-4","DOIUrl":"https://doi.org/10.1007/s11554-024-01543-4","url":null,"abstract":"<p>With the development of apple-picking robots, deep learning models have become essential in apple detection. However, current detection models are often disrupted by complex backgrounds, leading to low recognition accuracy and slow speeds in natural environments. To address these issues, this study proposes an improved model, YOLOv8s-CFB, based on YOLOv8s. This model introduces partial convolution (PConv) in the backbone network, enhances the C2f module, and forms a new architecture, CSPPC, to reduce computational complexity and improve speed. Additionally, FocalModulation technology replaces the original SPPF module to enhance the model’s ability to recognize key areas. Finally, the bidirectional feature pyramid (BiFPN) is introduced to adaptively learn the importance of weights at each scale, effectively retaining multi-scale information through a bidirectional context information transmission mechanism, and improving the model’s detection ability for occluded targets. Test results show that the improved YOLOv8 network achieves better detection performance, with an average accuracy of 93.86%, a parameter volume of 8.83 M, and a detection time of 0.7 ms. The improved algorithm achieves high detection accuracy with a small weight file, making it suitable for deployment on mobile devices. Therefore, the improved model can efficiently and accurately detect apples in complex orchard environments in real time.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"31 4 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maintaining road pavement integrity is crucial for ensuring safe and efficient transportation. Conventional methods for assessing pavement condition are often laborious and susceptible to human error. This paper proposes YOLO9tr, a novel lightweight object detection model for pavement damage detection, leveraging the advancements of deep learning. YOLO9tr is based on the YOLOv9 architecture, incorporating a partial attention block that enhances feature extraction and attention mechanisms, leading to improved detection performance in complex scenarios. The model is trained on a comprehensive dataset comprising road damage images from multiple countries. This dataset includes an expanded set of damage categories beyond the standard four types (longitudinal cracks, transverse cracks, alligator cracks, and potholes), providing a more nuanced classification of road damage. This broadened classification range allows for a more accurate and realistic assessment of pavement conditions. Comparative analysis demonstrates YOLO9tr’s superior precision and inference speed compared to state-of-the-art models like YOLOv8, YOLOv9 and YOLOv10, achieving a balance between computational efficiency and detection accuracy. The model achieves a high frame rate of up to 136 FPS, making it suitable for real-time applications such as video surveillance and automated inspection systems. The research presents an ablation study to analyze the impact of architectural modifications and hyperparameter variations on model performance, further validating the effectiveness of the partial attention block. The results highlight YOLO9tr’s potential for practical deployment in real-time pavement condition monitoring, contributing to the development of robust and efficient solutions for maintaining safe and functional road infrastructure.
{"title":"YOLO9tr: a lightweight model for pavement damage detection utilizing a generalized efficient layer aggregation network and attention mechanism","authors":"Sompote Youwai, Achitaphon Chaiyaphat, Pawarotorn Chaipetch","doi":"10.1007/s11554-024-01545-2","DOIUrl":"https://doi.org/10.1007/s11554-024-01545-2","url":null,"abstract":"<p>Maintaining road pavement integrity is crucial for ensuring safe and efficient transportation. Conventional methods for assessing pavement condition are often laborious and susceptible to human error. This paper proposes YOLO9tr, a novel lightweight object detection model for pavement damage detection, leveraging the advancements of deep learning. YOLO9tr is based on the YOLOv9 architecture, incorporating a partial attention block that enhances feature extraction and attention mechanisms, leading to improved detection performance in complex scenarios. The model is trained on a comprehensive dataset comprising road damage images from multiple countries. This dataset includes an expanded set of damage categories beyond the standard four types (longitudinal cracks, transverse cracks, alligator cracks, and potholes), providing a more nuanced classification of road damage. This broadened classification range allows for a more accurate and realistic assessment of pavement conditions. Comparative analysis demonstrates YOLO9tr’s superior precision and inference speed compared to state-of-the-art models like YOLOv8, YOLOv9 and YOLOv10, achieving a balance between computational efficiency and detection accuracy. The model achieves a high frame rate of up to 136 FPS, making it suitable for real-time applications such as video surveillance and automated inspection systems. The research presents an ablation study to analyze the impact of architectural modifications and hyperparameter variations on model performance, further validating the effectiveness of the partial attention block. The results highlight YOLO9tr’s potential for practical deployment in real-time pavement condition monitoring, contributing to the development of robust and efficient solutions for maintaining safe and functional road infrastructure.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"39 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-29DOI: 10.1007/s11554-024-01540-7
Jun Sun, Yifei Peng, Chen Chen, Bing Zhang, Zhaoqi Wu, Yilin Jia, Lei Shi
Accurate localization of apple fruits and recognition of occlusion types in complex orchard environments play an important role in precision agriculture. This work proposes an efficient fruit recognition model called Efficient Spatial and Channel Feature YOLOX (ESC-YOLO). ESC-YOLO is built upon YOLOX and fully leverages and emphasizes spatial channel information, ensuring coherence between global information and local features. The optimization strategies for the backbone network involve adopting EfficientViT as the foundational backbone, integrating Spatial and Channel Reconstruction Convolution (SCConv) into the input stem to reorganize spatial channel features and reduce redundancy, and constructing the Efficient-MBConv module, which is optimally combined with the EfficientViTBlock for feature extraction. The optimization strategies for the neck network involve utilizing the Centralized Feature Pyramid Net (CFPNet) as the neck network and employing a Simple, Parameter-Free Attention Module (SimAM) to enhance model performance. In this work, we adopted the lightweight model of the ESC-YOLO for performance evaluation, namely ESC-YOLO-S. It achieves a 4.26% improvement in Top-1 mean Average Precision (mAP) compared to YOLOX-S and significantly reduces the false and missed detections caused by various types of occlusions. Therefore, the improved model meets the requirements for high-precision identification in complex orchard environments.
{"title":"ESC-YOLO: optimizing apple fruit recognition with efficient spatial and channel features in YOLOX","authors":"Jun Sun, Yifei Peng, Chen Chen, Bing Zhang, Zhaoqi Wu, Yilin Jia, Lei Shi","doi":"10.1007/s11554-024-01540-7","DOIUrl":"https://doi.org/10.1007/s11554-024-01540-7","url":null,"abstract":"<p>Accurate localization of apple fruits and recognition of occlusion types in complex orchard environments play an important role in precision agriculture. This work proposes an efficient fruit recognition model called Efficient Spatial and Channel Feature YOLOX (ESC-YOLO). ESC-YOLO is built upon YOLOX and fully leverages and emphasizes spatial channel information, ensuring coherence between global information and local features. The optimization strategies for the backbone network involve adopting EfficientViT as the foundational backbone, integrating Spatial and Channel Reconstruction Convolution (SCConv) into the input stem to reorganize spatial channel features and reduce redundancy, and constructing the Efficient-MBConv module, which is optimally combined with the EfficientViTBlock for feature extraction. The optimization strategies for the neck network involve utilizing the Centralized Feature Pyramid Net (CFPNet) as the neck network and employing a Simple, Parameter-Free Attention Module (SimAM) to enhance model performance. In this work, we adopted the lightweight model of the ESC-YOLO for performance evaluation, namely ESC-YOLO-S. It achieves a 4.26% improvement in Top-1 mean Average Precision (mAP) compared to YOLOX-S and significantly reduces the false and missed detections caused by various types of occlusions. Therefore, the improved model meets the requirements for high-precision identification in complex orchard environments.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"1 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Real-time object detection in underground coal mine is a crucial task in the development of AI-assisted supervision systems. Due to the complex environment of the underground coal mine, limited computing resources, and the variability of object poses, the general object detection algorithms cannot provide good performance. Hence, an improved underground pose-varied object detection method named Slim-YOLO-PR_KD has been proposed. By designing an efficient pose-varied attention module (EPA) for the backbone network, providing a receive field block (RFB) module for the neck network, and optimizing the loss function, the underground pose-varied detection model YOLO-PR is obtained, which achieved good accuracy but reduced speed. For YOLO-PR, the study improved the original module by designing RFB_SK, a lightweight C2f_GSG module, a shared parameter detection head and selectively replaced modules to slim down the whole network, resulting in a lightweight detection model Slim-YOLO-PR. By using an attention guided knowledge distillation of underground object detection method and using YOLO-PR as the teacher model, the efficient pose-varied detection model Slim-YOLO-PR_KD for coal mine underground is proposed. The experimental results show that compared with the baseline model, the proposed Slim-YOLO-PR_KD has a faster detection speed, achieving higher detection accuracy while reducing model parameters and computational complexity by 42% and 46% respectively, making it capable of performing real-time underground detection tasks. Compared with other general detection models, Slim-YOLO-PR_KD exhibits excellent performance in real-time pose-varied object detection tasks in complex environments of underground coal mines.
{"title":"Slim-YOLO-PR_KD: an efficient pose-varied object detection method for underground coal mine","authors":"Huaxing Mu, Jueting Liu, Yanyun Guan, Wei Chen, Tingting Xu, Zehua Wang","doi":"10.1007/s11554-024-01539-0","DOIUrl":"https://doi.org/10.1007/s11554-024-01539-0","url":null,"abstract":"<p>Real-time object detection in underground coal mine is a crucial task in the development of AI-assisted supervision systems. Due to the complex environment of the underground coal mine, limited computing resources, and the variability of object poses, the general object detection algorithms cannot provide good performance. Hence, an improved underground pose-varied object detection method named Slim-YOLO-PR_KD has been proposed. By designing an efficient pose-varied attention module (EPA) for the backbone network, providing a receive field block (RFB) module for the neck network, and optimizing the loss function, the underground pose-varied detection model YOLO-PR is obtained, which achieved good accuracy but reduced speed. For YOLO-PR, the study improved the original module by designing RFB_SK, a lightweight C2f_GSG module, a shared parameter detection head and selectively replaced modules to slim down the whole network, resulting in a lightweight detection model Slim-YOLO-PR. By using an attention guided knowledge distillation of underground object detection method and using YOLO-PR as the teacher model, the efficient pose-varied detection model Slim-YOLO-PR_KD for coal mine underground is proposed. The experimental results show that compared with the baseline model, the proposed Slim-YOLO-PR_KD has a faster detection speed, achieving higher detection accuracy while reducing model parameters and computational complexity by 42% and 46% respectively, making it capable of performing real-time underground detection tasks. Compared with other general detection models, Slim-YOLO-PR_KD exhibits excellent performance in real-time pose-varied object detection tasks in complex environments of underground coal mines.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"5 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-28DOI: 10.1007/s11554-024-01544-3
Yinghuan Li, Jicheng Liu
With the rapid development of deep learning in the last decade, generating and processing real-time images have become one of critical methods in intelligent driving systems for new energy vehicles. However, the real-time images captured by sensors are susceptible to variations in various environments, including different weather and lighting conditions. To enhance the real-time image generation performance for new energy vehicles in complex environments, and improve real-time visual image processing capabilities, this study proposes an energy-efficient real-time visual image adversarial generation and processing algorithm, called as ENV-GAN. It hypothesizes a shared latent domain among mixed image domains after analyzing driving situations under various weather and lighting conditions. Mappings are established between different image domains. Besides, a multi-encoder weight-sharing technique is utilized to enhances the generative adversarial network model. Additionally, the algorithm integrates an attention module to enhance the model’s image generation. Experimental results and analysis demonstrate that the new algorithm outperforms existing algorithms in tasks such as defogging, rain removal, and lighting enhancement, offering high energy efficiency and low energy consumption.
{"title":"Energy-efficient real-time visual image adversarial generation and processing algorithm for new energy vehicles","authors":"Yinghuan Li, Jicheng Liu","doi":"10.1007/s11554-024-01544-3","DOIUrl":"https://doi.org/10.1007/s11554-024-01544-3","url":null,"abstract":"<p>With the rapid development of deep learning in the last decade, generating and processing real-time images have become one of critical methods in intelligent driving systems for new energy vehicles. However, the real-time images captured by sensors are susceptible to variations in various environments, including different weather and lighting conditions. To enhance the real-time image generation performance for new energy vehicles in complex environments, and improve real-time visual image processing capabilities, this study proposes an energy-efficient real-time visual image adversarial generation and processing algorithm, called as ENV-GAN. It hypothesizes a shared latent domain among mixed image domains after analyzing driving situations under various weather and lighting conditions. Mappings are established between different image domains. Besides, a multi-encoder weight-sharing technique is utilized to enhances the generative adversarial network model. Additionally, the algorithm integrates an attention module to enhance the model’s image generation. Experimental results and analysis demonstrate that the new algorithm outperforms existing algorithms in tasks such as defogging, rain removal, and lighting enhancement, offering high energy efficiency and low energy consumption.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"1 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-27DOI: 10.1007/s11554-024-01535-4
Ran Tang, Xiaofeng Huang, Yan Cui, Xinnan Guo, Yang Zhou, Haibing Yin, Chenggang Yan
The rate distortion optimization (RDO) process aims at achieving optimal coding performance by determining the optimal coding mode according to a certain strategy in the AV1 video coding. However, the high computational complexity and strong data dependency in RDO impede real-time applications. To address these issues, a fast RDO algorithm suitable for hardware implementation is proposed. Firstly, we propose a high-frequency coefficients zero-setting approach to optimize the hardware memory occupation. Then, in the rate-distortion calculation stage, an efficient rate estimation method is proposed based on a statistical feature for the number of quantization coefficients, and the distortion estimation method is proposed by considering intrinsic features in the all-zero block. Finally, a reconstruction approximate model is proposed to solve the low parallelism issue caused by the coupling of pixel reconstruction and prediction data. Experimental results show that the proposed algorithm achieves 68.49% and 50.77% time-saving by 2.73% and 2.95% Bjøntegaard delta rate (BD-Rate) increase on average under all intra (AI) and random access (RA) configurations, respectively.
{"title":"Hardware-friendly fast rate-distortion optimization algorithm for AV1 encoder","authors":"Ran Tang, Xiaofeng Huang, Yan Cui, Xinnan Guo, Yang Zhou, Haibing Yin, Chenggang Yan","doi":"10.1007/s11554-024-01535-4","DOIUrl":"https://doi.org/10.1007/s11554-024-01535-4","url":null,"abstract":"<p>The rate distortion optimization (RDO) process aims at achieving optimal coding performance by determining the optimal coding mode according to a certain strategy in the AV1 video coding. However, the high computational complexity and strong data dependency in RDO impede real-time applications. To address these issues, a fast RDO algorithm suitable for hardware implementation is proposed. Firstly, we propose a high-frequency coefficients zero-setting approach to optimize the hardware memory occupation. Then, in the rate-distortion calculation stage, an efficient rate estimation method is proposed based on a statistical feature for the number of quantization coefficients, and the distortion estimation method is proposed by considering intrinsic features in the all-zero block. Finally, a reconstruction approximate model is proposed to solve the low parallelism issue caused by the coupling of pixel reconstruction and prediction data. Experimental results show that the proposed algorithm achieves 68.49% and 50.77% time-saving by 2.73% and 2.95% Bjøntegaard delta rate (BD-Rate) increase on average under all intra (AI) and random access (RA) configurations, respectively.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"4 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-23DOI: 10.1007/s11554-024-01541-6
Yanxiong Su, Qian Zhao
The input tensor of video data includes temporal, spatial, and channel dimensions, crucial for extracting complementary spatial, temporal, and spatio-temporal features for video action recognition. To efficiently extract and integrate these features, we propose an efficient spatio-temporal module (ESTM) with three pathways dedicated to extracting spatial, temporal, and spatio-temporal features. Each pathway uses the Cross Global Average Pooling (CGAP) module to compress the current dimension, focusing features on the remaining two dimensions. This enhances feature extraction and recognition rates for complex actions. We also introduce a Motion Excitation Module (MEM) to enrich input features by transforming correlations between adjacent frames, reducing computational complexity. Finally, ESTM and MEM are seamlessly integrated into a 2D CNN, forming the efficient spatio-temporal network (ESTN), with minimal impact on network parameters and computational costs. Extensive experiments show that ESTN outperforms state-of-the-art methods on datasets like Something V1 & V2 and HMDB51, validating its effectiveness.
{"title":"Efficient spatio-temporal network for action recognition","authors":"Yanxiong Su, Qian Zhao","doi":"10.1007/s11554-024-01541-6","DOIUrl":"https://doi.org/10.1007/s11554-024-01541-6","url":null,"abstract":"<p>The input tensor of video data includes temporal, spatial, and channel dimensions, crucial for extracting complementary spatial, temporal, and spatio-temporal features for video action recognition. To efficiently extract and integrate these features, we propose an efficient spatio-temporal module (ESTM) with three pathways dedicated to extracting spatial, temporal, and spatio-temporal features. Each pathway uses the Cross Global Average Pooling (CGAP) module to compress the current dimension, focusing features on the remaining two dimensions. This enhances feature extraction and recognition rates for complex actions. We also introduce a Motion Excitation Module (MEM) to enrich input features by transforming correlations between adjacent frames, reducing computational complexity. Finally, ESTM and MEM are seamlessly integrated into a 2D CNN, forming the efficient spatio-temporal network (ESTN), with minimal impact on network parameters and computational costs. Extensive experiments show that ESTN outperforms state-of-the-art methods on datasets like Something V1 & V2 and HMDB51, validating its effectiveness.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"1 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-22DOI: 10.1007/s11554-024-01538-1
Güner Tatar, Salih Bayar
The rapid advancement in embedded AI, driven by integrating deep neural networks (DNNs) into embedded systems for real-time image and video processing, has been notably pushed by AI-specific platforms like the AMD Xilinx Vitis AI on the MPSoC-FPGA platform. This platform utilizes a configurable Deep Processing Unit (DPU) for scalable resource utilization and operating frequencies. Our study employed a detailed methodology to assess the impact of various DPU configurations and frequencies on resource utilization and energy consumption. The findings reveal that increasing the DPU frequency enhances resource utilization efficiency and improves performance. Conversely, lower frequencies significantly reduce resource utilization, with only a borderline decrease in performance. These trade-offs are influenced not only by frequency but also by variations in DPU parameters. These findings are critical for developing energy-efficient AI-driven systems in Advanced Driver Assistance Systems (ADAS) based on real-time video processing. By leveraging the capabilities of Xilinx Vitis AI deployed on the Kria KV260 MPSoC platform, we explore the intricacies of optimizing energy efficiency through multi-task learning in real-time ADAS applications.
将深度神经网络(DNN)集成到嵌入式系统中进行实时图像和视频处理,推动了嵌入式人工智能的快速发展,而 AMD Xilinx Vitis AI on the MPSoC-FPGA 平台等人工智能专用平台则显著推动了这一发展。该平台利用可配置的深度处理单元(DPU)实现可扩展的资源利用率和工作频率。我们的研究采用了详细的方法来评估各种 DPU 配置和频率对资源利用率和能耗的影响。研究结果表明,提高 DPU 频率可提高资源利用效率并改善性能。相反,频率越低,资源利用率越低,而性能仅有微弱的下降。这些权衡不仅受到频率的影响,还受到 DPU 参数变化的影响。这些发现对于在高级驾驶辅助系统(ADAS)中开发基于实时视频处理的高能效人工智能驱动系统至关重要。通过利用部署在 Kria KV260 MPSoC 平台上的赛灵思 Vitis AI 的功能,我们探索了在实时 ADAS 应用中通过多任务学习优化能效的复杂性。
{"title":"Energy efficiency assessment in advanced driver assistance systems with real-time image processing on custom Xilinx DPUs","authors":"Güner Tatar, Salih Bayar","doi":"10.1007/s11554-024-01538-1","DOIUrl":"https://doi.org/10.1007/s11554-024-01538-1","url":null,"abstract":"<p>The rapid advancement in embedded AI, driven by integrating deep neural networks (DNNs) into embedded systems for real-time image and video processing, has been notably pushed by AI-specific platforms like the AMD Xilinx Vitis AI on the MPSoC-FPGA platform. This platform utilizes a configurable Deep Processing Unit (DPU) for scalable resource utilization and operating frequencies. Our study employed a detailed methodology to assess the impact of various DPU configurations and frequencies on resource utilization and energy consumption. The findings reveal that increasing the DPU frequency enhances resource utilization efficiency and improves performance. Conversely, lower frequencies significantly reduce resource utilization, with only a borderline decrease in performance. These trade-offs are influenced not only by frequency but also by variations in DPU parameters. These findings are critical for developing energy-efficient AI-driven systems in Advanced Driver Assistance Systems (ADAS) based on real-time video processing. By leveraging the capabilities of Xilinx Vitis AI deployed on the Kria KV260 MPSoC platform, we explore the intricacies of optimizing energy efficiency through multi-task learning in real-time ADAS applications.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"60 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-20DOI: 10.1007/s11554-024-01536-3
Youcef Ghelamallah, Azzeddine Rachedi
Remote sensing images are inevitably produced with radiometric artifacts due to the photo-response non-uniformity of charge-coupled device (CCD) sensors. In situations where time constraints demand the prompt acquisition of imaging products, integrating an onboard radiometric correction system becomes essential. This paper advocates for a hardware–firmware co-design approach to achieve radiometric correction within the payload front-end electronics (FEE), leveraging the capabilities of field programmable gate array circuits (FPGA). The selection of an appropriate CCD sensor and optical device is guided by a thorough payload mission analysis, ensuring compliance with the specifications derived from Alsat-1B, the Algerian microsatellite launched in September 2016. Simulation results demonstrate that the designed FPGA firmware effectively controls the CCD sensor and configures its settings to achieve real-time radiometric correction of the acquired pixels in accordance with the mission requirements. To ensure efficient utilization during imaging operations, a hardware solution for onboard storage and in-orbit update of the radiometric coefficients has been considered for the radiometric correction system.
{"title":"FPGA-based hardware/firmware co-design for real-time radiometric correction onboard microsatellite","authors":"Youcef Ghelamallah, Azzeddine Rachedi","doi":"10.1007/s11554-024-01536-3","DOIUrl":"https://doi.org/10.1007/s11554-024-01536-3","url":null,"abstract":"<p>Remote sensing images are inevitably produced with radiometric artifacts due to the photo-response non-uniformity of charge-coupled device (CCD) sensors. In situations where time constraints demand the prompt acquisition of imaging products, integrating an onboard radiometric correction system becomes essential. This paper advocates for a hardware–firmware co-design approach to achieve radiometric correction within the payload front-end electronics (FEE), leveraging the capabilities of field programmable gate array circuits (FPGA). The selection of an appropriate CCD sensor and optical device is guided by a thorough payload mission analysis, ensuring compliance with the specifications derived from Alsat-1B, the Algerian microsatellite launched in September 2016. Simulation results demonstrate that the designed FPGA firmware effectively controls the CCD sensor and configures its settings to achieve real-time radiometric correction of the acquired pixels in accordance with the mission requirements. To ensure efficient utilization during imaging operations, a hardware solution for onboard storage and in-orbit update of the radiometric coefficients has been considered for the radiometric correction system.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"34 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-19DOI: 10.1007/s11554-024-01532-7
Yiming Zhou, Callen MacPhee, Wesley Gunawan, Ali Farahani, Bahram Jalali
Real-time low-light video enhancement on smartphones remains an open challenge due to hardware constraints such as limited sensor size and processing power. While night mode cameras have been introduced in smartphones to acquire high-quality images in light-constrained environments, their usability is restricted to static scenes as the camera must remain stationary for an extended period to leverage long exposure times or burst imaging techniques. Concurrently, significant process has been made in low-light enhancement on images coming out from the camera’s image signal processor (ISP), particularly through neural networks. These methods do not improve the image capture process itself; instead, they function as post-processing techniques to enhance the perceptual brightness and quality of captured imagery for display to human viewers. However, most neural networks are computationally intensive, making their mobile deployment either impractical or requiring considerable engineering efforts. This paper introduces VLight, a novel single-parameter low-light enhancement algorithm that enables real-time video enhancement on smartphones, along with real-time adaptation to changing lighting conditions and user-friendly fine-tuning. Operating as a custom brightness-booster on digital images, VLight provides real-time and device-agnostic enhancement directly on users’ devices. Notably, it delivers real-time low-light enhancement at up to 67 frames per second (FPS) for 4K videos locally on the smartphone.
{"title":"Real-time low-light video enhancement on smartphones","authors":"Yiming Zhou, Callen MacPhee, Wesley Gunawan, Ali Farahani, Bahram Jalali","doi":"10.1007/s11554-024-01532-7","DOIUrl":"https://doi.org/10.1007/s11554-024-01532-7","url":null,"abstract":"<p>Real-time low-light video enhancement on smartphones remains an open challenge due to hardware constraints such as limited sensor size and processing power. While night mode cameras have been introduced in smartphones to acquire high-quality images in light-constrained environments, their usability is restricted to static scenes as the camera must remain stationary for an extended period to leverage long exposure times or burst imaging techniques. Concurrently, significant process has been made in low-light enhancement on images coming out from the camera’s image signal processor (ISP), particularly through neural networks. These methods do not improve the image capture process itself; instead, they function as post-processing techniques to enhance the perceptual brightness and quality of captured imagery for display to human viewers. However, most neural networks are computationally intensive, making their mobile deployment either impractical or requiring considerable engineering efforts. This paper introduces <i>VLight</i>, a novel single-parameter low-light enhancement algorithm that enables real-time video enhancement on smartphones, along with real-time adaptation to changing lighting conditions and user-friendly fine-tuning. Operating as a custom brightness-booster on digital images, VLight provides real-time and device-agnostic enhancement directly on users’ devices. Notably, it delivers real-time low-light enhancement at up to 67 frames per second (FPS) for 4K videos locally on the smartphone.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"22 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}