Pub Date : 2024-06-15DOI: 10.1007/s11554-024-01493-x
E. Torti, M. Gazzoni, E. Marenzi, F. Leporati
{"title":"GPU-based key-frame selection of pulmonary ultrasound images to detect COVID-19","authors":"E. Torti, M. Gazzoni, E. Marenzi, F. Leporati","doi":"10.1007/s11554-024-01493-x","DOIUrl":"https://doi.org/10.1007/s11554-024-01493-x","url":null,"abstract":"","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141337177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-15DOI: 10.1007/s11554-024-01486-w
Xiaohui Ren, Wenze Fan, Yinghao Wang
{"title":"Efficiently adapting large pre-trained models for real-time violence recognition in smart city surveillance","authors":"Xiaohui Ren, Wenze Fan, Yinghao Wang","doi":"10.1007/s11554-024-01486-w","DOIUrl":"https://doi.org/10.1007/s11554-024-01486-w","url":null,"abstract":"","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141337314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-14DOI: 10.1007/s11554-024-01485-x
Liu Zihan, Wu xu, Linyun Zhang, Panlin Yu
{"title":"LightYOLO-S: a lightweight algorithm for detecting small targets","authors":"Liu Zihan, Wu xu, Linyun Zhang, Panlin Yu","doi":"10.1007/s11554-024-01485-x","DOIUrl":"https://doi.org/10.1007/s11554-024-01485-x","url":null,"abstract":"","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141341767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-06DOI: 10.1007/s11554-024-01481-1
Gaofeng Zhu, Zhiguo Qu, Le Sun, Yuming Liu, Jianfeng Yang
{"title":"Realistic real-time processing of anime portraits based on generative adversarial networks","authors":"Gaofeng Zhu, Zhiguo Qu, Le Sun, Yuming Liu, Jianfeng Yang","doi":"10.1007/s11554-024-01481-1","DOIUrl":"https://doi.org/10.1007/s11554-024-01481-1","url":null,"abstract":"","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141376732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-04DOI: 10.1007/s11554-024-01483-z
YaLin Zeng, DongJin Guo, WeiKai He, Tian Zhang, ZhongTao Liu
There are several difficulties in the task of object detection for Unmanned Aerial Vehicle (UAV) photography images, including the small size of objects, densely distributed objects, and diverse perspectives from which the objects are captured. To tackle these challenges, we proposed a real-time algorithm named adjusting overall receptive field enhancement YOLOv8 (ARF-YOLOv8) for object detection in UAV-captured images. Our approach begins with a comprehensive restructuring of the YOLOv8 network architecture. The primary objectives are to mitigate the loss of shallow-level information and establish an optimal model receptive field. Subsequently, we designed a bibranch fusion attention module based on Coordinate Attention which is seamlessly integrated into the detection network. This module combines features processed by Coordinate Attention module with shallow-level features, facilitating the extraction of multi-level feature information. Furthermore, recognizing the influence of target size on boundary box loss, we refine the boundary box loss function CIoU Loss employed in YOLOv8. Extensive experimentation conducted on the visdrone2019 dataset provides empirical evidence supporting the superior performance of ARF-YOLOv8. In comparison to YOLOv8, our method demonstrates a noteworthy 6.86% increase in mAP (0.5:0.95) while maintaining similar detection speeds. The code is available at https://github.com/sbzeng/ARF-YOLOv8-for-uav/tree/main.
{"title":"ARF-YOLOv8: a novel real-time object detection model for UAV-captured images detection","authors":"YaLin Zeng, DongJin Guo, WeiKai He, Tian Zhang, ZhongTao Liu","doi":"10.1007/s11554-024-01483-z","DOIUrl":"https://doi.org/10.1007/s11554-024-01483-z","url":null,"abstract":"<p>There are several difficulties in the task of object detection for Unmanned Aerial Vehicle (UAV) photography images, including the small size of objects, densely distributed objects, and diverse perspectives from which the objects are captured. To tackle these challenges, we proposed a real-time algorithm named adjusting overall receptive field enhancement YOLOv8 (ARF-YOLOv8) for object detection in UAV-captured images. Our approach begins with a comprehensive restructuring of the YOLOv8 network architecture. The primary objectives are to mitigate the loss of shallow-level information and establish an optimal model receptive field. Subsequently, we designed a bibranch fusion attention module based on Coordinate Attention which is seamlessly integrated into the detection network. This module combines features processed by Coordinate Attention module with shallow-level features, facilitating the extraction of multi-level feature information. Furthermore, recognizing the influence of target size on boundary box loss, we refine the boundary box loss function CIoU Loss employed in YOLOv8. Extensive experimentation conducted on the visdrone2019 dataset provides empirical evidence supporting the superior performance of ARF-YOLOv8. In comparison to YOLOv8, our method demonstrates a noteworthy 6.86% increase in mAP (0.5:0.95) while maintaining similar detection speeds. The code is available at https://github.com/sbzeng/ARF-YOLOv8-for-uav/tree/main.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141252696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-02DOI: 10.1007/s11554-024-01487-9
Hossein Dehnavi, Mohammad Dehnavi, Sajad Haghzad Klidbary
Video compression for storage and transmission has always been a focal point for researchers in the field of image processing. Their efforts aim to reduce the data volume required for video representation while maintaining its quality. HEVC is one of the efficient standards for video compression, receiving special attention due to the increasing demand for high-resolution videos. The main step in video compression involves dividing the coding unit (CU) blocks into smaller blocks that have a uniform texture. In traditional methods, The Discrete Cosine Transform (DCT) is applied, followed by the use of RDO for decision-making on partitioning. This paper presents a novel convolutional neural network (CNN) and its hardware implementation as an alternative to DCT, aimed at speeding up partitioning and reducing the hardware resources required. The proposed hardware utilizes an efficient and lightweight CNN to partition CUs with low hardware resources in real-time applications. This CNN is trained for different Quantization Parameters (QPs) and block sizes to prevent overfitting. Furthermore, the system’s input size is fixed at (16times 16), and other input sizes are scaled to this dimension. Loop unrolling, data reuse, and resource sharing are applied in hardware implementation to save resources. The hardware architecture is fixed for all block sizes and QPs, and only the coefficients of the CNN are changed. In terms of compression quality, the proposed hardware achieves a (4.42%) BD-BR and (-,0.19) BD-PSNR compared to HM16.5. The proposed system can process (64times 64) CU at 150 MHz and in 4914 clock cycles. The hardware resources utilized by the proposed system include 13,141 LUTs, 15,885 Flip-flops, 51 BRAMs, and 74 DSPs.
{"title":"Fcd-cnn: FPGA-based CU depth decision for HEVC intra encoder using CNN","authors":"Hossein Dehnavi, Mohammad Dehnavi, Sajad Haghzad Klidbary","doi":"10.1007/s11554-024-01487-9","DOIUrl":"https://doi.org/10.1007/s11554-024-01487-9","url":null,"abstract":"<p>Video compression for storage and transmission has always been a focal point for researchers in the field of image processing. Their efforts aim to reduce the data volume required for video representation while maintaining its quality. HEVC is one of the efficient standards for video compression, receiving special attention due to the increasing demand for high-resolution videos. The main step in video compression involves dividing the coding unit (CU) blocks into smaller blocks that have a uniform texture. In traditional methods, The Discrete Cosine Transform (DCT) is applied, followed by the use of RDO for decision-making on partitioning. This paper presents a novel convolutional neural network (CNN) and its hardware implementation as an alternative to DCT, aimed at speeding up partitioning and reducing the hardware resources required. The proposed hardware utilizes an efficient and lightweight CNN to partition CUs with low hardware resources in real-time applications. This CNN is trained for different Quantization Parameters (QPs) and block sizes to prevent overfitting. Furthermore, the system’s input size is fixed at <span>(16times 16)</span>, and other input sizes are scaled to this dimension. Loop unrolling, data reuse, and resource sharing are applied in hardware implementation to save resources. The hardware architecture is fixed for all block sizes and QPs, and only the coefficients of the CNN are changed. In terms of compression quality, the proposed hardware achieves a <span>(4.42%)</span> BD-BR and <span>(-,0.19)</span> BD-PSNR compared to HM16.5. The proposed system can process <span>(64times 64)</span> CU at 150 MHz and in 4914 clock cycles. The hardware resources utilized by the proposed system include 13,141 LUTs, 15,885 Flip-flops, 51 BRAMs, and 74 DSPs.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141252051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-02DOI: 10.1007/s11554-024-01488-8
Priya Singh, Rajalakshmi Krishnamurthi
In farming, clashes between humans and animals create significant challenges, risking crop yields, human well-being, and resource depletion. Farmers use traditional methods like electric fences to protect their fields but these can harm essential animals that maintain a balanced ecosystem. To address these fundamental challenges, our research presents a fresh solution harnessing the power of the Internet of Things (IoT) and deep learning. In this paper, we developed a monitoring system that takes advantage of ESP32-CAM and Raspberry Pi in collaboration with optimised YOLOv8 model. Our objective is to detect and classify objects such as animals or humans that roam around the field, providing real-time notification to the farmers by incorporating firebase cloud messaging (FCM). Initially, we have employed ultrasonic sensors that will detect any intruder movement, triggering the camera to capture an image. Further, the captured image is transmitted to a server equipped with an object detection model. Afterwards, the processed image is forwarded to FCM, responsible for managing the image and sending notifications to the farmer through an Android application. Our optimised YOLOv8 model attains an exceptional precision of 97%, recall of 96%, and accuracy of 96%. Once we achieved this optimal outcome, we integrated the model with our IoT infrastructure. This study emphasizes the effectiveness of low-power IoT devices, LoRa devices, and object detection techniques in delivering strong security solutions to the agriculture industry. These technologies hold the potential to significantly decrease crop damage while enhancing safety within the agricultural field and contribute towards wildlife conservation.
{"title":"IoT-based real-time object detection system for crop protection and agriculture field security","authors":"Priya Singh, Rajalakshmi Krishnamurthi","doi":"10.1007/s11554-024-01488-8","DOIUrl":"https://doi.org/10.1007/s11554-024-01488-8","url":null,"abstract":"<p>In farming, clashes between humans and animals create significant challenges, risking crop yields, human well-being, and resource depletion. Farmers use traditional methods like electric fences to protect their fields but these can harm essential animals that maintain a balanced ecosystem. To address these fundamental challenges, our research presents a fresh solution harnessing the power of the Internet of Things (IoT) and deep learning. In this paper, we developed a monitoring system that takes advantage of ESP32-CAM and Raspberry Pi in collaboration with optimised YOLOv8 model. Our objective is to detect and classify objects such as animals or humans that roam around the field, providing real-time notification to the farmers by incorporating firebase cloud messaging (FCM). Initially, we have employed ultrasonic sensors that will detect any intruder movement, triggering the camera to capture an image. Further, the captured image is transmitted to a server equipped with an object detection model. Afterwards, the processed image is forwarded to FCM, responsible for managing the image and sending notifications to the farmer through an Android application. Our optimised YOLOv8 model attains an exceptional precision of 97%, recall of 96%, and accuracy of 96%. Once we achieved this optimal outcome, we integrated the model with our IoT infrastructure. This study emphasizes the effectiveness of low-power IoT devices, LoRa devices, and object detection techniques in delivering strong security solutions to the agriculture industry. These technologies hold the potential to significantly decrease crop damage while enhancing safety within the agricultural field and contribute towards wildlife conservation.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141252232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}