首页 > 最新文献

Journal of Real-Time Image Processing最新文献

英文 中文
A MEMS-based real-time structured light 3-D measuring architecture on FPGA 基于 MEMS 的 FPGA 实时结构光三维测量架构
IF 3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-01 DOI: 10.1007/s11554-024-01477-x
Wenbiao Zhou, Yunfei Jia, Luyao Fan, Gongyu Fan, Fengchi Lu
{"title":"A MEMS-based real-time structured light 3-D measuring architecture on FPGA","authors":"Wenbiao Zhou, Yunfei Jia, Luyao Fan, Gongyu Fan, Fengchi Lu","doi":"10.1007/s11554-024-01477-x","DOIUrl":"https://doi.org/10.1007/s11554-024-01477-x","url":null,"abstract":"","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141133146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive histogram equalization in constant time 恒定时间内的自适应直方图均衡化
IF 3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-01 DOI: 10.1007/s11554-024-01465-1
Philipp Härtinger, Carsten Steger
{"title":"Adaptive histogram equalization in constant time","authors":"Philipp Härtinger, Carsten Steger","doi":"10.1007/s11554-024-01465-1","DOIUrl":"https://doi.org/10.1007/s11554-024-01465-1","url":null,"abstract":"","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141031442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empowering individuals with disabilities: a real-time, cost-effective, calibration-free assistive system utilizing eye tracking 增强残疾人的能力:利用眼动跟踪技术的实时、经济、免校准辅助系统
IF 3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-01 DOI: 10.1007/s11554-024-01478-w
Govind Ram Chhimpa, Ajay Kumar, S. Garhwal, Dhiraj
{"title":"Empowering individuals with disabilities: a real-time, cost-effective, calibration-free assistive system utilizing eye tracking","authors":"Govind Ram Chhimpa, Ajay Kumar, S. Garhwal, Dhiraj","doi":"10.1007/s11554-024-01478-w","DOIUrl":"https://doi.org/10.1007/s11554-024-01478-w","url":null,"abstract":"","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141140425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-time and accurate model of instance segmentation of foods 实时准确的食品实例分割模型
IF 3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-30 DOI: 10.1007/s11554-024-01459-z
Yuhe Fan, Lixun Zhang, Canxing Zheng, Yunqin Zu, Keyi Wang, Xingyuan Wang

Instance segmentation of foods is an important technology to ensure the food success rate of meal-assisting robotics. However, due to foods have strong intraclass variability, interclass similarity, and complex physical properties, which leads to more challenges in recognition, localization, and contour acquisition of foods. To address the above issues, this paper proposed a novel method for instance segmentation of foods. Specifically, in backbone network, deformable convolution was introduced to enhance the ability of YOLOv8 architecture to capture finer-grained spatial information, and efficient multiscale attention based on cross-spatial learning was introduced to improve sensitivity and expressiveness of multiscale inputs. In neck network, classical convolution and C2f modules were replaced by lightweight convolution GSConv and improved VoV-GSCSP aggregation module, respectively, to improve inference speed of models. We abbreviated it as the DEG-YOLOv8n-seg model. The proposed method was compared with baseline model and several state-of-the-art (SOTA) segmentation models on datasets, respectively. The results show that the DEG-YOLOv8n-seg model has higher accuracy, faster speed, and stronger robustness. Specifically, the DEG-YOLOv8n-seg model can achieve 84.6% Box_mAP@0.5 and 84.1% Mask_mAP@0.5 accuracy at 55.2 FPS and 11.1 GFLOPs. The importance of adopting data augmentation and the effectiveness of introducing deformable convolution, EMA, and VoV-GSCSP were verified by ablation experiments. Finally, the DEG-YOLOv8n-seg model was applied to experiments of food instance segmentation for meal-assisting robots. The results show that the DEG-YOLOv8n-seg can achieve better instance segmentation of foods. This work can promote the development of intelligent meal-assisting robotics technology and can provide theoretical foundations for other tasks of the computer vision field with some reference value.

食品的实例分割是确保助餐机器人食品成功率的一项重要技术。然而,由于食物具有较强的类内差异性、类间相似性和复杂的物理特性,这给食物的识别、定位和轮廓获取带来了更多挑战。针对上述问题,本文提出了一种新颖的食品实例分割方法。具体来说,在骨干网络中,引入了可变形卷积,以增强 YOLOv8 架构捕捉更细粒度空间信息的能力;引入了基于跨空间学习的高效多尺度注意力,以提高多尺度输入的灵敏度和表现力。在颈部网络中,经典的卷积和 C2f 模块分别被轻量级卷积 GSConv 和改进的 VoV-GSCSP 聚合模块取代,以提高模型的推理速度。我们将其简称为 DEG-YOLOv8n-seg 模型。我们分别在数据集上将所提出的方法与基线模型和几种最先进的(SOTA)分割模型进行了比较。结果表明,DEG-YOLOv8n-seg 模型具有更高的准确性、更快的速度和更强的鲁棒性。具体来说,DEG-YOLOv8n-seg 模型能在 55.2 FPS 和 11.1 GFLOPs 的条件下实现 84.6% 的 Box_mAP@0.5 和 84.1% 的 Mask_mAP@0.5 准确率。烧蚀实验验证了采用数据增强的重要性以及引入可变形卷积、EMA 和 VoV-GSCSP 的有效性。最后,DEG-YOLOv8n-seg 模型被应用于助餐机器人的食物实例分割实验。结果表明,DEG-YOLOv8n-seg 可以实现更好的食物实例分割。这项工作能促进智能助餐机器人技术的发展,并能为计算机视觉领域的其他任务提供理论基础,具有一定的参考价值。
{"title":"Real-time and accurate model of instance segmentation of foods","authors":"Yuhe Fan, Lixun Zhang, Canxing Zheng, Yunqin Zu, Keyi Wang, Xingyuan Wang","doi":"10.1007/s11554-024-01459-z","DOIUrl":"https://doi.org/10.1007/s11554-024-01459-z","url":null,"abstract":"<p>Instance segmentation of foods is an important technology to ensure the food success rate of meal-assisting robotics. However, due to foods have strong intraclass variability, interclass similarity, and complex physical properties, which leads to more challenges in recognition, localization, and contour acquisition of foods. To address the above issues, this paper proposed a novel method for instance segmentation of foods. Specifically, in backbone network, deformable convolution was introduced to enhance the ability of YOLOv8 architecture to capture finer-grained spatial information, and efficient multiscale attention based on cross-spatial learning was introduced to improve sensitivity and expressiveness of multiscale inputs. In neck network, classical convolution and C2f modules were replaced by lightweight convolution GSConv and improved VoV-GSCSP aggregation module, respectively, to improve inference speed of models. We abbreviated it as the DEG-YOLOv8n-seg model. The proposed method was compared with baseline model and several state-of-the-art (SOTA) segmentation models on datasets, respectively. The results show that the DEG-YOLOv8n-seg model has higher accuracy, faster speed, and stronger robustness. Specifically, the DEG-YOLOv8n-seg model can achieve 84.6% Box_mAP@0.5 and 84.1% Mask_mAP@0.5 accuracy at 55.2 FPS and 11.1 GFLOPs. The importance of adopting data augmentation and the effectiveness of introducing deformable convolution, EMA, and VoV-GSCSP were verified by ablation experiments. Finally, the DEG-YOLOv8n-seg model was applied to experiments of food instance segmentation for meal-assisting robots. The results show that the DEG-YOLOv8n-seg can achieve better instance segmentation of foods. This work can promote the development of intelligent meal-assisting robotics technology and can provide theoretical foundations for other tasks of the computer vision field with some reference value.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140829793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel image denoising algorithm based on least square generative adversarial network 基于最小平方生成对抗网络的新型图像去噪算法
IF 3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-24 DOI: 10.1007/s11554-024-01447-3
Sharfuddin Waseem Mohammed, Brindha Murugan
{"title":"A novel image denoising algorithm based on least square generative adversarial network","authors":"Sharfuddin Waseem Mohammed, Brindha Murugan","doi":"10.1007/s11554-024-01447-3","DOIUrl":"https://doi.org/10.1007/s11554-024-01447-3","url":null,"abstract":"","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140661184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing UAV tracking: a focus on discriminative representations using contrastive instances 增强无人飞行器的跟踪能力:重点关注使用对比实例的判别表征
IF 3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-21 DOI: 10.1007/s11554-024-01456-2
Xucheng Wang, Dan Zeng, Yongxin Li, Mingliang Zou, Qijun Zhao, Shuiwang Li

Addressing the core challenges of achieving both high efficiency and precision in UAV tracking is crucial due to limitations in computing resources, battery capacity, and maximum load capacity on UAVs. Discriminative correlation filter (DCF)-based trackers excel in efficiency on a single CPU but lag in precision. In contrast, many lightweight deep learning (DL)-based trackers based on model compression strike a better balance between efficiency and precision. However, higher compression rates can hinder performance by diminishing discriminative representations. Given these challenges, our paper aims to enhance feature representations’ discriminative abilities through an innovative feature-learning approach. We specifically emphasize leveraging contrasting instances to achieve more distinct representations for effective UAV tracking. Our method eliminates the need for manual annotations and facilitates the creation and deployment of lightweight models. As far as our knowledge goes, we are the pioneers in exploring the possibilities of contrastive learning in UAV tracking applications. Through extensive experimentation across four UAV benchmarks, namely, UAVDT, DTB70, UAV123@10fps and VisDrone2018, We have shown that our DRCI (discriminative representation with contrastive instances) tracker outperforms current state-of-the-art UAV tracking methods, underscoring its potential to effectively tackle the persistent challenges in this field.

由于受到计算资源、电池容量和无人机最大负载能力的限制,解决无人机跟踪中实现高效率和高精度的核心难题至关重要。基于判别相关滤波器(DCF)的跟踪器在单个 CPU 上具有出色的效率,但在精度方面却相对落后。相比之下,许多基于模型压缩的轻量级深度学习(DL)跟踪器在效率和精度之间取得了更好的平衡。然而,较高的压缩率会降低辨别表征,从而影响性能。鉴于这些挑战,我们的论文旨在通过一种创新的特征学习方法来增强特征表征的判别能力。我们特别强调利用对比实例实现更独特的表征,从而实现有效的无人机跟踪。我们的方法无需手动注释,便于创建和部署轻量级模型。据我们所知,我们是在无人机跟踪应用中探索对比学习可能性的先驱。通过在 UAVDT、DTB70、UAV123@10fps 和 VisDrone2018 这四个无人机基准测试中进行广泛实验,我们证明了我们的 DRCI(具有对比性实例的判别表示)跟踪器优于当前最先进的无人机跟踪方法,凸显了其有效解决该领域长期挑战的潜力。
{"title":"Enhancing UAV tracking: a focus on discriminative representations using contrastive instances","authors":"Xucheng Wang, Dan Zeng, Yongxin Li, Mingliang Zou, Qijun Zhao, Shuiwang Li","doi":"10.1007/s11554-024-01456-2","DOIUrl":"https://doi.org/10.1007/s11554-024-01456-2","url":null,"abstract":"<p>Addressing the core challenges of achieving both high efficiency and precision in UAV tracking is crucial due to limitations in computing resources, battery capacity, and maximum load capacity on UAVs. Discriminative correlation filter (DCF)-based trackers excel in efficiency on a single CPU but lag in precision. In contrast, many lightweight deep learning (DL)-based trackers based on model compression strike a better balance between efficiency and precision. However, higher compression rates can hinder performance by diminishing discriminative representations. Given these challenges, our paper aims to enhance feature representations’ discriminative abilities through an innovative feature-learning approach. We specifically emphasize leveraging contrasting instances to achieve more distinct representations for effective UAV tracking. Our method eliminates the need for manual annotations and facilitates the creation and deployment of lightweight models. As far as our knowledge goes, we are the pioneers in exploring the possibilities of contrastive learning in UAV tracking applications. Through extensive experimentation across four UAV benchmarks, namely, UAVDT, DTB70, UAV123@10fps and VisDrone2018, We have shown that our DRCI (discriminative representation with contrastive instances) tracker outperforms current state-of-the-art UAV tracking methods, underscoring its potential to effectively tackle the persistent challenges in this field.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140637100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel real-time pixel-level road crack segmentation network 新型实时像素级路面裂缝分割网络
IF 3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-20 DOI: 10.1007/s11554-024-01458-0
Rongdi Wang, Hao Wang, Zhenhao He, Jianchao Zhu, Haiqiang Zuo

Road crack detection plays a vital role in preserving the life of roads and ensuring driver safety. Traditional methods relying on manual observation have limitations in terms of subjectivity and inefficiency in quantifying damage. In recent years, advances in deep learning techniques have held promise for automated crack detection, but challenges, such as low contrast, small datasets, and inaccurate localization, remain. In this paper, we propose a deep learning-based pixel-level road crack segmentation network that achieves excellent performance on multiple datasets. In order to enrich the receptive fields of conventional convolutional modules, we design a residual asymmetric convolutional module for feature extraction. In addition to this, a multiple receptive field cascade module and a feature fusion module with non-local attention are proposed. Our network demonstrates superior accuracy and inference speed, achieving 55.60%, 59.01%, 75.65%, and 57.95% IoU on the CrackForest, CrackTree, CDD, and Crack500 datasets, respectively. It also has the ability to process 143 images per second. Experimental results and analysis validate the effectiveness of our approach. This work contributes to the advancement of road crack detection, providing a valuable tool for road maintenance and safety improvement.

道路裂缝检测在保护道路寿命和确保驾驶员安全方面起着至关重要的作用。依靠人工观测的传统方法在量化损坏方面存在主观性和低效率的局限性。近年来,深度学习技术的进步为裂缝自动检测带来了希望,但低对比度、小数据集和定位不准确等挑战依然存在。在本文中,我们提出了一种基于深度学习的像素级道路裂缝分割网络,该网络在多个数据集上都取得了优异的性能。为了丰富传统卷积模块的感受野,我们设计了一个用于特征提取的残差非对称卷积模块。除此之外,我们还提出了一个多感受野级联模块和一个具有非局部注意力的特征融合模块。我们的网络展示了卓越的准确性和推理速度,在 CrackForest、CrackTree、CDD 和 Crack500 数据集上分别实现了 55.60%、59.01%、75.65% 和 57.95% 的 IoU。它每秒还能处理 143 幅图像。实验结果和分析验证了我们方法的有效性。这项工作有助于推动道路裂缝检测的发展,为道路维护和安全改善提供有价值的工具。
{"title":"A novel real-time pixel-level road crack segmentation network","authors":"Rongdi Wang, Hao Wang, Zhenhao He, Jianchao Zhu, Haiqiang Zuo","doi":"10.1007/s11554-024-01458-0","DOIUrl":"https://doi.org/10.1007/s11554-024-01458-0","url":null,"abstract":"<p>Road crack detection plays a vital role in preserving the life of roads and ensuring driver safety. Traditional methods relying on manual observation have limitations in terms of subjectivity and inefficiency in quantifying damage. In recent years, advances in deep learning techniques have held promise for automated crack detection, but challenges, such as low contrast, small datasets, and inaccurate localization, remain. In this paper, we propose a deep learning-based pixel-level road crack segmentation network that achieves excellent performance on multiple datasets. In order to enrich the receptive fields of conventional convolutional modules, we design a residual asymmetric convolutional module for feature extraction. In addition to this, a multiple receptive field cascade module and a feature fusion module with non-local attention are proposed. Our network demonstrates superior accuracy and inference speed, achieving 55.60%, 59.01%, 75.65%, and 57.95% IoU on the CrackForest, CrackTree, CDD, and Crack500 datasets, respectively. It also has the ability to process 143 images per second. Experimental results and analysis validate the effectiveness of our approach. This work contributes to the advancement of road crack detection, providing a valuable tool for road maintenance and safety improvement.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140629057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved feature extraction network in lightweight YOLOv7 model for real-time vehicle detection on low-cost hardware 改进轻量级 YOLOv7 模型中的特征提取网络,在低成本硬件上实现实时车辆检测
IF 3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-20 DOI: 10.1007/s11554-024-01457-1
Johan Lela Andika, Anis Salwa Mohd Khairuddin, Harikrishnan Ramiah, Jeevan Kanesan

The advancement of unmanned aerial vehicles (UAVs) has drawn researchers to update object detection algorithms for better accuracy and computation performance. Previous works applying deep learning models for object detection applications required high graphics processing unit (GPU) computation power. Generally, object detection models suffer trade-off between accuracy and model size where the relationship is not always linear in deep learning models. Various factors such as architectural design, optimization techniques, and dataset characteristics can significantly influence the accuracy, model size, and computation cost in adopting object detection models for low-cost embedded devices. Hence, it is crucial to employ lightweight object detection models for real-time object identification for the solution to be sustainable. In this work, an improved feature extraction network is proposed by incorporating an efficient long-range aggregation network for vehicle detection (ELAN-VD) in the backbone layer. The architecture improvement in YOLOv7-tiny model is proposed to improve the accuracy of detecting small vehicles in the aerial image. Besides that, the image size output of the second and third prediction boxes is upscaled for better performance. This study showed that the proposed method yields a mean average precision (mAP) of 57.94%, which is higher than that of the conventional YOLOv7-tiny. In addition, the proposed model showed significant performance when compared to previous works, making it viable for application in low-cost embedded devices.

无人驾驶飞行器(UAV)的发展促使研究人员更新物体检测算法,以提高精度和计算性能。以往应用深度学习模型进行物体检测的工作需要较高的图形处理器(GPU)计算能力。一般来说,物体检测模型需要在精度和模型大小之间进行权衡,而在深度学习模型中,两者之间的关系并不总是线性的。在为低成本嵌入式设备采用物体检测模型时,架构设计、优化技术和数据集特性等各种因素都会对精度、模型大小和计算成本产生重大影响。因此,采用轻量级物体检测模型进行实时物体识别对于解决方案的可持续性至关重要。本研究提出了一种改进的特征提取网络,在骨干层中加入了用于车辆检测的高效远距离聚合网络(ELAN-VD)。对 YOLOv7-tiny 模型的架构进行了改进,以提高航空图像中小型车辆的检测精度。此外,为了获得更好的性能,第二和第三预测框的图像尺寸输出被放大。研究表明,所提方法的平均精度(mAP)为 57.94%,高于传统的 YOLOv7-tiny。此外,与之前的研究相比,所提出的模型表现出了显著的性能,使其在低成本嵌入式设备中的应用变得可行。
{"title":"Improved feature extraction network in lightweight YOLOv7 model for real-time vehicle detection on low-cost hardware","authors":"Johan Lela Andika, Anis Salwa Mohd Khairuddin, Harikrishnan Ramiah, Jeevan Kanesan","doi":"10.1007/s11554-024-01457-1","DOIUrl":"https://doi.org/10.1007/s11554-024-01457-1","url":null,"abstract":"<p>The advancement of unmanned aerial vehicles (UAVs) has drawn researchers to update object detection algorithms for better accuracy and computation performance. Previous works applying deep learning models for object detection applications required high graphics processing unit (GPU) computation power. Generally, object detection models suffer trade-off between accuracy and model size where the relationship is not always linear in deep learning models. Various factors such as architectural design, optimization techniques, and dataset characteristics can significantly influence the accuracy, model size, and computation cost in adopting object detection models for low-cost embedded devices. Hence, it is crucial to employ lightweight object detection models for real-time object identification for the solution to be sustainable. In this work, an improved feature extraction network is proposed by incorporating an efficient long-range aggregation network for vehicle detection (ELAN-VD) in the backbone layer. The architecture improvement in YOLOv7-tiny model is proposed to improve the accuracy of detecting small vehicles in the aerial image. Besides that, the image size output of the second and third prediction boxes is upscaled for better performance. This study showed that the proposed method yields a mean average precision (mAP) of 57.94%, which is higher than that of the conventional YOLOv7-tiny. In addition, the proposed model showed significant performance when compared to previous works, making it viable for application in low-cost embedded devices.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140630353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Driver fatigue detection based on improved YOLOv7 基于改进型 YOLOv7 的驾驶员疲劳检测
IF 3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-13 DOI: 10.1007/s11554-024-01455-3
Xianguo Li, Xueyan Li, Zhenqian Shen, Guangmin Qian

Fatigue driving is one of the main reasons threatening road traffic safety. Aiming at the problems of complex detection process, low accuracy, and susceptibility to light interference in the current driver fatigue detection algorithm, this paper proposes a driver Eye State detection algorithm based on YOLO, abbreviated as ES-YOLO. The algorithm optimizes the structure of YOLOv7, integrates the multi-scale features using the convolutional block attention mechanism (CBAM), and improves the attention to important spatial locations in the image. Furthermore, using the Focal-EIOU Loss instead of CIOU Loss to increase the attention on difficult samples and reduce the influence of sample class imbalance. Then, based on ES-YOLO, a driver fatigue detection method is proposed, and the driver fatigue judgment logic is designed to monitor the fatigue state in real-time and alarm in time to improve the accuracy of detection. The experiments on the public dataset CEW and the self-made dataset show that the proposed ES-YOLO obtained 99.0% and 98.8% mAP values, respectively, which are better than the compared algorithms. And this method achieves real-time and accurate detection of driver fatigue status. Source code is released in https://www.github/driver-fatigue-detection.git.

疲劳驾驶是威胁道路交通安全的主要原因之一。针对目前驾驶员疲劳检测算法中存在的检测过程复杂、准确率低、易受光线干扰等问题,本文提出了一种基于 YOLO 的驾驶员眼部状态检测算法,简称 ES-YOLO。该算法优化了 YOLOv7 的结构,利用卷积块注意力机制(CBAM)整合了多尺度特征,提高了对图像中重要空间位置的注意力。此外,使用 Focal-EIOU Loss 代替 CIOU Loss 来提高对困难样本的关注度,减少样本类别不平衡的影响。然后,基于 ES-YOLO 提出了一种驾驶员疲劳检测方法,并设计了驾驶员疲劳判断逻辑,实时监测疲劳状态并及时报警,提高了检测的准确性。在公共数据集CEW和自建数据集上的实验表明,所提出的ES-YOLO分别获得了99.0%和98.8%的mAP值,优于对比算法。该方法实现了对驾驶员疲劳状态的实时、准确检测。源代码发布于 https://www.github/driver-fatigue-detection.git。
{"title":"Driver fatigue detection based on improved YOLOv7","authors":"Xianguo Li, Xueyan Li, Zhenqian Shen, Guangmin Qian","doi":"10.1007/s11554-024-01455-3","DOIUrl":"https://doi.org/10.1007/s11554-024-01455-3","url":null,"abstract":"<p>Fatigue driving is one of the main reasons threatening road traffic safety. Aiming at the problems of complex detection process, low accuracy, and susceptibility to light interference in the current driver fatigue detection algorithm, this paper proposes a driver Eye State detection algorithm based on YOLO, abbreviated as ES-YOLO. The algorithm optimizes the structure of YOLOv7, integrates the multi-scale features using the convolutional block attention mechanism (CBAM), and improves the attention to important spatial locations in the image. Furthermore, using the Focal-EIOU Loss instead of CIOU Loss to increase the attention on difficult samples and reduce the influence of sample class imbalance. Then, based on ES-YOLO, a driver fatigue detection method is proposed, and the driver fatigue judgment logic is designed to monitor the fatigue state in real-time and alarm in time to improve the accuracy of detection. The experiments on the public dataset CEW and the self-made dataset show that the proposed ES-YOLO obtained 99.0% and 98.8% mAP values, respectively, which are better than the compared algorithms. And this method achieves real-time and accurate detection of driver fatigue status. Source code is released in https://www.github/driver-fatigue-detection.git.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140601871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-time semantic segmentation network based on parallel atrous convolution for short-term dense concatenate and attention feature fusion 基于并行无绳卷积的实时语义分割网络,用于短期密集串联和注意力特征融合
IF 3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-10 DOI: 10.1007/s11554-024-01453-5
Lijun Wu, Shangdong Qiu, Zhicong Chen

To address the problem of incomplete segmentation of large objects and miss-segmentation of tiny objects that is universally existing in semantic segmentation algorithms, PACAMNet, a real-time segmentation network based on short-term dense concatenate of parallel atrous convolution and fusion of attentional features is proposed, called PACAMNet. First, parallel atrous convolution is introduced to improve the short-term dense concatenate module. By adjusting the size of the atrous factor, multi-scale semantic information is obtained to ensure that the last layer of the module can also obtain rich input feature maps. Second, attention feature fusion module is proposed to align the receptive fields of deep and shallow feature maps via depth-separable convolutions with different sizes, and the channel attention mechanism is used to generate weights to effectively fuse the deep and shallow feature maps. Finally, experiments are carried out based on both Cityscapes and CamVid datasets, and the segmentation accuracy achieve 77.4% and 74.0% at the inference speeds of 98.7 FPS and 134.6 FPS, respectively. Compared with other methods, PACAMNet improves the inference speed of the model while ensuring higher segmentation accuracy, so PACAMNet achieve a better balance between segmentation accuracy and inference speed.

针对语义分割算法中普遍存在的大型物体分割不完整和微小物体分割错误的问题,我们提出了一种基于并行阿特罗斯卷积的短期密集串联和注意力特征融合的实时分割网络,称为 PACAMNet。首先,引入并行阿特罗斯卷积来改进短期密集串联模块。通过调整atrous因子的大小,可以获得多尺度的语义信息,确保模块的最后一层也能获得丰富的输入特征图。其次,提出了注意力特征融合模块,通过不同大小的深度分离卷积来对齐深层和浅层特征图的感受野,并利用通道注意力机制生成权重,从而有效地融合深层和浅层特征图。最后,基于 Cityscapes 和 CamVid 数据集进行了实验,在推理速度分别为 98.7 FPS 和 134.6 FPS 的情况下,分割准确率分别达到了 77.4% 和 74.0%。与其他方法相比,PACAMNet 在提高模型推理速度的同时,也保证了较高的分割精度,因此 PACAMNet 在分割精度和推理速度之间取得了较好的平衡。
{"title":"Real-time semantic segmentation network based on parallel atrous convolution for short-term dense concatenate and attention feature fusion","authors":"Lijun Wu, Shangdong Qiu, Zhicong Chen","doi":"10.1007/s11554-024-01453-5","DOIUrl":"https://doi.org/10.1007/s11554-024-01453-5","url":null,"abstract":"<p>To address the problem of incomplete segmentation of large objects and miss-segmentation of tiny objects that is universally existing in semantic segmentation algorithms, PACAMNet, a real-time segmentation network based on short-term dense concatenate of parallel atrous convolution and fusion of attentional features is proposed, called PACAMNet. First, parallel atrous convolution is introduced to improve the short-term dense concatenate module. By adjusting the size of the atrous factor, multi-scale semantic information is obtained to ensure that the last layer of the module can also obtain rich input feature maps. Second, attention feature fusion module is proposed to align the receptive fields of deep and shallow feature maps via depth-separable convolutions with different sizes, and the channel attention mechanism is used to generate weights to effectively fuse the deep and shallow feature maps. Finally, experiments are carried out based on both Cityscapes and CamVid datasets, and the segmentation accuracy achieve 77.4% and 74.0% at the inference speeds of 98.7 FPS and 134.6 FPS, respectively. Compared with other methods, PACAMNet improves the inference speed of the model while ensuring higher segmentation accuracy, so PACAMNet achieve a better balance between segmentation accuracy and inference speed.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140601724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Real-Time Image Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1