有远见的警惕：利用大规模基准数据集优化用于跌倒者检测的 YOLOV8

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Image and Vision Computing Pub Date : 2024-07-26 DOI:10.1016/j.imavis.2024.105195

{"title":"有远见的警惕：利用大规模基准数据集优化用于跌倒者检测的 YOLOV8","authors":"","doi":"10.1016/j.imavis.2024.105195","DOIUrl":null,"url":null,"abstract":"<div><p>Falls pose a significant risk to elderly people, patients with diseases such as neurological disorders, cardiovascular diseases, and disabled children. This highlights the need for real-time intelligent fall detection (FD) systems for quick relief leading to assisted living. The existing attempts are often based on multimodal approaches which are computationally expensive due to multi-sensor integration. The computer vision (CV) based era for FD needs the deployment of state-of-the-art (SOTA) networks with progressive enhancements to grasp falls effectively. However, CV-based systems often lack the ability to operate efficiently in real-time and the attempts for visual intelligence are usually not integrated at feasible stages of the networks. More importantly, the lack of large-scale well-annotated benchmark datasets limits the ability of FD in challenging and complex environments. To bridge the research gaps, we proposed an enhanced version of YOLOV8 for FD. Our research presents significant contributions by addressing these limitations through three key contributions. Initially, a comprehensive large-scale dataset is introduced which comprises approximately 10,500 image samples with corresponding annotations. The dataset encompasses diverse environmental conditions and scenarios, facilitating the generalization ability for the models. Then, progressive enhancements to the YOLOV8S model are proposed, integrating a focus module in the backbone to optimize feature extraction. Moreover, the convolutional block attention modules (CBAMs) are integrated at the feasible stages of the network to improve spatial and channel contexts for more accurate detection, especially in complex scenes. Finally, an extensive empirical evaluation showcases the superiority of the proposed network over 13 SOTA techniques, substantiated by meticulous benchmarking and qualitative validation across varied environments. The empirical findings and analysis of multiple factors such as model performance, size, and processing time prove that the suggested network displays impressive results. Datasets with annotations, results, and the ways of progressive modifications in the code will be available to the research community at the link <span><span>https://github.com/habib1402/Fall-Detection-DiverseFall10500</span><svg><path></path></svg></span></p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":null,"pages":null},"PeriodicalIF":4.2000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Visionary vigilance: Optimized YOLOV8 for fallen person detection with large-scale benchmark dataset\",\"authors\":\"\",\"doi\":\"10.1016/j.imavis.2024.105195\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Falls pose a significant risk to elderly people, patients with diseases such as neurological disorders, cardiovascular diseases, and disabled children. This highlights the need for real-time intelligent fall detection (FD) systems for quick relief leading to assisted living. The existing attempts are often based on multimodal approaches which are computationally expensive due to multi-sensor integration. The computer vision (CV) based era for FD needs the deployment of state-of-the-art (SOTA) networks with progressive enhancements to grasp falls effectively. However, CV-based systems often lack the ability to operate efficiently in real-time and the attempts for visual intelligence are usually not integrated at feasible stages of the networks. More importantly, the lack of large-scale well-annotated benchmark datasets limits the ability of FD in challenging and complex environments. To bridge the research gaps, we proposed an enhanced version of YOLOV8 for FD. Our research presents significant contributions by addressing these limitations through three key contributions. Initially, a comprehensive large-scale dataset is introduced which comprises approximately 10,500 image samples with corresponding annotations. The dataset encompasses diverse environmental conditions and scenarios, facilitating the generalization ability for the models. Then, progressive enhancements to the YOLOV8S model are proposed, integrating a focus module in the backbone to optimize feature extraction. Moreover, the convolutional block attention modules (CBAMs) are integrated at the feasible stages of the network to improve spatial and channel contexts for more accurate detection, especially in complex scenes. Finally, an extensive empirical evaluation showcases the superiority of the proposed network over 13 SOTA techniques, substantiated by meticulous benchmarking and qualitative validation across varied environments. The empirical findings and analysis of multiple factors such as model performance, size, and processing time prove that the suggested network displays impressive results. Datasets with annotations, results, and the ways of progressive modifications in the code will be available to the research community at the link <span><span>https://github.com/habib1402/Fall-Detection-DiverseFall10500</span><svg><path></path></svg></span></p></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885624003007\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624003007","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

跌倒对老年人、神经系统疾病患者、心血管疾病患者和残疾儿童构成重大风险。这就凸显了对实时智能跌倒检测（FD）系统的需求，以实现快速救助和辅助生活。现有的尝试通常基于多模态方法，由于多传感器集成，计算成本昂贵。基于计算机视觉（CV）的跌倒检测时代需要部署最先进的（SOTA）网络，并逐步增强，以有效掌握跌倒情况。然而，基于 CV 的系统往往缺乏实时高效运行的能力，而且视觉智能的尝试通常没有集成到网络的可行阶段。更重要的是，由于缺乏大规模的标注良好的基准数据集，限制了 FD 在具有挑战性的复杂环境中的能力。为了弥补这些研究空白，我们提出了针对 FD 的增强版 YOLOV8。我们的研究通过三个关键贡献解决了这些局限性，从而做出了重大贡献。首先，我们引入了一个全面的大规模数据集，其中包括约 10,500 个图像样本和相应的注释。该数据集涵盖不同的环境条件和场景，有助于提高模型的泛化能力。然后，提出了对 YOLOV8S 模型的渐进增强，在骨干中集成了聚焦模块，以优化特征提取。此外，还在网络的可行阶段集成了卷积块注意力模块（CBAM），以改善空间和通道背景，从而实现更准确的检测，尤其是在复杂场景中。最后，通过在不同环境下进行细致的基准测试和定性验证，广泛的实证评估证明了所提出的网络优于 13 种 SOTA 技术。对模型性能、大小和处理时间等多种因素的实证研究和分析结果证明，所建议的网络具有令人印象深刻的效果。带有注释的数据集、结果以及代码的渐进修改方式将通过以下链接提供给研究界：https://github.com/habib1402/Fall-Detection-DiverseFall10500。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Visionary vigilance: Optimized YOLOV8 for fallen person detection with large-scale benchmark dataset

Falls pose a significant risk to elderly people, patients with diseases such as neurological disorders, cardiovascular diseases, and disabled children. This highlights the need for real-time intelligent fall detection (FD) systems for quick relief leading to assisted living. The existing attempts are often based on multimodal approaches which are computationally expensive due to multi-sensor integration. The computer vision (CV) based era for FD needs the deployment of state-of-the-art (SOTA) networks with progressive enhancements to grasp falls effectively. However, CV-based systems often lack the ability to operate efficiently in real-time and the attempts for visual intelligence are usually not integrated at feasible stages of the networks. More importantly, the lack of large-scale well-annotated benchmark datasets limits the ability of FD in challenging and complex environments. To bridge the research gaps, we proposed an enhanced version of YOLOV8 for FD. Our research presents significant contributions by addressing these limitations through three key contributions. Initially, a comprehensive large-scale dataset is introduced which comprises approximately 10,500 image samples with corresponding annotations. The dataset encompasses diverse environmental conditions and scenarios, facilitating the generalization ability for the models. Then, progressive enhancements to the YOLOV8S model are proposed, integrating a focus module in the backbone to optimize feature extraction. Moreover, the convolutional block attention modules (CBAMs) are integrated at the feasible stages of the network to improve spatial and channel contexts for more accurate detection, especially in complex scenes. Finally, an extensive empirical evaluation showcases the superiority of the proposed network over 13 SOTA techniques, substantiated by meticulous benchmarking and qualitative validation across varied environments. The empirical findings and analysis of multiple factors such as model performance, size, and processing time prove that the suggested network displays impressive results. Datasets with annotations, results, and the ways of progressive modifications in the code will be available to the research community at the link https://github.com/habib1402/Fall-Detection-DiverseFall10500

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.