基于学习和视觉的方法，利用视频数据对自然场景中的人体跌倒进行检测和分类

IF 1.8 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2024-08-10 DOI:10.1145/3687125

Shashvat Singh, Kumkum Kumari, A. Vaish

{"title":"基于学习和视觉的方法，利用视频数据对自然场景中的人体跌倒进行检测和分类","authors":"Shashvat Singh, Kumkum Kumari, A. Vaish","doi":"10.1145/3687125","DOIUrl":null,"url":null,"abstract":"The advancement of medicine presents challenges for modern cultures, especially with unpredictable elderly falling incidents anywhere due to serious health issues. Delayed rescue for at-risk elders can be dangerous. Traditional elder safety methods like video surveillance or wearable sensors are inefficient and burdensome, wasting human resources and requiring caregivers' constant fall detection monitoring. Thus, a more effective and convenient solution is needed to ensure elderly safety. In this paper, a method is presented for detecting human falls in naturally occurring scenes using videos through a traditional Convolutional Neural Network (CNN) model, Inception-v3, VGG-19 and two versions of the You Only Look Once (YOLO) working model. The primary focus of this work is human fall detection through the utilization of deep learning models. Specifically, the YOLO approach is adopted for object detection and tracking in video scenes. By implementing YOLO, human subjects are identified, and bounding boxes are generated around them. The classification of various human activities, including fall detection is accomplished through the analysis of deformation features extracted from these bounding boxes. The traditional CNN model achieves an impressive 99.83% accuracy in human fall detection, surpassing other state-of-the-art methods. However, training time is longer compared to YOLO-v2 and YOLO-v3, but significantly shorter than Inception-v3, taking only around 10% of its total training time.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning and Vision-based approach for Human fall detection and classification in naturally occurring scenes using video data\",\"authors\":\"Shashvat Singh, Kumkum Kumari, A. Vaish\",\"doi\":\"10.1145/3687125\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The advancement of medicine presents challenges for modern cultures, especially with unpredictable elderly falling incidents anywhere due to serious health issues. Delayed rescue for at-risk elders can be dangerous. Traditional elder safety methods like video surveillance or wearable sensors are inefficient and burdensome, wasting human resources and requiring caregivers' constant fall detection monitoring. Thus, a more effective and convenient solution is needed to ensure elderly safety. In this paper, a method is presented for detecting human falls in naturally occurring scenes using videos through a traditional Convolutional Neural Network (CNN) model, Inception-v3, VGG-19 and two versions of the You Only Look Once (YOLO) working model. The primary focus of this work is human fall detection through the utilization of deep learning models. Specifically, the YOLO approach is adopted for object detection and tracking in video scenes. By implementing YOLO, human subjects are identified, and bounding boxes are generated around them. The classification of various human activities, including fall detection is accomplished through the analysis of deformation features extracted from these bounding boxes. The traditional CNN model achieves an impressive 99.83% accuracy in human fall detection, surpassing other state-of-the-art methods. However, training time is longer compared to YOLO-v2 and YOLO-v3, but significantly shorter than Inception-v3, taking only around 10% of its total training time.\",\"PeriodicalId\":54312,\"journal\":{\"name\":\"ACM Transactions on Asian and Low-Resource Language Information Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-08-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Asian and Low-Resource Language Information Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3687125\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Asian and Low-Resource Language Information Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3687125","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

医学的发展给现代文化带来了挑战，尤其是由于严重健康问题而在任何地方发生的不可预测的老人跌倒事件。延误对高危老人的救援可能会带来危险。传统的老年人安全方法，如视频监控或可穿戴传感器，既低效又繁琐，既浪费人力资源，又需要护理人员持续监测跌倒情况。因此，需要一种更有效、更便捷的解决方案来确保老年人的安全。本文介绍了一种通过传统卷积神经网络（CNN）模型、Inception-v3、VGG-19 和两个版本的 "只看一眼"（YOLO）工作模型，在自然发生的场景中利用视频检测人体跌倒的方法。这项工作的主要重点是通过利用深度学习模型进行人体跌倒检测。具体来说，YOLO 方法被用于视频场景中的物体检测和跟踪。通过实施 YOLO，可以识别出人类主体，并在其周围生成边界框。通过分析从这些边界框中提取的形变特征，可以完成包括跌倒检测在内的各种人类活动的分类。传统的 CNN 模型在人类跌倒检测方面达到了令人印象深刻的 99.83% 的准确率，超过了其他最先进的方法。不过，与 YOLO-v2 和 YOLO-v3 相比，训练时间较长，但与 Inception-v3 相比，训练时间明显缩短，仅占总训练时间的 10%左右。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Learning and Vision-based approach for Human fall detection and classification in naturally occurring scenes using video data

The advancement of medicine presents challenges for modern cultures, especially with unpredictable elderly falling incidents anywhere due to serious health issues. Delayed rescue for at-risk elders can be dangerous. Traditional elder safety methods like video surveillance or wearable sensors are inefficient and burdensome, wasting human resources and requiring caregivers' constant fall detection monitoring. Thus, a more effective and convenient solution is needed to ensure elderly safety. In this paper, a method is presented for detecting human falls in naturally occurring scenes using videos through a traditional Convolutional Neural Network (CNN) model, Inception-v3, VGG-19 and two versions of the You Only Look Once (YOLO) working model. The primary focus of this work is human fall detection through the utilization of deep learning models. Specifically, the YOLO approach is adopted for object detection and tracking in video scenes. By implementing YOLO, human subjects are identified, and bounding boxes are generated around them. The classification of various human activities, including fall detection is accomplished through the analysis of deformation features extracted from these bounding boxes. The traditional CNN model achieves an impressive 99.83% accuracy in human fall detection, surpassing other state-of-the-art methods. However, training time is longer compared to YOLO-v2 and YOLO-v3, but significantly shorter than Inception-v3, taking only around 10% of its total training time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Asian and Low-Resource Language Information Processing Computer Science-General Computer Science

CiteScore

3.60

自引率

15.00%

发文量

241

期刊介绍： The ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) publishes high quality original archival papers and technical notes in the areas of computation and processing of information in Asian languages, low-resource languages of Africa, Australasia, Oceania and the Americas, as well as related disciplines. The subject areas covered by TALLIP include, but are not limited to: -Computational Linguistics: including computational phonology, computational morphology, computational syntax (e.g. parsing), computational semantics, computational pragmatics, etc. -Linguistic Resources: including computational lexicography, terminology, electronic dictionaries, cross-lingual dictionaries, electronic thesauri, etc. -Hardware and software algorithms and tools for Asian or low-resource language processing, e.g., handwritten character recognition. -Information Understanding: including text understanding, speech understanding, character recognition, discourse processing, dialogue systems, etc. -Machine Translation involving Asian or low-resource languages. -Information Retrieval: including natural language processing (NLP) for concept-based indexing, natural language query interfaces, semantic relevance judgments, etc. -Information Extraction and Filtering: including automatic abstraction, user profiling, etc. -Speech processing: including text-to-speech synthesis and automatic speech recognition. -Multimedia Asian Information Processing: including speech, image, video, image/text translation, etc. -Cross-lingual information processing involving Asian or low-resource languages. -Papers that deal in theory, systems design, evaluation and applications in the aforesaid subjects are appropriate for TALLIP. Emphasis will be placed on the originality and the practical significance of the reported research.