{"title":"Recognition of Human Interactions in Still Images using AdaptiveDRNet with Multi-level Attention","authors":"Arnab Dey, Samit Biswas, Dac-Nhoung Le","doi":"10.14569/ijacsa.2023.01410103","DOIUrl":null,"url":null,"abstract":"Human-Human Interaction Recognition (H2HIR) is a multidisciplinary field that combines computer vision, deep learning, and psychology. Its primary objective is to decode and understand the intricacies of human-human interactions. H2HIR holds significant importance across various domains as it enables machines to perceive, comprehend, and respond to human social behaviors, gestures, and communication patterns. This study aims to identify human-human interactions from just one frame, i.e. from an image. Diverging from the realm of video-based inter-action recognition, a well-established research domain that relies on the utilization of spatio-temporal information, the complexity of the task escalates significantly when dealing with still images due to the absence of these intrinsic spatio-temporal features. This research introduces a novel deep learning model called AdaptiveDRNet with Multi-level Attention to recognize Human-Human (H2H) interactions. Our proposed method demonstrates outstanding performance on the Human-Human Interaction Image dataset (H2HID), encompassing 4049 meticulously curated images representing fifteen distinct human interactions and on the publicly accessible HII and HIIv2 related benchmark datasets. Notably, our proposed model excels with a validation accuracy of 97.20% in the classification of human-human interaction images, surpassing the performance of EfficientNet, InceptionResNetV2, NASNet Mobile, ConvXNet, ResNet50, and VGG-16 models. H2H interaction recognition’s significance lies in its capacity to enhance communication, improve decision-making, and ultimately contribute to the well-being and efficiency of individuals and society as a whole.","PeriodicalId":13824,"journal":{"name":"International Journal of Advanced Computer Science and Applications","volume":"137 1","pages":"0"},"PeriodicalIF":0.7000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advanced Computer Science and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14569/ijacsa.2023.01410103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Human-Human Interaction Recognition (H2HIR) is a multidisciplinary field that combines computer vision, deep learning, and psychology. Its primary objective is to decode and understand the intricacies of human-human interactions. H2HIR holds significant importance across various domains as it enables machines to perceive, comprehend, and respond to human social behaviors, gestures, and communication patterns. This study aims to identify human-human interactions from just one frame, i.e. from an image. Diverging from the realm of video-based inter-action recognition, a well-established research domain that relies on the utilization of spatio-temporal information, the complexity of the task escalates significantly when dealing with still images due to the absence of these intrinsic spatio-temporal features. This research introduces a novel deep learning model called AdaptiveDRNet with Multi-level Attention to recognize Human-Human (H2H) interactions. Our proposed method demonstrates outstanding performance on the Human-Human Interaction Image dataset (H2HID), encompassing 4049 meticulously curated images representing fifteen distinct human interactions and on the publicly accessible HII and HIIv2 related benchmark datasets. Notably, our proposed model excels with a validation accuracy of 97.20% in the classification of human-human interaction images, surpassing the performance of EfficientNet, InceptionResNetV2, NASNet Mobile, ConvXNet, ResNet50, and VGG-16 models. H2H interaction recognition’s significance lies in its capacity to enhance communication, improve decision-making, and ultimately contribute to the well-being and efficiency of individuals and society as a whole.
期刊介绍:
IJACSA is a scholarly computer science journal representing the best in research. Its mission is to provide an outlet for quality research to be publicised and published to a global audience. The journal aims to publish papers selected through rigorous double-blind peer review to ensure originality, timeliness, relevance, and readability. In sync with the Journal''s vision "to be a respected publication that publishes peer reviewed research articles, as well as review and survey papers contributed by International community of Authors", we have drawn reviewers and editors from Institutions and Universities across the globe. A double blind peer review process is conducted to ensure that we retain high standards. At IJACSA, we stand strong because we know that global challenges make way for new innovations, new ways and new talent. International Journal of Advanced Computer Science and Applications publishes carefully refereed research, review and survey papers which offer a significant contribution to the computer science literature, and which are of interest to a wide audience. Coverage extends to all main-stream branches of computer science and related applications