首页 > 最新文献

2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)最新文献

英文 中文
PIDLNet: A Physics-Induced Deep Learning Network for Characterization of Crowd Videos PIDLNet:一个物理诱导的深度学习网络,用于描述人群视频
S. Behera, T. K. Vijay, H. M. Kausik, D. P. Dogra
Human visual perception regarding crowd gatherings can provide valuable information about behavioral movements. Empirical analysis on visual perception about orderly moving crowds has revealed that such movements are often structured in nature with relatively higher order parameter and lower entropy as compared to unstructured crowd, and vice-versa. This paper proposes a Physics-Induced Deep Learning Network (PIDLNet), a deep learning framework trained on conventional 3D convolutional features combined with physics-based features. We have computed frame-level entropy and order parameter from the motion flows extracted from the crowd videos. These features are then integrated with the 3D convolutional features at a later stage in the feature extraction pipeline to aid in the crowd characterization process. Experiments reveal that the proposed network can characterize video segments depicting crowd movements with accuracy as high as 91.63%. We have obtained overall AUC of 0.9913 on highly challenging publicly available video dataset. The method outperforms existing deep-learning frameworks and conventional crowd characterization frameworks by a notable margin.
人类对人群聚集的视觉感知可以提供有关行为运动的宝贵信息。对有序移动人群视觉感知的实证分析表明,与非结构化人群相比,有序移动人群往往具有较高的序参量和较低的熵,反之亦然。本文提出了一种物理诱导深度学习网络(PIDLNet),这是一种基于传统3D卷积特征和基于物理特征相结合的深度学习框架。对从人群视频中提取的运动流进行了帧级熵和序参量的计算。然后在特征提取管道的后期阶段将这些特征与3D卷积特征集成,以帮助人群表征过程。实验表明,所提出的网络可以对描述人群运动的视频片段进行表征,准确率高达91.63%。我们在极具挑战性的公开视频数据集上获得了0.9913的总体AUC。该方法明显优于现有的深度学习框架和传统的人群表征框架。
{"title":"PIDLNet: A Physics-Induced Deep Learning Network for Characterization of Crowd Videos","authors":"S. Behera, T. K. Vijay, H. M. Kausik, D. P. Dogra","doi":"10.1109/AVSS52988.2021.9663817","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663817","url":null,"abstract":"Human visual perception regarding crowd gatherings can provide valuable information about behavioral movements. Empirical analysis on visual perception about orderly moving crowds has revealed that such movements are often structured in nature with relatively higher order parameter and lower entropy as compared to unstructured crowd, and vice-versa. This paper proposes a Physics-Induced Deep Learning Network (PIDLNet), a deep learning framework trained on conventional 3D convolutional features combined with physics-based features. We have computed frame-level entropy and order parameter from the motion flows extracted from the crowd videos. These features are then integrated with the 3D convolutional features at a later stage in the feature extraction pipeline to aid in the crowd characterization process. Experiments reveal that the proposed network can characterize video segments depicting crowd movements with accuracy as high as 91.63%. We have obtained overall AUC of 0.9913 on highly challenging publicly available video dataset. The method outperforms existing deep-learning frameworks and conventional crowd characterization frameworks by a notable margin.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"308 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124388685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Virtual Inductive Loop: Real time video analytics for vehicular access control 虚拟电感回路:车辆访问控制的实时视频分析
N. Ramanathan, Allison Beach, R. Hastings, Weihong Yin, Sima Taheri, P. Brewer, Dana Eubanks, Kyoung-Jin Park, Hongli Deng, Zhong Zhang, Donald Madden, Gang Qian, Amit Mistry, Huiping Li
Automated access control entails automatically detecting incoming vehicles in real-time and allowing access only to authorized vehicles. Access control systems typically adopt one or more sensors such as inductive loops, light array sensors, wireless magnetometers in detecting vehicles at access points. This paper 1 provides a detailed account on a real-time video analytics system named the “ Virtual Inductive Loop ” (VIL), that we developed as an alternative cost-efficient solution for access control. The VIL system poses precision and recall rates over 98%, performs on par with current systems in latency towards detecting event onset and further adds a suite of additional capabilities to access control systems such as vehicle classification, tailgate detection and unusual event detection. The system was tested in live conditions in different site at a Naval Facility in the United States over a two year period. The project was funded by the Office of Naval Research (#N000l4-l7-C-7030).
自动访问控制需要实时自动检测进入的车辆,并只允许授权的车辆进入。访问控制系统通常采用一个或多个传感器,如感应回路、光阵列传感器、无线磁力计等,在接入点检测车辆。本文1详细介绍了一种名为“虚拟电感回路”(VIL)的实时视频分析系统,我们开发了它作为访问控制的另一种经济高效的解决方案。VIL系统的准确率和召回率超过98%,在检测事件发生的延迟方面与当前系统相当,并进一步为访问控制系统增加了一套额外的功能,如车辆分类、尾门检测和异常事件检测。该系统在美国海军设施的不同地点进行了为期两年的现场测试。该项目由海军研究办公室资助(# n00014 - 17 - c -7030)。
{"title":"Virtual Inductive Loop: Real time video analytics for vehicular access control","authors":"N. Ramanathan, Allison Beach, R. Hastings, Weihong Yin, Sima Taheri, P. Brewer, Dana Eubanks, Kyoung-Jin Park, Hongli Deng, Zhong Zhang, Donald Madden, Gang Qian, Amit Mistry, Huiping Li","doi":"10.1109/AVSS52988.2021.9663748","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663748","url":null,"abstract":"Automated access control entails automatically detecting incoming vehicles in real-time and allowing access only to authorized vehicles. Access control systems typically adopt one or more sensors such as inductive loops, light array sensors, wireless magnetometers in detecting vehicles at access points. This paper 1 provides a detailed account on a real-time video analytics system named the “ Virtual Inductive Loop ” (VIL), that we developed as an alternative cost-efficient solution for access control. The VIL system poses precision and recall rates over 98%, performs on par with current systems in latency towards detecting event onset and further adds a suite of additional capabilities to access control systems such as vehicle classification, tailgate detection and unusual event detection. The system was tested in live conditions in different site at a Naval Facility in the United States over a two year period. The project was funded by the Office of Naval Research (#N000l4-l7-C-7030).","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125531081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Sample Weighting and Score Aggregation Method for Multi-query Object Matching 多查询对象匹配的样本加权和分数聚合方法
Jangwon Lee, Gang Qian, Allison Beach
In this paper, we propose a simple and effective method to properly assign weights to the query samples and compute aggregated matching scores using these weights in multi-query object matching. Multi-query object matching commonly exists in many real-life problems such as finding suspicious objects in surveillance videos. In this problem, a query object is represented by multiple samples and the matching candidates in a database are ranked according to their similarities to these query samples. In this context, query samples are not equally effective to find the target object in the database, thus one of the key challenges is how to measure the effectiveness of each query to find the correct matching object. So far, however, very little attention has been paid to address this issue. Therefore, we propose a simple but effective way, Inverse Model Frequency (IMF), to measure of matching effectiveness of query samples. Furthermore, we introduce a new score aggregation method to boost the object matching performance given multiple queries. We tested the proposed method for vehicle re-identification and image retrieval tasks. Our proposed approach achieves state-of-the-art matching accuracy on two vehicle re-identification datasets (VehicleID/VeRi-776) and two image retrieval datasets (the original & revisited Oxford/Paris). The proposed approach can seamlessly plug into many existing multi-query object matching approaches to further boost their performance with minimal effort.
在多查询对象匹配中,我们提出了一种简单有效的方法来为查询样本分配合适的权值,并利用这些权值计算聚合匹配分数。多查询对象匹配在监控视频中的可疑对象查找等现实问题中普遍存在。在该问题中,一个查询对象由多个样本表示,数据库中匹配的候选对象根据与这些查询样本的相似度进行排序。在这种情况下,查询样本在数据库中查找目标对象的效率并不相同,因此关键的挑战之一是如何度量每个查询查找正确匹配对象的效率。然而,到目前为止,很少注意解决这个问题。因此,我们提出了一种简单而有效的方法——逆模型频率(IMF)来衡量查询样本的匹配有效性。此外,我们引入了一种新的分数聚合方法来提高给定多个查询的对象匹配性能。我们对该方法进行了车辆再识别和图像检索任务的测试。我们提出的方法在两个车辆重新识别数据集(VehicleID/VeRi-776)和两个图像检索数据集(原始和重新访问的牛津/巴黎)上实现了最先进的匹配精度。所提出的方法可以无缝地插入许多现有的多查询对象匹配方法,从而以最小的努力进一步提高它们的性能。
{"title":"A Sample Weighting and Score Aggregation Method for Multi-query Object Matching","authors":"Jangwon Lee, Gang Qian, Allison Beach","doi":"10.1109/AVSS52988.2021.9663848","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663848","url":null,"abstract":"In this paper, we propose a simple and effective method to properly assign weights to the query samples and compute aggregated matching scores using these weights in multi-query object matching. Multi-query object matching commonly exists in many real-life problems such as finding suspicious objects in surveillance videos. In this problem, a query object is represented by multiple samples and the matching candidates in a database are ranked according to their similarities to these query samples. In this context, query samples are not equally effective to find the target object in the database, thus one of the key challenges is how to measure the effectiveness of each query to find the correct matching object. So far, however, very little attention has been paid to address this issue. Therefore, we propose a simple but effective way, Inverse Model Frequency (IMF), to measure of matching effectiveness of query samples. Furthermore, we introduce a new score aggregation method to boost the object matching performance given multiple queries. We tested the proposed method for vehicle re-identification and image retrieval tasks. Our proposed approach achieves state-of-the-art matching accuracy on two vehicle re-identification datasets (VehicleID/VeRi-776) and two image retrieval datasets (the original & revisited Oxford/Paris). The proposed approach can seamlessly plug into many existing multi-query object matching approaches to further boost their performance with minimal effort.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127183895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hazardous Events Detection in Automatic Train Doors Vicinity Using Deep Neural Networks 基于深度神经网络的列车自动门附近危险事件检测
Olivier Laurendin, S. Ambellouis, A. Fleury, Ankur Mahtani, Sanaa Chafik, Clément Strauss
In the field of train transportation, personal injuries due to train automatic doors are still a common occurrence. This paper aims at implementing a computer vision solution as part of a safety detection system to identify automatic doors-related hazardous events to reduce their occurrence and their severity. Deep anomaly detection algorithms are often applied on CCTV video feeds to identify such hazardous events. However, the anomalous events identified by those algorithms are often simpler than most common occurrences in transport environments, hindering their widespread usage. Since such events are of quite a diverse nature and no dataset featuring them exist, we create a specilically-tailored dataset composed of real-case scenarios of hazardous events near train doors. We then study an anomaly detection algorithm from the literature on this dataset and propose a set of modifications to better adapt it to our railway context and to subsequently ease its application to a wider range of use-cases.
在火车运输领域,由于列车自动门造成的人身伤害仍然是一个常见的事件。本文旨在实现计算机视觉解决方案,作为安全检测系统的一部分,以识别自动门相关的危险事件,以减少其发生和严重程度。深度异常检测算法通常应用于CCTV视频馈送来识别此类危险事件。然而,这些算法识别的异常事件通常比传输环境中最常见的事件更简单,阻碍了它们的广泛使用。由于此类事件具有相当多样化的性质,并且没有具有它们的数据集存在,因此我们创建了一个专门定制的数据集,该数据集由火车门附近危险事件的真实情况组成。然后,我们从该数据集的文献中研究了一种异常检测算法,并提出了一组修改,以更好地使其适应我们的铁路环境,并随后简化其应用于更广泛的用例。
{"title":"Hazardous Events Detection in Automatic Train Doors Vicinity Using Deep Neural Networks","authors":"Olivier Laurendin, S. Ambellouis, A. Fleury, Ankur Mahtani, Sanaa Chafik, Clément Strauss","doi":"10.1109/AVSS52988.2021.9663863","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663863","url":null,"abstract":"In the field of train transportation, personal injuries due to train automatic doors are still a common occurrence. This paper aims at implementing a computer vision solution as part of a safety detection system to identify automatic doors-related hazardous events to reduce their occurrence and their severity. Deep anomaly detection algorithms are often applied on CCTV video feeds to identify such hazardous events. However, the anomalous events identified by those algorithms are often simpler than most common occurrences in transport environments, hindering their widespread usage. Since such events are of quite a diverse nature and no dataset featuring them exist, we create a specilically-tailored dataset composed of real-case scenarios of hazardous events near train doors. We then study an anomaly detection algorithm from the literature on this dataset and propose a set of modifications to better adapt it to our railway context and to subsequently ease its application to a wider range of use-cases.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128147144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
D4FLY Multimodal Biometric Database: multimodal fusion evaluation envisaging on-the-move biometric-based border control D4FLY多模态生物特征数据库:多模态融合评估,设想基于移动生物特征的边境控制
Lulu Chen, Jonathan N. Boyle, A. Danelakis, J. Ferryman, Simone Ferstl, Damjan Gicic, A. Grudzien, André Howe, M. Kowalski, Krzysztof Mierzejewski, T. Theoharis
This work presents a novel multimodal biometric dataset with emerging biometric traits including 3D face, thermal face, iris on-the-move, iris mobile, somatotype and smartphone sensors. This dataset was created to resemble on-the-move characteristics in applications such as border control. The five types of biometric traits were selected as they can be captured while on-the-move, are contactless, and show potential for use in a multimodal fusion verification system in a border control scenario. Innovative sensor hardware was used in the data capture. The data featuring these biometric traits will be a valuable contribution to advancing biometric fusion research in general. Baseline evaluation was performed on each unimodal dataset. Multimodal fusion was evaluated based on various scenarios for comparison. Real-time performance is presented based on an Automated Border Control (ABC) scenario.
这项工作提出了一个新的多模态生物特征数据集,其中包括3D人脸、热人脸、移动虹膜、移动虹膜、体型和智能手机传感器。创建此数据集是为了模拟边界控制等应用程序中的移动特征。之所以选择这五种类型的生物特征,是因为它们可以在移动中捕获,是非接触式的,并且在边境控制场景中的多模式融合验证系统中显示出使用潜力。数据采集采用了创新的传感器硬件。具有这些生物特征的数据将为推进生物特征融合研究做出宝贵贡献。对每个单峰数据集进行基线评估。基于不同场景对多模态融合进行评估以进行比较。实时性能基于自动边界控制(ABC)场景。
{"title":"D4FLY Multimodal Biometric Database: multimodal fusion evaluation envisaging on-the-move biometric-based border control","authors":"Lulu Chen, Jonathan N. Boyle, A. Danelakis, J. Ferryman, Simone Ferstl, Damjan Gicic, A. Grudzien, André Howe, M. Kowalski, Krzysztof Mierzejewski, T. Theoharis","doi":"10.1109/AVSS52988.2021.9663737","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663737","url":null,"abstract":"This work presents a novel multimodal biometric dataset with emerging biometric traits including 3D face, thermal face, iris on-the-move, iris mobile, somatotype and smartphone sensors. This dataset was created to resemble on-the-move characteristics in applications such as border control. The five types of biometric traits were selected as they can be captured while on-the-move, are contactless, and show potential for use in a multimodal fusion verification system in a border control scenario. Innovative sensor hardware was used in the data capture. The data featuring these biometric traits will be a valuable contribution to advancing biometric fusion research in general. Baseline evaluation was performed on each unimodal dataset. Multimodal fusion was evaluated based on various scenarios for comparison. Real-time performance is presented based on an Automated Border Control (ABC) scenario.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128270894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Video Analytic System for Rail Crossing Point Protection 铁路过境点保护视频分析系统
Guangliang Zhao, Ashok Pandey, Ming-Ching Chang, Siwei Lyu
With the rise of AI deep learning, video surveillance based on deep neural networks can provide real-time detection and tracking of vehicles and pedestrians. We present a video analytic system for monitoring railway crossing and providing security protection for rail intersections. Our system can automatically determine the rail-crossing gate status via visual detection and analyze traffic by detecting and tracking passing vehicles, thus to oversee a set of rail-transportation related safety events. Assuming a fixed camera view, each gate RoI can be manually annotated once for each site during system setup, and then gate status can be automatically detected afterwards. Vehicles are detected using YOLOv4 and multi-target tracking is performed using DeepSORT. Safety-related events including trespassing are continuously monitored using rule-based triggering. Experimental evaluation is performed on a Youtube rail crossing dataset as well as a private dataset. On the private dataset of 76 total minutes from 38 videos, our system can successfully detect all 56 events out of 58 annotated events. On the public dataset of 14.21 hrs of videos, it detects 58 out of 62 events.
随着人工智能深度学习的兴起,基于深度神经网络的视频监控可以对车辆和行人进行实时检测和跟踪。提出了一种监控铁路道口的视频分析系统,为铁路道口提供安全保护。我们的系统可以通过视觉检测自动确定铁路道口的状态,并通过检测和跟踪过往车辆来分析交通,从而监督一系列与铁路运输相关的安全事件。假设相机视图固定,在系统设置过程中可以对每个站点手动标注一次闸门RoI,然后自动检测闸门状态。使用YOLOv4检测车辆,使用DeepSORT进行多目标跟踪。使用基于规则的触发持续监控与安全相关的事件,包括非法侵入。实验评估是在Youtube轨道交叉数据集以及私有数据集上进行的。在38个视频的76分钟私有数据集上,我们的系统可以成功检测出58个注释事件中的56个事件。在14.21小时的公共视频数据集中,它检测到62个事件中的58个。
{"title":"A Video Analytic System for Rail Crossing Point Protection","authors":"Guangliang Zhao, Ashok Pandey, Ming-Ching Chang, Siwei Lyu","doi":"10.1109/AVSS52988.2021.9663781","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663781","url":null,"abstract":"With the rise of AI deep learning, video surveillance based on deep neural networks can provide real-time detection and tracking of vehicles and pedestrians. We present a video analytic system for monitoring railway crossing and providing security protection for rail intersections. Our system can automatically determine the rail-crossing gate status via visual detection and analyze traffic by detecting and tracking passing vehicles, thus to oversee a set of rail-transportation related safety events. Assuming a fixed camera view, each gate RoI can be manually annotated once for each site during system setup, and then gate status can be automatically detected afterwards. Vehicles are detected using YOLOv4 and multi-target tracking is performed using DeepSORT. Safety-related events including trespassing are continuously monitored using rule-based triggering. Experimental evaluation is performed on a Youtube rail crossing dataset as well as a private dataset. On the private dataset of 76 total minutes from 38 videos, our system can successfully detect all 56 events out of 58 annotated events. On the public dataset of 14.21 hrs of videos, it detects 58 out of 62 events.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130107814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Action Recognition with Domain Invariant Features of Skeleton Image 基于域不变性特征的骨骼图像动作识别
Han Chen, Yifan Jiang, Hanseok Ko
Due to the fast processing-speed and robustness it can achieve, skeleton-based action recognition has recently received the attention of the computer vision community. The recent Convolutional Neural Network (CNN)-based methods have shown commendable performance in learning spatio-temporal representations for skeleton sequence, which use skeleton image as input to a CNN. Since the CNN-based methods mainly encoding the temporal and skeleton joints simply as rows and columns, respectively, the latent correlation related to all joints may be lost caused by the 2D convolution. To solve this problem, we propose a novel CNN-based method with adversarial training for action recognition. We introduce a two-level domain adversarial learning to align the features of skeleton images from different view angles or subjects, respectively, thus further improve the generalization. We evaluated our proposed method on NTU RGB+D. It achieves competitive results compared with state-of-the-art methods and 2.4%, 1.9%accuracy gain than the baseline for cross-subject and cross-view.
基于骨架的动作识别由于具有较快的处理速度和鲁棒性,近年来受到了计算机视觉界的关注。最近基于卷积神经网络(CNN)的方法在学习骨架序列的时空表征方面表现优异,该方法将骨架图像作为卷积神经网络的输入。由于基于cnn的方法主要将颞骨关节和骨骼关节分别简单地编码为行和列,因此可能会由于二维卷积而丢失所有关节的潜在相关性。为了解决这个问题,我们提出了一种新的基于cnn的对抗训练的动作识别方法。我们引入了一种两级域对抗学习,分别对不同视角或不同主体的骨架图像特征进行对齐,从而进一步提高了泛化能力。我们在NTU RGB+D上对我们的方法进行了评估。与最先进的方法相比,它取得了具有竞争力的结果,并且在交叉主题和交叉视图上的准确率比基线提高了2.4%,1.9%。
{"title":"Action Recognition with Domain Invariant Features of Skeleton Image","authors":"Han Chen, Yifan Jiang, Hanseok Ko","doi":"10.1109/AVSS52988.2021.9663824","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663824","url":null,"abstract":"Due to the fast processing-speed and robustness it can achieve, skeleton-based action recognition has recently received the attention of the computer vision community. The recent Convolutional Neural Network (CNN)-based methods have shown commendable performance in learning spatio-temporal representations for skeleton sequence, which use skeleton image as input to a CNN. Since the CNN-based methods mainly encoding the temporal and skeleton joints simply as rows and columns, respectively, the latent correlation related to all joints may be lost caused by the 2D convolution. To solve this problem, we propose a novel CNN-based method with adversarial training for action recognition. We introduce a two-level domain adversarial learning to align the features of skeleton images from different view angles or subjects, respectively, thus further improve the generalization. We evaluated our proposed method on NTU RGB+D. It achieves competitive results compared with state-of-the-art methods and 2.4%, 1.9%accuracy gain than the baseline for cross-subject and cross-view.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130277943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Position-aware Location Regression Network for Temporal Video Grounding 基于时间视频接地的位置感知定位回归网络
Sunoh Kim, Kimin Yun, J. Choi
The key to successful grounding for video surveillance is to understand a semantic phrase corresponding to important actors and objects. Conventional methods ignore comprehensive contexts for the phrase or require heavy computation for multiple phrases. To understand comprehensive contexts with only one semantic phrase, we propose Position-aware Location Regression Network (PLRN) which exploits position-aware features of a query and a video. Specifically, PLRN first encodes both the video and query using positional information of words and video segments. Then, a semantic phrase feature is extracted from an encoded query with attention. The semantic phrase feature and encoded video are merged and made into a context-aware feature by reflecting local and global contexts. Finally, PLRN predicts start, end, center, and width values of a grounding boundary. Our experiments show that PLRN achieves competitive performance over existing methods with less computation time and memory.
视频监控成功接地的关键是理解与重要行为者和对象相对应的语义短语。传统方法忽略了短语的综合上下文,或者需要对多个短语进行大量计算。为了仅用一个语义短语理解全面的上下文,我们提出了位置感知位置回归网络(PLRN),该网络利用查询和视频的位置感知特征。具体来说,PLRN首先使用单词和视频片段的位置信息对视频和查询进行编码。然后,从带有注意的编码查询中提取语义短语特征。将语义短语特征与编码视频相结合,通过反映局部和全局上下文,形成上下文感知特征。最后,PLRN预测接地边界的起始、结束、中心和宽度值。我们的实验表明,与现有方法相比,PLRN在计算时间和内存方面具有竞争力。
{"title":"Position-aware Location Regression Network for Temporal Video Grounding","authors":"Sunoh Kim, Kimin Yun, J. Choi","doi":"10.1109/AVSS52988.2021.9663815","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663815","url":null,"abstract":"The key to successful grounding for video surveillance is to understand a semantic phrase corresponding to important actors and objects. Conventional methods ignore comprehensive contexts for the phrase or require heavy computation for multiple phrases. To understand comprehensive contexts with only one semantic phrase, we propose Position-aware Location Regression Network (PLRN) which exploits position-aware features of a query and a video. Specifically, PLRN first encodes both the video and query using positional information of words and video segments. Then, a semantic phrase feature is extracted from an encoded query with attention. The semantic phrase feature and encoded video are merged and made into a context-aware feature by reflecting local and global contexts. Finally, PLRN predicts start, end, center, and width values of a grounding boundary. Our experiments show that PLRN achieves competitive performance over existing methods with less computation time and memory.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133519614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Splittable DNN-Based Object Detector for Edge-Cloud Collaborative Real-Time Video Inference 基于可分割dnn的边缘云协同实时视频推断目标检测器
Joochan Lee, Yongwoo Kim, Sungtae Moon, J. Ko
While recent advances in deep neural networks (DNNs) enabled remarkable performance on various computer vision tasks, it is challenging for edge devices to perform real-time inference of complex DNN models due to their stringent resource constraint. To enhance the inference throughput, recent studies proposed collaborative intelligence (CI) that splits DNN computation into edge and cloud platforms, mostly for simple tasks such as image classification. However, for general DNN-based object detectors with a branching architecture, CI is highly restricted because of a significant feature transmission overhead. To solve this issue, this paper proposes a splittable object detector that enables edge-cloud collaborative real-time video inference. The proposed architecture includes a feature reconstruction network that can generate multiple features required for detection using a small-sized feature from the edge-side extractor. Asymmetric scaling on the feature extractor and reconstructor further reduces the transmitted feature size and edge inference latency, while maintaining detection accuracy. The performance evaluation using Yolov5 shows that the proposed model achieves 28 fps (2.45X and 1.56X higher than edge-only and cloud-only inference, respectively), on the NVIDIA Jetson TX2 platform in WiFi environment.
虽然深度神经网络(DNN)的最新进展在各种计算机视觉任务上取得了显着的性能,但由于其严格的资源约束,边缘设备对复杂的DNN模型进行实时推理是具有挑战性的。为了提高推理吞吐量,最近的研究提出了协作智能(CI),将深度神经网络计算分为边缘和云平台,主要用于图像分类等简单任务。然而,对于具有分支架构的一般基于dnn的目标检测器,由于显著的特征传输开销,CI受到高度限制。为了解决这一问题,本文提出了一种可分割的目标检测器,实现边缘云协同实时视频推理。所提出的体系结构包括一个特征重建网络,该网络可以使用边缘提取器的小尺寸特征生成检测所需的多个特征。特征提取器和重构器的非对称缩放进一步减小了传输的特征大小和边缘推断延迟,同时保持了检测精度。使用Yolov5进行性能评估表明,在NVIDIA Jetson TX2平台上,在WiFi环境下,所提出的模型达到了28 fps(分别比edge-only和cloud-only推理高2.45X和1.56X)。
{"title":"A Splittable DNN-Based Object Detector for Edge-Cloud Collaborative Real-Time Video Inference","authors":"Joochan Lee, Yongwoo Kim, Sungtae Moon, J. Ko","doi":"10.1109/AVSS52988.2021.9663806","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663806","url":null,"abstract":"While recent advances in deep neural networks (DNNs) enabled remarkable performance on various computer vision tasks, it is challenging for edge devices to perform real-time inference of complex DNN models due to their stringent resource constraint. To enhance the inference throughput, recent studies proposed collaborative intelligence (CI) that splits DNN computation into edge and cloud platforms, mostly for simple tasks such as image classification. However, for general DNN-based object detectors with a branching architecture, CI is highly restricted because of a significant feature transmission overhead. To solve this issue, this paper proposes a splittable object detector that enables edge-cloud collaborative real-time video inference. The proposed architecture includes a feature reconstruction network that can generate multiple features required for detection using a small-sized feature from the edge-side extractor. Asymmetric scaling on the feature extractor and reconstructor further reduces the transmitted feature size and edge inference latency, while maintaining detection accuracy. The performance evaluation using Yolov5 shows that the proposed model achieves 28 fps (2.45X and 1.56X higher than edge-only and cloud-only inference, respectively), on the NVIDIA Jetson TX2 platform in WiFi environment.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"258 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116454069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Multi-Stream Approach for Seizure Classification with Knowledge Distillation 基于知识精馏的多流癫痫分类方法
Jen-Cheng Hou, A. McGonigal, F. Bartolomei, M. Thonnat
In this work, we propose a multi-stream approach with knowledge distillation to classify epileptic seizures and psychogenic non-epileptic seizures. The proposed framework utilizes multi-stream information from keypoints and appearance from both body and face. We take the detected keypoints through time as spatio-temporal graph and train it with an adaptive graph convolutional networks to model the spatio-temporal dynamics throughout the seizure event. Besides, we regularize the keypoint features with complementary information from the appearance stream by imposing a knowledge distillation mechanism. We demonstrate the effectiveness of our approach by conducting experiments on real-world seizure videos. The experiments are conducted by both seizure-wise cross validation and leave-one-subject-out validation, and with the proposed model, the performances of the F1-scorelaccuracy are 0.89/0.87 for seizure-wise cross validation, and 0.75/0.72 for leave-one-subject-out validation.
在这项工作中,我们提出了一种知识蒸馏的多流方法来分类癫痫发作和心因性非癫痫发作。该框架利用来自关键点的多流信息和来自身体和面部的外观信息。我们将检测到的关键点作为时空图,并使用自适应图卷积网络对其进行训练,以模拟整个癫痫发作事件的时空动态。此外,通过引入知识蒸馏机制,利用外观流中的互补信息对关键点特征进行正则化。我们通过对真实世界的癫痫视频进行实验来证明我们方法的有效性。同时进行了抓控交叉验证和留一被试验证,模型的F1-scorelaccuracy在抓控交叉验证时为0.89/0.87,在留一被试验证时为0.75/0.72。
{"title":"A Multi-Stream Approach for Seizure Classification with Knowledge Distillation","authors":"Jen-Cheng Hou, A. McGonigal, F. Bartolomei, M. Thonnat","doi":"10.1109/AVSS52988.2021.9663770","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663770","url":null,"abstract":"In this work, we propose a multi-stream approach with knowledge distillation to classify epileptic seizures and psychogenic non-epileptic seizures. The proposed framework utilizes multi-stream information from keypoints and appearance from both body and face. We take the detected keypoints through time as spatio-temporal graph and train it with an adaptive graph convolutional networks to model the spatio-temporal dynamics throughout the seizure event. Besides, we regularize the keypoint features with complementary information from the appearance stream by imposing a knowledge distillation mechanism. We demonstrate the effectiveness of our approach by conducting experiments on real-world seizure videos. The experiments are conducted by both seizure-wise cross validation and leave-one-subject-out validation, and with the proposed model, the performances of the F1-scorelaccuracy are 0.89/0.87 for seizure-wise cross validation, and 0.75/0.72 for leave-one-subject-out validation.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122869562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1