首页 > 最新文献

2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)最新文献

英文 中文
Single-Stage UAV Detection and Classification with YOLOV5: Mosaic Data Augmentation and PANet 基于YOLOV5的单级无人机检测与分类:拼接数据增强和PANet
Fardad Dadboud, Vaibhav Patel, Varun Mehta, M. Bolic, I. Mantegh
In Drone-vs-Bird Detection Challenge in conjunction with the 4th International Workshop on Small-Drone Surveillance, Detection and Counteraction Techniques at IEEE AVSS 2021, we proposed a YOLOV5-based object detection model for small UAV detection and classification. YOLOV5 leverages PANet neck and mosaic augmentation which help in improving detection of small objects. We have combined the challenge dataset with one of the publicly available UAV air to air dataset having complex background and lighting conditions for training the model. The proposed approach achieved 0.96 Recall, $0.98 mAP_{0.5}$, and $0.71 mAP_{0.5:0.95}$ on the 10% randomly sampled dataset from the whole dataset.
结合IEEE AVSS 2021第四届小型无人机监视、检测和对抗技术国际研讨会,我们提出了一种基于yolov5的小型无人机目标检测模型,用于小型无人机的检测和分类。YOLOV5利用PANet颈部和马赛克增强,有助于提高对小物体的检测。我们将挑战数据集与一个公开可用的无人机空对空数据集相结合,该数据集具有复杂的背景和照明条件,用于训练模型。该方法在整个数据集中随机抽取10%的数据集上实现了0.96 Recall, $0.98 mAP_{0.5}$和$0.71 mAP_{0.5:0.95}$。
{"title":"Single-Stage UAV Detection and Classification with YOLOV5: Mosaic Data Augmentation and PANet","authors":"Fardad Dadboud, Vaibhav Patel, Varun Mehta, M. Bolic, I. Mantegh","doi":"10.1109/AVSS52988.2021.9663841","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663841","url":null,"abstract":"In Drone-vs-Bird Detection Challenge in conjunction with the 4th International Workshop on Small-Drone Surveillance, Detection and Counteraction Techniques at IEEE AVSS 2021, we proposed a YOLOV5-based object detection model for small UAV detection and classification. YOLOV5 leverages PANet neck and mosaic augmentation which help in improving detection of small objects. We have combined the challenge dataset with one of the publicly available UAV air to air dataset having complex background and lighting conditions for training the model. The proposed approach achieved 0.96 Recall, $0.98 mAP_{0.5}$, and $0.71 mAP_{0.5:0.95}$ on the 10% randomly sampled dataset from the whole dataset.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"267 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122553409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Person Localisation under Fragmented Occlusion* 碎片遮挡下的人定位*
R. Pflugfelder, Jonas Auer
Occlusion is a fundamental challenge in object recognition. Fragmented occlusion is much more challenging than ordinary partial occlusion and occurs in natural environments such as forests. Less is known in computer vision about fragmented occlusion and object recognition. Interestingly, human vision has far more explored this problem as the human visual system evolved to fragmented occlusion at the times when mankind occupied rainforest. A motivating example of fragmented occlusion is object detection through foliage which is an essential requirement in green border surveillance. Instead of detection, this paper studies the simpler problem of localisation with persons. A neural network based method shows a precision larger than 90% on new image sequences capturing the problem. This is possible by two observations: (i) fragmented occlusion is unsolvable in single images without temporal information, and (ii) colour quantisation and colour swapping is essential to force the training of the network to learn from the available temporal information in the spatiotemporal data.
遮挡是物体识别中的一个基本问题。碎片遮挡比普通的局部遮挡更具挑战性,并且发生在森林等自然环境中。在计算机视觉中,关于碎片遮挡和目标识别的研究较少。有趣的是,在人类占领雨林的时代,人类的视觉系统进化到碎片遮挡,人类的视觉对这个问题的探索要多得多。碎片遮挡的一个激励例子是通过树叶进行目标检测,这是绿色边界监视的基本要求。本文研究的不是检测问题,而是更简单的带有人的定位问题。基于神经网络的方法对新图像序列的捕获精度大于90%。这可以通过两个观察:(i)在没有时间信息的单个图像中无法解决碎片遮挡,以及(ii)颜色量化和颜色交换对于迫使网络训练从时空数据中可用的时间信息中学习是必不可少的。
{"title":"Person Localisation under Fragmented Occlusion*","authors":"R. Pflugfelder, Jonas Auer","doi":"10.1109/AVSS52988.2021.9663791","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663791","url":null,"abstract":"Occlusion is a fundamental challenge in object recognition. Fragmented occlusion is much more challenging than ordinary partial occlusion and occurs in natural environments such as forests. Less is known in computer vision about fragmented occlusion and object recognition. Interestingly, human vision has far more explored this problem as the human visual system evolved to fragmented occlusion at the times when mankind occupied rainforest. A motivating example of fragmented occlusion is object detection through foliage which is an essential requirement in green border surveillance. Instead of detection, this paper studies the simpler problem of localisation with persons. A neural network based method shows a precision larger than 90% on new image sequences capturing the problem. This is possible by two observations: (i) fragmented occlusion is unsolvable in single images without temporal information, and (ii) colour quantisation and colour swapping is essential to force the training of the network to learn from the available temporal information in the spatiotemporal data.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129485455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DSA-PR: Discrete Soft Biometric Attribute-Based Person Retrieval in Surveillance Videos DSA-PR:基于离散软生物特征属性的监控视频人物检索
Hiren Galiyawala, M. Raval, Dhyey Savaliya
Physical characteristics or soft biometrics are visually perceptible aspects of a human body. Noticeable attributes like build, height, complexion, clothes help with the development of a human surveillance system. The paper proposes Discrete Soft biometric Attribute-based Person Retrieval (DSA-PR) from a video using height, gender, torso (clothes) color-1, torso color-2, and torso (clothes) type given in a textual query. The DSA-PR uses Mask R-CNN for semantic segmentation and ResNet-50 for attribute classification. Height is estimated using the Tsai camera calibration method. DSA-PR weighs attributes and fuses their probability to generate a final score for each detected person. The proposed approach achieves an average Intersection-over-Union (IoU) of 0.602 and retrieval with IoU $ge$ 0.4 is 0.808 over the AVSS challenge II dataset which works out to 5.8% and 2.02% above the state-of-the-art techniques respectively.
物理特征或软生物特征是人体在视觉上可感知的方面。诸如身材、身高、肤色、衣着等明显的特征有助于人类监控系统的发展。本文利用文本查询中给出的身高、性别、躯干(衣服)颜色-1、躯干(衣服)颜色-2和躯干(衣服)类型,提出了基于离散软生物特征属性的视频人物检索(DSA-PR)方法。DSA-PR使用Mask R-CNN进行语义分割,使用ResNet-50进行属性分类。高度用蔡氏相机标定法估计。DSA-PR衡量属性并融合它们的概率,为每个被检测到的人生成最终得分。与AVSS挑战II数据集相比,该方法实现了0.602的平均交叉点-联合(IoU), IoU $ge$ 0.4的检索值为0.808,分别比现有技术高出5.8%和2.02%。
{"title":"DSA-PR: Discrete Soft Biometric Attribute-Based Person Retrieval in Surveillance Videos","authors":"Hiren Galiyawala, M. Raval, Dhyey Savaliya","doi":"10.1109/AVSS52988.2021.9663775","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663775","url":null,"abstract":"Physical characteristics or soft biometrics are visually perceptible aspects of a human body. Noticeable attributes like build, height, complexion, clothes help with the development of a human surveillance system. The paper proposes Discrete Soft biometric Attribute-based Person Retrieval (DSA-PR) from a video using height, gender, torso (clothes) color-1, torso color-2, and torso (clothes) type given in a textual query. The DSA-PR uses Mask R-CNN for semantic segmentation and ResNet-50 for attribute classification. Height is estimated using the Tsai camera calibration method. DSA-PR weighs attributes and fuses their probability to generate a final score for each detected person. The proposed approach achieves an average Intersection-over-Union (IoU) of 0.602 and retrieval with IoU $ge$ 0.4 is 0.808 over the AVSS challenge II dataset which works out to 5.8% and 2.02% above the state-of-the-art techniques respectively.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127037759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Track Boosting and Synthetic Data Aided Drone Detection 航迹增强和合成数据辅助无人机探测
F. C. Akyon, Ogulcan Eryuksel, Kamil Anil Ozfuttu, S. Altinuc
As the usage of drones increases with lowered costs and improved drone technology, drone detection emerges as a vital object detection task. However, detecting distant drones under unfavorable conditions, namely weak contrast, long-range, low visibility, requires effective algorithms. Our method approaches the drone detection problem by fine-tuning a YOLOv5 model with real and synthetically generated data using a Kalman-based object tracker to boost detection confidence. Our results indicate that augmenting the real data with an optimal subset of synthetic data can increase the performance. Moreover, temporal information gathered by object tracking methods can increase performance further.
随着无人机使用的增加、成本的降低和无人机技术的改进,无人机检测成为一项重要的目标检测任务。然而,在对比度弱、距离远、能见度低等不利条件下检测远距离无人机,需要有效的算法。我们的方法通过使用基于卡尔曼的目标跟踪器对具有真实和综合生成数据的YOLOv5模型进行微调来提高检测置信度,从而解决无人机检测问题。我们的结果表明,用合成数据的最优子集扩充真实数据可以提高性能。此外,目标跟踪方法收集的时间信息可以进一步提高性能。
{"title":"Track Boosting and Synthetic Data Aided Drone Detection","authors":"F. C. Akyon, Ogulcan Eryuksel, Kamil Anil Ozfuttu, S. Altinuc","doi":"10.1109/AVSS52988.2021.9663759","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663759","url":null,"abstract":"As the usage of drones increases with lowered costs and improved drone technology, drone detection emerges as a vital object detection task. However, detecting distant drones under unfavorable conditions, namely weak contrast, long-range, low visibility, requires effective algorithms. Our method approaches the drone detection problem by fine-tuning a YOLOv5 model with real and synthetically generated data using a Kalman-based object tracker to boost detection confidence. Our results indicate that augmenting the real data with an optimal subset of synthetic data can increase the performance. Moreover, temporal information gathered by object tracking methods can increase performance further.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131636142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Bayesian Personalized-Wardrobe Model (BP-WM) for Long-Term Person Re-Identification 长期人物再识别的贝叶斯个性化衣橱模型(BP-WM)
K. Lee, Nishant Sankaran, D. Mohan, Kenny Davila, Dennis Fedorishin, S. Setlur, V. Govindaraju
Long-term surveillance applications often involve having to re-identify individuals over several days. The task is made even more challenging due to changes in appearance features such as clothing over a longitudinal time-span of days or longer. In this paper, we propose a novel approach called Bayesian Personalized-Wardrobe Model (BPWM) for long-term person re-identification (re-ID) by employing a Bayesian Personalized Ranking (BPR) for clothing features extracted from video sequences. In contrast to previous long-term person re-ID works, we exploit the fact that people typically choose their attire based on their personal preferences and that knowing a person’s chosen wardrobe can be used as a soft-biometric to distinguish identities in the long-term. We evaluate the performance of our proposed BP-WM on the extended Indoor Long-term Re-identification Wardrobe (ILRW) dataset. Experimental results show that our method achieves state-of-the-art performance and that BP-WM can be used as a reliable soft-biometric for person re-identification.
长期监视应用通常需要在几天内重新识别个人。由于在数天或更长时间内服装等外观特征的变化,这项任务变得更加具有挑战性。在本文中,我们提出了一种新的方法,称为贝叶斯个性化衣橱模型(BPWM),通过对从视频序列中提取的服装特征采用贝叶斯个性化排名(BPR)来进行长期的人物再识别(re-ID)。与之前的长期个人身份识别工作不同,我们利用了人们通常根据个人喜好选择着装的事实,了解一个人选择的服装可以作为一种软生物识别技术,用于长期区分身份。我们在扩展的室内长期重新识别衣柜(ILRW)数据集上评估了我们提出的BP-WM的性能。实验结果表明,我们的方法达到了最先进的性能,BP-WM可以作为一种可靠的软生物识别技术用于人的再识别。
{"title":"Bayesian Personalized-Wardrobe Model (BP-WM) for Long-Term Person Re-Identification","authors":"K. Lee, Nishant Sankaran, D. Mohan, Kenny Davila, Dennis Fedorishin, S. Setlur, V. Govindaraju","doi":"10.1109/AVSS52988.2021.9663830","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663830","url":null,"abstract":"Long-term surveillance applications often involve having to re-identify individuals over several days. The task is made even more challenging due to changes in appearance features such as clothing over a longitudinal time-span of days or longer. In this paper, we propose a novel approach called Bayesian Personalized-Wardrobe Model (BPWM) for long-term person re-identification (re-ID) by employing a Bayesian Personalized Ranking (BPR) for clothing features extracted from video sequences. In contrast to previous long-term person re-ID works, we exploit the fact that people typically choose their attire based on their personal preferences and that knowing a person’s chosen wardrobe can be used as a soft-biometric to distinguish identities in the long-term. We evaluate the performance of our proposed BP-WM on the extended Indoor Long-term Re-identification Wardrobe (ILRW) dataset. Experimental results show that our method achieves state-of-the-art performance and that BP-WM can be used as a reliable soft-biometric for person re-identification.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121310809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Far-Sighted BiSeNet V2 for Real-time Semantic Segmentation 面向实时语义分割的远视距BiSeNet V2
Te-Wei Chen, Yen-Ting Huang, W. Liao
Real-time semantic segmentation is one of the most investigated areas in the field of computer vision. In this paper, we focus on improving the performance of BiSeNet V2 by modifying its architecture. BiSeNet V2 is a two-branch segmentation model designed to extract semantic information from high-level feature maps and detailed information from low-level feature maps. The proposed enhancement remains lightweight and real-time with two main modifications: enlarging the contextual information and breaking the constraint caused by the fixed size of convolutional kernels. Specifically, additional modules known as dilated strip pooling (DSP) and dilated mixed pooling (DMP) are appended to the original BiSeNet V2 model to form the far-sighted BiSeNet V2. The proposed dilated strip pooling block and dilated mixed pooling module are adapted from modules proposed in SPNet, with extra branches composed of dilated convolutions to provide larger receptive fields. The proposed far-sighted BiSeNet V2 improves the accuracy to 76.0% from 73.4% with an FPS of 94 on Nvidia 1080Ti. Moreover, the proposed dilated mixed pooling block achieves the same performance as that of the model with two mixed pooling modules using only 2/3 of the number of parameters.
实时语义分割是计算机视觉领域研究最多的领域之一。在本文中,我们着重于通过修改BiSeNet V2的架构来提高其性能。BiSeNet V2是一种双分支分割模型,旨在从高级特征图中提取语义信息,从低级特征图中提取详细信息。本文提出的增强方法通过两个主要改进保持了轻量级和实时性:扩大上下文信息和打破卷积核固定大小的限制。具体来说,在原来的BiSeNet V2模型上增加了扩展条形池化(DSP)和扩展混合池化(DMP)等模块,形成了具有远见的BiSeNet V2。本文提出的扩展条形池化模块和扩展混合池化模块是在SPNet中提出的模块的基础上改进而来的,增加了由扩展卷积组成的分支以提供更大的接受域。提出的远视BiSeNet V2在Nvidia 1080Ti上将精度从73.4%提高到76.0%,FPS为94。此外,所提出的扩展混合池块仅使用2/3的参数数量就可以达到与具有两个混合池模块的模型相同的性能。
{"title":"Far-Sighted BiSeNet V2 for Real-time Semantic Segmentation","authors":"Te-Wei Chen, Yen-Ting Huang, W. Liao","doi":"10.1109/AVSS52988.2021.9663738","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663738","url":null,"abstract":"Real-time semantic segmentation is one of the most investigated areas in the field of computer vision. In this paper, we focus on improving the performance of BiSeNet V2 by modifying its architecture. BiSeNet V2 is a two-branch segmentation model designed to extract semantic information from high-level feature maps and detailed information from low-level feature maps. The proposed enhancement remains lightweight and real-time with two main modifications: enlarging the contextual information and breaking the constraint caused by the fixed size of convolutional kernels. Specifically, additional modules known as dilated strip pooling (DSP) and dilated mixed pooling (DMP) are appended to the original BiSeNet V2 model to form the far-sighted BiSeNet V2. The proposed dilated strip pooling block and dilated mixed pooling module are adapted from modules proposed in SPNet, with extra branches composed of dilated convolutions to provide larger receptive fields. The proposed far-sighted BiSeNet V2 improves the accuracy to 76.0% from 73.4% with an FPS of 94 on Nvidia 1080Ti. Moreover, the proposed dilated mixed pooling block achieves the same performance as that of the model with two mixed pooling modules using only 2/3 of the number of parameters.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122207330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
FlagDetSeg: Multi-Nation Flag Detection and Segmentation in the Wild FlagDetSeg:野外多国国旗检测和分割
Shou-Fang Wu, Ming-Ching Chang, Siwei Lyu, Cheng-Shih Wong, Ashok Pandey, Po-Chi Su
We present a simple and effective flag detection approach for multi-nation flag instance segmentation in-the-wild based on data augmentation and Mask-RCNN PointRend. To the best of our knowledge, this is the first multi-nation flag detection work incorporating recent deep object detection with code and dataset that will be released for public use. Flag images with binary segmentation are collected from public domain including the Open Image V6 and annotated for up to 225 countries. Additional flag images are generated from template flag images with cropping, warping, masking, and color adaption to hallucinate realistic-looking flag images for training and testing. Data augmentation is performed by fusing and transforming the segmented flags on top of natural image backgrounds to synthesize new images. To cope with the large variability of flags with the lack of authentic annotated flags, we combine the trained binary Mask-RCNN segmentation weights with the new multi-nation classifier for fine-tuning. For evaluation, the proposed model is compared with other popular detectors and instance segmentation methods including YOLACT++. Results show the efficacy of the proposed approach.
提出了一种基于数据增强和Mask-RCNN PointRend的简单有效的野外多国国旗实例分割方法。据我们所知,这是第一次将最近的深度目标检测与代码和数据集结合起来的多国国旗检测工作,这些代码和数据集将发布供公众使用。带有二值分割的旗帜图像是从包括Open Image V6在内的公共领域收集的,并为多达225个国家做了注释。附加的旗帜图像是从模板旗帜图像与裁剪,翘曲,掩蔽,和颜色适应产生幻觉的逼真的旗帜图像训练和测试。数据增强是通过在自然图像背景上融合和变换分割后的标志来合成新图像。为了应对标志的大可变性和缺乏真实的注释标志,我们将训练好的二进制Mask-RCNN分割权重与新的多民族分类器相结合进行微调。为了评估,将该模型与其他流行的检测器和实例分割方法(包括yolact++)进行了比较。结果表明了该方法的有效性。
{"title":"FlagDetSeg: Multi-Nation Flag Detection and Segmentation in the Wild","authors":"Shou-Fang Wu, Ming-Ching Chang, Siwei Lyu, Cheng-Shih Wong, Ashok Pandey, Po-Chi Su","doi":"10.1109/AVSS52988.2021.9663833","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663833","url":null,"abstract":"We present a simple and effective flag detection approach for multi-nation flag instance segmentation in-the-wild based on data augmentation and Mask-RCNN PointRend. To the best of our knowledge, this is the first multi-nation flag detection work incorporating recent deep object detection with code and dataset that will be released for public use. Flag images with binary segmentation are collected from public domain including the Open Image V6 and annotated for up to 225 countries. Additional flag images are generated from template flag images with cropping, warping, masking, and color adaption to hallucinate realistic-looking flag images for training and testing. Data augmentation is performed by fusing and transforming the segmented flags on top of natural image backgrounds to synthesize new images. To cope with the large variability of flags with the lack of authentic annotated flags, we combine the trained binary Mask-RCNN segmentation weights with the new multi-nation classifier for fine-tuning. For evaluation, the proposed model is compared with other popular detectors and instance segmentation methods including YOLACT++. Results show the efficacy of the proposed approach.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123810821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A comprehensive maritime benchmark dataset for detection, tracking and threat recognition 用于检测、跟踪和威胁识别的综合海事基准数据集
J. L. Patino, Tom Cane, J. Ferryman
This paper describes a new multimodal maritime dataset recorded using a multispectral suite of sensors, including AIS, GPS, radar, and visible and thermal cameras. The visible and thermal cameras are mounted on the vessel itself and surveillance is performed around the vessel in order to protect it from piracy at sea. The dataset corresponds to a series of acted scenarios which simulate attacks to the vessel by small, fast-moving boats (‘skiffs’). The scenarios are inspired by real piracy incidents at sea and present a range of technical challenges to the different stages in an automated surveillance system: object detection, object tracking, and event recognition (in this case, threats towards the vessel). The dataset can thus be employed for training and testing at several stages of a threat detection and classification system. We also present in this paper baseline results that can be used for benchmarking algorithms performing such tasks. This new dataset fills a lack of publicly available datasets for the development and testing of maritime surveillance applications.
本文描述了一个新的多模式海事数据集,使用多光谱传感器套件记录,包括AIS、GPS、雷达、可见光和热像仪。可见和热成像摄像机安装在船只本身,并在船只周围进行监视,以保护它免受海上海盗的侵害。该数据集对应于一系列模拟小型、快速移动的船只(“小艇”)攻击船只的场景。这些场景受到海上真实海盗事件的启发,并对自动监视系统的不同阶段提出了一系列技术挑战:目标检测、目标跟踪和事件识别(在这种情况下,是对船只的威胁)。因此,该数据集可以用于威胁检测和分类系统的几个阶段的训练和测试。我们还在本文中提供了可用于执行此类任务的基准算法的基线结果。这个新的数据集填补了开发和测试海上监视应用的公开可用数据集的不足。
{"title":"A comprehensive maritime benchmark dataset for detection, tracking and threat recognition","authors":"J. L. Patino, Tom Cane, J. Ferryman","doi":"10.1109/AVSS52988.2021.9663739","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663739","url":null,"abstract":"This paper describes a new multimodal maritime dataset recorded using a multispectral suite of sensors, including AIS, GPS, radar, and visible and thermal cameras. The visible and thermal cameras are mounted on the vessel itself and surveillance is performed around the vessel in order to protect it from piracy at sea. The dataset corresponds to a series of acted scenarios which simulate attacks to the vessel by small, fast-moving boats (‘skiffs’). The scenarios are inspired by real piracy incidents at sea and present a range of technical challenges to the different stages in an automated surveillance system: object detection, object tracking, and event recognition (in this case, threats towards the vessel). The dataset can thus be employed for training and testing at several stages of a threat detection and classification system. We also present in this paper baseline results that can be used for benchmarking algorithms performing such tasks. This new dataset fills a lack of publicly available datasets for the development and testing of maritime surveillance applications.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124948673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Sequential Visual Appearance Transformation for Online Multi-Object Tracking 学习在线多目标跟踪的顺序视觉外观变换
Itziar Sagastiberri, Noud van de Gevel, Jorge García, O. Otaegui
Recent online multi-object tracking approaches combine single object trackers and affinity networks with the aim of capturing object motions and associating objects by using their appearance, respectively. Those affinity networks often build on complex feature representations (re-ID embeddings) or sophisticated scoring functions, whose objective is to match current detections with previous tracklets, known as short-term appearance information. However, drastic appearance changes during the object trajectory acquired by omnidirectional cameras causes a degradation of the performance since affinity networks ignore the variation of the long-term appearance information. In this paper, we deal with the appearance changes in a coherent way by proposing a novel affinity model which is able to predict the new visual appearance of an object by considering the long-term appearance information. Our affinity model includes a convolutional LSTM encoder-decoder architecture to learn the space-time appearance transformation metric between consecutive re-ID feature representations along the object trajectory. Experimental results show that it achieves promising performance on several multi-object tracking datasets containing omnidirectional cameras.
最近的在线多目标跟踪方法将单目标跟踪器和亲和网络相结合,目的是分别捕获目标运动和利用其外观将目标关联起来。这些亲和网络通常建立在复杂的特征表示(重新标识嵌入)或复杂的评分函数上,其目标是将当前检测与以前的轨迹(称为短期外观信息)相匹配。然而,由于亲和网络忽略了长期外观信息的变化,在全向相机获取的目标轨迹中,剧烈的外观变化会导致性能下降。在本文中,我们提出了一种新的亲和性模型,该模型能够通过考虑长期的外观信息来预测物体的新视觉外观,从而以连贯的方式处理外观变化。我们的亲和模型包括一个卷积LSTM编码器-解码器架构,用于学习沿目标轨迹连续re-ID特征表示之间的时空外观转换度量。实验结果表明,该方法在包含全向相机的多个多目标跟踪数据集上取得了良好的性能。
{"title":"Learning Sequential Visual Appearance Transformation for Online Multi-Object Tracking","authors":"Itziar Sagastiberri, Noud van de Gevel, Jorge García, O. Otaegui","doi":"10.1109/AVSS52988.2021.9663809","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663809","url":null,"abstract":"Recent online multi-object tracking approaches combine single object trackers and affinity networks with the aim of capturing object motions and associating objects by using their appearance, respectively. Those affinity networks often build on complex feature representations (re-ID embeddings) or sophisticated scoring functions, whose objective is to match current detections with previous tracklets, known as short-term appearance information. However, drastic appearance changes during the object trajectory acquired by omnidirectional cameras causes a degradation of the performance since affinity networks ignore the variation of the long-term appearance information. In this paper, we deal with the appearance changes in a coherent way by proposing a novel affinity model which is able to predict the new visual appearance of an object by considering the long-term appearance information. Our affinity model includes a convolutional LSTM encoder-decoder architecture to learn the space-time appearance transformation metric between consecutive re-ID feature representations along the object trajectory. Experimental results show that it achieves promising performance on several multi-object tracking datasets containing omnidirectional cameras.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121785657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Moving-Object-Aware Anomaly Detection in Surveillance Videos 监控视频中的运动对象感知异常检测
Chun-Lung Yang, Tsung-Hsuan Wu, S. Lai
Video anomaly detection plays a crucial role in automatically detecting abnormal actions or events from surveillance video, which can help to protect public safety. Deep learning techniques have been extensively employed and achieved excellent anomaly detection results recently. However, previous image-reconstruction-based models did not fully exploit foreground object regions for the video anomaly detection. Some recent works applied pre-trained object detectors to provide local context in the video surveillance scenario for anomaly detection. Nevertheless, these methods require prior knowledge of object types for the anomaly which is somewhat contradictory to the problem setting of unsupervised anomaly detection. In this paper, we propose a novel framework based on learning the moving-object feature prediction based on a convolutional autoencoder architecture. We train our anomaly detector to be aware of moving-object regions in a scene without using an object detector or requiring prior knowledge of specific object classes for the anomaly. The appearance and motion features in moving objects regions provide comprehensive information of moving foreground objects for unsupervised learning of video anomaly detector. Besides, the proposed latent representation learning scheme encourages the convolutional autoencoder model to learn a more convergent latent representation for normal training data, while anomalous data exhibits quite different representations. We also propose a novel anomaly scoring method based on the feature prediction errors of moving foreground object regions and the latent representation regularity. Our experimental results demonstrate that the proposed approach achieves competitive results compared with SOTA methods on three public datasets for video anomaly detection.
视频异常检测在自动检测监控视频中的异常动作或事件中起着至关重要的作用,有助于保障公共安全。近年来,深度学习技术得到了广泛的应用,并取得了良好的异常检测效果。然而,以往基于图像重建的模型并没有充分利用前景目标区域进行视频异常检测。最近的一些工作应用预训练的对象检测器在视频监控场景中为异常检测提供局部上下文。然而,这些方法需要对异常对象类型的先验知识,这与无监督异常检测的问题设置有些矛盾。本文提出了一种基于卷积自编码器结构的运动目标特征预测学习框架。我们训练我们的异常检测器来感知场景中的移动物体区域,而不需要使用对象检测器或需要对异常的特定对象类的先验知识。运动目标区域的外观和运动特征为视频异常检测器的无监督学习提供了全面的运动前景目标信息。此外,所提出的潜在表征学习方案鼓励卷积自编码器模型对正常训练数据学习更收敛的潜在表征,而异常数据则表现出完全不同的表征。我们还提出了一种基于运动前景目标区域的特征预测误差和潜在表示规律的异常评分方法。实验结果表明,该方法与SOTA方法在三个公共数据集上的视频异常检测效果相当。
{"title":"Moving-Object-Aware Anomaly Detection in Surveillance Videos","authors":"Chun-Lung Yang, Tsung-Hsuan Wu, S. Lai","doi":"10.1109/AVSS52988.2021.9663742","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663742","url":null,"abstract":"Video anomaly detection plays a crucial role in automatically detecting abnormal actions or events from surveillance video, which can help to protect public safety. Deep learning techniques have been extensively employed and achieved excellent anomaly detection results recently. However, previous image-reconstruction-based models did not fully exploit foreground object regions for the video anomaly detection. Some recent works applied pre-trained object detectors to provide local context in the video surveillance scenario for anomaly detection. Nevertheless, these methods require prior knowledge of object types for the anomaly which is somewhat contradictory to the problem setting of unsupervised anomaly detection. In this paper, we propose a novel framework based on learning the moving-object feature prediction based on a convolutional autoencoder architecture. We train our anomaly detector to be aware of moving-object regions in a scene without using an object detector or requiring prior knowledge of specific object classes for the anomaly. The appearance and motion features in moving objects regions provide comprehensive information of moving foreground objects for unsupervised learning of video anomaly detector. Besides, the proposed latent representation learning scheme encourages the convolutional autoencoder model to learn a more convergent latent representation for normal training data, while anomalous data exhibits quite different representations. We also propose a novel anomaly scoring method based on the feature prediction errors of moving foreground object regions and the latent representation regularity. Our experimental results demonstrate that the proposed approach achieves competitive results compared with SOTA methods on three public datasets for video anomaly detection.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122776406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1