首页 > 最新文献

2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)最新文献

英文 中文
Benefits of Synthetically Pre-trained Depth-Prediction Networks for Indoor/Outdoor Image Classification 综合预训练深度预测网络在室内外图像分类中的应用
Pub Date : 2023-01-01 DOI: 10.1109/WACVW58289.2023.00040
Ke Lin, Irene Cho, Ameya S. Walimbe, Bryan A. Zamora, Alex Rich, Sirius Z. Zhang, Tobias Höllerer
Ground truth depth information is necessary for many computer vision tasks. Collecting this information is chal-lenging, especially for outdoor scenes. In this work, we propose utilizing single-view depth prediction neural networks pre-trained on synthetic scenes to generate relative depth, which we call pseudo-depth. This approach is a less expen-sive option as the pre-trained neural network obtains ac-curate depth information from synthetic scenes, which does not require any expensive sensor equipment and takes less time. We measure the usefulness of pseudo-depth from pre-trained neural networks by training indoor/outdoor binary classifiers with and without it. We also compare the difference in accuracy between using pseudo-depth and ground truth depth. We experimentally show that adding pseudo-depth to training achieves a 4.4% performance boost over the non-depth baseline model on DIODE, a large stan-dard test dataset, retaining 63.8% of the performance boost achieved from training a classifier on RGB and ground truth depth. It also boosts performance by 1.3% on another dataset, SUN397, for which ground truth depth is not avail-able. Our result shows that it is possible to take information obtained from a model pre-trained on synthetic scenes and successfully apply it beyond the synthetic domain to real-world data.
地面真实深度信息是许多计算机视觉任务所必需的。收集这些信息是具有挑战性的,特别是对于户外场景。在这项工作中,我们建议使用在合成场景上预训练的单视图深度预测神经网络来生成相对深度,我们称之为伪深度。这种方法成本较低,因为预训练的神经网络可以从合成场景中获得准确的深度信息,不需要任何昂贵的传感器设备,而且耗时更短。我们通过训练室内/室外二元分类器来衡量预训练神经网络的伪深度的有用性。我们还比较了伪深度和地面真深度在精度上的差异。我们通过实验表明,在二极管(一个大型标准测试数据集)上,将伪深度添加到训练中可以比非深度基线模型提高4.4%的性能,保留了在RGB和ground truth深度上训练分类器所获得的63.8%的性能提升。在另一个数据集SUN397上,它的性能也提高了1.3%,该数据集的地面真值深度是不可用的。我们的结果表明,可以从合成场景预训练的模型中获取信息,并成功地将其应用于合成领域之外的真实世界数据。
{"title":"Benefits of Synthetically Pre-trained Depth-Prediction Networks for Indoor/Outdoor Image Classification","authors":"Ke Lin, Irene Cho, Ameya S. Walimbe, Bryan A. Zamora, Alex Rich, Sirius Z. Zhang, Tobias Höllerer","doi":"10.1109/WACVW58289.2023.00040","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00040","url":null,"abstract":"Ground truth depth information is necessary for many computer vision tasks. Collecting this information is chal-lenging, especially for outdoor scenes. In this work, we propose utilizing single-view depth prediction neural networks pre-trained on synthetic scenes to generate relative depth, which we call pseudo-depth. This approach is a less expen-sive option as the pre-trained neural network obtains ac-curate depth information from synthetic scenes, which does not require any expensive sensor equipment and takes less time. We measure the usefulness of pseudo-depth from pre-trained neural networks by training indoor/outdoor binary classifiers with and without it. We also compare the difference in accuracy between using pseudo-depth and ground truth depth. We experimentally show that adding pseudo-depth to training achieves a 4.4% performance boost over the non-depth baseline model on DIODE, a large stan-dard test dataset, retaining 63.8% of the performance boost achieved from training a classifier on RGB and ground truth depth. It also boosts performance by 1.3% on another dataset, SUN397, for which ground truth depth is not avail-able. Our result shows that it is possible to take information obtained from a model pre-trained on synthetic scenes and successfully apply it beyond the synthetic domain to real-world data.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116758902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Transformer-based Late-Fusion Mechanism for Fine-Grained Object Recognition in Videos 基于变压器的视频细粒度目标识别后期融合机制
Pub Date : 2023-01-01 DOI: 10.1109/WACVW58289.2023.00015
Jannik Koch, Stefan Wolf, Jürgen Beyerer
Fine-grained image classification is limited by only considering a single view while in many cases, like surveillance, a whole video exists which provides multiple perspectives. However, the potential of videos is mostly considered in the context of action recognition while fine-grained object recognition is rarely considered as an application for video classification. This leads to recent video classification architectures being inappropriate for the task of fine-grained object recognition. We propose a novel, Transformer-based late-fusion mechanism for fine-grained video classification. Our approach achieves superior results to both early-fusion mechanisms, like the Video Swin Transformer, and a simple consensus-based late-fusion baseline with a modern Swin Transformer backbone. Additionally, we achieve improved efficiency, as our results show a high increase in accuracy with only a slight increase in computational complexity. Code is available at: https://github.com/wolfstefan/tlf.
细粒度图像分类受限于只考虑单个视图,而在许多情况下,如监控,整个视频存在,提供多个视角。然而,视频的潜力大多是在动作识别的背景下考虑的,而细粒度对象识别很少被认为是视频分类的应用。这导致最近的视频分类架构不适合细粒度对象识别的任务。我们提出了一种新颖的,基于变压器的后期融合机制,用于细粒度视频分类。我们的方法在早期融合机制(如Video Swin Transformer)和简单的基于共识的晚期融合基线(带有现代Swin Transformer骨干)中都取得了优异的结果。此外,我们还提高了效率,因为我们的结果显示,在计算复杂性略有增加的情况下,准确性有了很大的提高。代码可从https://github.com/wolfstefan/tlf获得。
{"title":"A Transformer-based Late-Fusion Mechanism for Fine-Grained Object Recognition in Videos","authors":"Jannik Koch, Stefan Wolf, Jürgen Beyerer","doi":"10.1109/WACVW58289.2023.00015","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00015","url":null,"abstract":"Fine-grained image classification is limited by only considering a single view while in many cases, like surveillance, a whole video exists which provides multiple perspectives. However, the potential of videos is mostly considered in the context of action recognition while fine-grained object recognition is rarely considered as an application for video classification. This leads to recent video classification architectures being inappropriate for the task of fine-grained object recognition. We propose a novel, Transformer-based late-fusion mechanism for fine-grained video classification. Our approach achieves superior results to both early-fusion mechanisms, like the Video Swin Transformer, and a simple consensus-based late-fusion baseline with a modern Swin Transformer backbone. Additionally, we achieve improved efficiency, as our results show a high increase in accuracy with only a slight increase in computational complexity. Code is available at: https://github.com/wolfstefan/tlf.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121309808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Subjective and Objective Video Quality Assessment of High Dynamic Range Sports Content 高动态范围运动内容的主客观视频质量评价
Pub Date : 2023-01-01 DOI: 10.1109/WACVW58289.2023.00062
Zaixi Shang, Yixu Chen, Yongjun Wu, Hai Wei, S. Sethuraman
High Dynamic Range (HDR) video streaming has be-come more popular because of the faithful color and bright-ness presentation. However, the live streaming of HDR, especially of sports content, has unique challenges, as it was usually encoded and distributed in real-time without the post-production workflow. A set of unique problems that occurs only in live streaming, e.g. resolution and frame rate crossover, intra-frame pulsing video quality defects, complex relationship between rate-control mode and video quality, are more salient when the videos are streamed in HDR format. These issues are typically ignored by other subjective databases, disregard the fact that they have a sig-nificant impact on the perceived quality of the videos. In this paper, we present a large-scale HDR video quality dataset for sports content that includes the above mentioned important issues in live streaming, and a method of merging multi-ple datasets using anchor videos. We also benchmarked ex-isting video quality metrics on the new dataset, particularly over the novel scopes included in the database, to evaluate the effectiveness and efficiency of the existing models. We found that despite the strong overall performance over the entire database, most of the tested models perform poorly when predicting human preference for various encoding pa-rameters, such as frame rate and adaptive quantization.
高动态范围(HDR)视频流由于其忠实的色彩和亮度呈现而变得越来越流行。然而,HDR的直播,特别是体育内容的直播,面临着独特的挑战,因为它通常是实时编码和分发的,没有后期制作工作流程。分辨率和帧率交叉、帧内脉冲视频质量缺陷、速率控制方式与视频质量之间的复杂关系等一系列只有在直播中才会出现的问题,在以HDR格式进行视频直播时更加突出。这些问题通常被其他主观数据库所忽略,忽略了它们对视频的感知质量有重大影响的事实。在本文中,我们提出了一个大规模的体育内容HDR视频质量数据集,该数据集包含了直播中上述重要问题,以及一种使用锚视频合并多个数据集的方法。我们还在新数据集上对现有视频质量指标进行基准测试,特别是在数据库中包含的新范围内,以评估现有模型的有效性和效率。我们发现,尽管整个数据库的整体性能很强,但大多数测试模型在预测人类对各种编码参数(如帧率和自适应量化)的偏好时表现不佳。
{"title":"Subjective and Objective Video Quality Assessment of High Dynamic Range Sports Content","authors":"Zaixi Shang, Yixu Chen, Yongjun Wu, Hai Wei, S. Sethuraman","doi":"10.1109/WACVW58289.2023.00062","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00062","url":null,"abstract":"High Dynamic Range (HDR) video streaming has be-come more popular because of the faithful color and bright-ness presentation. However, the live streaming of HDR, especially of sports content, has unique challenges, as it was usually encoded and distributed in real-time without the post-production workflow. A set of unique problems that occurs only in live streaming, e.g. resolution and frame rate crossover, intra-frame pulsing video quality defects, complex relationship between rate-control mode and video quality, are more salient when the videos are streamed in HDR format. These issues are typically ignored by other subjective databases, disregard the fact that they have a sig-nificant impact on the perceived quality of the videos. In this paper, we present a large-scale HDR video quality dataset for sports content that includes the above mentioned important issues in live streaming, and a method of merging multi-ple datasets using anchor videos. We also benchmarked ex-isting video quality metrics on the new dataset, particularly over the novel scopes included in the database, to evaluate the effectiveness and efficiency of the existing models. We found that despite the strong overall performance over the entire database, most of the tested models perform poorly when predicting human preference for various encoding pa-rameters, such as frame rate and adaptive quantization.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114523064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Sonar Image Composition for Semantic Segmentation Using Machine Learning 基于机器学习的声纳图像合成语义分割
Pub Date : 2023-01-01 DOI: 10.1109/WACVW58289.2023.00031
William Ard, Corina Barbalata
This paper presents an approach for merging side scan sonar data and bathymetry information for the benefit of improved automatic shipwreck identification. The steps to combine a raw side-scan sonar image with a 2D relief map into a new composite RGB image are presented in detail, and a supervised image segmentation approach via the U-Net architecture is implemented to identify shipwrecks. To validate the effectiveness of the approach, two datasets were created from shipwreck surveys: one using side-scan only, and one using the new composite RGB images. The U-Net model was trained and tested on each dataset, and the results were compared. The test results show a mean accuracy which is 15% higher for the case where the RGB composition is used when compared with the model trained and tested with the side-scan sonar only dataset. Furthermore, the mean intersection over union (IoU) shows an increase of 9.5% using the RGB composition model.
本文提出了一种融合侧扫声纳数据和测深信息的方法,以提高海难自动识别能力。详细介绍了将原始侧扫声纳图像与2D地形图结合成新的复合RGB图像的步骤,并实现了通过U-Net架构的监督图像分割方法来识别沉船。为了验证该方法的有效性,从沉船调查中创建了两个数据集:一个仅使用侧面扫描,另一个使用新的复合RGB图像。U-Net模型在每个数据集上进行训练和测试,并对结果进行比较。测试结果显示,与仅使用侧扫声纳数据集训练和测试的模型相比,使用RGB成分的情况下,平均精度高出15%。此外,使用RGB组合模型,平均交点比联合(IoU)增加了9.5%。
{"title":"Sonar Image Composition for Semantic Segmentation Using Machine Learning","authors":"William Ard, Corina Barbalata","doi":"10.1109/WACVW58289.2023.00031","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00031","url":null,"abstract":"This paper presents an approach for merging side scan sonar data and bathymetry information for the benefit of improved automatic shipwreck identification. The steps to combine a raw side-scan sonar image with a 2D relief map into a new composite RGB image are presented in detail, and a supervised image segmentation approach via the U-Net architecture is implemented to identify shipwrecks. To validate the effectiveness of the approach, two datasets were created from shipwreck surveys: one using side-scan only, and one using the new composite RGB images. The U-Net model was trained and tested on each dataset, and the results were compared. The test results show a mean accuracy which is 15% higher for the case where the RGB composition is used when compared with the model trained and tested with the side-scan sonar only dataset. Furthermore, the mean intersection over union (IoU) shows an increase of 9.5% using the RGB composition model.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128945986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting Temporal Context for Tiny Object Detection 利用时间背景进行微小目标检测
Pub Date : 2023-01-01 DOI: 10.1109/WACVW58289.2023.00013
Christof W. Corsel, Michel van Lier, L. Kampmeijer, N. Boehrer, E. Bakker
In surveillance applications, the detection of tiny, low-resolution objects remains a challenging task. Most deep learning object detection methods rely on appearance features extracted from still images and struggle to accurately detect tiny objects. In this paper, we address the problem of tiny object detection for real-time surveillance applications, by exploiting the temporal context available in video sequences recorded from static cameras. We present a spatiotemporal deep learning model based on YOLOv5 that exploits temporal context by processing sequences of frames at once. The model drastically improves the identification of tiny moving objects in the aerial surveillance and person detection domains, without degrading the detection of stationary objects. Additionally, a two-stream architecture that uses frame-difference as explicit motion information was proposed, further improving the detection of moving objects down to $4times 4$ pixels in size. Our approaches outperform previous work on the public WPAFB WAMI dataset, as well as surpassing previous work on an embedded NVIDIA Jetson Nano deployment in both accuracy and inference speed. We conclude that the addition of temporal context to deep learning object detectors is an effective approach to drastically improve the detection of tiny moving objects in static videos.
在监视应用中,微小、低分辨率物体的检测仍然是一项具有挑战性的任务。大多数深度学习对象检测方法依赖于从静止图像中提取的外观特征,难以准确检测微小物体。在本文中,我们通过利用静态摄像机记录的视频序列中可用的时间上下文,解决了实时监控应用中微小物体检测的问题。我们提出了一个基于YOLOv5的时空深度学习模型,该模型通过一次处理帧序列来利用时间上下文。该模型在不降低对静止物体的检测的前提下,极大地提高了对空中监视和人员检测领域中微小运动物体的识别。此外,提出了一种使用帧差作为显式运动信息的双流架构,进一步提高了对运动物体的检测,其大小降至$4 × 4$像素。我们的方法在准确性和推理速度上超过了以前在公共WPAFB WAMI数据集上的工作,也超过了以前在嵌入式NVIDIA Jetson Nano部署上的工作。我们得出结论,在深度学习对象检测器中添加时间上下文是一种有效的方法,可以大大提高静态视频中微小运动物体的检测。
{"title":"Exploiting Temporal Context for Tiny Object Detection","authors":"Christof W. Corsel, Michel van Lier, L. Kampmeijer, N. Boehrer, E. Bakker","doi":"10.1109/WACVW58289.2023.00013","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00013","url":null,"abstract":"In surveillance applications, the detection of tiny, low-resolution objects remains a challenging task. Most deep learning object detection methods rely on appearance features extracted from still images and struggle to accurately detect tiny objects. In this paper, we address the problem of tiny object detection for real-time surveillance applications, by exploiting the temporal context available in video sequences recorded from static cameras. We present a spatiotemporal deep learning model based on YOLOv5 that exploits temporal context by processing sequences of frames at once. The model drastically improves the identification of tiny moving objects in the aerial surveillance and person detection domains, without degrading the detection of stationary objects. Additionally, a two-stream architecture that uses frame-difference as explicit motion information was proposed, further improving the detection of moving objects down to $4times 4$ pixels in size. Our approaches outperform previous work on the public WPAFB WAMI dataset, as well as surpassing previous work on an embedded NVIDIA Jetson Nano deployment in both accuracy and inference speed. We conclude that the addition of temporal context to deep learning object detectors is an effective approach to drastically improve the detection of tiny moving objects in static videos.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133556628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Attentive Sensing for Long-Range Face Recognition 远程人脸识别的细心感知
Pub Date : 2023-01-01 DOI: 10.1109/WACVW58289.2023.00068
Hélio Perroni Filho, Aleksander Trajcevski, K. Bhargava, Nizwa Javed, J. Elder
To be effective, a social robot must reliably detect and recognize people in all visual directions and in both near and far fields. A major challenge is the resolution/field-of-view tradeoff; here we propose and evaluate a novel attentive sensing solution. Panoramic low-resolution pre-attentive sensing is provided by an array of wide-angle cameras, while attentive sensing is achieved with a high-resolution, narrow field-of-view camera and a mirror-based gaze deflection system. Quantitative evaluation on a novel dataset shows that this attentive sensing strategy can yield good panoramic face recognition performance in the wild out to distances of ~35m.
为了提高效率,社交机器人必须可靠地检测和识别所有视觉方向和远近领域的人。一个主要的挑战是分辨率/视野的权衡;在这里,我们提出并评估了一种新的关注传感解决方案。全景低分辨率预关注传感由一系列广角摄像头提供,而关注传感则由高分辨率窄视场摄像头和基于反光镜的凝视偏转系统实现。对一个新数据集的定量评估表明,这种专注的感知策略可以在~35m的距离内产生良好的全景人脸识别性能。
{"title":"Attentive Sensing for Long-Range Face Recognition","authors":"Hélio Perroni Filho, Aleksander Trajcevski, K. Bhargava, Nizwa Javed, J. Elder","doi":"10.1109/WACVW58289.2023.00068","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00068","url":null,"abstract":"To be effective, a social robot must reliably detect and recognize people in all visual directions and in both near and far fields. A major challenge is the resolution/field-of-view tradeoff; here we propose and evaluate a novel attentive sensing solution. Panoramic low-resolution pre-attentive sensing is provided by an array of wide-angle cameras, while attentive sensing is achieved with a high-resolution, narrow field-of-view camera and a mirror-based gaze deflection system. Quantitative evaluation on a novel dataset shows that this attentive sensing strategy can yield good panoramic face recognition performance in the wild out to distances of ~35m.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132180218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge-based Visual Context-Aware Framework for Applications in Robotic Services 基于知识的视觉上下文感知框架在机器人服务中的应用
Pub Date : 2023-01-01 DOI: 10.1109/WACVW58289.2023.00012
Doosoo Chang, Bohyung Han
Recently, context awareness in vision technologies has become essential with the increasing demand for real-world applications, such as surveillance systems and service robots. However, implementing context awareness with an end-to-end learning-based system limits its extensibility and performance because the context varies in scope and type, but related data are mostly rare. To mitigate these limitations, we propose a visual context-aware frame-work composed of independent processes of visual perception and context inference. The framework performs logical inferences using the abstracted visual information of recognized objects and relationships based on our knowledge representation. We demonstrate the scalability and utility of the proposed framework through experimental cases that present stepwise context inferences applied to robotic services in different domains.
最近,随着监控系统和服务机器人等实际应用的需求不断增加,视觉技术中的上下文感知变得至关重要。然而,使用端到端基于学习的系统实现上下文感知限制了它的可扩展性和性能,因为上下文的范围和类型各不相同,但相关数据却很少。为了减轻这些限制,我们提出了一个由视觉感知和上下文推理独立过程组成的视觉上下文感知框架。该框架根据我们的知识表示,利用已识别对象和关系的抽象视觉信息进行逻辑推理。我们通过实验案例展示了所提出框架的可扩展性和实用性,这些实验案例将逐步上下文推理应用于不同领域的机器人服务。
{"title":"Knowledge-based Visual Context-Aware Framework for Applications in Robotic Services","authors":"Doosoo Chang, Bohyung Han","doi":"10.1109/WACVW58289.2023.00012","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00012","url":null,"abstract":"Recently, context awareness in vision technologies has become essential with the increasing demand for real-world applications, such as surveillance systems and service robots. However, implementing context awareness with an end-to-end learning-based system limits its extensibility and performance because the context varies in scope and type, but related data are mostly rare. To mitigate these limitations, we propose a visual context-aware frame-work composed of independent processes of visual perception and context inference. The framework performs logical inferences using the abstracted visual information of recognized objects and relationships based on our knowledge representation. We demonstrate the scalability and utility of the proposed framework through experimental cases that present stepwise context inferences applied to robotic services in different domains.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131756187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Observation Centric and Central Distance Recovery for Athlete Tracking 观察中心和中心距离恢复运动员跟踪
Pub Date : 2023-01-01 DOI: 10.1109/WACVW58289.2023.00050
Hsiang-Wei Huang, Cheng-Yen Yang, Samartha Ramkumar, Chung-I Huang, Jenq-Neng Hwang, Pyong-Kun Kim, Kyoungoh Lee, Kwang-Ik Kim
Multi-Object Tracking on humans has improved rapidly with the development of object detection and re-identification algorithms. However, multi-actor tracking over humans with similar appearance and non-linear movement can still be very challenging even for the state-of-the-art tracking algorithm. Current motion-based tracking algorithms often use Kalman Filter to predict the motion of an object, however, its linear movement assumption can cause failure in tracking when the target is not moving linearly. And for multi-player tracking over the sports field, because the players on the same team are usually wearing the same color of jersey, making re-identification even harder both in the short term and long term in the tracking process. In this work, we proposed a motion-based tracking algorithm and three post-processing pipelines for three sports including basketball, football, and volleyball, we successfully handle the tracking of the non-linear movement of players on the sports fields. Experimental results achieved a HOTA of 73.968 on the testing set of ECCV DeeperAction Challenge SportsMOT Dataset and a HOTA of 49.97 on the McGill HPTDataset, showing the effectiveness of the proposed framework and its robustness in different sports including basketball, football, hockey, and volleyball.
随着目标检测和再识别算法的发展,人体多目标跟踪技术得到了迅速发展。然而,即使对于最先进的跟踪算法,对具有相似外观和非线性运动的人类进行多角色跟踪仍然是非常具有挑战性的。当前基于运动的跟踪算法通常使用卡尔曼滤波来预测目标的运动,但其线性运动假设会导致目标在非线性运动时跟踪失败。对于运动场上的多人追踪,由于同一队的球员通常穿着相同颜色的球衣,这使得在追踪过程中的短期和长期重新识别变得更加困难。本文针对篮球、足球、排球三种运动,提出了一种基于运动的跟踪算法和三条后处理流水线,成功地处理了运动员在运动场上的非线性运动跟踪。实验结果在ECCV DeeperAction Challenge SportsMOT Dataset测试集上的HOTA为73.968,在McGill HPTDataset测试集上的HOTA为49.97,表明了该框架在篮球、足球、曲棍球和排球等不同运动项目上的有效性和鲁棒性。
{"title":"Observation Centric and Central Distance Recovery for Athlete Tracking","authors":"Hsiang-Wei Huang, Cheng-Yen Yang, Samartha Ramkumar, Chung-I Huang, Jenq-Neng Hwang, Pyong-Kun Kim, Kyoungoh Lee, Kwang-Ik Kim","doi":"10.1109/WACVW58289.2023.00050","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00050","url":null,"abstract":"Multi-Object Tracking on humans has improved rapidly with the development of object detection and re-identification algorithms. However, multi-actor tracking over humans with similar appearance and non-linear movement can still be very challenging even for the state-of-the-art tracking algorithm. Current motion-based tracking algorithms often use Kalman Filter to predict the motion of an object, however, its linear movement assumption can cause failure in tracking when the target is not moving linearly. And for multi-player tracking over the sports field, because the players on the same team are usually wearing the same color of jersey, making re-identification even harder both in the short term and long term in the tracking process. In this work, we proposed a motion-based tracking algorithm and three post-processing pipelines for three sports including basketball, football, and volleyball, we successfully handle the tracking of the non-linear movement of players on the sports fields. Experimental results achieved a HOTA of 73.968 on the testing set of ECCV DeeperAction Challenge SportsMOT Dataset and a HOTA of 49.97 on the McGill HPTDataset, showing the effectiveness of the proposed framework and its robustness in different sports including basketball, football, hockey, and volleyball.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127171578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Multi-IVE: Privacy Enhancement of Multiple Soft-Biometrics in Face Embeddings Multi-IVE:人脸嵌入中多重软生物特征的隐私增强
Pub Date : 2023-01-01 DOI: 10.1109/WACVW58289.2023.00036
Pietro Melzi, H. O. Shahreza, C. Rathgeb, Rubén Tolosana, R. Vera-Rodríguez, Julian Fierrez, S. Marcel, C. Busch
This study focuses on the protection of soft-biometric at-tributes related to the demographic information of individ-uals that can be extracted from compact representations of face images, called embeddings. We consider a state-of-the-art technology for soft-biometric privacy enhancement, Incremental Variable Elimination (IVE), and propose Multi-IVE, a new method based on IVE to secure multiple soft-biometric attributes simultaneously. Several aspects of this technology are investigated, proposing different approaches to effectively identify and discard multiple soft-biometric at-tributes contained in face embeddings. In particular, we consider a domain transformation using Principle component Analysis (PCA), and apply IVE in the PCA domain. A complete analysis of the proposed Multi-IVE algorithm is carried out studying the embeddings generated by state-of-the-art face feature extractors, predicting soft-biometric attributes contained within them with multiple machine learning classifiers, and providing a cross-database evaluation. The results obtained show the possibility to simultane-ously secure multiple soft-biometric attributes and support the application of embedding domain transformations be-fore addressing the enhancement of soft-biometric privacy.
本研究的重点是保护与个人人口统计信息相关的软生物特征属性,这些属性可以从面部图像的紧凑表示(称为嵌入)中提取。我们考虑了一种最先进的软生物特征隐私增强技术——增量变量消除(IVE),并提出了一种基于IVE的同时保护多个软生物特征属性的新方法Multi-IVE。研究了该技术的几个方面,提出了有效识别和丢弃人脸嵌入中包含的多个软生物特征属性的不同方法。特别地,我们考虑了一个使用主成分分析(PCA)的域变换,并将IVE应用于PCA域。对所提出的Multi-IVE算法进行了完整的分析,研究了由最先进的人脸特征提取器生成的嵌入,使用多个机器学习分类器预测其中包含的软生物特征属性,并提供了跨数据库评估。结果表明,在解决增强软生物特征隐私之前,可以同时保护多个软生物特征属性,并支持嵌入域变换的应用。
{"title":"Multi-IVE: Privacy Enhancement of Multiple Soft-Biometrics in Face Embeddings","authors":"Pietro Melzi, H. O. Shahreza, C. Rathgeb, Rubén Tolosana, R. Vera-Rodríguez, Julian Fierrez, S. Marcel, C. Busch","doi":"10.1109/WACVW58289.2023.00036","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00036","url":null,"abstract":"This study focuses on the protection of soft-biometric at-tributes related to the demographic information of individ-uals that can be extracted from compact representations of face images, called embeddings. We consider a state-of-the-art technology for soft-biometric privacy enhancement, Incremental Variable Elimination (IVE), and propose Multi-IVE, a new method based on IVE to secure multiple soft-biometric attributes simultaneously. Several aspects of this technology are investigated, proposing different approaches to effectively identify and discard multiple soft-biometric at-tributes contained in face embeddings. In particular, we consider a domain transformation using Principle component Analysis (PCA), and apply IVE in the PCA domain. A complete analysis of the proposed Multi-IVE algorithm is carried out studying the embeddings generated by state-of-the-art face feature extractors, predicting soft-biometric attributes contained within them with multiple machine learning classifiers, and providing a cross-database evaluation. The results obtained show the possibility to simultane-ously secure multiple soft-biometric attributes and support the application of embedding domain transformations be-fore addressing the enhancement of soft-biometric privacy.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130320426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Efficient Approach for Underwater Image Improvement: Deblurring, Dehazing, and Color Correction 一种有效的水下图像改进方法:去模糊、去雾和色彩校正
Pub Date : 2023-01-01 DOI: 10.1109/WACVW58289.2023.00026
Alejandro Rico Espinosa, Declan McIntosh, A. Albu
As remotely operated underwater vehicles (ROV) and static underwater video and image collection platforms become more prevalent, there is a significant need for effective ways to increase the quality of underwater images at faster than real-time speeds. To this end, we present a novel state-of-the-art end-to-end deep learning architecture for underwater image enhancement focused on solving key image degradations related to blur, haze, and color casts and inference efficiency. Our proposed architecture builds from a minimal encoder-decoder structure to address these main underwater image degradations while maintaining efficiency. We use the discrete wavelet transform skip connections and channel attention modules to address haze and color corrections while preserving model efficiency. Our minimal architecture operates at 40 frames per second while scoring a structural similarity index (SSIM) of 0.8703 on the underwater image enhancement benchmark (UIEDB) dataset. These results show our method to be twice as fast as the previous state-of-the-art. We also present a variation of our proposed method with a second parallel deblurring branch for even more significant image improvement, which achieves an improved SSIM of 0.8802 while operating more efficiently than almost all comparable methods. The source code is available at https://github.com/alejorico98/underwater_ddc
随着远程操作的水下航行器(ROV)和静态水下视频和图像采集平台的日益普及,迫切需要一种有效的方法来提高水下图像的质量,以超过实时的速度。为此,我们提出了一种新颖的端到端深度学习架构,用于水下图像增强,专注于解决与模糊、雾霾、色偏和推理效率相关的关键图像退化问题。我们提出的架构从最小的编码器-解码器结构构建,以解决这些主要的水下图像退化,同时保持效率。我们使用离散小波变换跳过连接和通道注意模块来解决雾霾和颜色校正,同时保持模型效率。我们的最小架构以每秒40帧的速度运行,同时在水下图像增强基准(UIEDB)数据集上获得0.8703的结构相似指数(SSIM)。这些结果表明我们的方法比以前最先进的方法快两倍。我们还提出了我们提出的方法的一个变体,使用第二个并行去模糊分支来实现更显著的图像改进,该方法实现了改进的SSIM为0.8802,同时比几乎所有可比较的方法更有效地运行。源代码可从https://github.com/alejorico98/underwater_ddc获得
{"title":"An Efficient Approach for Underwater Image Improvement: Deblurring, Dehazing, and Color Correction","authors":"Alejandro Rico Espinosa, Declan McIntosh, A. Albu","doi":"10.1109/WACVW58289.2023.00026","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00026","url":null,"abstract":"As remotely operated underwater vehicles (ROV) and static underwater video and image collection platforms become more prevalent, there is a significant need for effective ways to increase the quality of underwater images at faster than real-time speeds. To this end, we present a novel state-of-the-art end-to-end deep learning architecture for underwater image enhancement focused on solving key image degradations related to blur, haze, and color casts and inference efficiency. Our proposed architecture builds from a minimal encoder-decoder structure to address these main underwater image degradations while maintaining efficiency. We use the discrete wavelet transform skip connections and channel attention modules to address haze and color corrections while preserving model efficiency. Our minimal architecture operates at 40 frames per second while scoring a structural similarity index (SSIM) of 0.8703 on the underwater image enhancement benchmark (UIEDB) dataset. These results show our method to be twice as fast as the previous state-of-the-art. We also present a variation of our proposed method with a second parallel deblurring branch for even more significant image improvement, which achieves an improved SSIM of 0.8802 while operating more efficiently than almost all comparable methods. The source code is available at https://github.com/alejorico98/underwater_ddc","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130846759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1