首页 > 最新文献

2023 18th International Conference on Machine Vision and Applications (MVA)最新文献

英文 中文
Self-Supervised Pre-Training Boosts Semantic Scene Segmentation on LiDAR data 自监督预训练增强激光雷达数据的语义场景分割
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216191
Mariona Carós, Ariadna Just, S. Seguí, Jordi Vitrià
Airborne LiDAR systems have the capability to capture the Earth’s surface by generating extensive point cloud data comprised of points mainly defined by 3D coordinates. However, labeling such points for supervised learning tasks is time-consuming. As a result, there is a need to investigate techniques that can learn from unlabeled data to significantly reduce the number of annotated samples. In this work, we propose to train a self-supervised encoder with Barlow Twins and use it as a pre-trained network in the task of semantic scene segmentation. The experimental results demonstrate that our unsupervised pre-training boosts performance once fine-tuned on the supervised task, especially for under-represented categories.
机载激光雷达系统能够通过生成主要由3D坐标定义的点组成的大量点云数据来捕获地球表面。然而,为监督学习任务标记这些点是很耗时的。因此,有必要研究可以从未标记数据中学习的技术,以显着减少注释样本的数量。在这项工作中,我们提出用Barlow Twins训练一个自监督编码器,并将其用作语义场景分割任务的预训练网络。实验结果表明,一旦对监督任务进行微调,我们的无监督预训练可以提高性能,特别是对于代表性不足的类别。
{"title":"Self-Supervised Pre-Training Boosts Semantic Scene Segmentation on LiDAR data","authors":"Mariona Carós, Ariadna Just, S. Seguí, Jordi Vitrià","doi":"10.23919/MVA57639.2023.10216191","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216191","url":null,"abstract":"Airborne LiDAR systems have the capability to capture the Earth’s surface by generating extensive point cloud data comprised of points mainly defined by 3D coordinates. However, labeling such points for supervised learning tasks is time-consuming. As a result, there is a need to investigate techniques that can learn from unlabeled data to significantly reduce the number of annotated samples. In this work, we propose to train a self-supervised encoder with Barlow Twins and use it as a pre-trained network in the task of semantic scene segmentation. The experimental results demonstrate that our unsupervised pre-training boosts performance once fine-tuned on the supervised task, especially for under-represented categories.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115157357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MS-VACSNet: A Network for Multi-scale Volcanic Ash Cloud Segmentation in Remote Sensing Images MS-VACSNet:遥感图像中多尺度火山灰云分割网络
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215928
G. Swetha, Rajeshreddy Datla, Vishnu Chalavadi, K. C.
The segmentation of volcanic ash clouds in remote sensing images provides valuable insights to study the volcanic deformation, forecasting, tracking, and hazard monitoring. However, the task of delineating the boundary of volcanic eruptions becomes difficult due to non-uniformity in the scale of eruptions across remote sensing images. In this paper, we propose a network for multi-scale volcanic ash clouds segmentation (MS-VACSNet) in remote sensing images. The proposed MS-VACSNet uses U-Net as base line with few improvements in the encoder and decoder sub-networks. Specifically, we employ dilated convolutions to capture the contextual information while delineating volcanic eruptions of different scales. We have conducted experiments on 10 active volcanic regions across the globe using MODIS thermal and infrared images. The experimental results show that our MS-VACSNet achieves an improvement of 5% in dice score compared to state-of-the-art segmentation approaches in segmenting the volcanic ash clouds.
遥感影像中火山灰云的分割为火山形变研究、预测、跟踪和灾害监测提供了有价值的信息。然而,由于遥感影像中火山喷发规模的不均匀性,给火山喷发边界的划定带来了困难。本文提出了一种遥感影像多尺度火山灰云分割网络(MS-VACSNet)。所提出的MS-VACSNet以U-Net为基准,在编码器和解码器子网络上进行了少量改进。具体来说,我们在描述不同规模的火山爆发时使用了扩展卷积来捕获上下文信息。我们利用MODIS热、红外影像对全球10个活火山区域进行了实验。实验结果表明,我们的MS-VACSNet在分割火山灰云时,比目前最先进的分割方法在骰子得分上提高了5%。
{"title":"MS-VACSNet: A Network for Multi-scale Volcanic Ash Cloud Segmentation in Remote Sensing Images","authors":"G. Swetha, Rajeshreddy Datla, Vishnu Chalavadi, K. C.","doi":"10.23919/MVA57639.2023.10215928","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215928","url":null,"abstract":"The segmentation of volcanic ash clouds in remote sensing images provides valuable insights to study the volcanic deformation, forecasting, tracking, and hazard monitoring. However, the task of delineating the boundary of volcanic eruptions becomes difficult due to non-uniformity in the scale of eruptions across remote sensing images. In this paper, we propose a network for multi-scale volcanic ash clouds segmentation (MS-VACSNet) in remote sensing images. The proposed MS-VACSNet uses U-Net as base line with few improvements in the encoder and decoder sub-networks. Specifically, we employ dilated convolutions to capture the contextual information while delineating volcanic eruptions of different scales. We have conducted experiments on 10 active volcanic regions across the globe using MODIS thermal and infrared images. The experimental results show that our MS-VACSNet achieves an improvement of 5% in dice score compared to state-of-the-art segmentation approaches in segmenting the volcanic ash clouds.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124211786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalizable Solar Irradiation Prediction using Large Transformer Models with Sky Imagery 基于天空图像的大型变压器模型的太阳辐照预报
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216081
Kuber Reddy Gorantla, Aditi Roy
Deployment of solar power system in new locations impose several challenges on the operations of local and regional power grids due to the inherent variation in ground-level solar irradiance. This work proposes a novel real-time solar now-casting methodology for solar irradiance prediction based on deep transfer learning from ground-based sky imagery. Existing approaches use statistical methods or Convolutional Neural Networks for irradiation regression trained for a particular location that cannot be transferred to new locations deploying potentially different imaging sensors. This observation motivated us to introduce a large deep neural network based on Vision Transformers that is generalizable and transferable to different scenarios.The system is developed using multiple years of solar irradiance and sky image recordings in two locations. We captured our own data set in Princeton, NJ, USA and also used open-source ASI16 benchmark dataset captured in Golden, CO, USA. The method is validated against these two locations of diverse geographic, climatic conditions and sensor variation. Results show that the proposed method is robust and highly accurate (85-90% accuracy) for multiple locations deployment with 50% less data requirement from new locations.
由于地面太阳辐照度的内在变化,在新地点部署太阳能发电系统给当地和区域电网的运行带来了一些挑战。本研究提出了一种基于地面天空图像深度迁移学习的实时太阳辐射预测方法。现有的方法使用统计方法或卷积神经网络进行辐射回归训练,用于特定位置,不能转移到部署可能不同成像传感器的新位置。这一观察结果促使我们引入一个基于视觉变形器的大型深度神经网络,该网络具有通用性,可转移到不同的场景。该系统是利用多年来在两个地点的太阳辐照度和天空图像记录开发的。我们在美国新泽西州普林斯顿捕获了我们自己的数据集,也使用了在美国科罗拉多州戈尔登捕获的开源ASI16基准数据集。该方法在这两个具有不同地理、气候条件和传感器变化的地点进行了验证。结果表明,该方法具有较强的鲁棒性和较高的精度(85-90%),适用于多地点部署,新地点的数据需求减少50%。
{"title":"Generalizable Solar Irradiation Prediction using Large Transformer Models with Sky Imagery","authors":"Kuber Reddy Gorantla, Aditi Roy","doi":"10.23919/MVA57639.2023.10216081","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216081","url":null,"abstract":"Deployment of solar power system in new locations impose several challenges on the operations of local and regional power grids due to the inherent variation in ground-level solar irradiance. This work proposes a novel real-time solar now-casting methodology for solar irradiance prediction based on deep transfer learning from ground-based sky imagery. Existing approaches use statistical methods or Convolutional Neural Networks for irradiation regression trained for a particular location that cannot be transferred to new locations deploying potentially different imaging sensors. This observation motivated us to introduce a large deep neural network based on Vision Transformers that is generalizable and transferable to different scenarios.The system is developed using multiple years of solar irradiance and sky image recordings in two locations. We captured our own data set in Princeton, NJ, USA and also used open-source ASI16 benchmark dataset captured in Golden, CO, USA. The method is validated against these two locations of diverse geographic, climatic conditions and sensor variation. Results show that the proposed method is robust and highly accurate (85-90% accuracy) for multiple locations deployment with 50% less data requirement from new locations.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132646861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep Randomized Time Warping for Action Recognition 深度随机时间扭曲的动作识别
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216189
Yutaro Hiraoka, K. Fukui
This paper proposes an enhanced Randomized Time Warping (RTW) using CNN features, termed Deep RTW, for motion recognition. RTW is a general extension of Dynamic Time Warping (DTW), widely used for matching and comparing sequential patterns. The basic idea of RTW is to simultaneously calculate the similarities between many pairs of various warped patterns, i.e. Time elastic (TE) features generated by randomly sampling the sequential pattern while retaining their temporal order. This mechanism enables RTW to treat the changes in motion speed flexibly. However, naive TE feature vectors generated from raw images are not expected to have high discriminative power. Besides, the dimension of TE features can increase depending on the number of concatenated images. To address the limitations, we incorporate CNN features extracted from 2D/3D CNNs into the framework of RTW as input to address this issue. Our framework is very simple but effective and applicable to various types of CNN architecture. Extensive experiment on public motion datasets, Jester and Something-Something V2, supports the advantage of our method over the original CNNs.
本文提出了一种使用CNN特征的增强随机时间扭曲(RTW),称为Deep RTW,用于运动识别。RTW是动态时间翘曲(DTW)的一般扩展,广泛用于序列模式的匹配和比较。RTW的基本思想是同时计算许多对不同的扭曲模式之间的相似度,即时间弹性(TE)特征,该特征是在保持序列模式的时间顺序的情况下随机采样而产生的。这种机制使RTW能够灵活地处理运动速度的变化。然而,从原始图像生成的朴素TE特征向量不具有很高的判别能力。此外,TE特征的维度可以随着拼接图像数量的增加而增加。为了解决这一限制,我们将从2D/3D CNN中提取的CNN特征纳入RTW框架中作为输入来解决这个问题。我们的框架非常简单而有效,适用于各种类型的CNN架构。在公共运动数据集Jester和Something-Something V2上的大量实验证明了我们的方法比原始cnn的优势。
{"title":"Deep Randomized Time Warping for Action Recognition","authors":"Yutaro Hiraoka, K. Fukui","doi":"10.23919/MVA57639.2023.10216189","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216189","url":null,"abstract":"This paper proposes an enhanced Randomized Time Warping (RTW) using CNN features, termed Deep RTW, for motion recognition. RTW is a general extension of Dynamic Time Warping (DTW), widely used for matching and comparing sequential patterns. The basic idea of RTW is to simultaneously calculate the similarities between many pairs of various warped patterns, i.e. Time elastic (TE) features generated by randomly sampling the sequential pattern while retaining their temporal order. This mechanism enables RTW to treat the changes in motion speed flexibly. However, naive TE feature vectors generated from raw images are not expected to have high discriminative power. Besides, the dimension of TE features can increase depending on the number of concatenated images. To address the limitations, we incorporate CNN features extracted from 2D/3D CNNs into the framework of RTW as input to address this issue. Our framework is very simple but effective and applicable to various types of CNN architecture. Extensive experiment on public motion datasets, Jester and Something-Something V2, supports the advantage of our method over the original CNNs.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134554458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer with Task Selection for Continual Learning 持续学习的任务选择转换器
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215673
Sheng-Kai Huang, Chun-Rong Huang
The goal of continual learning is to let the models continuously learn the new incoming knowledge without catastrophic forgetting. To address this issue, we propose a transformer-based framework with the task selection module. The task selection module will select corresponding task tokens to assist the learning of incoming samples of new tasks. For previous samples, the selected task tokens can retain the previous knowledge to assist the prediction of samples of learned classes. Compared with the state-of-the-art methods, our method achieves good performance on the CIFAR-100 dataset especially for the testing of the last task to show that our method can better prevent catastrophic forgetting.
持续学习的目标是让模型不断学习新传入的知识,而不会出现灾难性的遗忘。为了解决这个问题,我们提出了一个带有任务选择模块的基于转换器的框架。任务选择模块将选择相应的任务令牌,以帮助学习新任务的传入样本。对于以前的样本,选择的任务令牌可以保留以前的知识,以帮助预测学习类的样本。与目前最先进的方法相比,我们的方法在CIFAR-100数据集上取得了良好的性能,特别是在最后一个任务的测试中,表明我们的方法可以更好地防止灾难性遗忘。
{"title":"Transformer with Task Selection for Continual Learning","authors":"Sheng-Kai Huang, Chun-Rong Huang","doi":"10.23919/MVA57639.2023.10215673","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215673","url":null,"abstract":"The goal of continual learning is to let the models continuously learn the new incoming knowledge without catastrophic forgetting. To address this issue, we propose a transformer-based framework with the task selection module. The task selection module will select corresponding task tokens to assist the learning of incoming samples of new tasks. For previous samples, the selected task tokens can retain the previous knowledge to assist the prediction of samples of learned classes. Compared with the state-of-the-art methods, our method achieves good performance on the CIFAR-100 dataset especially for the testing of the last task to show that our method can better prevent catastrophic forgetting.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130938811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating self-supervised learning for Skin Lesion Classification 研究皮肤病变分类的自监督学习
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215580
Takumi Morita, X. Han
Skin cancer is one of the most common cancer worldwide, and is growing as a rising global health issue due to the damage of the natural protection from harmful ultraviolet radiation. Early diagnosis and proper treatment even for the deadliest malignant melanoma can greatly increase the survival rate. Thus, computer-aided diagnosis for skin lesions has been actively explored and made remarkable progress in medical practices benefiting from the the great advance of the deep convolution neural networks in vision tasks. However, most studies in skin lesion/cancer recognition and detection focus on reconstructing a robust prediction model with the annotated training samples in a fully-supervised manner, and cannot make full use of the available unlabeled data. This study investigates self-supervised learning using large amount of unlabeled skin lesion images to train a good initial network for representation learning, and transfer the knowledge of the initial model to the supervised skin lesion classification task with small number of annotated samples for enhancing the performance. Specifically, we employ a negative sample-free self-supervised framework by leveraging the interaction learning of the online and target networks for enforcing representative robustness with only positive samples. Moreover, according to the observation of the potential variations in the target skin images, we select the adaptive augmentation methods to produce the transformed positive views for self-supervised learning. Extensive experiments on two benchmark skin lesion datasets demonstrated that the proposed self-supervised pre-training can stably improve the recognition performance with different numbers of the labeled images compared with the baseline models.
皮肤癌是世界上最常见的癌症之一,由于有害紫外线辐射对自然保护的破坏,皮肤癌正在成为一个日益严重的全球健康问题。即使是最致命的恶性黑色素瘤,早期诊断和适当治疗也能大大提高生存率。因此,得益于深度卷积神经网络在视觉任务中的巨大进步,计算机辅助诊断皮肤病变已被积极探索,并在医疗实践中取得了显著进展。然而,大多数关于皮肤病变/癌症识别和检测的研究都集中在用带注释的训练样本以全监督的方式重建一个鲁棒的预测模型,而不能充分利用现有的未标记数据。本研究利用大量未标记的皮肤病变图像进行自监督学习,训练一个良好的初始网络进行表征学习,并将初始模型的知识转移到具有少量注释样本的监督皮肤病变分类任务中,以提高性能。具体来说,我们通过利用在线网络和目标网络的交互学习来实现只有正样本的代表性鲁棒性,采用了一个无负样本的自监督框架。此外,根据对目标皮肤图像潜在变化的观察,选择自适应增强方法生成变换后的正视图,用于自监督学习。在两个基准皮肤病变数据集上进行的大量实验表明,与基线模型相比,所提出的自监督预训练可以稳定地提高不同数量标记图像的识别性能。
{"title":"Investigating self-supervised learning for Skin Lesion Classification","authors":"Takumi Morita, X. Han","doi":"10.23919/MVA57639.2023.10215580","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215580","url":null,"abstract":"Skin cancer is one of the most common cancer worldwide, and is growing as a rising global health issue due to the damage of the natural protection from harmful ultraviolet radiation. Early diagnosis and proper treatment even for the deadliest malignant melanoma can greatly increase the survival rate. Thus, computer-aided diagnosis for skin lesions has been actively explored and made remarkable progress in medical practices benefiting from the the great advance of the deep convolution neural networks in vision tasks. However, most studies in skin lesion/cancer recognition and detection focus on reconstructing a robust prediction model with the annotated training samples in a fully-supervised manner, and cannot make full use of the available unlabeled data. This study investigates self-supervised learning using large amount of unlabeled skin lesion images to train a good initial network for representation learning, and transfer the knowledge of the initial model to the supervised skin lesion classification task with small number of annotated samples for enhancing the performance. Specifically, we employ a negative sample-free self-supervised framework by leveraging the interaction learning of the online and target networks for enforcing representative robustness with only positive samples. Moreover, according to the observation of the potential variations in the target skin images, we select the adaptive augmentation methods to produce the transformed positive views for self-supervised learning. Extensive experiments on two benchmark skin lesion datasets demonstrated that the proposed self-supervised pre-training can stably improve the recognition performance with different numbers of the labeled images compared with the baseline models.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132750801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Age Prediction From Face Images Via Contrastive Learning 基于对比学习的人脸图像年龄预测
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216074
Yeongnam Chae, Poulami Raha, Mijung Kim, B. Stenger
This paper presents a novel approach for accurately estimating age from face images, which overcomes the challenge of collecting a large dataset of individuals with the same identity at different ages. Instead, we leverage readily available face datasets of different people at different ages and aim to extract age-related features using contrastive learning. Our method emphasizes these relevant features while suppressing identity-related features using a combination of cosine similarity and triplet margin losses. We demonstrate the effectiveness of our proposed approach by achieving state-of-the-art performance on two public datasets, FG-NET and MORPH II.
本文提出了一种从人脸图像中准确估计年龄的新方法,克服了在不同年龄收集具有相同身份的个体的大数据集的挑战。相反,我们利用现成的不同年龄的不同人的面部数据集,旨在使用对比学习提取与年龄相关的特征。我们的方法强调这些相关特征,同时使用余弦相似度和三重边缘损失的组合来抑制身份相关特征。我们通过在两个公共数据集FG-NET和MORPH II上实现最先进的性能来证明我们提出的方法的有效性。
{"title":"Age Prediction From Face Images Via Contrastive Learning","authors":"Yeongnam Chae, Poulami Raha, Mijung Kim, B. Stenger","doi":"10.23919/MVA57639.2023.10216074","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216074","url":null,"abstract":"This paper presents a novel approach for accurately estimating age from face images, which overcomes the challenge of collecting a large dataset of individuals with the same identity at different ages. Instead, we leverage readily available face datasets of different people at different ages and aim to extract age-related features using contrastive learning. Our method emphasizes these relevant features while suppressing identity-related features using a combination of cosine similarity and triplet margin losses. We demonstrate the effectiveness of our proposed approach by achieving state-of-the-art performance on two public datasets, FG-NET and MORPH II.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115370006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical Spatio-Temporal Neural Network with Displacement Based Refinement for Monocular Head Pose Prediction 基于位移的分层时空神经网络单眼头姿预测
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216167
Zhe Xu, Yuan Li, Yuhong Li, Songlin Du, T. Ikenaga
Head pose prediction aims to forecast future head pose given observed sequence, which plays an increasingly important role in human computer interaction, virtual reality, and driver monitoring. However, since there are many moving possibilities, current head pose works, mainly focusing on estimation, fail to provide sufficient temporal information to meet the high demands for accurate predictions. This paper proposes (A) a Spatio-Temporal Encoder (STE), (B) a displacement based offset generating module, and (C) a time step feature aggregation module. The STE extracts spatial information via Transformer and temporal information according to the time order of frames. The displacement based offset generating module utilizes displacement information through a frequency domain process between adjacent frames to generate an offset to refine the prediction result. Furthermore, the time step feature aggregation module integrates time step features based on the information density and hierarchically extracts past motion information as prior knowledge to capture the motion recurrence. Extensive experiments have shown that the proposed network outperforms related methods, achieving a Mean Absolute Error (MAE) of 4.5865° on simple background sequences and 7.1325° on complex background sequences.
头姿预测的目的是根据观察序列预测未来的头姿,在人机交互、虚拟现实、驾驶员监控等领域发挥着越来越重要的作用。然而,由于运动的可能性很多,目前的头部姿势工作主要集中在估计上,无法提供足够的时间信息来满足对准确预测的高要求。本文提出了(A)一个时空编码器(STE), (B)一个基于位移的偏移量生成模块,(C)一个时间步长特征聚合模块。STE通过Transformer提取空间信息,并根据帧的时间顺序提取时间信息。基于位移的偏移量生成模块通过相邻帧之间的频域处理,利用位移信息生成偏移量,以细化预测结果。此外,时间步长特征聚合模块根据信息密度对时间步长特征进行整合,分层提取过去的运动信息作为先验知识,捕捉运动递归。大量的实验表明,该网络优于相关方法,在简单背景序列上的平均绝对误差(MAE)为4.5865°,在复杂背景序列上的平均绝对误差为7.1325°。
{"title":"Hierarchical Spatio-Temporal Neural Network with Displacement Based Refinement for Monocular Head Pose Prediction","authors":"Zhe Xu, Yuan Li, Yuhong Li, Songlin Du, T. Ikenaga","doi":"10.23919/MVA57639.2023.10216167","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216167","url":null,"abstract":"Head pose prediction aims to forecast future head pose given observed sequence, which plays an increasingly important role in human computer interaction, virtual reality, and driver monitoring. However, since there are many moving possibilities, current head pose works, mainly focusing on estimation, fail to provide sufficient temporal information to meet the high demands for accurate predictions. This paper proposes (A) a Spatio-Temporal Encoder (STE), (B) a displacement based offset generating module, and (C) a time step feature aggregation module. The STE extracts spatial information via Transformer and temporal information according to the time order of frames. The displacement based offset generating module utilizes displacement information through a frequency domain process between adjacent frames to generate an offset to refine the prediction result. Furthermore, the time step feature aggregation module integrates time step features based on the information density and hierarchically extracts past motion information as prior knowledge to capture the motion recurrence. Extensive experiments have shown that the proposed network outperforms related methods, achieving a Mean Absolute Error (MAE) of 4.5865° on simple background sequences and 7.1325° on complex background sequences.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115430653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint Learning with Group Relation and Individual Action 群体关系与个体行为的共同学习
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215994
Chihiro Nakatani, Hiroaki Kawashima, N. Ukita
This paper proposes a method for group relation learning. Different from related work in which the manual annotation of group activities is required for supervised learning, we propose group relation learning without group activity annotation through recognition of individual action that can be more easily annotated than group activities defined with complex inter-people relationships. Our method extracts features informative for recognizing the action of each person by conditioning the group relation with the location of this person. A variety of experimental results demonstrate that our method outperforms SOTA methods quantitatively and qualitatively on two public datasets.
本文提出了一种群体关系学习方法。与以往需要手工标注小组活动才能进行监督学习的相关工作不同,我们提出了不需要标注小组活动的小组关系学习,通过对个体行为的识别,可以比具有复杂人际关系的小组活动更容易标注。我们的方法通过将群体关系与每个人的位置联系起来,提取信息丰富的特征来识别每个人的行为。各种实验结果表明,我们的方法在两个公共数据集上的定量和定性都优于SOTA方法。
{"title":"Joint Learning with Group Relation and Individual Action","authors":"Chihiro Nakatani, Hiroaki Kawashima, N. Ukita","doi":"10.23919/MVA57639.2023.10215994","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215994","url":null,"abstract":"This paper proposes a method for group relation learning. Different from related work in which the manual annotation of group activities is required for supervised learning, we propose group relation learning without group activity annotation through recognition of individual action that can be more easily annotated than group activities defined with complex inter-people relationships. Our method extracts features informative for recognizing the action of each person by conditioning the group relation with the location of this person. A variety of experimental results demonstrate that our method outperforms SOTA methods quantitatively and qualitatively on two public datasets.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114423563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging Embedding Information to Create Video Capsule Endoscopy Datasets 利用嵌入信息创建视频胶囊内窥镜数据集
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215919
Pere Gilabert, C. Malagelada, Hagen Wenzek, Jordi Vitrià, S. Seguí
As the field of deep learning continues to expand, it has become increasingly apparent that large volumes of data are needed to train algorithms effectively. This is particularly challenging in the endoscopic capsule field, where obtaining and labeling sufficient data can be expensive and time-consuming. To overcome these challenges, we have developed an automatic method of video selection that uses the diversity of unlabeled videos to identify the most relevant videos for labeling. The findings indicate a significant improvement in performance with the implementation of this new methodology. The system selects relevant and diverse videos, achieving high accuracy in the classification task. This translates to less workload for annotators as they can label fewer videos while maintaining the same accuracy level in the classification task.
随着深度学习领域的不断扩展,越来越明显的是,需要大量的数据来有效地训练算法。这在内窥镜胶囊领域尤其具有挑战性,因为获取和标记足够的数据既昂贵又耗时。为了克服这些挑战,我们开发了一种自动视频选择方法,该方法利用未标记视频的多样性来识别最相关的视频进行标记。调查结果表明,在执行这种新方法后,业绩有了显著改善。该系统选择了相关且多样的视频,在分类任务中实现了较高的准确率。这意味着注释者的工作量更少,因为他们可以标记更少的视频,同时在分类任务中保持相同的准确性水平。
{"title":"Leveraging Embedding Information to Create Video Capsule Endoscopy Datasets","authors":"Pere Gilabert, C. Malagelada, Hagen Wenzek, Jordi Vitrià, S. Seguí","doi":"10.23919/MVA57639.2023.10215919","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215919","url":null,"abstract":"As the field of deep learning continues to expand, it has become increasingly apparent that large volumes of data are needed to train algorithms effectively. This is particularly challenging in the endoscopic capsule field, where obtaining and labeling sufficient data can be expensive and time-consuming. To overcome these challenges, we have developed an automatic method of video selection that uses the diversity of unlabeled videos to identify the most relevant videos for labeling. The findings indicate a significant improvement in performance with the implementation of this new methodology. The system selects relevant and diverse videos, achieving high accuracy in the classification task. This translates to less workload for annotators as they can label fewer videos while maintaining the same accuracy level in the classification task.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114795101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2023 18th International Conference on Machine Vision and Applications (MVA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1