首页 > 最新文献

2023 18th International Conference on Machine Vision and Applications (MVA)最新文献

英文 中文
ViTVO: Vision Transformer based Visual Odometry with Attention Supervision 基于视觉变压器的视觉里程测量与注意监督
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215538
Chu-Chi Chiu, Hsuan-Kung Yang, Hao-Wei Chen, Yu-Wen Chen, Chun-Yi Lee
In this paper, we develop a Vision Transformer based visual odometry (VO), called ViTVO. ViTVO introduces an attention mechanism to perform visual odometry. Due to the nature of VO, Transformer based VO models tend to overconcentrate on few points, which may result in a degradation of accuracy. In addition, noises from dynamic objects usually cause difficulties in performing VO tasks. To overcome these issues, we propose an attention loss during training, which utilizes ground truth masks or self supervision to guide the attention maps to focus more on static regions of an image. In our experiments, we demonstrate the superior performance of ViTVO on the Sintel validation set, and validate the effectiveness of our attention supervision mechanism in performing VO tasks.
在本文中,我们开发了一种基于视觉里程计的视觉变压器,称为ViTVO。ViTVO引入了一种注意力机制来执行视觉里程计。由于VO的性质,基于Transformer的VO模型往往会过度集中在几个点上,这可能会导致精度下降。此外,来自动态对象的噪声通常会给VO任务的执行带来困难。为了克服这些问题,我们提出了在训练期间的注意力损失,它利用地面真相面具或自我监督来引导注意力地图更多地关注图像的静态区域。在我们的实验中,我们证明了ViTVO在Sintel验证集上的优越性能,并验证了我们的注意力监督机制在执行VO任务中的有效性。
{"title":"ViTVO: Vision Transformer based Visual Odometry with Attention Supervision","authors":"Chu-Chi Chiu, Hsuan-Kung Yang, Hao-Wei Chen, Yu-Wen Chen, Chun-Yi Lee","doi":"10.23919/MVA57639.2023.10215538","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215538","url":null,"abstract":"In this paper, we develop a Vision Transformer based visual odometry (VO), called ViTVO. ViTVO introduces an attention mechanism to perform visual odometry. Due to the nature of VO, Transformer based VO models tend to overconcentrate on few points, which may result in a degradation of accuracy. In addition, noises from dynamic objects usually cause difficulties in performing VO tasks. To overcome these issues, we propose an attention loss during training, which utilizes ground truth masks or self supervision to guide the attention maps to focus more on static regions of an image. In our experiments, we demonstrate the superior performance of ViTVO on the Sintel validation set, and validate the effectiveness of our attention supervision mechanism in performing VO tasks.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127923758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Weakly-Supervised Deep Image Hashing based on Cross-Modal Transformer 基于跨模态变换的弱监督深度图像哈希
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216160
Ching-Ching Yang, W. Chu, S. Dubey
Weakly-supervised image hashing emerges recently because web images associated with contextual text or tags are abundant. Text information weakly-related to images can be utilized to guide the learning of a deep hashing network. In this paper, we propose Weakly-supervised deep Hashing based on Cross-Modal Transformer (WHCMT). First, cross-scale attention between image patches is discovered to form more effective visual representations. A baseline transformer is also adopted to find self-attention of tags and form tag representations. Second, the cross-modal attention between images and tags is discovered by the proposed cross-modal transformer. Effective hash codes are then generated by embedding layers. WHCMT is tested on semantic image retrieval, and we show new state-of-the-art results can be obtained for the MIRFLICKR-25K dataset and NUS-WIDE dataset.
由于与上下文文本或标签相关的网络图像大量存在,弱监督图像哈希算法应运而生。与图像弱相关的文本信息可以用来指导深度哈希网络的学习。本文提出了一种基于跨模态变换的弱监督深度哈希算法(WHCMT)。首先,发现图像块之间的跨尺度关注可以形成更有效的视觉表征。采用基线转换器寻找标签的自关注,形成标签表示。其次,利用所提出的跨模态转换器发现图像和标签之间的跨模态关注。然后通过嵌入层生成有效的哈希码。WHCMT在语义图像检索上进行了测试,我们展示了在MIRFLICKR-25K数据集和NUS-WIDE数据集上可以获得新的最先进的结果。
{"title":"Weakly-Supervised Deep Image Hashing based on Cross-Modal Transformer","authors":"Ching-Ching Yang, W. Chu, S. Dubey","doi":"10.23919/MVA57639.2023.10216160","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216160","url":null,"abstract":"Weakly-supervised image hashing emerges recently because web images associated with contextual text or tags are abundant. Text information weakly-related to images can be utilized to guide the learning of a deep hashing network. In this paper, we propose Weakly-supervised deep Hashing based on Cross-Modal Transformer (WHCMT). First, cross-scale attention between image patches is discovered to form more effective visual representations. A baseline transformer is also adopted to find self-attention of tags and form tag representations. Second, the cross-modal attention between images and tags is discovered by the proposed cross-modal transformer. Effective hash codes are then generated by embedding layers. WHCMT is tested on semantic image retrieval, and we show new state-of-the-art results can be obtained for the MIRFLICKR-25K dataset and NUS-WIDE dataset.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129882408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video Anomaly Detection Using Encoder-Decoder Networks with Video Vision Transformer and Channel Attention Blocks 基于视频视觉变压器和信道注意块的编码器-解码器网络的视频异常检测
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215921
Shimpei Kobayashi, A. Hizukuri, R. Nakayama
A surveillance camera has been introduced in various locations for public safety. However, security personnel who have to keep observing surveillance camera movies with few abnormal events would be boring. The purpose of this study is to develop a computerized anomaly detection method for the surveillance camera movies. Our database consisted of three public datasets for anomaly detection: UCSD Pedestrian 1, 2, and CUHK Avenue datasets. In the proposed network, channel attention blocks were introduced to TransAnomaly which is one of the anomaly detections to focus important channel information. The areas under the receiver operating characteristic curves (AUCs) with the proposed network were 0.827 for UCSD Pedestrian 1, 0.964 for UCSD Pedestrian 2, and 0.854 for CUHK Avenue, respectively. The AUCs for the proposed network were greater than those for a conventional TransAnomaly without channel attention blocks (0.767, 0.934, and 0.839).
为保障公众安全,在多个地点安装了监控摄像头。但是,如果保安人员一直盯着监控录像看,几乎没有什么异常事件,那就太无聊了。本研究的目的是开发一种针对监控摄影机影像的电脑异常侦测方法。我们的数据库包括三个用于异常检测的公共数据集:UCSD行人1、2和中大大道数据集。在TransAnomaly中引入了通道注意块,这是一种聚焦重要通道信息的异常检测方法。UCSD行人通道1号、UCSD行人通道2号及中大大道的接收人工作特征曲线(auc)下面积分别为0.827、0.964及0.854。该网络的auc大于没有通道注意块的传统TransAnomaly(0.767, 0.934和0.839)。
{"title":"Video Anomaly Detection Using Encoder-Decoder Networks with Video Vision Transformer and Channel Attention Blocks","authors":"Shimpei Kobayashi, A. Hizukuri, R. Nakayama","doi":"10.23919/MVA57639.2023.10215921","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215921","url":null,"abstract":"A surveillance camera has been introduced in various locations for public safety. However, security personnel who have to keep observing surveillance camera movies with few abnormal events would be boring. The purpose of this study is to develop a computerized anomaly detection method for the surveillance camera movies. Our database consisted of three public datasets for anomaly detection: UCSD Pedestrian 1, 2, and CUHK Avenue datasets. In the proposed network, channel attention blocks were introduced to TransAnomaly which is one of the anomaly detections to focus important channel information. The areas under the receiver operating characteristic curves (AUCs) with the proposed network were 0.827 for UCSD Pedestrian 1, 0.964 for UCSD Pedestrian 2, and 0.854 for CUHK Avenue, respectively. The AUCs for the proposed network were greater than those for a conventional TransAnomaly without channel attention blocks (0.767, 0.934, and 0.839).","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125656568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Prior Based Multi-Scale Condition Network for Single-Image HDR Reconstruction 基于多先验的单图像HDR重构多尺度条件网络
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216063
Haorong Jiang, Fengshan Zhao, Junda Liao, Qin Liu, T. Ikenaga
High Dynamic Range (HDR) imaging aims to reconstruct the natural appearance of real-world scenes by expanding the bit depth of captured images. However, due to the imaging pipeline of off-the-shelf cameras, information loss in over-exposed areas and noise in under-exposed areas pose significant challenges for single-image HDR imaging. As a result, the key to success lies in restoring over-exposed regions and denoising under-exposed regions. In this paper, a multi-prior based multi-scale condition network is proposed to address this issue. (1) Three types of prior knowledge modulate the intermediate features in the reconstruction network from different perspectives, resulting in improved modulation effects. (2) Multi-scale fusion extracts and integrates deep semantic information from various priors. Experiments on the NTIRE HDR challenge dataset demonstrate that the proposed method achieves state-of-the-art quantitative results.
高动态范围(HDR)成像旨在通过扩展捕获图像的位深度来重建真实世界场景的自然外观。然而,由于现有相机的成像流水线,过度曝光区域的信息丢失和曝光不足区域的噪声对单图像HDR成像构成了重大挑战。因此,成功的关键在于过度曝光区域的恢复和曝光不足区域的去噪。本文提出了一种基于多先验的多尺度条件网络来解决这一问题。(1)三种先验知识从不同角度对重构网络中的中间特征进行了调制,提高了调制效果。(2)多尺度融合从各种先验信息中提取和融合深层语义信息。在整个HDR挑战数据集上的实验表明,该方法达到了最先进的定量结果。
{"title":"Multi-Prior Based Multi-Scale Condition Network for Single-Image HDR Reconstruction","authors":"Haorong Jiang, Fengshan Zhao, Junda Liao, Qin Liu, T. Ikenaga","doi":"10.23919/MVA57639.2023.10216063","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216063","url":null,"abstract":"High Dynamic Range (HDR) imaging aims to reconstruct the natural appearance of real-world scenes by expanding the bit depth of captured images. However, due to the imaging pipeline of off-the-shelf cameras, information loss in over-exposed areas and noise in under-exposed areas pose significant challenges for single-image HDR imaging. As a result, the key to success lies in restoring over-exposed regions and denoising under-exposed regions. In this paper, a multi-prior based multi-scale condition network is proposed to address this issue. (1) Three types of prior knowledge modulate the intermediate features in the reconstruction network from different perspectives, resulting in improved modulation effects. (2) Multi-scale fusion extracts and integrates deep semantic information from various priors. Experiments on the NTIRE HDR challenge dataset demonstrate that the proposed method achieves state-of-the-art quantitative results.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128020555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Safe height estimation of deformable objects for picking robots by detecting multiple potential contact points 基于多个潜在接触点的可变形物体拾取机器人安全高度估计
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215690
Jaesung Yang, Daisuke Hagihara, Kiyoto Ito, Nobuhiro Chihara
Object sorting in logistics warehouses is still carried out manually, and there is a great need for automation with arm robots. It is desirable that target objects be carefully placed in situations where careful handling of products is important. We propose a method for estimating the height of picked object with a single depth camera to achieve precise placing of items such as stacking, especially for objects that are deformable, e.g., bags. The proposed method detects multiple potential contact points of a picked object to estimate the appropriate height to place the object using the point-cloud difference before and after picking. The validity of the proposed method was verified using 26 cases in which deformable objects were placed inside a container, and it was confirmed that object-height estimation is possible with an average error of 3.2 mm.
物流仓库中的物品分拣仍然是手工进行的,有很大的自动化需求与手臂机器人。在小心处理产品很重要的情况下,目标物体应小心放置。我们提出了一种用单深度相机估计拾取物体高度的方法,以实现物品的精确放置,如堆叠,特别是对于可变形的物体,如袋子。该方法利用拾取物体前后的点云差值,检测被拾取物体的多个潜在接触点,估计物体放置的合适高度。用26个可变形物体放置在容器内的实例验证了该方法的有效性,结果表明,该方法可以估计出物体高度,平均误差为3.2 mm。
{"title":"Safe height estimation of deformable objects for picking robots by detecting multiple potential contact points","authors":"Jaesung Yang, Daisuke Hagihara, Kiyoto Ito, Nobuhiro Chihara","doi":"10.23919/MVA57639.2023.10215690","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215690","url":null,"abstract":"Object sorting in logistics warehouses is still carried out manually, and there is a great need for automation with arm robots. It is desirable that target objects be carefully placed in situations where careful handling of products is important. We propose a method for estimating the height of picked object with a single depth camera to achieve precise placing of items such as stacking, especially for objects that are deformable, e.g., bags. The proposed method detects multiple potential contact points of a picked object to estimate the appropriate height to place the object using the point-cloud difference before and after picking. The validity of the proposed method was verified using 26 cases in which deformable objects were placed inside a container, and it was confirmed that object-height estimation is possible with an average error of 3.2 mm.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124094200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TinyPedSeg: A Tiny Pedestrian Segmentation Benchmark for Top-Down Drone Images TinyPedSeg:一个微小的行人分割基准自上而下无人机图像
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215829
Y. Sahin, Elvin Abdinli, M. A. Aydin, Gozde Unal
The usage of Unmanned Aerial Vehicles (UAVs) has significantly increased in various fields such as surveillance, agriculture, transportation, and military operations. However, the integration of UAVs in these applications requires the ability to navigate autonomously and detect/segment objects in real-time, which can be achieved through the use of neural networks. Despite object detection for RGB images/videos obtained from UAVs are widely studied before, limited effort has been made for segmentation from top-down aerial images. Considering the case in which the UAV is extremely high from the ground, the task can be formed as tiny object segmentation. Thus, inspired from the TinyPerson dataset which focuses on person detection from UAVs, we present TinyPedSeg, which contains 2563 pedestrians in 320 images. Specialized only in pedestrian segmentation, our dataset presents more informativeness than other UAV segmentation datasets. The dataset and the baseline codes are available at https://github.com/ituvisionlab/tinypedseg
无人驾驶飞行器(uav)在监视、农业、运输、军事行动等各个领域的使用显著增加。然而,在这些应用中集成无人机需要自主导航和实时检测/分割物体的能力,这可以通过使用神经网络来实现。尽管以前对无人机获得的RGB图像/视频的目标检测进行了广泛的研究,但对自上而下航拍图像的分割研究有限。考虑到无人机距离地面极高的情况,任务可以形成为微小目标分割。因此,受专注于无人机人员检测的TinyPerson数据集的启发,我们提出了TinyPedSeg,它包含320张图像中的2563名行人。我们的数据集只专注于行人分割,比其他无人机分割数据集具有更多的信息。数据集和基线代码可在https://github.com/ituvisionlab/tinypedseg上获得
{"title":"TinyPedSeg: A Tiny Pedestrian Segmentation Benchmark for Top-Down Drone Images","authors":"Y. Sahin, Elvin Abdinli, M. A. Aydin, Gozde Unal","doi":"10.23919/MVA57639.2023.10215829","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215829","url":null,"abstract":"The usage of Unmanned Aerial Vehicles (UAVs) has significantly increased in various fields such as surveillance, agriculture, transportation, and military operations. However, the integration of UAVs in these applications requires the ability to navigate autonomously and detect/segment objects in real-time, which can be achieved through the use of neural networks. Despite object detection for RGB images/videos obtained from UAVs are widely studied before, limited effort has been made for segmentation from top-down aerial images. Considering the case in which the UAV is extremely high from the ground, the task can be formed as tiny object segmentation. Thus, inspired from the TinyPerson dataset which focuses on person detection from UAVs, we present TinyPedSeg, which contains 2563 pedestrians in 320 images. Specialized only in pedestrian segmentation, our dataset presents more informativeness than other UAV segmentation datasets. The dataset and the baseline codes are available at https://github.com/ituvisionlab/tinypedseg","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"258 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132021178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncertainty Criteria in Active Transfer Learning for Efficient Video-Specific Human Pose Estimation 主动迁移学习中的不确定性准则用于高效视频特定人体姿态估计
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215565
Hiromu Taketsugu, N. Ukita
This paper presents a combination of Active Learning (AL) and Transfer Learning (TL) for efficiently adapting Human Pose (HP) estimators to individual videos. The proposed approach quantifies estimation uncertainty through the temporal changes and unnaturalness of estimated HPs. These uncertainty criteria are combined with clustering-based representativeness criterion to avoid the useless selection of similar samples. Experiments demonstrated that the proposed method achieves high learning efficiency and outperforms comparative methods.
本文提出了一种结合主动学习(AL)和迁移学习(TL)的方法,用于有效地使人体姿态(HP)估计器适应单个视频。该方法通过估计hp的时间变化和非自然性来量化估计的不确定性。这些不确定度准则与基于聚类的代表性准则相结合,避免了相似样本的无用选择。实验结果表明,该方法具有较高的学习效率,优于其他比较方法。
{"title":"Uncertainty Criteria in Active Transfer Learning for Efficient Video-Specific Human Pose Estimation","authors":"Hiromu Taketsugu, N. Ukita","doi":"10.23919/MVA57639.2023.10215565","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215565","url":null,"abstract":"This paper presents a combination of Active Learning (AL) and Transfer Learning (TL) for efficiently adapting Human Pose (HP) estimators to individual videos. The proposed approach quantifies estimation uncertainty through the temporal changes and unnaturalness of estimated HPs. These uncertainty criteria are combined with clustering-based representativeness criterion to avoid the useless selection of similar samples. Experiments demonstrated that the proposed method achieves high learning efficiency and outperforms comparative methods.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"157 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114247722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-Supervised Pre-Training Boosts Semantic Scene Segmentation on LiDAR data 自监督预训练增强激光雷达数据的语义场景分割
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216191
Mariona Carós, Ariadna Just, S. Seguí, Jordi Vitrià
Airborne LiDAR systems have the capability to capture the Earth’s surface by generating extensive point cloud data comprised of points mainly defined by 3D coordinates. However, labeling such points for supervised learning tasks is time-consuming. As a result, there is a need to investigate techniques that can learn from unlabeled data to significantly reduce the number of annotated samples. In this work, we propose to train a self-supervised encoder with Barlow Twins and use it as a pre-trained network in the task of semantic scene segmentation. The experimental results demonstrate that our unsupervised pre-training boosts performance once fine-tuned on the supervised task, especially for under-represented categories.
机载激光雷达系统能够通过生成主要由3D坐标定义的点组成的大量点云数据来捕获地球表面。然而,为监督学习任务标记这些点是很耗时的。因此,有必要研究可以从未标记数据中学习的技术,以显着减少注释样本的数量。在这项工作中,我们提出用Barlow Twins训练一个自监督编码器,并将其用作语义场景分割任务的预训练网络。实验结果表明,一旦对监督任务进行微调,我们的无监督预训练可以提高性能,特别是对于代表性不足的类别。
{"title":"Self-Supervised Pre-Training Boosts Semantic Scene Segmentation on LiDAR data","authors":"Mariona Carós, Ariadna Just, S. Seguí, Jordi Vitrià","doi":"10.23919/MVA57639.2023.10216191","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216191","url":null,"abstract":"Airborne LiDAR systems have the capability to capture the Earth’s surface by generating extensive point cloud data comprised of points mainly defined by 3D coordinates. However, labeling such points for supervised learning tasks is time-consuming. As a result, there is a need to investigate techniques that can learn from unlabeled data to significantly reduce the number of annotated samples. In this work, we propose to train a self-supervised encoder with Barlow Twins and use it as a pre-trained network in the task of semantic scene segmentation. The experimental results demonstrate that our unsupervised pre-training boosts performance once fine-tuned on the supervised task, especially for under-represented categories.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115157357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalization of pixel-wise phase estimation by CNN and improvement of phase-unwrapping by MRF optimization for one-shot 3D scan 基于CNN的逐像素相位估计泛化及基于MRF优化的单次三维扫描相位展开改进
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215780
Hiroto Harada, M. Mikamo, Furukawa Ryo, R. Sagawa, Hiroshi Kawasaki
Active stereo technique using single pattern projection, a.k.a. one-shot 3D scan, have drawn a wide attention from industry, medical purposes, etc. One severe drawback of one-shot 3D scan is sparse reconstruction. In addition, since spatial pattern becomes complicated for the purpose of efficient embedding, it is easily affected by noise, which results in unstable decoding. To solve the problems, we propose a pixel-wise interpolation technique for one-shot scan, which is applicable to any types of static pattern if the pattern is regular and periodic. This is achieved by U-net which is pre-trained by CG with efficient data augmentation algorithm. In the paper, to further overcome the decoding instability, we propose a robust correspondence finding algorithm based on Markov random field (MRF) optimization. We also propose a shape refinement algorithm based on b-spline and Gaussian kernel interpolation using explicitly detected laser curves. Experiments are conducted to show the effectiveness of the proposed method using real data with strong noises and textures.
单模式投影的主动立体技术,又称一次性三维扫描,已引起工业、医学等领域的广泛关注。单次三维扫描的一个严重缺点是重建稀疏。此外,为了有效嵌入,空间模式变得复杂,容易受到噪声的影响,导致解码不稳定。为了解决这些问题,我们提出了一种单次扫描的逐像素插值技术,该技术适用于任何类型的静态模式,只要该模式是规则的和周期性的。这是由CG用高效的数据增强算法对U-net进行预训练实现的。为了进一步克服解码不稳定性,本文提出了一种基于马尔可夫随机场(MRF)优化的鲁棒通信查找算法。我们还提出了一种基于b样条和高斯核插值的形状优化算法。在具有强噪声和强纹理的真实数据中进行了实验,验证了该方法的有效性。
{"title":"Generalization of pixel-wise phase estimation by CNN and improvement of phase-unwrapping by MRF optimization for one-shot 3D scan","authors":"Hiroto Harada, M. Mikamo, Furukawa Ryo, R. Sagawa, Hiroshi Kawasaki","doi":"10.23919/MVA57639.2023.10215780","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215780","url":null,"abstract":"Active stereo technique using single pattern projection, a.k.a. one-shot 3D scan, have drawn a wide attention from industry, medical purposes, etc. One severe drawback of one-shot 3D scan is sparse reconstruction. In addition, since spatial pattern becomes complicated for the purpose of efficient embedding, it is easily affected by noise, which results in unstable decoding. To solve the problems, we propose a pixel-wise interpolation technique for one-shot scan, which is applicable to any types of static pattern if the pattern is regular and periodic. This is achieved by U-net which is pre-trained by CG with efficient data augmentation algorithm. In the paper, to further overcome the decoding instability, we propose a robust correspondence finding algorithm based on Markov random field (MRF) optimization. We also propose a shape refinement algorithm based on b-spline and Gaussian kernel interpolation using explicitly detected laser curves. Experiments are conducted to show the effectiveness of the proposed method using real data with strong noises and textures.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122561657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LOTS: Litter On The Sand dataset for litter segmentation lot:用于凋落物分割的沙地上的凋落物数据集
Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216220
Paola Barra, Alessia Auriemma Citarella, Giosuè Orefice, M. Castrillón-Santana, A. Ciaramella
The marine ecosystem is threatened by human waste released into the sea. One of the most challenging marine litter to identify and remove are the small particles settled on the sand which may be ingested by local fauna or cause damage to the marine ecosystem. Those particles are not easy to identify because they get confused with maritime/natural material, natural elements such as shells, stones or others, which can not be classified as "litter". In this work we present a dataset of Litter On The Sand (LOTS), with images of clean, dirty and wavy sand from 3 different beaches.
人类排入海洋的排泄物威胁着海洋生态系统。识别和清除最具挑战性的海洋垃圾之一是沉积在沙滩上的小颗粒,它们可能被当地动物摄入或对海洋生态系统造成破坏。这些颗粒不容易识别,因为它们与海洋/天然材料、贝壳、石头等自然元素混淆,而这些自然元素不能归类为“垃圾”。在这项工作中,我们展示了一个关于沙滩上的垃圾(LOTS)的数据集,其中有来自3个不同海滩的干净、脏和波浪状的沙子的图像。
{"title":"LOTS: Litter On The Sand dataset for litter segmentation","authors":"Paola Barra, Alessia Auriemma Citarella, Giosuè Orefice, M. Castrillón-Santana, A. Ciaramella","doi":"10.23919/MVA57639.2023.10216220","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216220","url":null,"abstract":"The marine ecosystem is threatened by human waste released into the sea. One of the most challenging marine litter to identify and remove are the small particles settled on the sand which may be ingested by local fauna or cause damage to the marine ecosystem. Those particles are not easy to identify because they get confused with maritime/natural material, natural elements such as shells, stones or others, which can not be classified as \"litter\". In this work we present a dataset of Litter On The Sand (LOTS), with images of clean, dirty and wavy sand from 3 different beaches.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123086610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2023 18th International Conference on Machine Vision and Applications (MVA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1