首页 > 最新文献

Comput. Vis. Image Underst.最新文献

英文 中文
Progressive Scene Text Erasing with Self-Supervision 具有自我监督的渐进场景文本擦除
Pub Date : 2022-07-23 DOI: 10.48550/arXiv.2207.11469
Xiangcheng Du, Zhao Zhou, Yingbin Zheng, Xingjiao Wu, Tianlong Ma, Cheng Jin
Scene text erasing seeks to erase text contents from scene images and current state-of-the-art text erasing models are trained on large-scale synthetic data. Although data synthetic engines can provide vast amounts of annotated training samples, there are differences between synthetic and real-world data. In this paper, we employ self-supervision for feature representation on unlabeled real-world scene text images. A novel pretext task is designed to keep consistent among text stroke masks of image variants. We design the Progressive Erasing Network in order to remove residual texts. The scene text is erased progressively by leveraging the intermediate generated results which provide the foundation for subsequent higher quality results. Experiments show that our method significantly improves the generalization of the text erasing task and achieves state-of-the-art performance on public benchmarks.
场景文本擦除旨在擦除场景图像中的文本内容,目前最先进的文本擦除模型是在大规模合成数据上训练的。尽管数据合成引擎可以提供大量带注释的训练样本,但合成数据和真实数据之间存在差异。在本文中,我们对未标记的真实场景文本图像的特征表示采用了自监督。设计了一种新的借口任务,以保持图像变体的文本笔画蒙版之间的一致性。我们设计了渐进式擦除网络,以去除残留文本。通过利用中间生成的结果逐步擦除场景文本,这为后续更高质量的结果提供了基础。实验表明,我们的方法显著提高了文本擦除任务的泛化性,并在公共基准测试中达到了最先进的性能。
{"title":"Progressive Scene Text Erasing with Self-Supervision","authors":"Xiangcheng Du, Zhao Zhou, Yingbin Zheng, Xingjiao Wu, Tianlong Ma, Cheng Jin","doi":"10.48550/arXiv.2207.11469","DOIUrl":"https://doi.org/10.48550/arXiv.2207.11469","url":null,"abstract":"Scene text erasing seeks to erase text contents from scene images and current state-of-the-art text erasing models are trained on large-scale synthetic data. Although data synthetic engines can provide vast amounts of annotated training samples, there are differences between synthetic and real-world data. In this paper, we employ self-supervision for feature representation on unlabeled real-world scene text images. A novel pretext task is designed to keep consistent among text stroke masks of image variants. We design the Progressive Erasing Network in order to remove residual texts. The scene text is erased progressively by leveraging the intermediate generated results which provide the foundation for subsequent higher quality results. Experiments show that our method significantly improves the generalization of the text erasing task and achieves state-of-the-art performance on public benchmarks.","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74485378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An Efficient Framework for Few-shot Skeleton-based Temporal Action Segmentation 基于少镜头骨架的时间动作分割的有效框架
Pub Date : 2022-07-20 DOI: 10.48550/arXiv.2207.09925
Leiyang Xu, Qianqian Wang, Xiaotian Lin, Lin Yuan
Temporal action segmentation (TAS) aims to classify and locate actions in the long untrimmed action sequence. With the success of deep learning, many deep models for action segmentation have emerged. However, few-shot TAS is still a challenging problem. This study proposes an efficient framework for the few-shot skeleton-based TAS, including a data augmentation method and an improved model. The data augmentation approach based on motion interpolation is presented here to solve the problem of insufficient data, and can increase the number of samples significantly by synthesizing action sequences. Besides, we concatenate a Connectionist Temporal Classification (CTC) layer with a network designed for skeleton-based TAS to obtain an optimized model. Leveraging CTC can enhance the temporal alignment between prediction and ground truth and further improve the segment-wise metrics of segmentation results. Extensive experiments on both public and self-constructed datasets, including two small-scale datasets and one large-scale dataset, show the effectiveness of two proposed methods in improving the performance of the few-shot skeleton-based TAS task.
时间动作分割(Temporal action segmentation, TAS)的目的是对长时间未修剪动作序列中的动作进行分类和定位。随着深度学习的成功,出现了许多用于动作分割的深度模型。然而,少镜头TAS仍然是一个具有挑战性的问题。本研究提出了一种有效的基于少弹骨架的TAS框架,包括数据增强方法和改进模型。本文提出了一种基于运动插值的数据增强方法,解决了数据不足的问题,通过合成动作序列可以显著增加样本数量。此外,我们将Connectionist Temporal Classification (CTC)层与为基于骨架的TAS设计的网络连接起来,以获得优化模型。利用CTC可以增强预测和真实之间的时间一致性,并进一步改善分割结果的分段指标。在公共数据集和自建数据集(包括两个小尺度数据集和一个大尺度数据集)上进行的大量实验表明,所提出的两种方法在提高基于少镜头骨架的TAS任务性能方面是有效的。
{"title":"An Efficient Framework for Few-shot Skeleton-based Temporal Action Segmentation","authors":"Leiyang Xu, Qianqian Wang, Xiaotian Lin, Lin Yuan","doi":"10.48550/arXiv.2207.09925","DOIUrl":"https://doi.org/10.48550/arXiv.2207.09925","url":null,"abstract":"Temporal action segmentation (TAS) aims to classify and locate actions in the long untrimmed action sequence. With the success of deep learning, many deep models for action segmentation have emerged. However, few-shot TAS is still a challenging problem. This study proposes an efficient framework for the few-shot skeleton-based TAS, including a data augmentation method and an improved model. The data augmentation approach based on motion interpolation is presented here to solve the problem of insufficient data, and can increase the number of samples significantly by synthesizing action sequences. Besides, we concatenate a Connectionist Temporal Classification (CTC) layer with a network designed for skeleton-based TAS to obtain an optimized model. Leveraging CTC can enhance the temporal alignment between prediction and ground truth and further improve the segment-wise metrics of segmentation results. Extensive experiments on both public and self-constructed datasets, including two small-scale datasets and one large-scale dataset, show the effectiveness of two proposed methods in improving the performance of the few-shot skeleton-based TAS task.","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90475265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video Anomaly Detection 重新审视视频异常检测的自监督多任务学习
Pub Date : 2022-07-16 DOI: 10.48550/arXiv.2207.08003
Antonio Bărbălău, Radu Tudor Ionescu, Mariana-Iuliana Georgescu, J. Dueholm, B. Ramachandra, Kamal Nasrollahi, F. Khan, T. Moeslund, M. Shah
A self-supervised multi-task learning (SSMTL) framework for video anomaly detection was recently introduced in literature. Due to its highly accurate results, the method attracted the attention of many researchers. In this work, we revisit the self-supervised multi-task learning framework, proposing several updates to the original method. First, we study various detection methods, e.g. based on detecting high-motion regions using optical flow or background subtraction, since we believe the currently used pre-trained YOLOv3 is suboptimal, e.g. objects in motion or objects from unknown classes are never detected. Second, we modernize the 3D convolutional backbone by introducing multi-head self-attention modules, inspired by the recent success of vision transformers. As such, we alternatively introduce both 2D and 3D convolutional vision transformer (CvT) blocks. Third, in our attempt to further improve the model, we study additional self-supervised learning tasks, such as predicting segmentation maps through knowledge distillation, solving jigsaw puzzles, estimating body pose through knowledge distillation, predicting masked regions (inpainting), and adversarial learning with pseudo-anomalies. We conduct experiments to assess the performance impact of the introduced changes. Upon finding more promising configurations of the framework, dubbed SSMTL++v1 and SSMTL++v2, we extend our preliminary experiments to more data sets, demonstrating that our performance gains are consistent across all data sets. In most cases, our results on Avenue, ShanghaiTech and UBnormal raise the state-of-the-art performance bar to a new level.
近年来,文献中提出了一种用于视频异常检测的自监督多任务学习(SSMTL)框架。由于其结果的高度准确性,该方法引起了许多研究者的注意。在这项工作中,我们重新审视了自监督多任务学习框架,并对原始方法提出了一些更新。首先,我们研究了各种检测方法,例如基于使用光流或背景减法检测高运动区域,因为我们认为目前使用的预训练YOLOv3是次优的,例如运动中的物体或来自未知类别的物体从未被检测到。其次,我们通过引入多头自关注模块来实现3D卷积主干的现代化,灵感来自最近视觉变压器的成功。因此,我们交替地引入2D和3D卷积视觉变压器(CvT)块。第三,为了进一步改进模型,我们研究了额外的自监督学习任务,例如通过知识蒸馏预测分割图,解决拼图,通过知识蒸馏估计身体姿势,预测屏蔽区域(油漆),以及使用伪异常进行对抗性学习。我们进行实验来评估引入的更改对性能的影响。在找到更有希望的框架配置(称为ssmtl++ v1和ssmtl++ v2)之后,我们将初步实验扩展到更多数据集,证明我们的性能提升在所有数据集上都是一致的。在大多数情况下,我们在Avenue, ShanghaiTech和UBnormal的结果将最先进的性能标准提高到一个新的水平。
{"title":"SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video Anomaly Detection","authors":"Antonio Bărbălău, Radu Tudor Ionescu, Mariana-Iuliana Georgescu, J. Dueholm, B. Ramachandra, Kamal Nasrollahi, F. Khan, T. Moeslund, M. Shah","doi":"10.48550/arXiv.2207.08003","DOIUrl":"https://doi.org/10.48550/arXiv.2207.08003","url":null,"abstract":"A self-supervised multi-task learning (SSMTL) framework for video anomaly detection was recently introduced in literature. Due to its highly accurate results, the method attracted the attention of many researchers. In this work, we revisit the self-supervised multi-task learning framework, proposing several updates to the original method. First, we study various detection methods, e.g. based on detecting high-motion regions using optical flow or background subtraction, since we believe the currently used pre-trained YOLOv3 is suboptimal, e.g. objects in motion or objects from unknown classes are never detected. Second, we modernize the 3D convolutional backbone by introducing multi-head self-attention modules, inspired by the recent success of vision transformers. As such, we alternatively introduce both 2D and 3D convolutional vision transformer (CvT) blocks. Third, in our attempt to further improve the model, we study additional self-supervised learning tasks, such as predicting segmentation maps through knowledge distillation, solving jigsaw puzzles, estimating body pose through knowledge distillation, predicting masked regions (inpainting), and adversarial learning with pseudo-anomalies. We conduct experiments to assess the performance impact of the introduced changes. Upon finding more promising configurations of the framework, dubbed SSMTL++v1 and SSMTL++v2, we extend our preliminary experiments to more data sets, demonstrating that our performance gains are consistent across all data sets. In most cases, our results on Avenue, ShanghaiTech and UBnormal raise the state-of-the-art performance bar to a new level.","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79033053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
SiaTrans: Siamese Transformer Network for RGB-D Salient Object Detection with Depth Image Classification SiaTrans:基于深度图像分类的RGB-D显著目标检测Siamese变压器网络
Pub Date : 2022-07-09 DOI: 10.48550/arXiv.2207.04224
Xin Jia, Changlei Dongye, Yan-Tsung Peng
RGB-D SOD uses depth information to handle challenging scenes and obtain high-quality saliency maps. Existing state-of-the-art RGB-D saliency detection methods overwhelmingly rely on the strategy of directly fusing depth information. Although these methods improve the accuracy of saliency prediction through various cross-modality fusion strategies, misinformation provided by some poor-quality depth images can affect the saliency prediction result. To address this issue, a novel RGB-D salient object detection model (SiaTrans) is proposed in this paper, which allows training on depth image quality classification at the same time as training on SOD. In light of the common information between RGB and depth images on salient objects, SiaTrans uses a Siamese transformer network with shared weight parameters as the encoder and extracts RGB and depth features concatenated on the batch dimension, saving space resources without compromising performance. SiaTrans uses the Class token in the backbone network (T2T-ViT) to classify the quality of depth images without preventing the token sequence from going on with the saliency detection task. Transformer-based cross-modality fusion module (CMF) can effectively fuse RGB and depth information. And in the testing process, CMF can choose to fuse cross-modality information or enhance RGB information according to the quality classification signal of the depth image. The greatest benefit of our designed CMF and decoder is that they maintain the consistency of RGB and RGB-D information decoding: SiaTrans decodes RGB-D or RGB information under the same model parameters according to the classification signal during testing. Comprehensive experiments on nine RGB-D SOD benchmark datasets show that SiaTrans has the best overall performance and the least computation compared with recent state-of-the-art methods.
RGB-D SOD使用深度信息来处理具有挑战性的场景,并获得高质量的显著性地图。现有最先进的RGB-D显著性检测方法绝大多数依赖于直接融合深度信息的策略。虽然这些方法通过各种跨模态融合策略提高了显著性预测的准确性,但一些质量较差的深度图像提供的错误信息会影响显著性预测的结果。为了解决这一问题,本文提出了一种新的RGB-D显著目标检测模型(siatranss),该模型可以在进行SOD训练的同时进行深度图像质量分类训练。考虑到显著目标上RGB和深度图像之间的共同信息,SiaTrans采用具有共享权值参数的Siamese变压器网络作为编码器,在不影响性能的前提下,在批量维度上提取连接在一起的RGB和深度特征,节省空间资源。siatran使用骨干网(T2T-ViT)中的Class令牌对深度图像的质量进行分类,而不会阻止令牌序列继续进行显著性检测任务。基于变压器的跨模态融合模块(CMF)可以有效地融合RGB和深度信息。在测试过程中,CMF可以根据深度图像的质量分类信号选择融合交叉模态信息或增强RGB信息。我们设计的CMF和解码器最大的好处是保持了RGB和RGB- d信息解码的一致性:SiaTrans在测试时根据分类信号对相同模型参数下的RGB- d或RGB信息进行解码。在9个RGB-D SOD基准数据集上的综合实验表明,与目前最先进的方法相比,SiaTrans具有最佳的综合性能和最少的计算量。
{"title":"SiaTrans: Siamese Transformer Network for RGB-D Salient Object Detection with Depth Image Classification","authors":"Xin Jia, Changlei Dongye, Yan-Tsung Peng","doi":"10.48550/arXiv.2207.04224","DOIUrl":"https://doi.org/10.48550/arXiv.2207.04224","url":null,"abstract":"RGB-D SOD uses depth information to handle challenging scenes and obtain high-quality saliency maps. Existing state-of-the-art RGB-D saliency detection methods overwhelmingly rely on the strategy of directly fusing depth information. Although these methods improve the accuracy of saliency prediction through various cross-modality fusion strategies, misinformation provided by some poor-quality depth images can affect the saliency prediction result. To address this issue, a novel RGB-D salient object detection model (SiaTrans) is proposed in this paper, which allows training on depth image quality classification at the same time as training on SOD. In light of the common information between RGB and depth images on salient objects, SiaTrans uses a Siamese transformer network with shared weight parameters as the encoder and extracts RGB and depth features concatenated on the batch dimension, saving space resources without compromising performance. SiaTrans uses the Class token in the backbone network (T2T-ViT) to classify the quality of depth images without preventing the token sequence from going on with the saliency detection task. Transformer-based cross-modality fusion module (CMF) can effectively fuse RGB and depth information. And in the testing process, CMF can choose to fuse cross-modality information or enhance RGB information according to the quality classification signal of the depth image. The greatest benefit of our designed CMF and decoder is that they maintain the consistency of RGB and RGB-D information decoding: SiaTrans decodes RGB-D or RGB information under the same model parameters according to the classification signal during testing. Comprehensive experiments on nine RGB-D SOD benchmark datasets show that SiaTrans has the best overall performance and the least computation compared with recent state-of-the-art methods.","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76098488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
PoseGU: 3D Human Pose Estimation with Novel Human Pose Generator and Unbiased Learning PoseGU:基于新型人体姿态生成器和无偏学习的三维人体姿态估计
Pub Date : 2022-07-07 DOI: 10.48550/arXiv.2207.03618
S. Guan, Haiyan Lu, Linchao Zhu, Gengfa Fang
3D pose estimation has recently gained substantial interests in computer vision domain. Existing 3D pose estimation methods have a strong reliance on large size well-annotated 3D pose datasets, and they suffer poor model generalization on unseen poses due to limited diversity of 3D poses in training sets. In this work, we propose PoseGU, a novel human pose generator that generates diverse poses with access only to a small size of seed samples, while equipping the Counterfactual Risk Minimization to pursue an unbiased evaluation objective. Extensive experiments demonstrate PoseGU outforms almost all the state-of-the-art 3D human pose methods under consideration over three popular benchmark datasets. Empirical analysis also proves PoseGU generates 3D poses with improved data diversity and better generalization ability.
近年来,三维姿态估计在计算机视觉领域引起了广泛的关注。现有的三维姿态估计方法强烈依赖于大规模的、注释良好的三维姿态数据集,并且由于训练集中三维姿态的多样性有限,它们对未见姿态的模型泛化能力较差。在这项工作中,我们提出了PoseGU,这是一种新型的人体姿势生成器,它可以在只访问少量种子样本的情况下生成多种姿势,同时配备反事实风险最小化来追求公正的评估目标。大量的实验表明,在三个流行的基准数据集上,PoseGU优于几乎所有最先进的3D人体姿势方法。实证分析也证明了PoseGU生成的三维姿态具有更好的数据多样性和泛化能力。
{"title":"PoseGU: 3D Human Pose Estimation with Novel Human Pose Generator and Unbiased Learning","authors":"S. Guan, Haiyan Lu, Linchao Zhu, Gengfa Fang","doi":"10.48550/arXiv.2207.03618","DOIUrl":"https://doi.org/10.48550/arXiv.2207.03618","url":null,"abstract":"3D pose estimation has recently gained substantial interests in computer vision domain. Existing 3D pose estimation methods have a strong reliance on large size well-annotated 3D pose datasets, and they suffer poor model generalization on unseen poses due to limited diversity of 3D poses in training sets. In this work, we propose PoseGU, a novel human pose generator that generates diverse poses with access only to a small size of seed samples, while equipping the Counterfactual Risk Minimization to pursue an unbiased evaluation objective. Extensive experiments demonstrate PoseGU outforms almost all the state-of-the-art 3D human pose methods under consideration over three popular benchmark datasets. Empirical analysis also proves PoseGU generates 3D poses with improved data diversity and better generalization ability.","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85834408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Rich global feature guided network for monocular depth estimation 单目深度估计的富全局特征引导网络
Pub Date : 2022-07-01 DOI: 10.2139/ssrn.4057946
Bingyuan Wu, Yongxiong Wang
{"title":"Rich global feature guided network for monocular depth estimation","authors":"Bingyuan Wu, Yongxiong Wang","doi":"10.2139/ssrn.4057946","DOIUrl":"https://doi.org/10.2139/ssrn.4057946","url":null,"abstract":"","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84662054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
CERVI: collaborative editing of raster and vector images CERVI:协同编辑光栅和矢量图像
Pub Date : 2022-06-23 DOI: 10.1007/s00371-022-02522-1
Ulrike Bath, Sumit Shekhar, Julian Egbert, Julian Schmidt, Amir Semmo, J. Döllner, Matthias Trapp
{"title":"CERVI: collaborative editing of raster and vector images","authors":"Ulrike Bath, Sumit Shekhar, Julian Egbert, Julian Schmidt, Amir Semmo, J. Döllner, Matthias Trapp","doi":"10.1007/s00371-022-02522-1","DOIUrl":"https://doi.org/10.1007/s00371-022-02522-1","url":null,"abstract":"","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88231839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Neural network adaption for depth sensor replication 深度传感器复制的神经网络自适应
Pub Date : 2022-06-23 DOI: 10.1007/s00371-022-02531-0
Christian Kunert, Tobias Schwandt, Christon-Ragavan Nadar, W. Broll
{"title":"Neural network adaption for depth sensor replication","authors":"Christian Kunert, Tobias Schwandt, Christon-Ragavan Nadar, W. Broll","doi":"10.1007/s00371-022-02531-0","DOIUrl":"https://doi.org/10.1007/s00371-022-02531-0","url":null,"abstract":"","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80649268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Physically-admissible polarimetric data augmentation for road-scene analysis 物理上允许的偏振数据增强道路场景分析
Pub Date : 2022-06-01 DOI: 10.48550/arXiv.2206.07431
Cyprien Ruffino, Rachel Blin, Samia Ainouz, G. Gasso, Romain H'erault, F. Mériaudeau, S. Canu
Polarimetric imaging, along with deep learning, has shown improved performances on different tasks including scene analysis. However, its robustness may be questioned because of the small size of the training datasets. Though the issue could be solved by data augmentation, polarization modalities are subject to physical feasibility constraints unaddressed by classical data augmentation techniques. To address this issue, we propose to use CycleGAN, an image translation technique based on deep generative models that solely relies on unpaired data, to transfer large labeled road scene datasets to the polarimetric domain. We design several auxiliary loss terms that, alongside the CycleGAN losses, deal with the physical constraints of polarimetric images. The efficiency of this solution is demonstrated on road scene object detection tasks where generated realistic polarimetric images allow to improve performances on cars and pedestrian detection up to 9%. The resulting constrained CycleGAN is publicly released, allowing anyone to generate their own polarimetric images.
偏振成像技术与深度学习技术在场景分析等不同任务上的表现都有所改善。然而,由于训练数据集的规模较小,其鲁棒性可能受到质疑。虽然这个问题可以通过数据增强来解决,但极化模式受到物理可行性的限制,这是经典数据增强技术无法解决的。为了解决这个问题,我们建议使用CycleGAN,这是一种基于深度生成模型的图像翻译技术,它只依赖于未配对的数据,将大型标记道路场景数据集转移到极化域。我们设计了几个辅助损耗项,与CycleGAN损耗一起处理偏振图像的物理约束。该解决方案的效率在道路场景物体检测任务中得到了证明,其中生成的逼真偏振图像可以将汽车和行人的检测性能提高9%。由此产生的受限CycleGAN公开发布,允许任何人生成自己的偏振图像。
{"title":"Physically-admissible polarimetric data augmentation for road-scene analysis","authors":"Cyprien Ruffino, Rachel Blin, Samia Ainouz, G. Gasso, Romain H'erault, F. Mériaudeau, S. Canu","doi":"10.48550/arXiv.2206.07431","DOIUrl":"https://doi.org/10.48550/arXiv.2206.07431","url":null,"abstract":"Polarimetric imaging, along with deep learning, has shown improved performances on different tasks including scene analysis. However, its robustness may be questioned because of the small size of the training datasets. Though the issue could be solved by data augmentation, polarization modalities are subject to physical feasibility constraints unaddressed by classical data augmentation techniques. To address this issue, we propose to use CycleGAN, an image translation technique based on deep generative models that solely relies on unpaired data, to transfer large labeled road scene datasets to the polarimetric domain. We design several auxiliary loss terms that, alongside the CycleGAN losses, deal with the physical constraints of polarimetric images. The efficiency of this solution is demonstrated on road scene object detection tasks where generated realistic polarimetric images allow to improve performances on cars and pedestrian detection up to 9%. The resulting constrained CycleGAN is publicly released, allowing anyone to generate their own polarimetric images.","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85407885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accurate and efficient salient object detection via position prior attention 通过位置优先注意准确有效地检测显著目标
Pub Date : 2022-06-01 DOI: 10.2139/ssrn.4081836
Jin Zhang, Qiuwei Liang, Yanjiao Shi
{"title":"Accurate and efficient salient object detection via position prior attention","authors":"Jin Zhang, Qiuwei Liang, Yanjiao Shi","doi":"10.2139/ssrn.4081836","DOIUrl":"https://doi.org/10.2139/ssrn.4081836","url":null,"abstract":"","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80701212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Comput. Vis. Image Underst.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1