首页 > 最新文献

ACM Multimedia Asia最新文献

英文 中文
Video Saliency Prediction via Deep Eye Movement Learning 基于深眼动学习的视频显著性预测
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490597
Jiazhong Chen, Jing Chen, Yuan Dong, Dakai Ren, Shiqi Zhang, Zongyi Li
Existing methods often utilize temporal motion information and spatial layout information in video to predict video saliency. However, the fixations are not always consistent with the moving object of interest, because human eye fixations are determined not only by the spatio-temporal information, but also by the velocity of eye movement. To address this issue, a new saliency prediction method via deep eye movement learning (EML) is proposed in this paper. Compared with previous methods that use human fixations as ground truth, our method uses the optical flow of fixations between successive frames as an extra ground truth for the purpose of eye movement learning. Experimental results on DHF1K, Hollywood2, and UCF-sports datasets show the proposed EML model achieves a promising result across a wide of metrics.
现有方法通常利用视频中的时间运动信息和空间布局信息来预测视频显著性。然而,注视并不总是与感兴趣的运动物体一致,因为人眼的注视不仅由时空信息决定,而且由眼球运动的速度决定。针对这一问题,本文提出了一种基于深眼动学习(EML)的显著性预测方法。与以往使用人类注视作为基础真值的方法相比,我们的方法使用连续帧之间的注视光流作为额外的基础真值,以实现眼动学习的目的。在DHF1K、Hollywood2和UCF-sports数据集上的实验结果表明,所提出的EML模型在广泛的指标上取得了很好的结果。
{"title":"Video Saliency Prediction via Deep Eye Movement Learning","authors":"Jiazhong Chen, Jing Chen, Yuan Dong, Dakai Ren, Shiqi Zhang, Zongyi Li","doi":"10.1145/3469877.3490597","DOIUrl":"https://doi.org/10.1145/3469877.3490597","url":null,"abstract":"Existing methods often utilize temporal motion information and spatial layout information in video to predict video saliency. However, the fixations are not always consistent with the moving object of interest, because human eye fixations are determined not only by the spatio-temporal information, but also by the velocity of eye movement. To address this issue, a new saliency prediction method via deep eye movement learning (EML) is proposed in this paper. Compared with previous methods that use human fixations as ground truth, our method uses the optical flow of fixations between successive frames as an extra ground truth for the purpose of eye movement learning. Experimental results on DHF1K, Hollywood2, and UCF-sports datasets show the proposed EML model achieves a promising result across a wide of metrics.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131425318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Improving Hyperspectral Super-Resolution via Heterogeneous Knowledge Distillation 利用异构知识蒸馏提高高光谱超分辨率
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490610
Ziqian Liu, Qing Ma, Junjun Jiang, Xianming Liu
Hyperspectral images (HSI) contains rich spectrum information but their spatial resolution is often limited by imaging system. Super-resolution (SR) reconstruction becomes a hot topic aiming to increase spatial resolution without extra hardware cost. The fusion-based hyperspectral image super-resolution (FHSR) methods use supplementary high-resolution multispectral images (HR-MSI) to recover spatial details, but well co-registered HR-MSI is hard to collect. Recently, single hyperspectral image super-resolution (SHSR) methods based on deep learning have made great progress. However, lack of HR-MSI input makes these SHSR methods difficult to exploit the spatial information. To take advantages of FHSR and SHSR methods, in this paper we propose a new pipeline treating HR-MSI as privilege information and try to improve our SHSR model with knowledge distillation. That is, our model uses paired MSI-HSI data to train and only needs LR-HSI as input during inference. Specifically, we combine SHSR and spectral super-resolution (SSR) and design a novel architecture, Distillation-Oriented Dual-branch Net (DODN), to make the SHSR model fully employ transferred knowledge from the SSR model. Since the main stream of SSR model are 2D CNNs and full 2D CNN causes spectral disorder in SHSR task, a new mixed 2D/3D block, called Distillation-Oriented Dual-branch Block (DODB) is proposed, where the 3D branch extracts spectral-spatial correlation while the 2D branch accepts information from the SSR model through knowledge distillation. The main idea is to distill the knowledge of spatial information from HR-MSI to the SHSR model without changing its network architecture. Extensive experiments on two benchmark datasets, CAVE and NTIRE2020, demonstrate that our proposed DODN outperforms the state-of-the-art SHSR methods, in terms of both quantitative and qualitative analysis.
高光谱图像包含丰富的光谱信息,但其空间分辨率往往受到成像系统的限制。为了在不增加硬件成本的前提下提高空间分辨率,超分辨率重建成为当前研究的热点。基于融合的高光谱图像超分辨率(FHSR)方法使用补充的高分辨率多光谱图像(HR-MSI)来恢复空间细节,但很难收集到高分辨率多光谱图像。近年来,基于深度学习的单幅高光谱图像超分辨率(SHSR)方法取得了很大进展。然而,由于缺乏HR-MSI输入,使得这些SHSR方法难以利用空间信息。为了利用FHSR和SHSR方法的优点,本文提出了一种将HR-MSI作为特权信息的管道,并尝试用知识精馏来改进我们的SHSR模型。也就是说,我们的模型使用配对的MSI-HSI数据进行训练,在推理过程中只需要LR-HSI作为输入。具体而言,我们将光谱超分辨(SHSR)技术与光谱超分辨(SSR)技术相结合,设计了一种新的体系结构——面向蒸馏的双分支网络(DODN),使光谱超分辨模型充分利用了SSR模型的迁移知识。由于SSR模型的主流是二维CNN,而全二维CNN在SHSR任务中会造成频谱混乱,因此提出了一种新的二维/三维混合块,称为面向蒸馏的双分支块(DODB),其中三维分支提取光谱空间相关性,二维分支通过知识蒸馏接受SSR模型的信息。其主要思想是在不改变其网络架构的情况下,将HR-MSI的空间信息知识提取到SHSR模型中。在CAVE和nitre2020两个基准数据集上进行的大量实验表明,我们提出的DODN在定量和定性分析方面都优于最先进的SHSR方法。
{"title":"Improving Hyperspectral Super-Resolution via Heterogeneous Knowledge Distillation","authors":"Ziqian Liu, Qing Ma, Junjun Jiang, Xianming Liu","doi":"10.1145/3469877.3490610","DOIUrl":"https://doi.org/10.1145/3469877.3490610","url":null,"abstract":"Hyperspectral images (HSI) contains rich spectrum information but their spatial resolution is often limited by imaging system. Super-resolution (SR) reconstruction becomes a hot topic aiming to increase spatial resolution without extra hardware cost. The fusion-based hyperspectral image super-resolution (FHSR) methods use supplementary high-resolution multispectral images (HR-MSI) to recover spatial details, but well co-registered HR-MSI is hard to collect. Recently, single hyperspectral image super-resolution (SHSR) methods based on deep learning have made great progress. However, lack of HR-MSI input makes these SHSR methods difficult to exploit the spatial information. To take advantages of FHSR and SHSR methods, in this paper we propose a new pipeline treating HR-MSI as privilege information and try to improve our SHSR model with knowledge distillation. That is, our model uses paired MSI-HSI data to train and only needs LR-HSI as input during inference. Specifically, we combine SHSR and spectral super-resolution (SSR) and design a novel architecture, Distillation-Oriented Dual-branch Net (DODN), to make the SHSR model fully employ transferred knowledge from the SSR model. Since the main stream of SSR model are 2D CNNs and full 2D CNN causes spectral disorder in SHSR task, a new mixed 2D/3D block, called Distillation-Oriented Dual-branch Block (DODB) is proposed, where the 3D branch extracts spectral-spatial correlation while the 2D branch accepts information from the SSR model through knowledge distillation. The main idea is to distill the knowledge of spatial information from HR-MSI to the SHSR model without changing its network architecture. Extensive experiments on two benchmark datasets, CAVE and NTIRE2020, demonstrate that our proposed DODN outperforms the state-of-the-art SHSR methods, in terms of both quantitative and qualitative analysis.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132535736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Motion = Video - Content: Towards Unsupervised Learning of Motion Representation from Videos 运动=视频-内容:从视频中实现运动表示的无监督学习
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490582
Hehe Fan, Mohan S. Kankanhalli
Motion, according to its definition in physics, is the change in position with respect to time, regardless of the specific moving object and background. In this paper, we aim to learn appearance-independent motion representation in an unsupervised manner. The main idea is to separate motion from videos while leaving objects and background as content. Specifically, we design an encoder-decoder model which consists of a content encoder, a motion encoder and a video generator. To train the model, we leverage a one-step cycle-consistency in reconstruction within the same video and a two-step cycle-consistency in generation across different videos as self-supervised signals, and use adversarial training to remove the content representation from the motion representation. We demonstrate that the proposed framework can be used for conditional video generation and fine-grained action recognition.
根据物理学的定义,运动是位置相对于时间的变化,与特定的运动物体和背景无关。在本文中,我们的目标是以一种无监督的方式学习与外观无关的运动表示。其主要思想是将动作从视频中分离出来,同时将对象和背景作为内容。具体来说,我们设计了一个由内容编码器、运动编码器和视频生成器组成的编码器-解码器模型。为了训练模型,我们利用同一视频内重建的一步循环一致性和跨不同视频生成的两步循环一致性作为自监督信号,并使用对抗性训练从运动表示中去除内容表示。我们证明了该框架可以用于条件视频生成和细粒度动作识别。
{"title":"Motion = Video - Content: Towards Unsupervised Learning of Motion Representation from Videos","authors":"Hehe Fan, Mohan S. Kankanhalli","doi":"10.1145/3469877.3490582","DOIUrl":"https://doi.org/10.1145/3469877.3490582","url":null,"abstract":"Motion, according to its definition in physics, is the change in position with respect to time, regardless of the specific moving object and background. In this paper, we aim to learn appearance-independent motion representation in an unsupervised manner. The main idea is to separate motion from videos while leaving objects and background as content. Specifically, we design an encoder-decoder model which consists of a content encoder, a motion encoder and a video generator. To train the model, we leverage a one-step cycle-consistency in reconstruction within the same video and a two-step cycle-consistency in generation across different videos as self-supervised signals, and use adversarial training to remove the content representation from the motion representation. We demonstrate that the proposed framework can be used for conditional video generation and fine-grained action recognition.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133700966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Hierarchical Graph Representation Learning with Local Capsule Pooling 基于局部胶囊池的分层图表示学习
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3495645
Zidong Su, Zehui Hu, Yangding Li
Hierarchical graph pooling has shown great potential for capturing high-quality graph representations through the node cluster selection mechanism. However, the current node cluster selection methods have inadequate clustering issues, and their scoring methods rely too much on the node representation, resulting in excessive graph structure information loss during pooling. In this paper, a local capsule pooling network (LCPN) is proposed to alleviate the above issues. Specifically, (i) a local capsule pooling (LCP) is proposed to alleviate the issue of insufficient clustering; (ii) a task-aware readout (TAR) mechanism is proposed to obtain a more expressive graph representation; (iii) a pooling information loss (PIL) term is proposed to further alleviate the information loss caused by pooling during training. Experimental results on the graph classification task, the graph reconstruction task, and the pooled graph adjacency visualization task show the superior performance of the proposed LCPN and demonstrate its effectiveness and efficiency.
通过节点集群选择机制,分层图池显示出捕获高质量图表示的巨大潜力。然而,目前的节点聚类选择方法存在聚类不足的问题,并且其评分方法过于依赖节点表示,导致池化过程中过多的图结构信息丢失。本文提出了一种局部胶囊池网络(LCPN)来解决上述问题。具体而言,(i)提出了一种局部胶囊池(LCP)来缓解聚类不足的问题;(ii)提出任务感知读出(TAR)机制,以获得更具表现力的图形表示;(iii)提出池化信息损失(PIL)术语,进一步缓解训练过程中池化造成的信息损失。在图分类任务、图重构任务和池图邻接可视化任务上的实验结果表明,所提LCPN具有优越的性能,证明了其有效性和高效性。
{"title":"Hierarchical Graph Representation Learning with Local Capsule Pooling","authors":"Zidong Su, Zehui Hu, Yangding Li","doi":"10.1145/3469877.3495645","DOIUrl":"https://doi.org/10.1145/3469877.3495645","url":null,"abstract":"Hierarchical graph pooling has shown great potential for capturing high-quality graph representations through the node cluster selection mechanism. However, the current node cluster selection methods have inadequate clustering issues, and their scoring methods rely too much on the node representation, resulting in excessive graph structure information loss during pooling. In this paper, a local capsule pooling network (LCPN) is proposed to alleviate the above issues. Specifically, (i) a local capsule pooling (LCP) is proposed to alleviate the issue of insufficient clustering; (ii) a task-aware readout (TAR) mechanism is proposed to obtain a more expressive graph representation; (iii) a pooling information loss (PIL) term is proposed to further alleviate the information loss caused by pooling during training. Experimental results on the graph classification task, the graph reconstruction task, and the pooled graph adjacency visualization task show the superior performance of the proposed LCPN and demonstrate its effectiveness and efficiency.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131041741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A comparison study: the impact of age and gender distribution on age estimation 比较研究:年龄和性别分布对年龄估计的影响
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490576
Chang Kong, Qiuming Luo, Guoliang Chen
Age estimation from a single facial image is a challenging and attractive research area in the computer vision community. Several facial datasets annotated with age and gender attributes became available in the literature. However, one major drawback is that these datasets do not consider the label distribution during data collection. Therefore, the models training on these datasets inevitably have bias for the age having least number of images. In this work, we analyze the age and gender distribution of previous datasets and publish an Uniform Age and Gender Dataset (UAGD) which has almost equal number of female and male images in each age. In addition, we investigate the impact of age and gender distribution on age estimation by comparing DEX CNN model trained on several different datasets. Our experiments show that UAGD dataset has good performance for age estimation task and also it is suitable for being an evaluation benchmark.
从单个面部图像中估计年龄是计算机视觉界一个具有挑战性和吸引力的研究领域。一些带有年龄和性别属性注释的面部数据集在文献中可用。然而,一个主要的缺点是这些数据集在数据收集过程中没有考虑标签分布。因此,在这些数据集上训练的模型不可避免地会对图像数量最少的年龄产生偏差。在这项工作中,我们分析了以前数据集的年龄和性别分布,并发布了一个统一年龄和性别数据集(UAGD),该数据集在每个年龄中具有几乎相同数量的女性和男性图像。此外,我们通过比较在几个不同数据集上训练的DEX CNN模型,研究了年龄和性别分布对年龄估计的影响。我们的实验表明,UAGD数据集在年龄估计任务中具有良好的性能,并且适合作为评估基准。
{"title":"A comparison study: the impact of age and gender distribution on age estimation","authors":"Chang Kong, Qiuming Luo, Guoliang Chen","doi":"10.1145/3469877.3490576","DOIUrl":"https://doi.org/10.1145/3469877.3490576","url":null,"abstract":"Age estimation from a single facial image is a challenging and attractive research area in the computer vision community. Several facial datasets annotated with age and gender attributes became available in the literature. However, one major drawback is that these datasets do not consider the label distribution during data collection. Therefore, the models training on these datasets inevitably have bias for the age having least number of images. In this work, we analyze the age and gender distribution of previous datasets and publish an Uniform Age and Gender Dataset (UAGD) which has almost equal number of female and male images in each age. In addition, we investigate the impact of age and gender distribution on age estimation by comparing DEX CNN model trained on several different datasets. Our experiments show that UAGD dataset has good performance for age estimation task and also it is suitable for being an evaluation benchmark.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132652394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hierarchical Composition Learning for Composed Query Image Retrieval 面向组合查询图像检索的分层组合学习
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490601
Yahui Xu, Yi Bin, Guoqing Wang, Yang Yang
Composed query image retrieval is a growing research topic. The object is to retrieve images not only generally resemble the reference image, but differ according to the desired modification text. Existing methods mainly explore composing modification text with global feature or local entity descriptor of reference image. However, they ignore the fact that modification text is indeed diverse and arbitrary. It not only relates to abstractive global feature or concrete local entity transformation, but also often associates with the fine-grained structured visual adjustment. Thus, it is insufficient to emphasize the global or local entity visual for the query composition. In this work, we tackle this task by hierarchical composition learning. Specifically, the proposed method first encodes images into three representations consisting of global, entity and structure level representations. Structure level representation is richly explicable, which explicitly describes entities as well as attributes and relationships in the image with a directed graph. Based on these, we naturally perform hierarchical composition learning by fusing modification text and reference image in the global-entity-structure manner. It can transform the visual feature conditioned on modification text to target image in a coarse-to-fine manner, which takes advantage of the complementary information among three levels. Moreover, we introduce a hybrid space matching to explore global, entity and structure alignments which can get high performance and good interpretability.
组合查询图像检索是一个新兴的研究课题。目标是检索图像不仅一般类似于参考图像,而且根据所需的修改文本有所不同。现有方法主要探索利用参考图像的全局特征或局部实体描述符构成修改文本。然而,他们忽略了一个事实,即修改文本确实是多样和任意的。它不仅涉及抽象的全局特征或具体的局部实体转换,而且往往涉及细粒度的结构化视觉调整。因此,强调查询组合的全局或局部实体可视化是不够的。在这项工作中,我们通过分层作文学习来解决这个问题。具体而言,该方法首先将图像编码为全局级、实体级和结构级三种表示。结构级表示具有丰富的可解释性,它用有向图显式地描述图像中的实体以及属性和关系。在此基础上,我们自然地以全局实体-结构的方式融合修改文本和参考图像进行分层组成学习。它利用三层间的互补信息,将以修改文本为条件的视觉特征以粗到精的方式转化为目标图像。此外,我们还引入了一种混合空间匹配方法来探索全局、实体和结构对齐,以获得高性能和良好的可解释性。
{"title":"Hierarchical Composition Learning for Composed Query Image Retrieval","authors":"Yahui Xu, Yi Bin, Guoqing Wang, Yang Yang","doi":"10.1145/3469877.3490601","DOIUrl":"https://doi.org/10.1145/3469877.3490601","url":null,"abstract":"Composed query image retrieval is a growing research topic. The object is to retrieve images not only generally resemble the reference image, but differ according to the desired modification text. Existing methods mainly explore composing modification text with global feature or local entity descriptor of reference image. However, they ignore the fact that modification text is indeed diverse and arbitrary. It not only relates to abstractive global feature or concrete local entity transformation, but also often associates with the fine-grained structured visual adjustment. Thus, it is insufficient to emphasize the global or local entity visual for the query composition. In this work, we tackle this task by hierarchical composition learning. Specifically, the proposed method first encodes images into three representations consisting of global, entity and structure level representations. Structure level representation is richly explicable, which explicitly describes entities as well as attributes and relationships in the image with a directed graph. Based on these, we naturally perform hierarchical composition learning by fusing modification text and reference image in the global-entity-structure manner. It can transform the visual feature conditioned on modification text to target image in a coarse-to-fine manner, which takes advantage of the complementary information among three levels. Moreover, we introduce a hybrid space matching to explore global, entity and structure alignments which can get high performance and good interpretability.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134281777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Joint label refinement and contrastive learning with hybrid memory for Unsupervised Marine Object Re-Identification 联合标签细化和混合记忆对比学习用于无监督海洋物体再识别
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3497695
Xiaorui Han, Zhiqi Chen, Ruixue Wang, Pengfei Zhao
Unsupervised object re-identification is a challenging task due to the missing of labels for the dataset. Many unsupervised object re-identification approaches combine clustering-based pseudo-label prediction with feature fine-tuning. These methods have achieved great success in the field of unsupervised object Re-ID. However, the inevitable label noise caused by the clustering procedure was ignored. Such noisy pseudo labels substantially hinder the model’s capability on further improving feature representations. To this end, we propose a novel joint label refinement and contrastive learning framework with hybrid memory to alleviate this problem. Firstly, in order to reduce the noise of clustering pseudo labels, we propose a novel noise refinement strategy. This strategy refines pseudo labels at clustering phase and promotes clustering quality by boosting the label purity. In addition, we propose a hybrid memory bank. The hybrid memory dynamically generates prototype-level and un-clustered instance-level supervisory signals for learning feature representations. With all prototype-level and un-clustered instance-level supervisions, re-identification model is trained progressively. Our proposed unsupervised object Re-ID framework significantly reduces the influence of noisy labels and refines the learned features. Our method consistently achieves state-of-the-art performance on benchmark datasets.
由于数据集缺少标签,无监督对象重新识别是一项具有挑战性的任务。许多无监督目标再识别方法将基于聚类的伪标签预测与特征微调相结合。这些方法在无监督对象Re-ID领域取得了很大的成功。然而,聚类过程中不可避免的标签噪声被忽略了。这种有噪声的伪标签实质上阻碍了模型进一步改进特征表示的能力。为此,我们提出了一种新的带有混合记忆的联合标签细化和对比学习框架来缓解这一问题。首先,为了降低聚类伪标签的噪声,提出了一种新的噪声细化策略。该策略在聚类阶段细化伪标签,并通过提高标签纯度来提高聚类质量。此外,我们还提出了一种混合记忆库。混合存储器动态生成原型级和非聚类实例级监视信号,用于学习特征表示。在原型级和非聚类实例级监督下,逐步训练再识别模型。我们提出的无监督对象Re-ID框架显著降低了噪声标签的影响,并改进了学习到的特征。我们的方法始终在基准数据集上实现最先进的性能。
{"title":"Joint label refinement and contrastive learning with hybrid memory for Unsupervised Marine Object Re-Identification","authors":"Xiaorui Han, Zhiqi Chen, Ruixue Wang, Pengfei Zhao","doi":"10.1145/3469877.3497695","DOIUrl":"https://doi.org/10.1145/3469877.3497695","url":null,"abstract":"Unsupervised object re-identification is a challenging task due to the missing of labels for the dataset. Many unsupervised object re-identification approaches combine clustering-based pseudo-label prediction with feature fine-tuning. These methods have achieved great success in the field of unsupervised object Re-ID. However, the inevitable label noise caused by the clustering procedure was ignored. Such noisy pseudo labels substantially hinder the model’s capability on further improving feature representations. To this end, we propose a novel joint label refinement and contrastive learning framework with hybrid memory to alleviate this problem. Firstly, in order to reduce the noise of clustering pseudo labels, we propose a novel noise refinement strategy. This strategy refines pseudo labels at clustering phase and promotes clustering quality by boosting the label purity. In addition, we propose a hybrid memory bank. The hybrid memory dynamically generates prototype-level and un-clustered instance-level supervisory signals for learning feature representations. With all prototype-level and un-clustered instance-level supervisions, re-identification model is trained progressively. Our proposed unsupervised object Re-ID framework significantly reduces the influence of noisy labels and refines the learned features. Our method consistently achieves state-of-the-art performance on benchmark datasets.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131340551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient Proposal Generation with U-shaped Network for Temporal Sentence Grounding 基于u型网络的时间句基础高效建议生成
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490606
Ludan Ruan, Qin Jin
Temporal Sentence Grounding aims to localize the relevant temporal region in a given video according to the query sentence. It is a challenging task due to the semantic gap between different modalities and diversity of the event duration. Proposal generation plays an important role in previous mainstream methods. However, previous proposal generation methods apply the same feature extraction without considering the diversity of event duration. In this paper, we propose a novel temporal sentence grounding model with an U-shaped Network for efficient proposal generation (UN-TSG), which utilizes U-shaped structure to encode proposals of different lengths hierarchically. Experiments on two benchmark datasets demonstrate that with more efficient proposal generation method, our model can achieve the state-of-the-art grounding performance in faster speed and with less computation cost.
时间句基础的目的是根据查询句在给定视频中定位相关的时间区域。由于不同模态之间的语义差异和事件持续时间的多样性,这是一项具有挑战性的任务。在以往的主流方法中,提议生成占有重要地位。然而,以往的建议生成方法采用相同的特征提取,而没有考虑事件持续时间的多样性。本文提出了一种基于u形网络的时间句基础模型,该模型利用u形结构对不同长度的建议进行分层编码。在两个基准数据集上的实验表明,采用更高效的提议生成方法,我们的模型可以以更快的速度和更少的计算成本获得最先进的接地性能。
{"title":"Efficient Proposal Generation with U-shaped Network for Temporal Sentence Grounding","authors":"Ludan Ruan, Qin Jin","doi":"10.1145/3469877.3490606","DOIUrl":"https://doi.org/10.1145/3469877.3490606","url":null,"abstract":"Temporal Sentence Grounding aims to localize the relevant temporal region in a given video according to the query sentence. It is a challenging task due to the semantic gap between different modalities and diversity of the event duration. Proposal generation plays an important role in previous mainstream methods. However, previous proposal generation methods apply the same feature extraction without considering the diversity of event duration. In this paper, we propose a novel temporal sentence grounding model with an U-shaped Network for efficient proposal generation (UN-TSG), which utilizes U-shaped structure to encode proposals of different lengths hierarchically. Experiments on two benchmark datasets demonstrate that with more efficient proposal generation method, our model can achieve the state-of-the-art grounding performance in faster speed and with less computation cost.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127109081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visible-Infrared Cross-Modal Person Re-identification based on Positive Feedback 基于正反馈的可见-红外跨模态人再识别
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3497693
Lingyi Lu, Xin Xu
Visible-infrared person re-identification (VI-ReID) is undoubtedly a challenging cross-modality person retrieval task with increasing appreciation. Compared to traditional person ReID that focuses on person images in a single RGB mode, VI-ReID suffers from additional cross-modality discrepancy due to the different imaging processes of spectrum cameras. Several effective attempts have been made in recent years to narrow cross-modality gap aiming to improve the re-identification performance, but rarely study the key problem of optimizing the search results combined with relevant feedback. In this paper, we present the idea of cross-modality visible-infrared person re-identification combined with human positive feedback. This method allows the user to quickly optimize the search performance by selecting strong positive samples during the re-identification process. We have validated the effectiveness of our method on a public dataset, SYSU-MM01, and results confirmed that the proposed method achieved superior performance compared to the current state-of-the-art methods.
可见红外人体再识别(VI-ReID)无疑是一项具有挑战性的跨模态人体检索任务。与传统的专注于单一RGB模式下的人物图像的人物ReID相比,由于光谱相机的成像过程不同,VI-ReID存在额外的跨模态差异。近年来,为了提高再识别性能,缩小跨模态差距进行了一些有效的尝试,但很少研究结合相关反馈优化搜索结果的关键问题。本文提出了结合人体正反馈的跨模态可见-红外人体再识别的思想。该方法允许用户通过在重新鉴定过程中选择强阳性样本来快速优化搜索性能。我们已经在一个公共数据集SYSU-MM01上验证了我们的方法的有效性,结果证实,与目前最先进的方法相比,我们提出的方法取得了更好的性能。
{"title":"Visible-Infrared Cross-Modal Person Re-identification based on Positive Feedback","authors":"Lingyi Lu, Xin Xu","doi":"10.1145/3469877.3497693","DOIUrl":"https://doi.org/10.1145/3469877.3497693","url":null,"abstract":"Visible-infrared person re-identification (VI-ReID) is undoubtedly a challenging cross-modality person retrieval task with increasing appreciation. Compared to traditional person ReID that focuses on person images in a single RGB mode, VI-ReID suffers from additional cross-modality discrepancy due to the different imaging processes of spectrum cameras. Several effective attempts have been made in recent years to narrow cross-modality gap aiming to improve the re-identification performance, but rarely study the key problem of optimizing the search results combined with relevant feedback. In this paper, we present the idea of cross-modality visible-infrared person re-identification combined with human positive feedback. This method allows the user to quickly optimize the search performance by selecting strong positive samples during the re-identification process. We have validated the effectiveness of our method on a public dataset, SYSU-MM01, and results confirmed that the proposed method achieved superior performance compared to the current state-of-the-art methods.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117329453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
BRUSH: Label Reconstructing and Similarity Preserving Hashing for Cross-modal Retrieval 跨模态检索的标签重构和相似性保持哈希
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490589
P. Zhang, Pengfei Zhao, Xin Luo, Xin-Shun Xu
The hashing technique has recently sparked much attention in information retrieval community due to its high efficiency in terms of storage and query processing. For cross-modal retrieval tasks, existing supervised hashing models either treat the semantic labels as the ground truth and formalize the problem to a classification task, or further add a similarity matrix as supervisory signals to pursue hash codes of high quality to represent coupled data. However, these approaches are incapable of ensuring that the learnt binary codes preserve well the semantics and similarity relationships contained in the supervised information. Moreover, for sophisticated discrete optimization problems, it is always addressed by continuous relaxation or bit-wise solver, which leads to a large quantization error and inefficient computation. To relieve these issues, in this paper, we present a two-step supervised discrete hashing method, i.e., laBel ReconstrUcting and Similarity preserving Hashing (BRUSH). We formulate it as an asymmetric pairwise similarity-preserving problem by using two latent semantic embeddings deducted from decomposing semantics and reconstructing semantics, respectively. Meanwhile, the unified binary codes are jointly generated based on both embeddings with the affinity guarantee, such that the discriminative property of the obtained hash codes can be significantly enhanced alongside preserving semantics well. In addition, by adopting two-step hash learning strategy, our method simplifies the procedure of the hashing function and binary codes learning, thus improving the flexibility and efficiency. The resulting discrete optimization problem is also elegantly solved by the proposed alternating algorithm without any relaxation. Extensive experiments on benchmarks demonstrate that BRUSH outperforms the state-of-the-art methods, in terms of efficiency and effectiveness.
近年来,哈希技术以其在存储和查询处理方面的高效性受到了信息检索界的广泛关注。对于跨模态检索任务,现有的监督哈希模型要么将语义标签作为基本事实,将问题形式化为分类任务,要么进一步添加相似矩阵作为监督信号,追求高质量的哈希码来表示耦合数据。然而,这些方法不能保证学习到的二进制码很好地保留了监督信息中包含的语义和相似关系。此外,对于复杂的离散优化问题,通常采用连续松弛法或逐位求解法求解,导致量化误差大,计算效率低。为了解决这些问题,本文提出了一种两步监督离散哈希方法,即标签重构和相似性保持哈希(BRUSH)。我们利用从语义分解和语义重构中分别推导出的两个潜在语义嵌入,将其表述为一个非对称的两两相似保持问题。同时,基于两种嵌入方式联合生成具有亲和力保证的统一二进制码,使得得到的哈希码在保持语义的同时显著增强了判别性。此外,通过采用两步哈希学习策略,我们的方法简化了哈希函数和二进制码学习的过程,从而提高了灵活性和效率。所提出的交替算法也能很好地解决离散优化问题。在基准测试上的大量实验表明,在效率和有效性方面,BRUSH优于最先进的方法。
{"title":"BRUSH: Label Reconstructing and Similarity Preserving Hashing for Cross-modal Retrieval","authors":"P. Zhang, Pengfei Zhao, Xin Luo, Xin-Shun Xu","doi":"10.1145/3469877.3490589","DOIUrl":"https://doi.org/10.1145/3469877.3490589","url":null,"abstract":"The hashing technique has recently sparked much attention in information retrieval community due to its high efficiency in terms of storage and query processing. For cross-modal retrieval tasks, existing supervised hashing models either treat the semantic labels as the ground truth and formalize the problem to a classification task, or further add a similarity matrix as supervisory signals to pursue hash codes of high quality to represent coupled data. However, these approaches are incapable of ensuring that the learnt binary codes preserve well the semantics and similarity relationships contained in the supervised information. Moreover, for sophisticated discrete optimization problems, it is always addressed by continuous relaxation or bit-wise solver, which leads to a large quantization error and inefficient computation. To relieve these issues, in this paper, we present a two-step supervised discrete hashing method, i.e., laBel ReconstrUcting and Similarity preserving Hashing (BRUSH). We formulate it as an asymmetric pairwise similarity-preserving problem by using two latent semantic embeddings deducted from decomposing semantics and reconstructing semantics, respectively. Meanwhile, the unified binary codes are jointly generated based on both embeddings with the affinity guarantee, such that the discriminative property of the obtained hash codes can be significantly enhanced alongside preserving semantics well. In addition, by adopting two-step hash learning strategy, our method simplifies the procedure of the hashing function and binary codes learning, thus improving the flexibility and efficiency. The resulting discrete optimization problem is also elegantly solved by the proposed alternating algorithm without any relaxation. Extensive experiments on benchmarks demonstrate that BRUSH outperforms the state-of-the-art methods, in terms of efficiency and effectiveness.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121150961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Multimedia Asia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1