首页 > 最新文献

ACM Multimedia Asia最新文献

英文 中文
CFCR: A Convolution and Fusion Model for Cross-platform Recommendation 跨平台推荐的卷积和融合模型
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3495639
Shengze Yu, Xin Wang, Wenwu Zhu
With the emergence of various online platforms, associating different platforms is playing an increasingly important role in many applications. Cross-platform recommendation aims to improve recommendation accuracy through associating information from different platforms. Existing methods do not fully exploit high-order nonlinear connectivity information in cross-domain recommendation scenario and suffer from domain-incompatibility problem. In this paper, we propose an end-to-end convolution and fusion model for cross-platform recommendation (CFCR). The proposed CFCR model utilizes Graph Convolution Networks (GCN) to extract user and item features on graphs from different platforms, and fuses cross-platform information by Multimodal AutoEncoder (MAE) with common latent user features. Therefore, the high-order connectivity information is preserved to the most extent and domain-invariant user representations are automatically obtained. The domain-incompatible information is spontaneously discarded to avoid messing up the cross-platform association. Extensive experiments for the proposed CFCR model on real-world dataset demonstrate its advantages over existing cross-platform recommendation methods in terms of various evaluation metrics.
随着各种在线平台的出现,不同平台的关联在许多应用中发挥着越来越重要的作用。跨平台推荐的目的是通过关联不同平台的信息来提高推荐的准确性。现有方法在跨域推荐场景中没有充分利用高阶非线性连接信息,存在域不兼容问题。本文提出了一种跨平台推荐的端到端卷积融合模型。该模型利用图卷积网络(GCN)提取不同平台图上的用户和项目特征,并利用多模态自动编码器(MAE)与常见的潜在用户特征融合跨平台信息。因此,最大程度地保留了高阶连接信息,并自动获得了域不变的用户表示。领域不兼容的信息被自动丢弃,以避免混淆跨平台关联。本文提出的CFCR模型在真实数据集上的大量实验表明,它在各种评估指标方面优于现有的跨平台推荐方法。
{"title":"CFCR: A Convolution and Fusion Model for Cross-platform Recommendation","authors":"Shengze Yu, Xin Wang, Wenwu Zhu","doi":"10.1145/3469877.3495639","DOIUrl":"https://doi.org/10.1145/3469877.3495639","url":null,"abstract":"With the emergence of various online platforms, associating different platforms is playing an increasingly important role in many applications. Cross-platform recommendation aims to improve recommendation accuracy through associating information from different platforms. Existing methods do not fully exploit high-order nonlinear connectivity information in cross-domain recommendation scenario and suffer from domain-incompatibility problem. In this paper, we propose an end-to-end convolution and fusion model for cross-platform recommendation (CFCR). The proposed CFCR model utilizes Graph Convolution Networks (GCN) to extract user and item features on graphs from different platforms, and fuses cross-platform information by Multimodal AutoEncoder (MAE) with common latent user features. Therefore, the high-order connectivity information is preserved to the most extent and domain-invariant user representations are automatically obtained. The domain-incompatible information is spontaneously discarded to avoid messing up the cross-platform association. Extensive experiments for the proposed CFCR model on real-world dataset demonstrate its advantages over existing cross-platform recommendation methods in terms of various evaluation metrics.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125183782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Coarse-to-fine Approach for Fast Super-Resolution with Flexible Magnification 一种灵活放大的快速超分辨率从粗到精的方法
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490564
Zhichao Fu, Tianlong Ma, Liang Xue, Yingbin Zheng, Hao Ye, Liang He
We perform fast single image super-resolution with flexible magnification for natural images. A novel coarse-to-fine super-resolution framework is developed for the magnification that is factorized into a maximum integer component and the quotient. Specifically, our framework is embedded with a light-weight upscale network for super-resolution with the integer scale factor, followed by the fine-grained network to guide interpolation on feature maps as well as to generate the super-resolved image. Compared with the previous flexible magnification super-resolution approaches, the proposed framework achieves a tradeoff between computational complexity and performance. We conduct experiments using the coarse-to-fine framework on the standard benchmarks and demonstrate its superiority in terms of effectiveness and efficiency over previous approaches.
我们对自然图像进行快速的单图像超分辨率和灵活的放大。提出了一种新的粗到精的超分辨率框架,将放大倍数分解为最大整数分量和商。具体来说,我们的框架嵌入了一个轻量级的高端网络,用于整数比例因子的超分辨率,然后是细粒度网络,用于指导特征图的插值,并生成超分辨率图像。与以往的柔性放大超分辨率方法相比,该框架实现了计算复杂度和性能之间的平衡。我们在标准基准上使用从粗到精的框架进行了实验,并证明了其在有效性和效率方面优于以前的方法。
{"title":"A Coarse-to-fine Approach for Fast Super-Resolution with Flexible Magnification","authors":"Zhichao Fu, Tianlong Ma, Liang Xue, Yingbin Zheng, Hao Ye, Liang He","doi":"10.1145/3469877.3490564","DOIUrl":"https://doi.org/10.1145/3469877.3490564","url":null,"abstract":"We perform fast single image super-resolution with flexible magnification for natural images. A novel coarse-to-fine super-resolution framework is developed for the magnification that is factorized into a maximum integer component and the quotient. Specifically, our framework is embedded with a light-weight upscale network for super-resolution with the integer scale factor, followed by the fine-grained network to guide interpolation on feature maps as well as to generate the super-resolved image. Compared with the previous flexible magnification super-resolution approaches, the proposed framework achieves a tradeoff between computational complexity and performance. We conduct experiments using the coarse-to-fine framework on the standard benchmarks and demonstrate its superiority in terms of effectiveness and efficiency over previous approaches.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126495930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video Saliency Prediction via Deep Eye Movement Learning 基于深眼动学习的视频显著性预测
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490597
Jiazhong Chen, Jing Chen, Yuan Dong, Dakai Ren, Shiqi Zhang, Zongyi Li
Existing methods often utilize temporal motion information and spatial layout information in video to predict video saliency. However, the fixations are not always consistent with the moving object of interest, because human eye fixations are determined not only by the spatio-temporal information, but also by the velocity of eye movement. To address this issue, a new saliency prediction method via deep eye movement learning (EML) is proposed in this paper. Compared with previous methods that use human fixations as ground truth, our method uses the optical flow of fixations between successive frames as an extra ground truth for the purpose of eye movement learning. Experimental results on DHF1K, Hollywood2, and UCF-sports datasets show the proposed EML model achieves a promising result across a wide of metrics.
现有方法通常利用视频中的时间运动信息和空间布局信息来预测视频显著性。然而,注视并不总是与感兴趣的运动物体一致,因为人眼的注视不仅由时空信息决定,而且由眼球运动的速度决定。针对这一问题,本文提出了一种基于深眼动学习(EML)的显著性预测方法。与以往使用人类注视作为基础真值的方法相比,我们的方法使用连续帧之间的注视光流作为额外的基础真值,以实现眼动学习的目的。在DHF1K、Hollywood2和UCF-sports数据集上的实验结果表明,所提出的EML模型在广泛的指标上取得了很好的结果。
{"title":"Video Saliency Prediction via Deep Eye Movement Learning","authors":"Jiazhong Chen, Jing Chen, Yuan Dong, Dakai Ren, Shiqi Zhang, Zongyi Li","doi":"10.1145/3469877.3490597","DOIUrl":"https://doi.org/10.1145/3469877.3490597","url":null,"abstract":"Existing methods often utilize temporal motion information and spatial layout information in video to predict video saliency. However, the fixations are not always consistent with the moving object of interest, because human eye fixations are determined not only by the spatio-temporal information, but also by the velocity of eye movement. To address this issue, a new saliency prediction method via deep eye movement learning (EML) is proposed in this paper. Compared with previous methods that use human fixations as ground truth, our method uses the optical flow of fixations between successive frames as an extra ground truth for the purpose of eye movement learning. Experimental results on DHF1K, Hollywood2, and UCF-sports datasets show the proposed EML model achieves a promising result across a wide of metrics.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131425318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Hierarchical Graph Representation Learning with Local Capsule Pooling 基于局部胶囊池的分层图表示学习
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3495645
Zidong Su, Zehui Hu, Yangding Li
Hierarchical graph pooling has shown great potential for capturing high-quality graph representations through the node cluster selection mechanism. However, the current node cluster selection methods have inadequate clustering issues, and their scoring methods rely too much on the node representation, resulting in excessive graph structure information loss during pooling. In this paper, a local capsule pooling network (LCPN) is proposed to alleviate the above issues. Specifically, (i) a local capsule pooling (LCP) is proposed to alleviate the issue of insufficient clustering; (ii) a task-aware readout (TAR) mechanism is proposed to obtain a more expressive graph representation; (iii) a pooling information loss (PIL) term is proposed to further alleviate the information loss caused by pooling during training. Experimental results on the graph classification task, the graph reconstruction task, and the pooled graph adjacency visualization task show the superior performance of the proposed LCPN and demonstrate its effectiveness and efficiency.
通过节点集群选择机制,分层图池显示出捕获高质量图表示的巨大潜力。然而,目前的节点聚类选择方法存在聚类不足的问题,并且其评分方法过于依赖节点表示,导致池化过程中过多的图结构信息丢失。本文提出了一种局部胶囊池网络(LCPN)来解决上述问题。具体而言,(i)提出了一种局部胶囊池(LCP)来缓解聚类不足的问题;(ii)提出任务感知读出(TAR)机制,以获得更具表现力的图形表示;(iii)提出池化信息损失(PIL)术语,进一步缓解训练过程中池化造成的信息损失。在图分类任务、图重构任务和池图邻接可视化任务上的实验结果表明,所提LCPN具有优越的性能,证明了其有效性和高效性。
{"title":"Hierarchical Graph Representation Learning with Local Capsule Pooling","authors":"Zidong Su, Zehui Hu, Yangding Li","doi":"10.1145/3469877.3495645","DOIUrl":"https://doi.org/10.1145/3469877.3495645","url":null,"abstract":"Hierarchical graph pooling has shown great potential for capturing high-quality graph representations through the node cluster selection mechanism. However, the current node cluster selection methods have inadequate clustering issues, and their scoring methods rely too much on the node representation, resulting in excessive graph structure information loss during pooling. In this paper, a local capsule pooling network (LCPN) is proposed to alleviate the above issues. Specifically, (i) a local capsule pooling (LCP) is proposed to alleviate the issue of insufficient clustering; (ii) a task-aware readout (TAR) mechanism is proposed to obtain a more expressive graph representation; (iii) a pooling information loss (PIL) term is proposed to further alleviate the information loss caused by pooling during training. Experimental results on the graph classification task, the graph reconstruction task, and the pooled graph adjacency visualization task show the superior performance of the proposed LCPN and demonstrate its effectiveness and efficiency.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131041741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A comparison study: the impact of age and gender distribution on age estimation 比较研究:年龄和性别分布对年龄估计的影响
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490576
Chang Kong, Qiuming Luo, Guoliang Chen
Age estimation from a single facial image is a challenging and attractive research area in the computer vision community. Several facial datasets annotated with age and gender attributes became available in the literature. However, one major drawback is that these datasets do not consider the label distribution during data collection. Therefore, the models training on these datasets inevitably have bias for the age having least number of images. In this work, we analyze the age and gender distribution of previous datasets and publish an Uniform Age and Gender Dataset (UAGD) which has almost equal number of female and male images in each age. In addition, we investigate the impact of age and gender distribution on age estimation by comparing DEX CNN model trained on several different datasets. Our experiments show that UAGD dataset has good performance for age estimation task and also it is suitable for being an evaluation benchmark.
从单个面部图像中估计年龄是计算机视觉界一个具有挑战性和吸引力的研究领域。一些带有年龄和性别属性注释的面部数据集在文献中可用。然而,一个主要的缺点是这些数据集在数据收集过程中没有考虑标签分布。因此,在这些数据集上训练的模型不可避免地会对图像数量最少的年龄产生偏差。在这项工作中,我们分析了以前数据集的年龄和性别分布,并发布了一个统一年龄和性别数据集(UAGD),该数据集在每个年龄中具有几乎相同数量的女性和男性图像。此外,我们通过比较在几个不同数据集上训练的DEX CNN模型,研究了年龄和性别分布对年龄估计的影响。我们的实验表明,UAGD数据集在年龄估计任务中具有良好的性能,并且适合作为评估基准。
{"title":"A comparison study: the impact of age and gender distribution on age estimation","authors":"Chang Kong, Qiuming Luo, Guoliang Chen","doi":"10.1145/3469877.3490576","DOIUrl":"https://doi.org/10.1145/3469877.3490576","url":null,"abstract":"Age estimation from a single facial image is a challenging and attractive research area in the computer vision community. Several facial datasets annotated with age and gender attributes became available in the literature. However, one major drawback is that these datasets do not consider the label distribution during data collection. Therefore, the models training on these datasets inevitably have bias for the age having least number of images. In this work, we analyze the age and gender distribution of previous datasets and publish an Uniform Age and Gender Dataset (UAGD) which has almost equal number of female and male images in each age. In addition, we investigate the impact of age and gender distribution on age estimation by comparing DEX CNN model trained on several different datasets. Our experiments show that UAGD dataset has good performance for age estimation task and also it is suitable for being an evaluation benchmark.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132652394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hierarchical Composition Learning for Composed Query Image Retrieval 面向组合查询图像检索的分层组合学习
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490601
Yahui Xu, Yi Bin, Guoqing Wang, Yang Yang
Composed query image retrieval is a growing research topic. The object is to retrieve images not only generally resemble the reference image, but differ according to the desired modification text. Existing methods mainly explore composing modification text with global feature or local entity descriptor of reference image. However, they ignore the fact that modification text is indeed diverse and arbitrary. It not only relates to abstractive global feature or concrete local entity transformation, but also often associates with the fine-grained structured visual adjustment. Thus, it is insufficient to emphasize the global or local entity visual for the query composition. In this work, we tackle this task by hierarchical composition learning. Specifically, the proposed method first encodes images into three representations consisting of global, entity and structure level representations. Structure level representation is richly explicable, which explicitly describes entities as well as attributes and relationships in the image with a directed graph. Based on these, we naturally perform hierarchical composition learning by fusing modification text and reference image in the global-entity-structure manner. It can transform the visual feature conditioned on modification text to target image in a coarse-to-fine manner, which takes advantage of the complementary information among three levels. Moreover, we introduce a hybrid space matching to explore global, entity and structure alignments which can get high performance and good interpretability.
组合查询图像检索是一个新兴的研究课题。目标是检索图像不仅一般类似于参考图像,而且根据所需的修改文本有所不同。现有方法主要探索利用参考图像的全局特征或局部实体描述符构成修改文本。然而,他们忽略了一个事实,即修改文本确实是多样和任意的。它不仅涉及抽象的全局特征或具体的局部实体转换,而且往往涉及细粒度的结构化视觉调整。因此,强调查询组合的全局或局部实体可视化是不够的。在这项工作中,我们通过分层作文学习来解决这个问题。具体而言,该方法首先将图像编码为全局级、实体级和结构级三种表示。结构级表示具有丰富的可解释性,它用有向图显式地描述图像中的实体以及属性和关系。在此基础上,我们自然地以全局实体-结构的方式融合修改文本和参考图像进行分层组成学习。它利用三层间的互补信息,将以修改文本为条件的视觉特征以粗到精的方式转化为目标图像。此外,我们还引入了一种混合空间匹配方法来探索全局、实体和结构对齐,以获得高性能和良好的可解释性。
{"title":"Hierarchical Composition Learning for Composed Query Image Retrieval","authors":"Yahui Xu, Yi Bin, Guoqing Wang, Yang Yang","doi":"10.1145/3469877.3490601","DOIUrl":"https://doi.org/10.1145/3469877.3490601","url":null,"abstract":"Composed query image retrieval is a growing research topic. The object is to retrieve images not only generally resemble the reference image, but differ according to the desired modification text. Existing methods mainly explore composing modification text with global feature or local entity descriptor of reference image. However, they ignore the fact that modification text is indeed diverse and arbitrary. It not only relates to abstractive global feature or concrete local entity transformation, but also often associates with the fine-grained structured visual adjustment. Thus, it is insufficient to emphasize the global or local entity visual for the query composition. In this work, we tackle this task by hierarchical composition learning. Specifically, the proposed method first encodes images into three representations consisting of global, entity and structure level representations. Structure level representation is richly explicable, which explicitly describes entities as well as attributes and relationships in the image with a directed graph. Based on these, we naturally perform hierarchical composition learning by fusing modification text and reference image in the global-entity-structure manner. It can transform the visual feature conditioned on modification text to target image in a coarse-to-fine manner, which takes advantage of the complementary information among three levels. Moreover, we introduce a hybrid space matching to explore global, entity and structure alignments which can get high performance and good interpretability.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134281777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Joint label refinement and contrastive learning with hybrid memory for Unsupervised Marine Object Re-Identification 联合标签细化和混合记忆对比学习用于无监督海洋物体再识别
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3497695
Xiaorui Han, Zhiqi Chen, Ruixue Wang, Pengfei Zhao
Unsupervised object re-identification is a challenging task due to the missing of labels for the dataset. Many unsupervised object re-identification approaches combine clustering-based pseudo-label prediction with feature fine-tuning. These methods have achieved great success in the field of unsupervised object Re-ID. However, the inevitable label noise caused by the clustering procedure was ignored. Such noisy pseudo labels substantially hinder the model’s capability on further improving feature representations. To this end, we propose a novel joint label refinement and contrastive learning framework with hybrid memory to alleviate this problem. Firstly, in order to reduce the noise of clustering pseudo labels, we propose a novel noise refinement strategy. This strategy refines pseudo labels at clustering phase and promotes clustering quality by boosting the label purity. In addition, we propose a hybrid memory bank. The hybrid memory dynamically generates prototype-level and un-clustered instance-level supervisory signals for learning feature representations. With all prototype-level and un-clustered instance-level supervisions, re-identification model is trained progressively. Our proposed unsupervised object Re-ID framework significantly reduces the influence of noisy labels and refines the learned features. Our method consistently achieves state-of-the-art performance on benchmark datasets.
由于数据集缺少标签,无监督对象重新识别是一项具有挑战性的任务。许多无监督目标再识别方法将基于聚类的伪标签预测与特征微调相结合。这些方法在无监督对象Re-ID领域取得了很大的成功。然而,聚类过程中不可避免的标签噪声被忽略了。这种有噪声的伪标签实质上阻碍了模型进一步改进特征表示的能力。为此,我们提出了一种新的带有混合记忆的联合标签细化和对比学习框架来缓解这一问题。首先,为了降低聚类伪标签的噪声,提出了一种新的噪声细化策略。该策略在聚类阶段细化伪标签,并通过提高标签纯度来提高聚类质量。此外,我们还提出了一种混合记忆库。混合存储器动态生成原型级和非聚类实例级监视信号,用于学习特征表示。在原型级和非聚类实例级监督下,逐步训练再识别模型。我们提出的无监督对象Re-ID框架显著降低了噪声标签的影响,并改进了学习到的特征。我们的方法始终在基准数据集上实现最先进的性能。
{"title":"Joint label refinement and contrastive learning with hybrid memory for Unsupervised Marine Object Re-Identification","authors":"Xiaorui Han, Zhiqi Chen, Ruixue Wang, Pengfei Zhao","doi":"10.1145/3469877.3497695","DOIUrl":"https://doi.org/10.1145/3469877.3497695","url":null,"abstract":"Unsupervised object re-identification is a challenging task due to the missing of labels for the dataset. Many unsupervised object re-identification approaches combine clustering-based pseudo-label prediction with feature fine-tuning. These methods have achieved great success in the field of unsupervised object Re-ID. However, the inevitable label noise caused by the clustering procedure was ignored. Such noisy pseudo labels substantially hinder the model’s capability on further improving feature representations. To this end, we propose a novel joint label refinement and contrastive learning framework with hybrid memory to alleviate this problem. Firstly, in order to reduce the noise of clustering pseudo labels, we propose a novel noise refinement strategy. This strategy refines pseudo labels at clustering phase and promotes clustering quality by boosting the label purity. In addition, we propose a hybrid memory bank. The hybrid memory dynamically generates prototype-level and un-clustered instance-level supervisory signals for learning feature representations. With all prototype-level and un-clustered instance-level supervisions, re-identification model is trained progressively. Our proposed unsupervised object Re-ID framework significantly reduces the influence of noisy labels and refines the learned features. Our method consistently achieves state-of-the-art performance on benchmark datasets.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131340551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient Proposal Generation with U-shaped Network for Temporal Sentence Grounding 基于u型网络的时间句基础高效建议生成
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490606
Ludan Ruan, Qin Jin
Temporal Sentence Grounding aims to localize the relevant temporal region in a given video according to the query sentence. It is a challenging task due to the semantic gap between different modalities and diversity of the event duration. Proposal generation plays an important role in previous mainstream methods. However, previous proposal generation methods apply the same feature extraction without considering the diversity of event duration. In this paper, we propose a novel temporal sentence grounding model with an U-shaped Network for efficient proposal generation (UN-TSG), which utilizes U-shaped structure to encode proposals of different lengths hierarchically. Experiments on two benchmark datasets demonstrate that with more efficient proposal generation method, our model can achieve the state-of-the-art grounding performance in faster speed and with less computation cost.
时间句基础的目的是根据查询句在给定视频中定位相关的时间区域。由于不同模态之间的语义差异和事件持续时间的多样性,这是一项具有挑战性的任务。在以往的主流方法中,提议生成占有重要地位。然而,以往的建议生成方法采用相同的特征提取,而没有考虑事件持续时间的多样性。本文提出了一种基于u形网络的时间句基础模型,该模型利用u形结构对不同长度的建议进行分层编码。在两个基准数据集上的实验表明,采用更高效的提议生成方法,我们的模型可以以更快的速度和更少的计算成本获得最先进的接地性能。
{"title":"Efficient Proposal Generation with U-shaped Network for Temporal Sentence Grounding","authors":"Ludan Ruan, Qin Jin","doi":"10.1145/3469877.3490606","DOIUrl":"https://doi.org/10.1145/3469877.3490606","url":null,"abstract":"Temporal Sentence Grounding aims to localize the relevant temporal region in a given video according to the query sentence. It is a challenging task due to the semantic gap between different modalities and diversity of the event duration. Proposal generation plays an important role in previous mainstream methods. However, previous proposal generation methods apply the same feature extraction without considering the diversity of event duration. In this paper, we propose a novel temporal sentence grounding model with an U-shaped Network for efficient proposal generation (UN-TSG), which utilizes U-shaped structure to encode proposals of different lengths hierarchically. Experiments on two benchmark datasets demonstrate that with more efficient proposal generation method, our model can achieve the state-of-the-art grounding performance in faster speed and with less computation cost.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127109081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visible-Infrared Cross-Modal Person Re-identification based on Positive Feedback 基于正反馈的可见-红外跨模态人再识别
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3497693
Lingyi Lu, Xin Xu
Visible-infrared person re-identification (VI-ReID) is undoubtedly a challenging cross-modality person retrieval task with increasing appreciation. Compared to traditional person ReID that focuses on person images in a single RGB mode, VI-ReID suffers from additional cross-modality discrepancy due to the different imaging processes of spectrum cameras. Several effective attempts have been made in recent years to narrow cross-modality gap aiming to improve the re-identification performance, but rarely study the key problem of optimizing the search results combined with relevant feedback. In this paper, we present the idea of cross-modality visible-infrared person re-identification combined with human positive feedback. This method allows the user to quickly optimize the search performance by selecting strong positive samples during the re-identification process. We have validated the effectiveness of our method on a public dataset, SYSU-MM01, and results confirmed that the proposed method achieved superior performance compared to the current state-of-the-art methods.
可见红外人体再识别(VI-ReID)无疑是一项具有挑战性的跨模态人体检索任务。与传统的专注于单一RGB模式下的人物图像的人物ReID相比,由于光谱相机的成像过程不同,VI-ReID存在额外的跨模态差异。近年来,为了提高再识别性能,缩小跨模态差距进行了一些有效的尝试,但很少研究结合相关反馈优化搜索结果的关键问题。本文提出了结合人体正反馈的跨模态可见-红外人体再识别的思想。该方法允许用户通过在重新鉴定过程中选择强阳性样本来快速优化搜索性能。我们已经在一个公共数据集SYSU-MM01上验证了我们的方法的有效性,结果证实,与目前最先进的方法相比,我们提出的方法取得了更好的性能。
{"title":"Visible-Infrared Cross-Modal Person Re-identification based on Positive Feedback","authors":"Lingyi Lu, Xin Xu","doi":"10.1145/3469877.3497693","DOIUrl":"https://doi.org/10.1145/3469877.3497693","url":null,"abstract":"Visible-infrared person re-identification (VI-ReID) is undoubtedly a challenging cross-modality person retrieval task with increasing appreciation. Compared to traditional person ReID that focuses on person images in a single RGB mode, VI-ReID suffers from additional cross-modality discrepancy due to the different imaging processes of spectrum cameras. Several effective attempts have been made in recent years to narrow cross-modality gap aiming to improve the re-identification performance, but rarely study the key problem of optimizing the search results combined with relevant feedback. In this paper, we present the idea of cross-modality visible-infrared person re-identification combined with human positive feedback. This method allows the user to quickly optimize the search performance by selecting strong positive samples during the re-identification process. We have validated the effectiveness of our method on a public dataset, SYSU-MM01, and results confirmed that the proposed method achieved superior performance compared to the current state-of-the-art methods.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117329453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
BRUSH: Label Reconstructing and Similarity Preserving Hashing for Cross-modal Retrieval 跨模态检索的标签重构和相似性保持哈希
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490589
P. Zhang, Pengfei Zhao, Xin Luo, Xin-Shun Xu
The hashing technique has recently sparked much attention in information retrieval community due to its high efficiency in terms of storage and query processing. For cross-modal retrieval tasks, existing supervised hashing models either treat the semantic labels as the ground truth and formalize the problem to a classification task, or further add a similarity matrix as supervisory signals to pursue hash codes of high quality to represent coupled data. However, these approaches are incapable of ensuring that the learnt binary codes preserve well the semantics and similarity relationships contained in the supervised information. Moreover, for sophisticated discrete optimization problems, it is always addressed by continuous relaxation or bit-wise solver, which leads to a large quantization error and inefficient computation. To relieve these issues, in this paper, we present a two-step supervised discrete hashing method, i.e., laBel ReconstrUcting and Similarity preserving Hashing (BRUSH). We formulate it as an asymmetric pairwise similarity-preserving problem by using two latent semantic embeddings deducted from decomposing semantics and reconstructing semantics, respectively. Meanwhile, the unified binary codes are jointly generated based on both embeddings with the affinity guarantee, such that the discriminative property of the obtained hash codes can be significantly enhanced alongside preserving semantics well. In addition, by adopting two-step hash learning strategy, our method simplifies the procedure of the hashing function and binary codes learning, thus improving the flexibility and efficiency. The resulting discrete optimization problem is also elegantly solved by the proposed alternating algorithm without any relaxation. Extensive experiments on benchmarks demonstrate that BRUSH outperforms the state-of-the-art methods, in terms of efficiency and effectiveness.
近年来,哈希技术以其在存储和查询处理方面的高效性受到了信息检索界的广泛关注。对于跨模态检索任务,现有的监督哈希模型要么将语义标签作为基本事实,将问题形式化为分类任务,要么进一步添加相似矩阵作为监督信号,追求高质量的哈希码来表示耦合数据。然而,这些方法不能保证学习到的二进制码很好地保留了监督信息中包含的语义和相似关系。此外,对于复杂的离散优化问题,通常采用连续松弛法或逐位求解法求解,导致量化误差大,计算效率低。为了解决这些问题,本文提出了一种两步监督离散哈希方法,即标签重构和相似性保持哈希(BRUSH)。我们利用从语义分解和语义重构中分别推导出的两个潜在语义嵌入,将其表述为一个非对称的两两相似保持问题。同时,基于两种嵌入方式联合生成具有亲和力保证的统一二进制码,使得得到的哈希码在保持语义的同时显著增强了判别性。此外,通过采用两步哈希学习策略,我们的方法简化了哈希函数和二进制码学习的过程,从而提高了灵活性和效率。所提出的交替算法也能很好地解决离散优化问题。在基准测试上的大量实验表明,在效率和有效性方面,BRUSH优于最先进的方法。
{"title":"BRUSH: Label Reconstructing and Similarity Preserving Hashing for Cross-modal Retrieval","authors":"P. Zhang, Pengfei Zhao, Xin Luo, Xin-Shun Xu","doi":"10.1145/3469877.3490589","DOIUrl":"https://doi.org/10.1145/3469877.3490589","url":null,"abstract":"The hashing technique has recently sparked much attention in information retrieval community due to its high efficiency in terms of storage and query processing. For cross-modal retrieval tasks, existing supervised hashing models either treat the semantic labels as the ground truth and formalize the problem to a classification task, or further add a similarity matrix as supervisory signals to pursue hash codes of high quality to represent coupled data. However, these approaches are incapable of ensuring that the learnt binary codes preserve well the semantics and similarity relationships contained in the supervised information. Moreover, for sophisticated discrete optimization problems, it is always addressed by continuous relaxation or bit-wise solver, which leads to a large quantization error and inefficient computation. To relieve these issues, in this paper, we present a two-step supervised discrete hashing method, i.e., laBel ReconstrUcting and Similarity preserving Hashing (BRUSH). We formulate it as an asymmetric pairwise similarity-preserving problem by using two latent semantic embeddings deducted from decomposing semantics and reconstructing semantics, respectively. Meanwhile, the unified binary codes are jointly generated based on both embeddings with the affinity guarantee, such that the discriminative property of the obtained hash codes can be significantly enhanced alongside preserving semantics well. In addition, by adopting two-step hash learning strategy, our method simplifies the procedure of the hashing function and binary codes learning, thus improving the flexibility and efficiency. The resulting discrete optimization problem is also elegantly solved by the proposed alternating algorithm without any relaxation. Extensive experiments on benchmarks demonstrate that BRUSH outperforms the state-of-the-art methods, in terms of efficiency and effectiveness.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121150961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Multimedia Asia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1