首页 > 最新文献

ACM Multimedia Asia最新文献

英文 中文
A Local-Global Commutative Preserving Functional Map for Shape Correspondence 形状对应的局部-全局交换保持泛函映射
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490593
Qianxing Li, Shaofan Wang, Dehui Kong, Baocai Yin
Existing non-rigid shape matching methods mainly involve two disadvantages. (a) Local details and global features of shapes can not be carefully explored. (b) A satisfactory trade-off between the matching accuracy and computational efficiency can be hardly achieved. To address these issues, we propose a local-global commutative preserving functional map (LGCP) for shape correspondence. The core of LGCP involves an intra-segment geometric submodel and a local-global commutative preserving submodel, which accomplishes the segment-to-segment matching and the point-to-point matching tasks, respectively. The first submodel consists of an ICP similarity term and two geometric similarity terms which guarantee the correct correspondence of segments of two shapes, while the second submodel guarantees the bijectivity of the correspondence on both the shape level and the segment level. Experimental results on both segment-to-segment matching and point-to-point matching show that, LGCP not only generate quite accurate matching results, but also exhibit a satisfactory portability and a high efficiency.
现有的非刚性形状匹配方法主要存在两个缺点。(a)不能仔细探讨形状的局部细节和全局特征。(b)很难在匹配精度和计算效率之间取得令人满意的平衡。为了解决这些问题,我们提出了一种局部-全局交换保持功能映射(LGCP)。该算法的核心是一个段内几何子模型和一个局部-全局交换保持子模型,分别完成了段与段之间的匹配和点与点之间的匹配。第一个子模型由一个ICP相似项和两个几何相似项组成,保证了两个形状的线段的正确对应;第二个子模型保证了形状级和线段级对应的双射性。在段对段匹配和点对点匹配上的实验结果表明,LGCP算法不仅产生了相当精确的匹配结果,而且具有良好的可移植性和高效率。
{"title":"A Local-Global Commutative Preserving Functional Map for Shape Correspondence","authors":"Qianxing Li, Shaofan Wang, Dehui Kong, Baocai Yin","doi":"10.1145/3469877.3490593","DOIUrl":"https://doi.org/10.1145/3469877.3490593","url":null,"abstract":"Existing non-rigid shape matching methods mainly involve two disadvantages. (a) Local details and global features of shapes can not be carefully explored. (b) A satisfactory trade-off between the matching accuracy and computational efficiency can be hardly achieved. To address these issues, we propose a local-global commutative preserving functional map (LGCP) for shape correspondence. The core of LGCP involves an intra-segment geometric submodel and a local-global commutative preserving submodel, which accomplishes the segment-to-segment matching and the point-to-point matching tasks, respectively. The first submodel consists of an ICP similarity term and two geometric similarity terms which guarantee the correct correspondence of segments of two shapes, while the second submodel guarantees the bijectivity of the correspondence on both the shape level and the segment level. Experimental results on both segment-to-segment matching and point-to-point matching show that, LGCP not only generate quite accurate matching results, but also exhibit a satisfactory portability and a high efficiency.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127294868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Color Image Denoising via Tensor Robust PCA with Nonconvex and Nonlocal Regularization 基于非凸非局部正则化张量鲁棒PCA的彩色图像去噪
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3493592
Xiaoyu Geng, Q. Guo, Cai-ming Zhang
Tensor robust principal component analysis (TRPCA) is an important algorithm for color image denoising by treating the whole image as a tensor and shrinking all singular values equally. In this paper, to improve the denoising performance of TRPCA, we propose a variant of TRPCA model. Specifically, we first introduce a nonconvex TRPCA (N-TRPCA) model which can shrink large singular values more and shrink small singular values less, so that the physical meanings of different singular values can be preserved. To take advantage of the structural redundancy of an image, we further group similar patches as a tensor according to nonlocal prior, and then apply the N-TRPCA model on this tensor. The denoised image can be obtained by aggregating all processed tensors. Experimental results demonstrate the superiority of the proposed denoising method beyond state-of-the-arts.
张量鲁棒主成分分析(TRPCA)是一种重要的彩色图像去噪算法,它将整个图像作为一个张量,并将所有奇异值相等地缩小。为了提高TRPCA的去噪性能,本文提出了一种TRPCA模型的变体。具体而言,我们首先引入了一种非凸TRPCA (N-TRPCA)模型,该模型可以对大奇异值进行更多的收缩,对小奇异值进行更少的收缩,从而保持不同奇异值的物理意义。为了利用图像的结构冗余性,我们进一步根据非局部先验将相似的patch分组为一个张量,然后对该张量应用N-TRPCA模型。将处理后的张量进行汇总,得到去噪后的图像。实验结果表明,所提出的去噪方法具有较好的优越性。
{"title":"Color Image Denoising via Tensor Robust PCA with Nonconvex and Nonlocal Regularization","authors":"Xiaoyu Geng, Q. Guo, Cai-ming Zhang","doi":"10.1145/3469877.3493592","DOIUrl":"https://doi.org/10.1145/3469877.3493592","url":null,"abstract":"Tensor robust principal component analysis (TRPCA) is an important algorithm for color image denoising by treating the whole image as a tensor and shrinking all singular values equally. In this paper, to improve the denoising performance of TRPCA, we propose a variant of TRPCA model. Specifically, we first introduce a nonconvex TRPCA (N-TRPCA) model which can shrink large singular values more and shrink small singular values less, so that the physical meanings of different singular values can be preserved. To take advantage of the structural redundancy of an image, we further group similar patches as a tensor according to nonlocal prior, and then apply the N-TRPCA model on this tensor. The denoised image can be obtained by aggregating all processed tensors. Experimental results demonstrate the superiority of the proposed denoising method beyond state-of-the-arts.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129796591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Making Video Recognition Models Robust to Common Corruptions With Supervised Contrastive Learning 利用监督对比学习使视频识别模型对常见腐败具有鲁棒性
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3497692
Tomu Hirata, Yusuke Mukuta, Tatsuya Harada
The video understanding capability of video recognition models has been significantly improved by the development of deep learning techniques and various video datasets available. However, video recognition models are still vulnerable to invisible perturbations, which limits the use of deep video recognition models in the real world. We present a new benchmark for the robustness of action recognition classifiers to general corruptions, and show that a supervised contrastive learning framework is effective in obtaining discriminative and stable video representations, and makes deep video recognition models robust to general input corruptions. Experiments on the action recognition task for corrupted videos show the high robustness of the proposed method on the UCF101 and HMDB51 datasets with various common corruptions.
随着深度学习技术和各种视频数据集的发展,视频识别模型的视频理解能力得到了显著提高。然而,视频识别模型仍然容易受到不可见扰动的影响,这限制了深度视频识别模型在现实世界中的应用。我们提出了动作识别分类器对一般损坏的鲁棒性的新基准,并表明监督对比学习框架在获得判别和稳定的视频表示方面是有效的,并使深度视频识别模型对一般输入损坏具有鲁棒性。在UCF101和HMDB51数据集上进行的损坏视频动作识别实验表明,该方法对各种常见损坏的数据集具有较高的鲁棒性。
{"title":"Making Video Recognition Models Robust to Common Corruptions With Supervised Contrastive Learning","authors":"Tomu Hirata, Yusuke Mukuta, Tatsuya Harada","doi":"10.1145/3469877.3497692","DOIUrl":"https://doi.org/10.1145/3469877.3497692","url":null,"abstract":"The video understanding capability of video recognition models has been significantly improved by the development of deep learning techniques and various video datasets available. However, video recognition models are still vulnerable to invisible perturbations, which limits the use of deep video recognition models in the real world. We present a new benchmark for the robustness of action recognition classifiers to general corruptions, and show that a supervised contrastive learning framework is effective in obtaining discriminative and stable video representations, and makes deep video recognition models robust to general input corruptions. Experiments on the action recognition task for corrupted videos show the high robustness of the proposed method on the UCF101 and HMDB51 datasets with various common corruptions.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121820972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Focusing Attention across Multiple Images for Multimodal Event Detection 多模态事件检测的多幅图像集中注意力
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3495642
Yangyang Li, Jun Li, Hao Jin, Liang Peng
Multimodal social event detection has been attracting tremendous research attention in recent years, due to that it provides comprehensive and complementary understanding of social events and is important to public security and administration. Most existing works have been focusing on the fusion of multimodal information, especially for single image and text fusion. Such single image-text pair processing breaks the correlations between images of the same post and may affect the accuracy of event detection. In this work, we propose to focus attention across multiple images for multimodal event detection, which is also more reasonable for tweets with short text and multiple images. Towards this end, we elaborate a novel Multi-Image Focusing Network (MIFN) to connect text content with visual aspects in multiple images. Our MIFN consists of a feature extractor, a multi-focal network and an event classifier. The multi-focal network implements a focal attention across all the images, and fuses the most related regions with texts as multimodal representation. The event classifier finally predict the social event class based on the multimodal representations. To evaluate the effectiveness of our proposed approach, we conduct extensive experiments on a commonly-used disaster dataset. The experimental results demonstrate that, in both humanitarian event detection task and its variant of hurricane disaster, the proposed MIFN outperforms all the baselines. The ablation studies also exhibit the ability to filter the irrelevant regions across images which results in improving the accuracy of multimodal event detection.
近年来,多模式社会事件侦查因其提供了对社会事件的全面和互补的认识,对公共安全与行政管理具有重要意义而受到了极大的关注。现有的研究大多集中在多模态信息的融合,特别是单幅图像和文本的融合。这种单一的图像-文本对处理破坏了同一帖子图像之间的相关性,可能会影响事件检测的准确性。在这项工作中,我们提出将注意力集中在多幅图像上进行多模态事件检测,这对于文本短、图像多的推文来说也更合理。为此,我们设计了一种新的多图像聚焦网络(MIFN),将多图像中的文本内容与视觉方面联系起来。我们的MIFN由一个特征提取器、一个多焦点网络和一个事件分类器组成。多焦点网络实现了对所有图像的焦点关注,并将与文本最相关的区域融合为多模态表示。事件分类器最后根据多模态表示来预测社会事件的类别。为了评估我们提出的方法的有效性,我们在一个常用的灾难数据集上进行了大量的实验。实验结果表明,在人道主义事件检测任务及其飓风灾害变体中,所提出的MIFN都优于所有基线。消融研究还显示了过滤图像中不相关区域的能力,从而提高了多模态事件检测的准确性。
{"title":"Focusing Attention across Multiple Images for Multimodal Event Detection","authors":"Yangyang Li, Jun Li, Hao Jin, Liang Peng","doi":"10.1145/3469877.3495642","DOIUrl":"https://doi.org/10.1145/3469877.3495642","url":null,"abstract":"Multimodal social event detection has been attracting tremendous research attention in recent years, due to that it provides comprehensive and complementary understanding of social events and is important to public security and administration. Most existing works have been focusing on the fusion of multimodal information, especially for single image and text fusion. Such single image-text pair processing breaks the correlations between images of the same post and may affect the accuracy of event detection. In this work, we propose to focus attention across multiple images for multimodal event detection, which is also more reasonable for tweets with short text and multiple images. Towards this end, we elaborate a novel Multi-Image Focusing Network (MIFN) to connect text content with visual aspects in multiple images. Our MIFN consists of a feature extractor, a multi-focal network and an event classifier. The multi-focal network implements a focal attention across all the images, and fuses the most related regions with texts as multimodal representation. The event classifier finally predict the social event class based on the multimodal representations. To evaluate the effectiveness of our proposed approach, we conduct extensive experiments on a commonly-used disaster dataset. The experimental results demonstrate that, in both humanitarian event detection task and its variant of hurricane disaster, the proposed MIFN outperforms all the baselines. The ablation studies also exhibit the ability to filter the irrelevant regions across images which results in improving the accuracy of multimodal event detection.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121994578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
S2TD: A Tree-Structured Decoder for Image Paragraph Captioning S2TD:一种用于图像段落标题的树结构解码器
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490585
Yihui Shi, Yun Liu, Fangxiang Feng, Ruifan Li, Zhanyu Ma, Xiaojie Wang
Image paragraph captioning, a task to generate the paragraph description for a given image, usually requires mining and organizing linguistic counterparts from abundant visual clues. Limited by sequential decoding perspective, previous methods have difficulty in organizing the visual clues holistically or capturing the structural nature of linguistic descriptions. In this paper, we propose a novel tree-structured visual paragraph decoder network, called Splitting to Tree Decoder (S2TD) to address this problem. The key idea is to model the paragraph decoding process as a top-down binary tree expansion. S2TD consists of three modules: a split module, a score module, and a word-level RNN. The split module iteratively splits ancestral visual representations into two parts through a gating mechanism. To determine the tree topology, the score module uses cosine similarity to evaluate the nodes splitting. A novel tree structure loss is proposed to enable end-to-end learning. After the tree expansion, the word-level RNN decodes leaf nodes into sentences forming a coherent paragraph. Extensive experiments are conducted on the Stanford benchmark dataset. The experimental results show promising performance of our proposed S2TD.
图像段落字幕是一项为给定图像生成段落描述的任务,通常需要从丰富的视觉线索中挖掘和组织语言对应。以往的解码方法受顺序解码视角的限制,难以从整体上组织视觉线索或捕捉语言描述的结构本质。在本文中,我们提出了一种新的树状结构的视觉段落解码器网络,称为拆分到树解码器(S2TD)来解决这个问题。关键思想是将段落解码过程建模为自上而下的二叉树扩展。S2TD由三个模块组成:分割模块、评分模块和词级RNN。split模块通过一个门控机制将祖先的视觉表示迭代地分成两部分。为了确定树的拓扑结构,评分模块使用余弦相似度来评估节点的分裂。提出了一种新的树形结构损失算法实现端到端学习。在树展开后,词级RNN将叶节点解码成句子,形成连贯的段落。在斯坦福基准数据集上进行了广泛的实验。实验结果表明,我们提出的S2TD具有良好的性能。
{"title":"S2TD: A Tree-Structured Decoder for Image Paragraph Captioning","authors":"Yihui Shi, Yun Liu, Fangxiang Feng, Ruifan Li, Zhanyu Ma, Xiaojie Wang","doi":"10.1145/3469877.3490585","DOIUrl":"https://doi.org/10.1145/3469877.3490585","url":null,"abstract":"Image paragraph captioning, a task to generate the paragraph description for a given image, usually requires mining and organizing linguistic counterparts from abundant visual clues. Limited by sequential decoding perspective, previous methods have difficulty in organizing the visual clues holistically or capturing the structural nature of linguistic descriptions. In this paper, we propose a novel tree-structured visual paragraph decoder network, called Splitting to Tree Decoder (S2TD) to address this problem. The key idea is to model the paragraph decoding process as a top-down binary tree expansion. S2TD consists of three modules: a split module, a score module, and a word-level RNN. The split module iteratively splits ancestral visual representations into two parts through a gating mechanism. To determine the tree topology, the score module uses cosine similarity to evaluate the nodes splitting. A novel tree structure loss is proposed to enable end-to-end learning. After the tree expansion, the word-level RNN decodes leaf nodes into sentences forming a coherent paragraph. Extensive experiments are conducted on the Stanford benchmark dataset. The experimental results show promising performance of our proposed S2TD.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128338125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Few-shot Egocentric Multimodal Activity Recognition 少镜头自我中心多模态活动识别
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490603
Jinxing Pan, Xiaoshan Yang, Yi Huang, Changsheng Xu
Activity recognition based on egocentric multimodal data collected by wearable devices has become increasingly popular recently. However, conventional activity recognition methods face the dilemma of the lack of large-scale labeled egocentric multimodal datasets due to the high cost of data collection. In this paper, we propose a new task of few-shot egocentric multimodal activity recognition, which has at least two significant challenges. On the one hand, it is difficult to extract effective features from the multimodal data sequences of video and sensor signals due to the scarcity of the samples. On the other hand, how to robustly recognize novel activity classes with very few labeled samples becomes another more critical challenge due to the complexity of the multimodal data. To resolve the challenges, we propose a two-stream graph network, which consists of a heterogeneous graph-based multimodal association module and a knowledge-aware activity classifier module. The former uses a heterogeneous graph network to comprehensively capture the dynamic and complementary information contained in the multimodal data stream. The latter learns robust activity classifiers through knowledge propagation among the classifier parameters of different classes. In addition, we adopt episodic training strategy to improve the generalization ability of the proposed few-shot activity recognition model. Experiments on two public datasets show that the proposed model achieves better performances than other baseline models.
近年来,基于可穿戴设备收集的以自我为中心的多模态数据的活动识别越来越受欢迎。然而,由于数据收集成本高,传统的活动识别方法面临着缺乏大规模标记自中心多模态数据集的困境。在本文中,我们提出了一个新的任务,即少镜头自我中心多模态活动识别,这至少有两个重大的挑战。一方面,由于样本的稀缺性,难以从视频和传感器信号的多模态数据序列中提取有效特征;另一方面,由于多模态数据的复杂性,如何用很少的标记样本鲁棒地识别新的活动类别成为另一个更为关键的挑战。为了解决这些问题,我们提出了一种双流图网络,该网络由基于异构图的多模态关联模块和知识感知活动分类器模块组成。前者利用异构图网络综合捕获多模态数据流中包含的动态信息和互补信息。后者通过不同类别的分类器参数之间的知识传播来学习稳健的活动分类器。此外,我们采用情景训练策略来提高所提出的少镜头活动识别模型的泛化能力。在两个公共数据集上的实验表明,该模型比其他基准模型具有更好的性能。
{"title":"Few-shot Egocentric Multimodal Activity Recognition","authors":"Jinxing Pan, Xiaoshan Yang, Yi Huang, Changsheng Xu","doi":"10.1145/3469877.3490603","DOIUrl":"https://doi.org/10.1145/3469877.3490603","url":null,"abstract":"Activity recognition based on egocentric multimodal data collected by wearable devices has become increasingly popular recently. However, conventional activity recognition methods face the dilemma of the lack of large-scale labeled egocentric multimodal datasets due to the high cost of data collection. In this paper, we propose a new task of few-shot egocentric multimodal activity recognition, which has at least two significant challenges. On the one hand, it is difficult to extract effective features from the multimodal data sequences of video and sensor signals due to the scarcity of the samples. On the other hand, how to robustly recognize novel activity classes with very few labeled samples becomes another more critical challenge due to the complexity of the multimodal data. To resolve the challenges, we propose a two-stream graph network, which consists of a heterogeneous graph-based multimodal association module and a knowledge-aware activity classifier module. The former uses a heterogeneous graph network to comprehensively capture the dynamic and complementary information contained in the multimodal data stream. The latter learns robust activity classifiers through knowledge propagation among the classifier parameters of different classes. In addition, we adopt episodic training strategy to improve the generalization ability of the proposed few-shot activity recognition model. Experiments on two public datasets show that the proposed model achieves better performances than other baseline models.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"75 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134155535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep Multiple Length Hashing via Multi-task Learning 基于多任务学习的深度多长度哈希
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3493591
Letian Wang, Xiushan Nie, Quan Zhou, Yang Shi, Xingbo Liu
Hashing can compress heterogeneous high-dimensional data into compact binary codes. For most existing hash methods, they first predetermine a fixed length for the hash code and then train the model based on this fixed length. However, when the task requirements change, these methods need to retrain the model for a new length of hash codes, which increases time cost. To address this issue, we propose a deep supervised hashing method, called deep multiple length hashing(DMLH), which can learn multiple length hash codes simultaneously based on a multi-task learning network. This proposed DMLH can well utilize the relationships with a hard parameter sharing-based multi-task network. Specifically, in DMLH, the multiple hash codes with different lengths are regarded as different views of the same sample. Furthermore, we introduce a type of mutual information loss to mine the association among hash codes of different lengths. Extensive experiments have indicated that DMLH outperforms most existing models, verifying its effectiveness.
哈希可以将异构的高维数据压缩成紧凑的二进制代码。对于大多数现有的哈希方法,它们首先为哈希码预先确定一个固定长度,然后根据这个固定长度训练模型。然而,当任务需求发生变化时,这些方法需要为新的哈希码长度重新训练模型,这增加了时间成本。为了解决这个问题,我们提出了一种深度监督哈希方法,称为深度多长度哈希(DMLH),它可以基于多任务学习网络同时学习多个长度哈希码。基于硬参数共享的多任务网络可以很好地利用多任务网络之间的关系。具体来说,在DMLH中,不同长度的多个哈希码被视为同一样本的不同视图。此外,我们引入了一种互信息损失来挖掘不同长度哈希码之间的关联。大量的实验表明,DMLH优于大多数现有模型,验证了其有效性。
{"title":"Deep Multiple Length Hashing via Multi-task Learning","authors":"Letian Wang, Xiushan Nie, Quan Zhou, Yang Shi, Xingbo Liu","doi":"10.1145/3469877.3493591","DOIUrl":"https://doi.org/10.1145/3469877.3493591","url":null,"abstract":"Hashing can compress heterogeneous high-dimensional data into compact binary codes. For most existing hash methods, they first predetermine a fixed length for the hash code and then train the model based on this fixed length. However, when the task requirements change, these methods need to retrain the model for a new length of hash codes, which increases time cost. To address this issue, we propose a deep supervised hashing method, called deep multiple length hashing(DMLH), which can learn multiple length hash codes simultaneously based on a multi-task learning network. This proposed DMLH can well utilize the relationships with a hard parameter sharing-based multi-task network. Specifically, in DMLH, the multiple hash codes with different lengths are regarded as different views of the same sample. Furthermore, we introduce a type of mutual information loss to mine the association among hash codes of different lengths. Extensive experiments have indicated that DMLH outperforms most existing models, verifying its effectiveness.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133121924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Semantic Enhanced Cross-modal GAN for Zero-shot Learning 用于零次学习的语义增强跨模态GAN
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490581
Haotian Sun, Jiwei Wei, Yang Yang, Xing Xu
The goal of Zero-shot Learning (ZSL) is to recognize categories that are not seen during the training process. The traditional method is to learn an embedding space and map visual features and semantic features to this common space. However, this method inevitably encounters the bias problem, i.e., unseen instances are often incorrectly recognized as the seen classes. Some attempts are made by proposing another paradigm, which uses generative models to hallucinate the features of unseen samples. However, the generative models often suffer from instability issues, making it impractical for them to generate fine-grained features of unseen samples, thus resulting in very limited improvement. To resolve this, a Semantic Enhanced Cross-modal GAN (SECM GAN) is proposed by imposing the cross-modal association for improving the semantic and discriminative property of the generated features. Specifically, we first train a cross-modal embedding model called Semantic Enhanced Cross-modal Model (SECM), which is constrained by discrimination and semantics. Then we train our generative model based on Generative Adversarial Network (GAN) called SECM GAN, in which the generator generates cross-modal features, and the discriminator distinguishes true cross-modal features from generated cross-modal features. We deploy SECM as a weak constraint of GAN, which makes reliance on GAN get reduced. We evaluate extensive experiments on three widely used ZSL datasets to demonstrate the superiority of our framework.
Zero-shot Learning (ZSL)的目标是识别在训练过程中没有看到的类别。传统的方法是学习一个嵌入空间,将视觉特征和语义特征映射到这个公共空间。然而,这种方法不可避免地会遇到偏差问题,即不可见的实例经常被错误地识别为可见的类。一些尝试是通过提出另一种范式,它使用生成模型来幻觉看不见的样本的特征。然而,生成模型经常存在不稳定性问题,使得它们无法生成未见样本的细粒度特征,从而导致改进非常有限。为了解决这个问题,提出了一种语义增强的跨模态GAN (SECM GAN),通过引入跨模态关联来提高生成特征的语义和判别性。具体来说,我们首先训练了一个受区分和语义约束的跨模态嵌入模型,称为语义增强跨模态模型(Semantic Enhanced cross-modal model, SECM)。然后,我们基于生成对抗网络(GAN)训练生成模型,称为SECM GAN,其中生成器生成跨模态特征,鉴别器区分真实的跨模态特征和生成的跨模态特征。我们将SECM部署为GAN的弱约束,从而减少了对GAN的依赖。我们在三个广泛使用的ZSL数据集上进行了大量的实验,以证明我们的框架的优越性。
{"title":"Semantic Enhanced Cross-modal GAN for Zero-shot Learning","authors":"Haotian Sun, Jiwei Wei, Yang Yang, Xing Xu","doi":"10.1145/3469877.3490581","DOIUrl":"https://doi.org/10.1145/3469877.3490581","url":null,"abstract":"The goal of Zero-shot Learning (ZSL) is to recognize categories that are not seen during the training process. The traditional method is to learn an embedding space and map visual features and semantic features to this common space. However, this method inevitably encounters the bias problem, i.e., unseen instances are often incorrectly recognized as the seen classes. Some attempts are made by proposing another paradigm, which uses generative models to hallucinate the features of unseen samples. However, the generative models often suffer from instability issues, making it impractical for them to generate fine-grained features of unseen samples, thus resulting in very limited improvement. To resolve this, a Semantic Enhanced Cross-modal GAN (SECM GAN) is proposed by imposing the cross-modal association for improving the semantic and discriminative property of the generated features. Specifically, we first train a cross-modal embedding model called Semantic Enhanced Cross-modal Model (SECM), which is constrained by discrimination and semantics. Then we train our generative model based on Generative Adversarial Network (GAN) called SECM GAN, in which the generator generates cross-modal features, and the discriminator distinguishes true cross-modal features from generated cross-modal features. We deploy SECM as a weak constraint of GAN, which makes reliance on GAN get reduced. We evaluate extensive experiments on three widely used ZSL datasets to demonstrate the superiority of our framework.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134226502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Private-Share: A Secure and Privacy-Preserving De-Centralized Framework for Large Scale Data Sharing 私有共享:用于大规模数据共享的安全且保护隐私的去中心化框架
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3493588
Arun Zachariah, Maha M AlRasheed
The various data and privacy regulations introduced around the globe, require data to be stored in a secure and privacy-preserving fashion. Non-compliance with these regulations come with major consequences. This has led to the formation of huge data silos within organizations leading to difficult data analysis along with an increased risk of a data breach. Isolating data also prevents collaborative research. To address this, we present Private-Share, a framework that would enable secure sharing of large scale data. In order to achieve this goal, Private-Share leverages the recent advances in blockchain technology specifically the InterPlanetary File System and Ethereum.
全球各地推出的各种数据和隐私法规要求以安全和隐私保护的方式存储数据。不遵守这些规定将带来严重后果。这导致组织内部形成了巨大的数据孤岛,导致数据分析困难,数据泄露风险增加。隔离数据也阻碍了合作研究。为了解决这个问题,我们提出了Private-Share,这是一个能够安全共享大规模数据的框架。为了实现这一目标,Private-Share利用了区块链技术的最新进展,特别是星际文件系统和以太坊。
{"title":"Private-Share: A Secure and Privacy-Preserving De-Centralized Framework for Large Scale Data Sharing","authors":"Arun Zachariah, Maha M AlRasheed","doi":"10.1145/3469877.3493588","DOIUrl":"https://doi.org/10.1145/3469877.3493588","url":null,"abstract":"The various data and privacy regulations introduced around the globe, require data to be stored in a secure and privacy-preserving fashion. Non-compliance with these regulations come with major consequences. This has led to the formation of huge data silos within organizations leading to difficult data analysis along with an increased risk of a data breach. Isolating data also prevents collaborative research. To address this, we present Private-Share, a framework that would enable secure sharing of large scale data. In order to achieve this goal, Private-Share leverages the recent advances in blockchain technology specifically the InterPlanetary File System and Ethereum.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131839506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pose-aware Outfit Transfer between Unpaired in-the-wild Fashion Images 姿势意识的服装转换之间的不成对的野生时尚图像
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490569
Donnaphat Trakulwaranont, Marc A. Kastner, S. Satoh
Virtual try-on systems became popular for visualizing outfits, due to the importance of individual fashion in many communities. The objective of such a system is to transfer a piece of clothing to another person while preserving its detail and characteristics. To generate a realistic in-the-wild image, it needs visual optimization of the clothing, background and target person, making this task still very challenging. In this paper, we develop a method that generates realistic try-on images with unpaired images from in-the-wild datasets. Our proposed method starts with generating a mock-up paired image using geometric transfer. Then, the target’s pose information is adjusted using a modified pose-attention module. We combine a reconstruction and a content loss to preserve the detail and style of the transferred clothing, background and the target person. We evaluate the approach on the Fashionpedia dataset and can show a promising performance over a baseline approach.
由于个人时尚在许多社区的重要性,虚拟试穿系统在服装可视化方面变得流行起来。这种系统的目标是将一件衣服转移给另一个人,同时保留其细节和特征。为了生成逼真的野外图像,需要对服装,背景和目标人物进行视觉优化,这使得这项任务仍然非常具有挑战性。在本文中,我们开发了一种方法,可以使用来自野外数据集的未配对图像生成逼真的试穿图像。我们提出的方法首先使用几何转移生成模型配对图像。然后,使用改进的姿态-注意模块调整目标的姿态信息。我们将重建和内容丢失相结合,以保留转移的服装,背景和目标人物的细节和风格。我们在Fashionpedia数据集上评估了该方法,并且可以显示出比基线方法更有希望的性能。
{"title":"Pose-aware Outfit Transfer between Unpaired in-the-wild Fashion Images","authors":"Donnaphat Trakulwaranont, Marc A. Kastner, S. Satoh","doi":"10.1145/3469877.3490569","DOIUrl":"https://doi.org/10.1145/3469877.3490569","url":null,"abstract":"Virtual try-on systems became popular for visualizing outfits, due to the importance of individual fashion in many communities. The objective of such a system is to transfer a piece of clothing to another person while preserving its detail and characteristics. To generate a realistic in-the-wild image, it needs visual optimization of the clothing, background and target person, making this task still very challenging. In this paper, we develop a method that generates realistic try-on images with unpaired images from in-the-wild datasets. Our proposed method starts with generating a mock-up paired image using geometric transfer. Then, the target’s pose information is adjusted using a modified pose-attention module. We combine a reconstruction and a content loss to preserve the detail and style of the transferred clothing, background and the target person. We evaluate the approach on the Fashionpedia dataset and can show a promising performance over a baseline approach.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131313298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
ACM Multimedia Asia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1