Proceedings of the ACM Multimedia Asia最新文献

英文中文

Multi-Scale Invertible Network for Image Super-Resolution 图像超分辨率多尺度可逆网络

Proceedings of the ACM Multimedia Asia

Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366576

Zhuangzi Li, Shanshan Li, N. Zhang, Lei Wang, Ziyu Xue

Deep convolutional neural networks (CNNs) based image super-resolution approaches have reached significant success in recent years. However, due to the information-discarded nature of CNN, they inevitably suffer from information loss during the feature embedding process, in which extracted intermediate features cannot effectively represent or reconstruct the input. As a result, the super-resolved image will have large deviations in image structure with its low-resolution version, leading to inaccurate representations in some local details. In this study, we address this problem by designing an end-to-end invertible architecture that can reversely represent low-resolution images in any feature embedding level. Specifically, we propose a novel image super-resolution method, named multi-scale invertible network (MSIN) to keep information lossless and introduce multi-scale learning in a unified framework. In MSIN, a novel multi-scale invertible stack is proposed, which adopts four parallel branches to respectively capture features with different scales and keeps balanced information-interaction by branch shifting. In addition, we employee global and hierarchical feature fusion to learn elaborate and comprehensive feature representations, in order to further benefit the quality of final image reconstruction. We show the reversibility of the proposed MSIN, and extensive experiments conducted on benchmark datasets demonstrate the state-of-the-art performance of our method.

基于深度卷积神经网络(cnn)的图像超分辨率方法近年来取得了显著的成功。然而，由于CNN的信息丢弃性，在特征嵌入过程中不可避免地会出现信息丢失，提取的中间特征不能有效地表示或重构输入。因此，超分辨率图像在图像结构上与低分辨率图像存在较大偏差，导致局部细节呈现不准确。在本研究中，我们通过设计一个端到端可逆架构来解决这个问题，该架构可以在任何特征嵌入级别上反向表示低分辨率图像。具体来说，我们提出了一种新的图像超分辨方法，称为多尺度可逆网络(MSIN)，以保持信息的无损性，并在统一的框架中引入多尺度学习。在MSIN中，提出了一种新的多尺度可逆叠加，采用4个并行分支分别捕获不同尺度的特征，并通过分支移位保持信息交互平衡。此外，我们采用全局和分层特征融合来学习精细和全面的特征表示，以进一步提高最终图像重建的质量。我们展示了所提出的MSIN的可逆性，并且在基准数据集上进行的大量实验证明了我们方法的最先进性能。

{"title":"Multi-Scale Invertible Network for Image Super-Resolution","authors":"Zhuangzi Li, Shanshan Li, N. Zhang, Lei Wang, Ziyu Xue","doi":"10.1145/3338533.3366576","DOIUrl":"https://doi.org/10.1145/3338533.3366576","url":null,"abstract":"Deep convolutional neural networks (CNNs) based image super-resolution approaches have reached significant success in recent years. However, due to the information-discarded nature of CNN, they inevitably suffer from information loss during the feature embedding process, in which extracted intermediate features cannot effectively represent or reconstruct the input. As a result, the super-resolved image will have large deviations in image structure with its low-resolution version, leading to inaccurate representations in some local details. In this study, we address this problem by designing an end-to-end invertible architecture that can reversely represent low-resolution images in any feature embedding level. Specifically, we propose a novel image super-resolution method, named multi-scale invertible network (MSIN) to keep information lossless and introduce multi-scale learning in a unified framework. In MSIN, a novel multi-scale invertible stack is proposed, which adopts four parallel branches to respectively capture features with different scales and keeps balanced information-interaction by branch shifting. In addition, we employee global and hierarchical feature fusion to learn elaborate and comprehensive feature representations, in order to further benefit the quality of final image reconstruction. We show the reversibility of the proposed MSIN, and extensive experiments conducted on benchmark datasets demonstrate the state-of-the-art performance of our method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127979918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

An Efficient Parameter Optimization Algorithm and Its Application to Image De-noising 一种有效的参数优化算法及其在图像去噪中的应用

Proceedings of the ACM Multimedia Asia

Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366573

Yinhao Liu, Xiaofeng Huang, Mengting Fan, Haibing Yin

Prevailing image enhancement algorithms deliver flexible tradeoff at different level between image quality and implementation complexity, which is usually achieved via adjusting multiple algorithm parameters, i.e. multiple parameter optimization. Traditional exhaustive search over the whole solution space can resolve this optimization problem, however suffering from high search complexity caused by huge amount of multi-parameter combinations. To resolve this problem, an Energy Efficiency Ratio Model (EERM) based algorithm is proposed which is inspired from gradient decent in deep learning. To verify the effectiveness of the proposed algorithm, it is then applied to image de-noising algorithm framework based on non-local means (NLM) plus iteration. The experiment result shows that the optimal parameter combination decided by our proposed algorithm can achieve the comparable quality to that of the exhaustive search based method. Specifically, 86.7% complexity reduction can be achieved with only 0.05dB quality degradation with proposed method.

现有的图像增强算法在图像质量和实现复杂度之间提供了不同程度的灵活权衡，通常通过调整多个算法参数来实现，即多参数优化。传统的全解空间穷举搜索可以解决这一优化问题，但由于多参数组合量大，搜索复杂度高。为了解决这一问题，受深度学习中的梯度梯度的启发，提出了一种基于能效比模型(EERM)的算法。为了验证该算法的有效性，将其应用于基于非局部均值(NLM)加迭代的图像去噪算法框架中。实验结果表明，该算法确定的最优参数组合可以达到与穷举搜索方法相当的质量。具体而言，该方法在质量下降0.05dB的情况下，复杂度降低了86.7%。

引用次数: 0

Adaptive Bilinear Pooling for Fine-grained Representation Learning 细粒度表示学习的自适应双线性池

Proceedings of the ACM Multimedia Asia

Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366567

Shaobo Min, Hongtao Xie, Youliang Tian, Hantao Yao, Yongdong Zhang

Fine-grained representation learning targets to generate discriminative description for fine-grained visual objects. Recently, the bilinear feature interaction has been proved effective in generating powerful high-order representation with spatially invariant information. However, the existing methods apply a fixed feature interaction strategy to all samples, which ignore the image and region heterogeneity in a dataset. To this end, we propose a generalized feature interaction method, named Adaptive Bilinear Pooling (ABP), which can adaptively infer a suitable pooling strategy for a given sample based on image content. Specifically, ABP consists of two learning strategies: p-order learning (P-net) and spatial attention learning (S-net). The p-order learning predicts an optimal exponential coefficient rather than a fixed order number to extract moderate visual information from an image. The spatial attention learning aims to infer a weighted score that measures the importance of each local region, which can compact the image representations. To make ABP compatible with kernelized bilinear feature interaction, a crossed two-branch structure is utilized to combine the P-net and S-net. This structure can facilitate complementary information exchange between two different visual branches. The experiments on three widely used benchmarks, including fine-grained object classification and action recognition, demonstrate the effectiveness of the proposed method.

细粒度表示学习的目标是为细粒度的视觉对象生成判别描述。近年来，双线性特征相互作用被证明可以有效地生成具有空间不变信息的高阶表示。然而，现有方法对所有样本采用固定特征交互策略，忽略了数据集中图像和区域的异质性。为此，我们提出了一种广义的特征交互方法，称为自适应双线性池(ABP)，它可以根据图像内容自适应地推断出给定样本的合适池化策略。具体来说，ABP包括两种学习策略:p阶学习(P-net)和空间注意学习(S-net)。p阶学习预测一个最优的指数系数，而不是一个固定的阶数，从图像中提取适度的视觉信息。空间注意学习的目的是推断一个加权分数，衡量每个局部区域的重要性，从而压缩图像表征。为了使ABP兼容核化双线性特征交互，采用交叉双分支结构将p -网和s -网结合起来。这种结构可以促进两个不同视觉分支之间的互补信息交换。在细粒度目标分类和动作识别这三个广泛使用的基准上进行了实验，验证了该方法的有效性。

{"title":"Adaptive Bilinear Pooling for Fine-grained Representation Learning","authors":"Shaobo Min, Hongtao Xie, Youliang Tian, Hantao Yao, Yongdong Zhang","doi":"10.1145/3338533.3366567","DOIUrl":"https://doi.org/10.1145/3338533.3366567","url":null,"abstract":"Fine-grained representation learning targets to generate discriminative description for fine-grained visual objects. Recently, the bilinear feature interaction has been proved effective in generating powerful high-order representation with spatially invariant information. However, the existing methods apply a fixed feature interaction strategy to all samples, which ignore the image and region heterogeneity in a dataset. To this end, we propose a generalized feature interaction method, named Adaptive Bilinear Pooling (ABP), which can adaptively infer a suitable pooling strategy for a given sample based on image content. Specifically, ABP consists of two learning strategies: p-order learning (P-net) and spatial attention learning (S-net). The p-order learning predicts an optimal exponential coefficient rather than a fixed order number to extract moderate visual information from an image. The spatial attention learning aims to infer a weighted score that measures the importance of each local region, which can compact the image representations. To make ABP compatible with kernelized bilinear feature interaction, a crossed two-branch structure is utilized to combine the P-net and S-net. This structure can facilitate complementary information exchange between two different visual branches. The experiments on three widely used benchmarks, including fine-grained object classification and action recognition, demonstrate the effectiveness of the proposed method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115665816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Stop Hiding Behind Windshield: A Windshield Image Enhancer Based on a Two-way Generative Adversarial Network 停止隐藏在挡风玻璃后面:基于双向生成对抗网络的挡风玻璃图像增强器

Proceedings of the ACM Multimedia Asia

Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366559

Chi-Rung Chang, K. Lung, Yi-Chung Chen, Zhi-Kai Huang, Hong-Han Shuai, Wen-Huang Cheng

Windshield images captured by surveillance cameras are usually difficult to be seen through due to severe image degradation such as reflection, motion blur, low light, haze, and noise. Such image degradation hinders the capability of identifying and tracking people. In this paper, we aim to address this challenging windshield images enhancement task by presenting a novel deep learning model based on a two-way generative adversarial network, called Two-way Individual Normalization Perceptual Adversarial Network, TWIN-PAN. TWIN-PAN is an unpaired learning network which does not require pairs of degraded and corresponding ground truth images for training. Also, unlike existing image restoration algorithms which only address one specific type of degradation at once, TWIN-PAN can restore the image from various types of degradation. To restore the content inside the extremely degraded windshield and ensure the semantic consistency of the image, we introduce cyclic perceptual loss to the network and combine it with cycle-consistency loss. Moreover, to generate better restoration images, we introduce individual instance normalization layers for the generators, which can help our generators better adapt to their own input distributions. Furthermore, we collect a large high-quality windshield image dataset (WIE-Dataset) to train our network and to validate the robustness of our method in restoring degraded windshield images. Experimental results on human detection, vehicle ReID and user study manifest that the proposed method is effective for windshield image restoration.

监控摄像机捕捉到的挡风玻璃图像由于反射、运动模糊、弱光、雾霾、噪声等严重的图像退化，通常很难被看到。这种图像退化阻碍了识别和跟踪人的能力。在本文中，我们的目标是通过提出一种基于双向生成对抗网络的新型深度学习模型来解决这一具有挑战性的挡风玻璃图像增强任务，称为双向个体归一化感知对抗网络，TWIN-PAN。TWIN-PAN是一种非配对学习网络，它不需要对退化的和相应的地面真值图像进行训练。此外，与现有的图像恢复算法一次只能处理一种特定类型的退化不同，TWIN-PAN可以从各种类型的退化中恢复图像。为了恢复极度退化的挡风玻璃内部的内容并保证图像的语义一致性，我们将循环感知损失引入网络，并将其与循环一致性损失相结合。此外，为了生成更好的恢复图像，我们为生成器引入了单独的实例规范化层，这可以帮助我们的生成器更好地适应它们自己的输入分布。此外，我们收集了一个大型的高质量挡风玻璃图像数据集(WIE-Dataset)来训练我们的网络，并验证我们的方法在恢复退化的挡风玻璃图像方面的鲁棒性。人体检测、车辆ReID和用户研究的实验结果表明，该方法对挡风玻璃图像的恢复是有效的。

{"title":"Stop Hiding Behind Windshield: A Windshield Image Enhancer Based on a Two-way Generative Adversarial Network","authors":"Chi-Rung Chang, K. Lung, Yi-Chung Chen, Zhi-Kai Huang, Hong-Han Shuai, Wen-Huang Cheng","doi":"10.1145/3338533.3366559","DOIUrl":"https://doi.org/10.1145/3338533.3366559","url":null,"abstract":"Windshield images captured by surveillance cameras are usually difficult to be seen through due to severe image degradation such as reflection, motion blur, low light, haze, and noise. Such image degradation hinders the capability of identifying and tracking people. In this paper, we aim to address this challenging windshield images enhancement task by presenting a novel deep learning model based on a two-way generative adversarial network, called Two-way Individual Normalization Perceptual Adversarial Network, TWIN-PAN. TWIN-PAN is an unpaired learning network which does not require pairs of degraded and corresponding ground truth images for training. Also, unlike existing image restoration algorithms which only address one specific type of degradation at once, TWIN-PAN can restore the image from various types of degradation. To restore the content inside the extremely degraded windshield and ensure the semantic consistency of the image, we introduce cyclic perceptual loss to the network and combine it with cycle-consistency loss. Moreover, to generate better restoration images, we introduce individual instance normalization layers for the generators, which can help our generators better adapt to their own input distributions. Furthermore, we collect a large high-quality windshield image dataset (WIE-Dataset) to train our network and to validate the robustness of our method in restoring degraded windshield images. Experimental results on human detection, vehicle ReID and user study manifest that the proposed method is effective for windshield image restoration.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130696345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Domain Specific and Idiom Adaptive Video Summarization 特定领域和习语自适应视频摘要

Proceedings of the ACM Multimedia Asia

Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366603

Yi Dong, Chang Liu, Zhiqi Shen, Zhanning Gao, Pan Wang, Changgong Zhang, Peiran Ren, Xuansong Xie, Han Yu, Qingming Huang

As short videos become an increasingly popular form of storytelling, there is a growing demand for video summarization to convey information concisely with a subset of video frames. Some criteria such as interestingness and diversity are used by existing efforts to pick appropriate segments of content. However, there lacks a mechanism to infuse insights from cinematography and persuasion into this process. As a result, the results of the video summarization sometimes deviate from the original. In addition, the exploration of the vast design space to create customized video summaries is costly for video producer. To address these challenges, we propose a domain specific and idiom adaptive video summarization approach. Specifically, our approach first segments the input video and extracts high-level information from each segment. Such labels are used to represent a collection of idioms and summarization metrics as submodular components which users can combine to create personalized summary styles in a variety of ways. In order to identify the importance of the idioms and metrics in different domains, we leverage max margin learning. Experimental results have validated the effectiveness of our approach. We also plan to release a dataset containing over 600 videos with expert annotations which can benefit further research in this area.

随着短视频成为一种越来越流行的叙事形式，人们对视频摘要的需求越来越大，希望通过视频帧的子集来简洁地传达信息。一些标准，如趣味性和多样性，被现有的努力用来挑选适当的内容片段。然而，缺乏一种机制来将电影摄影和说服的见解注入到这个过程中。因此，视频总结的结果有时会偏离原来的内容。此外，探索广阔的设计空间来制作定制化的视频摘要，对于视频制作者来说是非常昂贵的。为了解决这些问题，我们提出了一种针对特定领域和习语的视频摘要方法。具体来说，我们的方法首先对输入视频进行分割，并从每个片段中提取高级信息。这样的标签用于将一组习惯用法和摘要度量表示为子模块组件，用户可以将这些组件组合起来，以各种方式创建个性化的摘要样式。为了确定习语和度量在不同领域的重要性，我们利用最大边际学习。实验结果验证了该方法的有效性。我们还计划发布一个包含超过600个带有专家注释的视频的数据集，这有助于该领域的进一步研究。

{"title":"Domain Specific and Idiom Adaptive Video Summarization","authors":"Yi Dong, Chang Liu, Zhiqi Shen, Zhanning Gao, Pan Wang, Changgong Zhang, Peiran Ren, Xuansong Xie, Han Yu, Qingming Huang","doi":"10.1145/3338533.3366603","DOIUrl":"https://doi.org/10.1145/3338533.3366603","url":null,"abstract":"As short videos become an increasingly popular form of storytelling, there is a growing demand for video summarization to convey information concisely with a subset of video frames. Some criteria such as interestingness and diversity are used by existing efforts to pick appropriate segments of content. However, there lacks a mechanism to infuse insights from cinematography and persuasion into this process. As a result, the results of the video summarization sometimes deviate from the original. In addition, the exploration of the vast design space to create customized video summaries is costly for video producer. To address these challenges, we propose a domain specific and idiom adaptive video summarization approach. Specifically, our approach first segments the input video and extracts high-level information from each segment. Such labels are used to represent a collection of idioms and summarization metrics as submodular components which users can combine to create personalized summary styles in a variety of ways. In order to identify the importance of the idioms and metrics in different domains, we leverage max margin learning. Experimental results have validated the effectiveness of our approach. We also plan to release a dataset containing over 600 videos with expert annotations which can benefit further research in this area.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"27 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113962432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Manifold Alignment with Multi-graph Embedding 多图嵌入的流形对齐

Proceedings of the ACM Multimedia Asia

Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366588

Chang-Bin Huang, Timothy Apasiba Abeo, Xiang-jun Shen

In this paper, a novel manifold alignment approach via multi-graph embedding (MA-MGE) is proposed. Different from the traditional manifold alignment algorithms that use a single graph to describe the latent manifold structure of each dataset, our approach utilizes multiple graphs for modeling multiple local manifolds in multi-view data alignment. Therefore a composite manifold representation with complete and more useful information is obtained from each dataset through a dynamic reconstruction of multiple graphs. Experimental results on Protein and Face-10 datasets demonstrate that the mapping coordinates of the proposed method provide better alignment performance compared to the state-of-the-art methods, such as semi-supervised manifold alignment (SS-MA), manifold alignment using Procrustes analysis (PAMA) and manifold alignment without correspondence (UNMA).

提出了一种基于多图嵌入的流形对齐方法。与传统流形对齐算法使用单个图来描述每个数据集的潜在流形结构不同，我们的方法在多视图数据对齐中使用多个图来建模多个局部流形。因此，通过对多个图的动态重构，从每个数据集中获得一个具有完整和更有用信息的复合流形表示。在Protein和Face-10数据集上的实验结果表明，与半监督流形对齐(SS-MA)、使用Procrustes分析的流形对齐(PAMA)和无对应流形对齐(UNMA)等最新方法相比，该方法的映射坐标具有更好的对齐性能。

引用次数: 0

RSC-DGS: Fusion of RGB and NIR Images Using Robust Spectral Consistency and Dynamic Gradient Sparsity RSC-DGS:基于鲁棒光谱一致性和动态梯度稀疏性的RGB和NIR图像融合

Proceedings of the ACM Multimedia Asia

Pub Date : 2019-12-15 DOI: 10.1145/3338533.3368261

Shengtao Yu, Cheolkon Jung, Kailong Zhou, Chen Su

Color (RGB) images captured under low light condition contain much noise with loss of textures. Since near-infrared (NIR) images are robust to noise with clear textures even in low light condition, they can be used to enhance low light RGB images by image fusion. In this paper, we propose fusion of RGB and NIR images using robust spectral consistency (RSC) and dynamic gradient sparsity (DGS), called RSC-DGS. We build the RSC model based on a robust error function to remove noise and preserve color/spectral consistency. We construct the DGS model based on vectorial total variation minimization that uses the NIR image as the reference image. The DGS model transfers clear textures of the NIR image to the fusion result and successfully preserves cross-channel interdependency of the RGB image. We use alternating direction method of multipliers (ADMM) for efficiency to solve the proposed RSC-DGS fusion. Experimental results confirm that the proposed method effectively preserves color/spectral consistency and textures in fusion results while successfully removing noise with high computational efficiency.

在弱光条件下拍摄的彩色(RGB)图像含有大量的噪声和纹理损失。由于近红外(NIR)图像即使在弱光条件下也具有清晰纹理的鲁棒性，因此可以通过图像融合来增强弱光RGB图像。在本文中，我们提出了使用鲁棒光谱一致性(RSC)和动态梯度稀疏性(DGS)(称为RSC-DGS)来融合RGB和NIR图像。我们建立了基于鲁棒误差函数的RSC模型，以去除噪声并保持颜色/光谱一致性。我们以近红外图像为参考图像，构建了基于矢量总变差最小化的DGS模型。DGS模型将近红外图像的清晰纹理转移到融合结果中，并成功地保留了RGB图像的跨通道相互依赖性。为了提高效率，我们采用了交替方向乘法器(ADMM)来解决RSC-DGS融合问题。实验结果表明，该方法有效地保留了融合结果的颜色/光谱一致性和纹理，同时成功地消除了噪声，计算效率高。

引用次数: 1

Dense Attention Network for Facial Expression Recognition in the Wild 面向野外面部表情识别的密集注意网络

Proceedings of the ACM Multimedia Asia

Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366568

Cong Wang, K. Lu, Jian Xue, Yanfu Yan

Recognizing facial expression is significant for human-computer interaction system and other applications. A certain number of facial expression datasets have been published in recent decades and helped with the improvements for emotion classification algorithms. However, recognition of the realistic expressions in the wild is still challenging because of uncontrolled lighting, brightness, pose, occlusion, etc. In this paper, we propose an attention mechanism based module which can help the network focus on the emotion-related locations. Furthermore, we produce two network structures named DenseCANet and DenseSANet by using the attention modules based on the backbone of DenseNet. Then these two networks and original DenseNet are trained on wild dataset AffectNet and lab-controlled dataset CK+. Experimental results show that the DenseSANet has improved the performance on both datasets comparing with the state-of-the-art methods.

面部表情识别对于人机交互系统和其他应用具有重要意义。近几十年来，已经发表了一定数量的面部表情数据集，并帮助改进了情绪分类算法。然而，由于光照、亮度、姿势、遮挡等不受控制，野外真实表情的识别仍然具有挑战性。在本文中，我们提出了一个基于注意机制的模块，可以帮助网络关注与情绪相关的位置。在此基础上，利用基于DenseNet主干的注意力模块，构建了DenseCANet和DenseSANet两种网络结构。然后在野外数据集AffectNet和实验室控制数据集CK+上对这两个网络和原始DenseNet进行训练。实验结果表明，与目前的方法相比，DenseSANet在两个数据集上的性能都有所提高。

引用次数: 3

Attention-Aware Feature Pyramid Ordinal Hashing for Image Retrieval 用于图像检索的注意力感知特征金字塔序数哈希

Proceedings of the ACM Multimedia Asia

Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366598

Xie Sun, Lu Jin, Zechao Li

Due to the effectiveness of representation learning, deep hashing methods have attracted increasing attention in image retrieval. However, most existing deep hashing methods merely encode the raw information of the last layer for hash learning, which result in the following deficiencies: (1) the useful information from the preceding-layer is not fully exploited; (2) the local salient information of the image is neglected. To this end, we propose a novel deep hashing method, called Attention-Aware Feature Pyramid Ordinal Hashing (AFPH), which explores both the visual structure information and semantic information from different convolutional layers. Specifically, two feature pyramids based on spatial and channel attention are well constructed to capture the local salient structure from multiple scales. Moreover, a multi-scale feature fusion strategy is proposed to aggregate the feature maps from multi-level pyramidal layers to generate the discriminative feature for ranking-based hashing. The experimental results conducted on two widely-used image retrieval datasets demonstrate the superiority of our method.

由于表示学习的有效性，深度哈希方法在图像检索中受到越来越多的关注。然而，大多数现有的深度哈希方法仅仅对最后一层的原始信息进行编码进行哈希学习，这导致了以下不足:(1)没有充分利用前一层的有用信息;(2)忽略图像的局部显著信息。为此，我们提出了一种新的深度哈希方法，称为注意力感知特征金字塔序数哈希(AFPH)，该方法从不同的卷积层中探索视觉结构信息和语义信息。具体而言，构建了两个基于空间和通道注意的特征金字塔，从多个尺度捕捉局部显著结构。此外，提出了一种多尺度特征融合策略，将多层金字塔层的特征映射聚合在一起，生成判别特征进行排序哈希。在两个广泛使用的图像检索数据集上进行的实验结果表明了该方法的优越性。

引用次数: 4

NRQQA NRQQA

Proceedings of the ACM Multimedia Asia

Pub Date : 2019-12-15 DOI: 10.1145/3338533.3366563

Shengju Yu, Tiansong Li, Xiaoyu Xu, Hao Tao, Li Yu, Yixuan Wang

Image stitching technology has been widely used in immersive applications, such as 3D modeling, VR and AR. The quality of stitching results is crucial. At present, the objective quality assessment methods of stitched images are mainly based on the availability of ground truth (i.e., Full-Reference). However, in most cases, ground truth is unavailable. In this paper, a no-reference quality assessment metric specifically designed for stitched images is proposed. We first find out the corresponding parts of source images in the stitched image. Then, the isolated points and the outer points generated by spherical projection are eliminated. After that, we take advantage of the bounding rectangle of stitching seams to locate the position of overlapping regions in the stitched image. Finally, the assessment of overlapping regions is taken as the final scoring result. Extensive experiments have shown that our scores are consistent with human vision. Even for the nuances that cannot be distinguished by human eyes, our proposed metric is also effective.

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the ACM Multimedia Asia

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀