首页 > 最新文献

Proceedings of the 2nd ACM International Conference on Multimedia in Asia最新文献

英文 中文
Similar scene retrieval in soccer videos with weak annotations by multimodal use of bidirectional LSTM 基于双向LSTM的弱注释足球视频相似场景检索
Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446280
T. Haruyama, Sho Takahashi, Takahiro Ogawa, M. Haseyama
This paper presents a novel method to retrieve similar scenes in soccer videos with weak annotations via multimodal use of bidirectional long short-term memory (BiLSTM). The significant increase in the number of different types of soccer videos with the development of technology brings valid assets for effective coaching, but it also increases the work of players and training staff. We tackle this problem with a nontraditional combination of pre-trained models for feature extraction and BiLSTMs for feature transformation. By using the pre-trained models, no training data is required for feature extraction. Then effective feature transformation for similarity calculation is performed by applying BiLSTM trained with weak annotations. This transformation allows for highly accurate capture of soccer video context from less annotation work. In this paper, we achieve an accurate retrieval of similar scenes by multimodal use of this BiLSTM-based transformer trainable with less human effort. The effectiveness of our method was verified by comparative experiments with state-of-the-art using actual soccer video dataset.
提出了一种基于双向长短期记忆(BiLSTM)的多模态弱注释足球视频相似场景检索方法。随着技术的发展,不同类型的足球视频的数量显著增加,为有效的教练带来了有效的资产,但同时也增加了球员和训练人员的工作量。我们采用了一种非传统的组合方法来解决这个问题,即使用预训练模型进行特征提取,使用bilstm进行特征转换。通过使用预训练模型,特征提取不需要训练数据。然后利用弱标注训练的BiLSTM进行有效的特征变换,进行相似度计算。这种转换允许从较少的注释工作中高度准确地捕获足球视频上下文。在本文中,我们通过多模态使用这种基于bilstm的可训练变压器实现了相似场景的准确检索,并且减少了人工的工作量。通过实际足球视频数据集与最新技术的对比实验,验证了该方法的有效性。
{"title":"Similar scene retrieval in soccer videos with weak annotations by multimodal use of bidirectional LSTM","authors":"T. Haruyama, Sho Takahashi, Takahiro Ogawa, M. Haseyama","doi":"10.1145/3444685.3446280","DOIUrl":"https://doi.org/10.1145/3444685.3446280","url":null,"abstract":"This paper presents a novel method to retrieve similar scenes in soccer videos with weak annotations via multimodal use of bidirectional long short-term memory (BiLSTM). The significant increase in the number of different types of soccer videos with the development of technology brings valid assets for effective coaching, but it also increases the work of players and training staff. We tackle this problem with a nontraditional combination of pre-trained models for feature extraction and BiLSTMs for feature transformation. By using the pre-trained models, no training data is required for feature extraction. Then effective feature transformation for similarity calculation is performed by applying BiLSTM trained with weak annotations. This transformation allows for highly accurate capture of soccer video context from less annotation work. In this paper, we achieve an accurate retrieval of similar scenes by multimodal use of this BiLSTM-based transformer trainable with less human effort. The effectiveness of our method was verified by comparative experiments with state-of-the-art using actual soccer video dataset.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124449118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Low-quality watermarked face inpainting with discriminative residual learning 基于判别残差学习的低质量水印人脸修复
Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446261
Zheng He, Xueli Wei, Kangli Zeng, Zhen Han, Qin Zou, Zhongyuan Wang
Most existing image inpainting methods assume that the location of the repair area (watermark) is known, but this assumption does not always hold. In addition, the actual watermarked face is in a compressed low-quality form, which is very disadvantageous to the repair due to compression distortion effects. To address these issues, this paper proposes a low-quality watermarked face inpainting method based on joint residual learning with cooperative discriminant network. We first employ residual learning based global inpainting and facial features based local inpainting to render clean and clear faces under unknown watermark positions. Because the repair process may distort the genuine face, we further propose a discriminative constraint network to maintain the fidelity of repaired faces. Experimentally, the average PSNR of inpainted face images is increased by 4.16dB, and the average SSIM is increased by 0.08. TPR is improved by 16.96% when FPR is 10% in face verification.
大多数现有的图像修复方法都假设修复区域(水印)的位置是已知的,但这一假设并不总是成立。另外,实际的水印面是压缩后的低质量形式,由于压缩失真的影响,对修复非常不利。针对这些问题,本文提出了一种基于联合残差学习和协同判别网络的低质量水印人脸修复方法。我们首先采用残差学习的全局补图和基于人脸特征的局部补图,在未知水印位置下绘制出干净清晰的人脸。由于修复过程可能会扭曲真实的人脸,我们进一步提出了一种判别约束网络来保持修复后人脸的保真度。实验结果表明,人脸图像的平均PSNR提高了4.16dB,平均SSIM提高了0.08。在人脸验证中,当FPR为10%时,TPR提高了16.96%。
{"title":"Low-quality watermarked face inpainting with discriminative residual learning","authors":"Zheng He, Xueli Wei, Kangli Zeng, Zhen Han, Qin Zou, Zhongyuan Wang","doi":"10.1145/3444685.3446261","DOIUrl":"https://doi.org/10.1145/3444685.3446261","url":null,"abstract":"Most existing image inpainting methods assume that the location of the repair area (watermark) is known, but this assumption does not always hold. In addition, the actual watermarked face is in a compressed low-quality form, which is very disadvantageous to the repair due to compression distortion effects. To address these issues, this paper proposes a low-quality watermarked face inpainting method based on joint residual learning with cooperative discriminant network. We first employ residual learning based global inpainting and facial features based local inpainting to render clean and clear faces under unknown watermark positions. Because the repair process may distort the genuine face, we further propose a discriminative constraint network to maintain the fidelity of repaired faces. Experimentally, the average PSNR of inpainted face images is increased by 4.16dB, and the average SSIM is increased by 0.08. TPR is improved by 16.96% when FPR is 10% in face verification.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127980973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Two-stage structure aware image inpainting based on generative adversarial networks 基于生成对抗网络的两阶段结构感知图像绘制
Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446260
Jin Wang, Xi Zhang, Chen Wang, Qing Zhu, Baocai Yin
In recent years, the image inpainting technology based on deep learning has made remarkable progress, which can better complete the complex image inpainting task compared with traditional methods. However, most of the existing methods can not generate reasonable structure and fine texture details at the same time. To solve this problem, in this paper we propose a two-stage image inpainting method with structure awareness based on Generative Adversarial Networks, which divides the inpainting process into two sub tasks, namely, image structure generation and image content generation. In the former stage, the network generates the structural information of the missing area; while in the latter stage, the network uses this structural information as a prior, and combines the existing texture and color information to complete the image. Extensive experiments are conducted to evaluate the performance of our proposed method on Places2, CelebA and Paris Streetview datasets. The experimental results show the superior performance of the proposed method compared with other state-of-the-art methods qualitatively and quantitatively.
近年来,基于深度学习的图像绘制技术取得了显著的进步,与传统方法相比,可以更好地完成复杂的图像绘制任务。然而,现有的方法大多不能同时生成合理的结构和精细的纹理细节。为了解决这一问题,本文提出了一种基于生成式对抗网络的具有结构感知的两阶段图像绘制方法,该方法将图像绘制过程分为图像结构生成和图像内容生成两个子任务。在前一阶段,网络生成缺失区域的结构信息;在后一阶段,网络将这些结构信息作为先验信息,并结合已有的纹理和颜色信息来完成图像。我们进行了大量的实验来评估我们提出的方法在Places2、CelebA和巴黎街景数据集上的性能。实验结果表明,该方法在定性和定量上均优于其他先进方法。
{"title":"Two-stage structure aware image inpainting based on generative adversarial networks","authors":"Jin Wang, Xi Zhang, Chen Wang, Qing Zhu, Baocai Yin","doi":"10.1145/3444685.3446260","DOIUrl":"https://doi.org/10.1145/3444685.3446260","url":null,"abstract":"In recent years, the image inpainting technology based on deep learning has made remarkable progress, which can better complete the complex image inpainting task compared with traditional methods. However, most of the existing methods can not generate reasonable structure and fine texture details at the same time. To solve this problem, in this paper we propose a two-stage image inpainting method with structure awareness based on Generative Adversarial Networks, which divides the inpainting process into two sub tasks, namely, image structure generation and image content generation. In the former stage, the network generates the structural information of the missing area; while in the latter stage, the network uses this structural information as a prior, and combines the existing texture and color information to complete the image. Extensive experiments are conducted to evaluate the performance of our proposed method on Places2, CelebA and Paris Streetview datasets. The experimental results show the superior performance of the proposed method compared with other state-of-the-art methods qualitatively and quantitatively.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134539340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention feature matching for weakly-supervised video relocalization 弱监督视频重定位的注意特征匹配
Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446317
Haoyu Tang, Jihua Zhu, Zan Gao, Tao Zhuo, Zhiyong Cheng
Localizing the desired video clip for a given query in an untrimmed video has been a hot research topic for multimedia understanding. Recently, a new task named video relocalization, in which the query is a video clip, has been raised. Some methods have been developed for this task, however, these methods often require dense annotations of the temporal boundaries inside long videos for training. A more practical solution is the weakly-supervised approach, which only needs the matching information between the query and video. Motivated by that, we propose a weakly-supervised video relocalization approach based on an attention-based feature matching method. Specifically, it recognizes the video clip by finding the clip whose frames are the most relevant to the query clip frames based on the matching results of the frame embeddings. In addition, an attention module is introduced to identify the frames containing rich semantic correlations in the query video. Extensive experiments on the ActivityNet dataset demonstrate that our method can outperform several weakly-supervised methods consistently and even achieve competing performance to supervised baselines.
在未修剪的视频中,为给定查询定位所需的视频片段一直是多媒体理解的一个热门研究课题。最近,提出了一个名为视频重定位的新任务,其中查询是一个视频片段。针对这一任务已经开发了一些方法,然而,这些方法通常需要对长视频内的时间边界进行密集注释以进行训练。一种更实用的解决方案是弱监督方法,它只需要查询和视频之间的匹配信息。基于此,我们提出了一种基于注意力特征匹配的弱监督视频定位方法。具体来说,它通过基于帧嵌入的匹配结果找到与查询片段帧最相关的片段来识别视频片段。此外,引入了注意模块来识别查询视频中包含丰富语义相关性的帧。在ActivityNet数据集上进行的大量实验表明,我们的方法可以始终优于几种弱监督方法,甚至可以达到与监督基线竞争的性能。
{"title":"Attention feature matching for weakly-supervised video relocalization","authors":"Haoyu Tang, Jihua Zhu, Zan Gao, Tao Zhuo, Zhiyong Cheng","doi":"10.1145/3444685.3446317","DOIUrl":"https://doi.org/10.1145/3444685.3446317","url":null,"abstract":"Localizing the desired video clip for a given query in an untrimmed video has been a hot research topic for multimedia understanding. Recently, a new task named video relocalization, in which the query is a video clip, has been raised. Some methods have been developed for this task, however, these methods often require dense annotations of the temporal boundaries inside long videos for training. A more practical solution is the weakly-supervised approach, which only needs the matching information between the query and video. Motivated by that, we propose a weakly-supervised video relocalization approach based on an attention-based feature matching method. Specifically, it recognizes the video clip by finding the clip whose frames are the most relevant to the query clip frames based on the matching results of the frame embeddings. In addition, an attention module is introduced to identify the frames containing rich semantic correlations in the query video. Extensive experiments on the ActivityNet dataset demonstrate that our method can outperform several weakly-supervised methods consistently and even achieve competing performance to supervised baselines.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114721430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Graph-based motion prediction for abnormal action detection 基于图的异常动作检测运动预测
Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446316
Yao Tang, Lin Zhao, Zhaoliang Yao, Chen Gong, Jian Yang
Abnormal action detection is the most noteworthy part of anomaly detection, which tries to identify unusual human behaviors in videos. Previous methods typically utilize future frame prediction to detect frames deviating from the normal scenario. While this strategy enjoys success in the accuracy of anomaly detection, critical information such as the cause and location of the abnormality is unable to be acquired. This paper proposes human motion prediction for abnormal action detection. We employ sequence of human poses to represent human motion, and detect irregular behavior by comparing the predicted pose with the actual pose detected in the frame. Hence the proposed method is able to explain why the action is regarded as irregularity and locate where the anomaly happens. Moreover, pose sequence is robust to noise, complex background and small targets in videos. Since posture information is non-Euclidean data, graph convolutional network is adopted for future pose prediction, which not only leads to greater expressive power but also stronger generalization capability. Experiments are conducted both on the widely used anomaly detection dataset ShanghaiTech and our newly proposed dataset NJUST-Anomaly, which mainly contains irregular behaviors happened in the campus. Our dataset expands the existing datasets by giving more abnormal actions attracting public attention in social security, which happen in more complex scenes and dynamic backgrounds. Experimental results on both datasets demonstrate the superiority of our method over the-state-of-the-art methods. The source code and NJUST-Anomaly dataset will be made public at https://github.com/datangzhengqing/MP-GCN.
异常动作检测是异常检测中最值得关注的部分,它试图识别视频中人类的异常行为。以前的方法通常利用未来帧预测来检测偏离正常场景的帧。虽然该策略在异常检测的准确性方面取得了成功,但无法获得异常原因和位置等关键信息。提出了一种用于异常动作检测的人体运动预测方法。我们使用人体姿态序列来表示人体运动,并通过将预测姿态与帧中检测到的实际姿态进行比较来检测不规则行为。因此,所提出的方法能够解释为什么动作被认为是不正常的,并定位异常发生的位置。此外,姿态序列对视频中的噪声、复杂背景和小目标具有较强的鲁棒性。由于姿态信息是非欧氏数据,未来姿态预测采用图卷积网络,不仅表达能力更强,而且泛化能力更强。实验分别在应用广泛的异常检测数据集ShanghaiTech和我们新提出的数据集NJUST-Anomaly上进行,NJUST-Anomaly主要包含校园内发生的不规则行为。我们的数据集扩展了现有的数据集,给出了更多社会保障中引起公众关注的异常行为,这些异常行为发生在更复杂的场景和动态背景中。在两个数据集上的实验结果表明,我们的方法优于最先进的方法。源代码和NJUST-Anomaly数据集将在https://github.com/datangzhengqing/MP-GCN上公开。
{"title":"Graph-based motion prediction for abnormal action detection","authors":"Yao Tang, Lin Zhao, Zhaoliang Yao, Chen Gong, Jian Yang","doi":"10.1145/3444685.3446316","DOIUrl":"https://doi.org/10.1145/3444685.3446316","url":null,"abstract":"Abnormal action detection is the most noteworthy part of anomaly detection, which tries to identify unusual human behaviors in videos. Previous methods typically utilize future frame prediction to detect frames deviating from the normal scenario. While this strategy enjoys success in the accuracy of anomaly detection, critical information such as the cause and location of the abnormality is unable to be acquired. This paper proposes human motion prediction for abnormal action detection. We employ sequence of human poses to represent human motion, and detect irregular behavior by comparing the predicted pose with the actual pose detected in the frame. Hence the proposed method is able to explain why the action is regarded as irregularity and locate where the anomaly happens. Moreover, pose sequence is robust to noise, complex background and small targets in videos. Since posture information is non-Euclidean data, graph convolutional network is adopted for future pose prediction, which not only leads to greater expressive power but also stronger generalization capability. Experiments are conducted both on the widely used anomaly detection dataset ShanghaiTech and our newly proposed dataset NJUST-Anomaly, which mainly contains irregular behaviors happened in the campus. Our dataset expands the existing datasets by giving more abnormal actions attracting public attention in social security, which happen in more complex scenes and dynamic backgrounds. Experimental results on both datasets demonstrate the superiority of our method over the-state-of-the-art methods. The source code and NJUST-Anomaly dataset will be made public at https://github.com/datangzhengqing/MP-GCN.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"375 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126719536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Multiplicative angular margin loss for text-based person search 基于文本的人员搜索的乘法角边损失
Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446314
Peng Zhang, Deqiang Ouyang, Feiyu Chen, Jie Shao
Text-based person search aims at retrieving the most relevant pedestrian images from database in response to a query in form of natural language description. Existing algorithms mainly focus on embedding textual and visual features into a common semantic space so that the similarity score of features from different modalities can be computed directly. Softmax loss is widely adopted to classify textual and visual features into a correct category in the joint embedding space. However, softmax loss can only help classify features but not increase the intra-class compactness and inter-class discrepancy. To this end, we propose multiplicative angular margin (MAM) loss to learn angularly discriminative features for each identity. The multiplicative angular margin loss penalizes the angle between feature vector and its corresponding classifier vector to learn more discriminative feature. Moreover, to focus more on informative image-text pair, we propose pairwise similarity weighting (PSW) loss to assign higher weight to informative pairs. Extensive experimental evaluations have been conducted on the CUHK-PEDES dataset over our proposed losses. The results show the superiority of our proposed method. Code is available at https://github.com/pengzhanguestc/MAM_loss.
基于文本的人物搜索旨在以自然语言描述的形式从数据库中检索与查询最相关的行人图像。现有算法主要是将文本特征和视觉特征嵌入到一个共同的语义空间中,从而直接计算不同模态特征的相似度得分。Softmax loss被广泛用于在联合嵌入空间中将文本和视觉特征分类到正确的类别中。然而,softmax损失只能帮助分类特征,而不能增加类内紧密度和类间差异。为此,我们提出了乘法角边损失(MAM)来学习每个恒等式的角判别特征。乘法角边损失对特征向量与其对应的分类器向量之间的角度进行惩罚,以学习更多的判别特征。此外,为了更多地关注信息图像-文本对,我们提出了成对相似加权(PSW)损失,为信息对分配更高的权重。针对我们提出的损失,我们在中大- pedes数据集上进行了广泛的实验评估。结果表明了该方法的优越性。代码可从https://github.com/pengzhanguestc/MAM_loss获得。
{"title":"Multiplicative angular margin loss for text-based person search","authors":"Peng Zhang, Deqiang Ouyang, Feiyu Chen, Jie Shao","doi":"10.1145/3444685.3446314","DOIUrl":"https://doi.org/10.1145/3444685.3446314","url":null,"abstract":"Text-based person search aims at retrieving the most relevant pedestrian images from database in response to a query in form of natural language description. Existing algorithms mainly focus on embedding textual and visual features into a common semantic space so that the similarity score of features from different modalities can be computed directly. Softmax loss is widely adopted to classify textual and visual features into a correct category in the joint embedding space. However, softmax loss can only help classify features but not increase the intra-class compactness and inter-class discrepancy. To this end, we propose multiplicative angular margin (MAM) loss to learn angularly discriminative features for each identity. The multiplicative angular margin loss penalizes the angle between feature vector and its corresponding classifier vector to learn more discriminative feature. Moreover, to focus more on informative image-text pair, we propose pairwise similarity weighting (PSW) loss to assign higher weight to informative pairs. Extensive experimental evaluations have been conducted on the CUHK-PEDES dataset over our proposed losses. The results show the superiority of our proposed method. Code is available at https://github.com/pengzhanguestc/MAM_loss.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114232488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Defense for adversarial videos by self-adaptive JPEG compression and optical texture 通过自适应JPEG压缩和光学纹理防御对抗性视频
Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446308
Yupeng Cheng, Xingxing Wei, H. Fu, Shang-Wei Lin, Weisi Lin
Despite demonstrated outstanding effectiveness in various computer vision tasks, Deep Neural Networks (DNNs) are known to be vulnerable to adversarial examples. Nowadays, adversarial attacks as well as their defenses w.r.t. DNNs in image domain have been intensively studied, and there are some recent works starting to explore adversarial attacks w.r.t. DNNs in video domain. However, the corresponding defense is rarely studied. In this paper, we propose a new two-stage framework for defending video adversarial attack. It contains two main components, namely self-adaptive Joint Photographic Experts Group (JPEG) compression defense and optical texture based defense (OTD). In self-adaptive JPEG compression defense, we propose to adaptively choose an appropriate JPEG quality based on an estimation of moving foreground object, such that the JPEG compression could depress most impact of adversarial noise without losing too much video quality. In OTD, we generate "optical texture" containing high-frequency information based on the optical flow map, and use it to edit Y channel (in YCrCb color space) of input frames, thus further reducing the influence of adversarial perturbation. Experimental results on a benchmark dataset demonstrate the effectiveness of our framework in recovering the classification performance on perturbed videos.
尽管深度神经网络(dnn)在各种计算机视觉任务中表现出出色的有效性,但众所周知,它很容易受到对抗性示例的影响。目前,图像域的对抗性攻击及其防御已经得到了广泛的研究,最近也有一些研究开始探索视频域的对抗性攻击。然而,相关的防御研究却很少。本文提出了一种新的两阶段防御视频对抗性攻击的框架。它包括两个主要部分,即自适应联合摄影专家组(JPEG)压缩防御和基于光学纹理的防御。在自适应JPEG压缩防御中,我们提出基于对前景运动物体的估计自适应选择合适的JPEG质量,使JPEG压缩能够在不损失太多视频质量的情况下抑制对抗性噪声的大部分影响。在OTD中,我们基于光流图生成包含高频信息的“光学纹理”,并利用它来编辑输入帧的Y通道(在YCrCb色彩空间中),从而进一步减少对抗性扰动的影响。在一个基准数据集上的实验结果证明了我们的框架在恢复对扰动视频的分类性能方面的有效性。
{"title":"Defense for adversarial videos by self-adaptive JPEG compression and optical texture","authors":"Yupeng Cheng, Xingxing Wei, H. Fu, Shang-Wei Lin, Weisi Lin","doi":"10.1145/3444685.3446308","DOIUrl":"https://doi.org/10.1145/3444685.3446308","url":null,"abstract":"Despite demonstrated outstanding effectiveness in various computer vision tasks, Deep Neural Networks (DNNs) are known to be vulnerable to adversarial examples. Nowadays, adversarial attacks as well as their defenses w.r.t. DNNs in image domain have been intensively studied, and there are some recent works starting to explore adversarial attacks w.r.t. DNNs in video domain. However, the corresponding defense is rarely studied. In this paper, we propose a new two-stage framework for defending video adversarial attack. It contains two main components, namely self-adaptive Joint Photographic Experts Group (JPEG) compression defense and optical texture based defense (OTD). In self-adaptive JPEG compression defense, we propose to adaptively choose an appropriate JPEG quality based on an estimation of moving foreground object, such that the JPEG compression could depress most impact of adversarial noise without losing too much video quality. In OTD, we generate \"optical texture\" containing high-frequency information based on the optical flow map, and use it to edit Y channel (in YCrCb color space) of input frames, thus further reducing the influence of adversarial perturbation. Experimental results on a benchmark dataset demonstrate the effectiveness of our framework in recovering the classification performance on perturbed videos.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"35 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116616076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Determining image age with rank-consistent ordinal classification and object-centered ensemble 用秩一致有序分类和目标中心集成确定图像年龄
Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446326
Shota Ashida, A. Jatowt, A. Doucet, Masatoshi Yoshikawa
A significant number of old photographs including ones that are posted online do not contain the information of the date at which they were taken, or this information needs to be verified. Many of such pictures are either scanned analog photographs or photographs taken using a digital camera with incorrect settings. Estimating the date of such pictures is useful for enhancing data quality and its consistency, improving information retrieval and for other related applications. In this study, we propose a novel approach for automatic estimation of the shooting dates of photographs based on a rank-consistent ordinal classification method for neural networks. We also introduce an ensemble approach that involves object segmentation. We conclude that assuring the rank consistency in the ordinal classification as well as combining models trained on segmented objects improve the results of the age determination task.
包括在网上发布的照片在内,很多老照片都没有拍摄日期的信息,或者需要对这些信息进行核实。许多这样的照片要么是扫描的模拟照片,要么是用设置不正确的数码相机拍摄的照片。估计这些图片的日期对于提高数据质量及其一致性、改进信息检索和其他相关应用都是有用的。在这项研究中,我们提出了一种基于神经网络秩一致有序分类方法的自动估计照片拍摄日期的新方法。我们还介绍了一种涉及对象分割的集成方法。我们得出结论,保证有序分类中的秩一致性以及结合在分割对象上训练的模型可以改善年龄确定任务的结果。
{"title":"Determining image age with rank-consistent ordinal classification and object-centered ensemble","authors":"Shota Ashida, A. Jatowt, A. Doucet, Masatoshi Yoshikawa","doi":"10.1145/3444685.3446326","DOIUrl":"https://doi.org/10.1145/3444685.3446326","url":null,"abstract":"A significant number of old photographs including ones that are posted online do not contain the information of the date at which they were taken, or this information needs to be verified. Many of such pictures are either scanned analog photographs or photographs taken using a digital camera with incorrect settings. Estimating the date of such pictures is useful for enhancing data quality and its consistency, improving information retrieval and for other related applications. In this study, we propose a novel approach for automatic estimation of the shooting dates of photographs based on a rank-consistent ordinal classification method for neural networks. We also introduce an ensemble approach that involves object segmentation. We conclude that assuring the rank consistency in the ordinal classification as well as combining models trained on segmented objects improve the results of the age determination task.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131526715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Relationship graph learning network for visual relationship detection 用于视觉关系检测的关系图学习网络
Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446312
Yanan Li, Jun Yu, Yibing Zhan, Zhi Chen
Visual relationship detection aims to predict the relationships between detected object pairs. It is well believed that the correlations between image components (i.e., objects and relationships between objects) are significant considerations when predicting objects' relationships. However, most current visual relationship detection methods only exploited the correlations among objects, and the correlations among objects' relationships remained underexplored. This paper proposes a relationship graph learning network (RGLN) to explore the correlations among objects' relationships for visual relationship detection. Specifically, RGLN obtains image objects using an object detector, and then, every pair of objects constitutes a relationship proposal. All relationship proposals construct a relationship graph, in which the proposals are treated as nodes. Accordingly, RGLN designs bi-stream graph attention subnetworks to detect relationship proposals, in which one graph attention subnetwork analyzes correlations among relationships based on visual and spatial information, and the other analyzes correlations based on semantic and spatial information. Besides, RGLN exploits a relationship selection subnetwork to ignore redundant information of object pairs with no relationships. We conduct extensive experiments on two public datasets: the VRD and the VG datasets. The experimental results compared with the state-of-the-art demonstrate the competitiveness of RGLN.
视觉关系检测的目的是预测被检测对象对之间的关系。人们普遍认为,图像组件之间的相关性(即对象和对象之间的关系)是预测对象关系时的重要考虑因素。然而,目前大多数视觉关系检测方法只利用了物体之间的相关性,对物体之间的相关性的研究还不够充分。本文提出了一种关系图学习网络(RGLN)来探索对象之间关系的相关性,用于视觉关系检测。具体来说,RGLN使用对象检测器获得图像对象,然后,每对对象构成一个关系建议。所有的关系建议都构建一个关系图,其中的建议被视为节点。因此,RGLN设计了双流图注意子网来检测关系建议,其中一个图注意子网基于视觉和空间信息分析关系之间的相关性,另一个图注意子网基于语义和空间信息分析关系之间的相关性。此外,RGLN利用关系选择子网来忽略没有关系的对象对的冗余信息。我们在两个公共数据集上进行了大量的实验:VRD和VG数据集。实验结果表明,RGLN具有较强的竞争力。
{"title":"Relationship graph learning network for visual relationship detection","authors":"Yanan Li, Jun Yu, Yibing Zhan, Zhi Chen","doi":"10.1145/3444685.3446312","DOIUrl":"https://doi.org/10.1145/3444685.3446312","url":null,"abstract":"Visual relationship detection aims to predict the relationships between detected object pairs. It is well believed that the correlations between image components (i.e., objects and relationships between objects) are significant considerations when predicting objects' relationships. However, most current visual relationship detection methods only exploited the correlations among objects, and the correlations among objects' relationships remained underexplored. This paper proposes a relationship graph learning network (RGLN) to explore the correlations among objects' relationships for visual relationship detection. Specifically, RGLN obtains image objects using an object detector, and then, every pair of objects constitutes a relationship proposal. All relationship proposals construct a relationship graph, in which the proposals are treated as nodes. Accordingly, RGLN designs bi-stream graph attention subnetworks to detect relationship proposals, in which one graph attention subnetwork analyzes correlations among relationships based on visual and spatial information, and the other analyzes correlations based on semantic and spatial information. Besides, RGLN exploits a relationship selection subnetwork to ignore redundant information of object pairs with no relationships. We conduct extensive experiments on two public datasets: the VRD and the VG datasets. The experimental results compared with the state-of-the-art demonstrate the competitiveness of RGLN.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133276438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Table detection and cell segmentation in online handwritten documents with graph attention networks 基于图关注网络的在线手写文档表检测与单元分割
Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446295
Ying-Jian Liu, Heng Zhang, Xiao-Long Yun, Jun-Yu Ye, Cheng-Lin Liu
In this paper, we propose a multi-task learning approach for table detection and cell segmentation with densely connected graph attention networks in free form online documents. Each online document is regarded as a graph, where nodes represent strokes and edges represent the relationships between strokes. Then we propose a graph attention network model to classify nodes and edges simultaneously. According to node classification results, tables can be detected in each document. By combining node and edge classification resutls, cells in each table can be segmented. To improve information flow in the network and enable efficient reuse of features among layers, dense connectivity among layers is used. Our proposed model has been experimentally validated on an online handwritten document dataset IAMOnDo and achieved encouraging results.
在本文中,我们提出了一种多任务学习方法,用于自由形式在线文档中密集连接的图关注网络的表检测和单元分割。每个在线文档被视为一个图,其中节点表示笔画,边表示笔画之间的关系。在此基础上,提出了一种同时对节点和边进行分类的图关注网络模型。根据节点分类结果,可以在每个文档中检测到表。通过结合节点和边缘的分类结果,可以对每个表中的单元格进行分割。为了改善网络中的信息流,实现层与层之间特征的高效重用,采用了层与层之间的密集连接。我们提出的模型已经在一个在线手写文档数据集IAMOnDo上进行了实验验证,取得了令人鼓舞的结果。
{"title":"Table detection and cell segmentation in online handwritten documents with graph attention networks","authors":"Ying-Jian Liu, Heng Zhang, Xiao-Long Yun, Jun-Yu Ye, Cheng-Lin Liu","doi":"10.1145/3444685.3446295","DOIUrl":"https://doi.org/10.1145/3444685.3446295","url":null,"abstract":"In this paper, we propose a multi-task learning approach for table detection and cell segmentation with densely connected graph attention networks in free form online documents. Each online document is regarded as a graph, where nodes represent strokes and edges represent the relationships between strokes. Then we propose a graph attention network model to classify nodes and edges simultaneously. According to node classification results, tables can be detected in each document. By combining node and edge classification resutls, cells in each table can be segmented. To improve information flow in the network and enable efficient reuse of features among layers, dense connectivity among layers is used. Our proposed model has been experimentally validated on an online handwritten document dataset IAMOnDo and achieved encouraging results.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133941461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 2nd ACM International Conference on Multimedia in Asia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1