首页 > 最新文献

ACM Multimedia Asia最新文献

英文 中文
Towards Discriminative Visual Search via Semantically Cycle-consistent Hashing Networks 基于语义循环一致哈希网络的判别视觉搜索
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490583
Zheng Zhang, Jianning Wang, Guangming Lu
Deep hashing has shown great potentials in large-scale visual similarity search due to preferable storage and computation efficiency. Typically, deep hashing encodes visual features into compact binary codes by preserving representative semantic visual features. Works in this area mainly focus on building the relationship between the visual and objective hash space, while they seldom study the triadic cross-domain semantic knowledge transfer among visual, semantic and hashing spaces, leading to serious semantic ignorance problem during space transformation. In this paper, we propose a novel deep tripartite semantically interactive hashing framework, dubbed Semantically Cycle-consistent Hashing Networks (SCHN), for discriminative hash code learning. Particularly, we construct a flexible semantic space and a transitive latent space, in conjunction with the visual space, to jointly deduce the privileged discriminative hash space. Specifically, a semantic space is conceived to strengthen the flexibility and completeness of categories in feature inference. Moreover, a transitive latent space is formulated to explore the shared semantic interactivity embedded in visual and semantic features. Our SCHN, for the first time, establishes the cyclic principle of deep semantic-preserving hashing by adaptive semantic parsing across different spaces in visual similarity search. In addition, the entire learning framework is jointly optimized in an end-to-end manner. Extensive experiments performed on diverse large-scale datasets evidence the superiority of our method against other state-of-the-art deep hashing algorithms.
由于较好的存储效率和计算效率,深度哈希在大规模视觉相似搜索中显示出巨大的潜力。通常,深度哈希通过保留具有代表性的语义视觉特征,将视觉特征编码为紧凑的二进制代码。该领域的工作主要集中在建立视觉哈希空间与客观哈希空间之间的关系,而很少研究视觉、语义和哈希空间之间的三元跨域语义知识转移,导致空间转换过程中存在严重的语义忽略问题。在本文中,我们提出了一种新的深度三方语义交互哈希框架,称为语义循环一致哈希网络(SCHN),用于判别哈希码学习。特别地,我们构建了一个灵活的语义空间和一个传递潜空间,结合视觉空间,共同推导出特权判别哈希空间。具体来说,语义空间是用来增强特征推理中类别的灵活性和完整性的。此外,我们还建立了一个传递潜空间来探索嵌入在视觉和语义特征中的共享语义交互性。我们的SCHN首次在视觉相似性搜索中通过跨不同空间的自适应语义解析建立了深度语义保持哈希的循环原理。此外,整个学习框架以端到端方式共同优化。在不同的大规模数据集上进行的大量实验证明了我们的方法相对于其他最先进的深度哈希算法的优越性。
{"title":"Towards Discriminative Visual Search via Semantically Cycle-consistent Hashing Networks","authors":"Zheng Zhang, Jianning Wang, Guangming Lu","doi":"10.1145/3469877.3490583","DOIUrl":"https://doi.org/10.1145/3469877.3490583","url":null,"abstract":"Deep hashing has shown great potentials in large-scale visual similarity search due to preferable storage and computation efficiency. Typically, deep hashing encodes visual features into compact binary codes by preserving representative semantic visual features. Works in this area mainly focus on building the relationship between the visual and objective hash space, while they seldom study the triadic cross-domain semantic knowledge transfer among visual, semantic and hashing spaces, leading to serious semantic ignorance problem during space transformation. In this paper, we propose a novel deep tripartite semantically interactive hashing framework, dubbed Semantically Cycle-consistent Hashing Networks (SCHN), for discriminative hash code learning. Particularly, we construct a flexible semantic space and a transitive latent space, in conjunction with the visual space, to jointly deduce the privileged discriminative hash space. Specifically, a semantic space is conceived to strengthen the flexibility and completeness of categories in feature inference. Moreover, a transitive latent space is formulated to explore the shared semantic interactivity embedded in visual and semantic features. Our SCHN, for the first time, establishes the cyclic principle of deep semantic-preserving hashing by adaptive semantic parsing across different spaces in visual similarity search. In addition, the entire learning framework is jointly optimized in an end-to-end manner. Extensive experiments performed on diverse large-scale datasets evidence the superiority of our method against other state-of-the-art deep hashing algorithms.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121589772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Delay-sensitive and Priority-aware Transmission Control for Real-time Multimedia Communications 实时多媒体通信的延迟敏感和优先级感知传输控制
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3493597
Ximing Wu, Lei Zhang, Yingfeng Wu, Haobin Zhou, Laizhong Cui
Today’s multimedia applications usually organize the contents into data blocks with different deadlines and priorities. Meeting/missing the deadline for different data blocks may contribute/hurt the user experience to different degrees. With the goal of optimizing real-time multimedia communications, the transmission control scheme needs to make two challenging decisions: the proper sending rate and the best data block to send under dynamic network conditions. In this paper, we propose a delay-sensitive and priority-aware transmission control scheme with two modules, namely, rate control and block selection. The rate control module constantly monitors the network condition and adjusts the sending rate accordingly. The block selection module classifies the blocks based on whether they are estimated to be delivered before deadline and then ranks them according to their effective priority scores. The extensive simulation results demonstrate the superiority of our proposed scheme over the other representative baseline approaches.
今天的多媒体应用程序通常将内容组织到具有不同截止日期和优先级的数据块中。满足/错过不同数据块的截止日期可能会在不同程度上贡献/损害用户体验。为了优化实时多媒体通信,传输控制方案需要做出两个具有挑战性的决策:适当的发送速率和动态网络条件下发送的最佳数据块。本文提出了一种时延敏感、优先级感知的传输控制方案,该方案包含两个模块,即速率控制和分组选择。速率控制模块通过监控网络状况,及时调整发送速率。块选择模块根据是否预计在截止日期前交付对块进行分类,然后根据有效优先级分数对块进行排序。大量的仿真结果表明,我们提出的方案优于其他代表性的基线方法。
{"title":"Delay-sensitive and Priority-aware Transmission Control for Real-time Multimedia Communications","authors":"Ximing Wu, Lei Zhang, Yingfeng Wu, Haobin Zhou, Laizhong Cui","doi":"10.1145/3469877.3493597","DOIUrl":"https://doi.org/10.1145/3469877.3493597","url":null,"abstract":"Today’s multimedia applications usually organize the contents into data blocks with different deadlines and priorities. Meeting/missing the deadline for different data blocks may contribute/hurt the user experience to different degrees. With the goal of optimizing real-time multimedia communications, the transmission control scheme needs to make two challenging decisions: the proper sending rate and the best data block to send under dynamic network conditions. In this paper, we propose a delay-sensitive and priority-aware transmission control scheme with two modules, namely, rate control and block selection. The rate control module constantly monitors the network condition and adjusts the sending rate accordingly. The block selection module classifies the blocks based on whether they are estimated to be delivered before deadline and then ranks them according to their effective priority scores. The extensive simulation results demonstrate the superiority of our proposed scheme over the other representative baseline approaches.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126941681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latent Pattern Sensing: Deepfake Video Detection via Predictive Representation Learning 潜在模式感知:基于预测表示学习的深度假视频检测
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490586
Shiming Ge, Fanzhao Lin, Chenyu Li, Daichi Zhang, Jiyong Tan, Weiping Wang, Dan Zeng
Increasingly advanced deepfake approaches have made the detection of deepfake videos very challenging. We observe that the general deepfake videos often exhibit appearance-level temporal inconsistencies in some facial components between frames, resulting in discriminable spatiotemporal latent patterns among semantic-level feature maps. Inspired by this finding, we propose a predictive representative learning approach termed Latent Pattern Sensing to capture these semantic change characteristics for deepfake video detection. The approach cascades a CNN-based encoder, a ConvGRU-based aggregator and a single-layer binary classifier. The encoder and aggregator are pre-trained in a self-supervised manner to form the representative spatiotemporal context features. Finally, the classifier is trained to classify the context features, distinguishing fake videos from real ones. In this manner, the extracted features can simultaneously describe the latent patterns of videos across frames spatially and temporally in a unified way, leading to an effective deepfake video detector. Extensive experiments prove our approach’s effectiveness, e.g., surpassing 10 state-of-the-arts at least 7.92%@AUC on challenging Celeb-DF(v2) benchmark.
越来越先进的深度伪造方法使得深度伪造视频的检测变得非常具有挑战性。我们观察到,一般深度假视频在帧之间的某些面部成分往往表现出外观级时间不一致,导致语义级特征映射之间存在可区分的时空潜在模式。受这一发现的启发,我们提出了一种称为潜在模式感知的预测代表性学习方法来捕获这些语义变化特征,用于深度假视频检测。该方法级联了一个基于cnn的编码器、一个基于convru的聚合器和一个单层二进制分类器。以自监督的方式对编码器和聚合器进行预训练,形成具有代表性的时空上下文特征。最后,训练分类器对上下文特征进行分类,区分假视频和真实视频。这样,提取的特征可以同时在空间和时间上统一地描述跨帧视频的潜在模式,从而实现有效的深度假视频检测器。大量的实验证明了我们的方法的有效性,例如,在挑战Celeb-DF(v2)基准上,超过了10个最先进的至少7.92%@AUC。
{"title":"Latent Pattern Sensing: Deepfake Video Detection via Predictive Representation Learning","authors":"Shiming Ge, Fanzhao Lin, Chenyu Li, Daichi Zhang, Jiyong Tan, Weiping Wang, Dan Zeng","doi":"10.1145/3469877.3490586","DOIUrl":"https://doi.org/10.1145/3469877.3490586","url":null,"abstract":"Increasingly advanced deepfake approaches have made the detection of deepfake videos very challenging. We observe that the general deepfake videos often exhibit appearance-level temporal inconsistencies in some facial components between frames, resulting in discriminable spatiotemporal latent patterns among semantic-level feature maps. Inspired by this finding, we propose a predictive representative learning approach termed Latent Pattern Sensing to capture these semantic change characteristics for deepfake video detection. The approach cascades a CNN-based encoder, a ConvGRU-based aggregator and a single-layer binary classifier. The encoder and aggregator are pre-trained in a self-supervised manner to form the representative spatiotemporal context features. Finally, the classifier is trained to classify the context features, distinguishing fake videos from real ones. In this manner, the extracted features can simultaneously describe the latent patterns of videos across frames spatially and temporally in a unified way, leading to an effective deepfake video detector. Extensive experiments prove our approach’s effectiveness, e.g., surpassing 10 state-of-the-arts at least 7.92%@AUC on challenging Celeb-DF(v2) benchmark.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125289993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Zero-shot Recognition with Image Attributes Generation using Hierarchical Coupled Dictionary Learning 使用层次耦合字典学习生成图像属性的零射击识别
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490613
Shuang Li, Lichun Wang, Shaofan Wang, Dehui Kong, Baocai Yin
Zero-shot learning (ZSL) aims to recognize images from unseen (novel) classes with the training images from seen classes. The attributes of each class is exploited as auxiliary semantic information. Recently most ZSL approaches focus on learning visual-semantic embeddings to transfer knowledge from the seen classes to the unseen classes. However, few works study whether the auxiliary semantic information in the class-level is extensive enough or not for the ZSL task. To tackle such problem, we propose a hierarchical coupled dictionary learning (HCDL) approach to hierarchically align the visual-semantic structures in both the class-level and the image-level. Firstly, the class-level coupled dictionary is trained to establish a basic connection between visual space and semantic space. Then, the image attributes are generated based on the basic connection. Finally, the fine-grained information can be embedded by training the image-level coupled dictionary. Zero-shot recognition is performed in multiple spaces by searching the nearest neighbor class of the unseen image. Experiments on two widely used benchmark datasets show the effectiveness of the proposed approach.
零射击学习(Zero-shot learning, ZSL)的目的是利用已见类的训练图像来识别未见(新)类的图像。每个类的属性被用作辅助语义信息。最近,大多数ZSL方法专注于学习视觉语义嵌入,以将知识从可见类转移到不可见类。然而,很少有研究类层面的辅助语义信息是否足够广泛地适用于ZSL任务。为了解决这一问题,我们提出了一种层次耦合字典学习(HCDL)方法,在类级和图像级对视觉语义结构进行层次对齐。首先,训练类级耦合字典,建立视觉空间和语义空间之间的基本联系;然后,根据基本连接生成图像属性。最后,通过训练图像级耦合字典来嵌入细粒度信息。通过搜索未见图像的最近邻类,在多个空间中进行零射击识别。在两个广泛使用的基准数据集上的实验表明了该方法的有效性。
{"title":"Zero-shot Recognition with Image Attributes Generation using Hierarchical Coupled Dictionary Learning","authors":"Shuang Li, Lichun Wang, Shaofan Wang, Dehui Kong, Baocai Yin","doi":"10.1145/3469877.3490613","DOIUrl":"https://doi.org/10.1145/3469877.3490613","url":null,"abstract":"Zero-shot learning (ZSL) aims to recognize images from unseen (novel) classes with the training images from seen classes. The attributes of each class is exploited as auxiliary semantic information. Recently most ZSL approaches focus on learning visual-semantic embeddings to transfer knowledge from the seen classes to the unseen classes. However, few works study whether the auxiliary semantic information in the class-level is extensive enough or not for the ZSL task. To tackle such problem, we propose a hierarchical coupled dictionary learning (HCDL) approach to hierarchically align the visual-semantic structures in both the class-level and the image-level. Firstly, the class-level coupled dictionary is trained to establish a basic connection between visual space and semantic space. Then, the image attributes are generated based on the basic connection. Finally, the fine-grained information can be embedded by training the image-level coupled dictionary. Zero-shot recognition is performed in multiple spaces by searching the nearest neighbor class of the unseen image. Experiments on two widely used benchmark datasets show the effectiveness of the proposed approach.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133254279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Dedark+Detection: A Hybrid Scheme for Object Detection under Low-light Surveillance Dedark+Detection:一种低光监视下的混合目标检测方案
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3497691
Xiaolei Luo, S. Xiang, Yingfeng Wang, Qiong Liu, You Yang, Kejun Wu
Object detection under low-light surveillance is a crucial problem that less efforts have been made on it. In this paper, we proposed a hybrid method that jointly use enhancement and object detection for the above challenge, namely Dedark+Detection. In this method, the low-light surveillance video is processed by the proposed de-dark method, and the video can thus be converted to appearance under normal lighting condition. This enhancement bring more benefits to the subsequent stage of object detection. After that, an object detection network is trained on the enhanced dataset for practical applications under low-light surveillance. Experiments are performed on 18 low-light surveillance video test sequences, and superior performance can be found when comparing to state-of-the-arts.
微光监视下的目标检测是一个关键问题,但目前研究较少。本文针对上述挑战,提出了一种结合增强和目标检测的混合方法,即Dedark+ detection。该方法对低照度监控视频进行去暗处理,将视频转换为正常光照条件下的图像。这种增强为后续阶段的目标检测带来了更多的好处。然后,在增强数据集上训练目标检测网络,用于弱光监控的实际应用。在18个弱光监控视频测试序列上进行了实验,与最先进的技术相比,可以发现优越的性能。
{"title":"Dedark+Detection: A Hybrid Scheme for Object Detection under Low-light Surveillance","authors":"Xiaolei Luo, S. Xiang, Yingfeng Wang, Qiong Liu, You Yang, Kejun Wu","doi":"10.1145/3469877.3497691","DOIUrl":"https://doi.org/10.1145/3469877.3497691","url":null,"abstract":"Object detection under low-light surveillance is a crucial problem that less efforts have been made on it. In this paper, we proposed a hybrid method that jointly use enhancement and object detection for the above challenge, namely Dedark+Detection. In this method, the low-light surveillance video is processed by the proposed de-dark method, and the video can thus be converted to appearance under normal lighting condition. This enhancement bring more benefits to the subsequent stage of object detection. After that, an object detection network is trained on the enhanced dataset for practical applications under low-light surveillance. Experiments are performed on 18 low-light surveillance video test sequences, and superior performance can be found when comparing to state-of-the-arts.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131183674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PBNet: Position-specific Text-to-image Generation by Boundary PBNet:根据边界生成特定位置的文本到图像
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3493594
Tian Tian, Li Liu, Huaxiang Zhang, Dongmei Liu
Most existing methods focus on improving the clarity and semantic consistency of the image with a given text, but do not pay attention to the multiple control of generated image content, such as the position of the object in generated image. In this paper, we introduce a novel position-based generative network (PBNet) which can generate fine-grained images with the object at the specified location. PBNet combines iterative structure with generative adversarial network (GAN). A location information embedding module (LIEM) is proposed to combine the location information extracted from the boundary block image with the semantic information extracted from the text. In addition, a silhouette generation module (SGM) is proposed to train the generator to generate object based on location information. The experimental results on CUB dataset demonstrate that PBNet effectively controls the location of the object in the generated image.
现有的方法大多侧重于提高图像与给定文本的清晰度和语义一致性,而没有注意对生成图像内容的多重控制,例如对象在生成图像中的位置。本文介绍了一种新的基于位置的生成网络(PBNet),该网络可以生成具有指定位置目标的细粒度图像。PBNet将迭代结构与生成对抗网络(GAN)相结合。提出了一种位置信息嵌入模块,将从边界块图像中提取的位置信息与从文本中提取的语义信息相结合。此外,提出了一个轮廓生成模块(SGM)来训练生成器根据位置信息生成目标。在CUB数据集上的实验结果表明,PBNet可以有效地控制目标在生成图像中的位置。
{"title":"PBNet: Position-specific Text-to-image Generation by Boundary","authors":"Tian Tian, Li Liu, Huaxiang Zhang, Dongmei Liu","doi":"10.1145/3469877.3493594","DOIUrl":"https://doi.org/10.1145/3469877.3493594","url":null,"abstract":"Most existing methods focus on improving the clarity and semantic consistency of the image with a given text, but do not pay attention to the multiple control of generated image content, such as the position of the object in generated image. In this paper, we introduce a novel position-based generative network (PBNet) which can generate fine-grained images with the object at the specified location. PBNet combines iterative structure with generative adversarial network (GAN). A location information embedding module (LIEM) is proposed to combine the location information extracted from the boundary block image with the semantic information extracted from the text. In addition, a silhouette generation module (SGM) is proposed to train the generator to generate object based on location information. The experimental results on CUB dataset demonstrate that PBNet effectively controls the location of the object in the generated image.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131580782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inter-modality Discordance for Multimodal Fake News Detection 多模态假新闻检测中的多模态不一致
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490614
Shivangi Singhal, Mudit Dhawan, R. Shah, P. Kumaraguru
The paradigm shift in the consumption of news via online platforms has cultivated the growth of digital journalism. Contrary to traditional media, lowering entry barriers and enabling everyone to be part of content creation have disabled the concept of centralized gatekeeping in digital journalism. This in turn has triggered the production of fake news. Current studies have made a significant effort towards multimodal fake news detection with less emphasis on exploring the discordance between the different multimedia present in a news article. We hypothesize that fabrication of either modality will lead to dissonance between the modalities, and resulting in misrepresented, misinterpreted and misleading news. In this paper, we inspect the authenticity of news coming from online media outlets by exploiting relationship (discordance) between the textual and multiple visual cues. We develop an inter-modality discordance based fake news detection framework to achieve the goal. The modal-specific discriminative features are learned, employing the cross-entropy loss and a modified version of contrastive loss that explores the inter-modality discordance. To the best of our knowledge, this is the first work that leverages information from different components of the news article (i.e., headline, body, and multiple images) for multimodal fake news detection. We conduct extensive experiments on the real-world datasets to show that our approach outperforms the state-of-the-art by an average F1-score of 6.3%.
通过在线平台消费新闻的模式转变促进了数字新闻的发展。与传统媒体不同的是,降低进入门槛,让每个人都能参与内容创作,这使得数字新闻行业的中心化把关的概念失效。这反过来又引发了假新闻的产生。目前的研究已经在多模态假新闻检测方面做出了很大的努力,但对新闻文章中不同多媒体之间的不一致性的探索却很少。我们假设,任何一种模态的捏造都会导致模态之间的不和谐,并导致歪曲、误解和误导性的新闻。在本文中,我们通过利用文本和多个视觉线索之间的关系(不一致性)来检验来自网络媒体的新闻的真实性。我们开发了一个基于模态间不一致的假新闻检测框架来实现这一目标。学习特定模态的判别特征,采用交叉熵损失和改进版本的对比损失来探索模态间的不一致性。据我们所知,这是第一个利用新闻文章的不同组成部分(即标题、正文和多个图像)进行多模态假新闻检测的工作。我们在真实世界的数据集上进行了广泛的实验,表明我们的方法比最先进的方法平均f1得分高6.3%。
{"title":"Inter-modality Discordance for Multimodal Fake News Detection","authors":"Shivangi Singhal, Mudit Dhawan, R. Shah, P. Kumaraguru","doi":"10.1145/3469877.3490614","DOIUrl":"https://doi.org/10.1145/3469877.3490614","url":null,"abstract":"The paradigm shift in the consumption of news via online platforms has cultivated the growth of digital journalism. Contrary to traditional media, lowering entry barriers and enabling everyone to be part of content creation have disabled the concept of centralized gatekeeping in digital journalism. This in turn has triggered the production of fake news. Current studies have made a significant effort towards multimodal fake news detection with less emphasis on exploring the discordance between the different multimedia present in a news article. We hypothesize that fabrication of either modality will lead to dissonance between the modalities, and resulting in misrepresented, misinterpreted and misleading news. In this paper, we inspect the authenticity of news coming from online media outlets by exploiting relationship (discordance) between the textual and multiple visual cues. We develop an inter-modality discordance based fake news detection framework to achieve the goal. The modal-specific discriminative features are learned, employing the cross-entropy loss and a modified version of contrastive loss that explores the inter-modality discordance. To the best of our knowledge, this is the first work that leverages information from different components of the news article (i.e., headline, body, and multiple images) for multimodal fake news detection. We conduct extensive experiments on the real-world datasets to show that our approach outperforms the state-of-the-art by an average F1-score of 6.3%.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124346528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
BAND: A Benchmark Dataset forBangla News Audio Classification BAND:孟加拉语新闻音频分类的基准数据集
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490575
Md. Rafi Ur Rashid, Mahim Mahbub, Muhammad Abdullah Adnan
Despite being the sixth most widely spoken language in the world, Bangla has barely received any attention in the domain of audio-visual news classification. In this work, we collect, annotate, and prepare a comprehensive news audio dataset in Bangla, comprising 5120 news clips, with around 820 hours of total duration. We also conduct practical experiments to obtain a human baseline for the news audio classification task. Later, we implement one of the human approaches by performing news classification directly on the audio features using various state-of-the-art classifiers and a few transfer learning models. To the best of our knowledge, this is the very first work developing a benchmark dataset for news audio classification in Bangla.
尽管孟加拉语是世界上第六大最广泛使用的语言,但它在视听新闻分类领域几乎没有受到任何关注。在这项工作中,我们收集、注释并准备了一个全面的孟加拉国新闻音频数据集,其中包括5120个新闻片段,总时长约为820小时。我们还进行了实际实验,以获得新闻音频分类任务的人类基线。稍后,我们通过使用各种最先进的分类器和一些迁移学习模型直接对音频特征执行新闻分类来实现一种人类方法。据我们所知,这是第一个为孟加拉国的新闻音频分类开发基准数据集的工作。
{"title":"BAND: A Benchmark Dataset forBangla News Audio Classification","authors":"Md. Rafi Ur Rashid, Mahim Mahbub, Muhammad Abdullah Adnan","doi":"10.1145/3469877.3490575","DOIUrl":"https://doi.org/10.1145/3469877.3490575","url":null,"abstract":"Despite being the sixth most widely spoken language in the world, Bangla has barely received any attention in the domain of audio-visual news classification. In this work, we collect, annotate, and prepare a comprehensive news audio dataset in Bangla, comprising 5120 news clips, with around 820 hours of total duration. We also conduct practical experiments to obtain a human baseline for the news audio classification task. Later, we implement one of the human approaches by performing news classification directly on the audio features using various state-of-the-art classifiers and a few transfer learning models. To the best of our knowledge, this is the very first work developing a benchmark dataset for news audio classification in Bangla.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124485321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Color Image Denoising via Tensor Robust PCA with Nonconvex and Nonlocal Regularization 基于非凸非局部正则化张量鲁棒PCA的彩色图像去噪
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3493592
Xiaoyu Geng, Q. Guo, Cai-ming Zhang
Tensor robust principal component analysis (TRPCA) is an important algorithm for color image denoising by treating the whole image as a tensor and shrinking all singular values equally. In this paper, to improve the denoising performance of TRPCA, we propose a variant of TRPCA model. Specifically, we first introduce a nonconvex TRPCA (N-TRPCA) model which can shrink large singular values more and shrink small singular values less, so that the physical meanings of different singular values can be preserved. To take advantage of the structural redundancy of an image, we further group similar patches as a tensor according to nonlocal prior, and then apply the N-TRPCA model on this tensor. The denoised image can be obtained by aggregating all processed tensors. Experimental results demonstrate the superiority of the proposed denoising method beyond state-of-the-arts.
张量鲁棒主成分分析(TRPCA)是一种重要的彩色图像去噪算法,它将整个图像作为一个张量,并将所有奇异值相等地缩小。为了提高TRPCA的去噪性能,本文提出了一种TRPCA模型的变体。具体而言,我们首先引入了一种非凸TRPCA (N-TRPCA)模型,该模型可以对大奇异值进行更多的收缩,对小奇异值进行更少的收缩,从而保持不同奇异值的物理意义。为了利用图像的结构冗余性,我们进一步根据非局部先验将相似的patch分组为一个张量,然后对该张量应用N-TRPCA模型。将处理后的张量进行汇总,得到去噪后的图像。实验结果表明,所提出的去噪方法具有较好的优越性。
{"title":"Color Image Denoising via Tensor Robust PCA with Nonconvex and Nonlocal Regularization","authors":"Xiaoyu Geng, Q. Guo, Cai-ming Zhang","doi":"10.1145/3469877.3493592","DOIUrl":"https://doi.org/10.1145/3469877.3493592","url":null,"abstract":"Tensor robust principal component analysis (TRPCA) is an important algorithm for color image denoising by treating the whole image as a tensor and shrinking all singular values equally. In this paper, to improve the denoising performance of TRPCA, we propose a variant of TRPCA model. Specifically, we first introduce a nonconvex TRPCA (N-TRPCA) model which can shrink large singular values more and shrink small singular values less, so that the physical meanings of different singular values can be preserved. To take advantage of the structural redundancy of an image, we further group similar patches as a tensor according to nonlocal prior, and then apply the N-TRPCA model on this tensor. The denoised image can be obtained by aggregating all processed tensors. Experimental results demonstrate the superiority of the proposed denoising method beyond state-of-the-arts.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129796591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Making Video Recognition Models Robust to Common Corruptions With Supervised Contrastive Learning 利用监督对比学习使视频识别模型对常见腐败具有鲁棒性
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3497692
Tomu Hirata, Yusuke Mukuta, Tatsuya Harada
The video understanding capability of video recognition models has been significantly improved by the development of deep learning techniques and various video datasets available. However, video recognition models are still vulnerable to invisible perturbations, which limits the use of deep video recognition models in the real world. We present a new benchmark for the robustness of action recognition classifiers to general corruptions, and show that a supervised contrastive learning framework is effective in obtaining discriminative and stable video representations, and makes deep video recognition models robust to general input corruptions. Experiments on the action recognition task for corrupted videos show the high robustness of the proposed method on the UCF101 and HMDB51 datasets with various common corruptions.
随着深度学习技术和各种视频数据集的发展,视频识别模型的视频理解能力得到了显著提高。然而,视频识别模型仍然容易受到不可见扰动的影响,这限制了深度视频识别模型在现实世界中的应用。我们提出了动作识别分类器对一般损坏的鲁棒性的新基准,并表明监督对比学习框架在获得判别和稳定的视频表示方面是有效的,并使深度视频识别模型对一般输入损坏具有鲁棒性。在UCF101和HMDB51数据集上进行的损坏视频动作识别实验表明,该方法对各种常见损坏的数据集具有较高的鲁棒性。
{"title":"Making Video Recognition Models Robust to Common Corruptions With Supervised Contrastive Learning","authors":"Tomu Hirata, Yusuke Mukuta, Tatsuya Harada","doi":"10.1145/3469877.3497692","DOIUrl":"https://doi.org/10.1145/3469877.3497692","url":null,"abstract":"The video understanding capability of video recognition models has been significantly improved by the development of deep learning techniques and various video datasets available. However, video recognition models are still vulnerable to invisible perturbations, which limits the use of deep video recognition models in the real world. We present a new benchmark for the robustness of action recognition classifiers to general corruptions, and show that a supervised contrastive learning framework is effective in obtaining discriminative and stable video representations, and makes deep video recognition models robust to general input corruptions. Experiments on the action recognition task for corrupted videos show the high robustness of the proposed method on the UCF101 and HMDB51 datasets with various common corruptions.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121820972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
ACM Multimedia Asia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1