首页 > 最新文献

ACM Multimedia Asia最新文献

英文 中文
A Reinforcement Learning-Based Reward Mechanism for Molecule Generation that Introduces Activity Information 引入活动信息的基于强化学习的分子生成奖励机制
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3497700
Hao Liu, Jinmeng Yan, Yuandong Zhou
In this paper, we propose an activity prediction method for molecule generation based on the framework of reinforcement learning. The method is used as a scoring module for the molecule generation process. By introducing information about known active molecules for specific set of target conformations, it overcomes the traditional molecular optimization strategy where the method only uses computable properties. Eventually, our prediction method improves the quality of the generated molecules. The prediction method utilized fusion features that consist of traditional countable properties of molecules such as atomic number and the binding property of the molecule to the target. Furthermore, this paper designs a ultra large-scale molecular docking parallel computing method, which greatly improves the performance of the molecular docking [1] scoring process. The computing method makes the high-quality docking computing to predict molecular activity possible. The final experimental result shows that the molecule generation model using the prediction method can produce nearly twenty percent active molecules, which shows that the method proposed in this paper can effectively improve the performance of molecule generation.
本文提出了一种基于强化学习框架的分子生成活动预测方法。该方法被用作分子生成过程的评分模块。通过引入特定目标构象的已知活性分子信息,克服了传统分子优化策略中仅使用可计算性质的问题。最终,我们的预测方法提高了生成分子的质量。该预测方法利用了由分子的传统可计数性质(如原子序数和分子与目标的结合性质)组成的融合特征。此外,本文设计了一种超大规模分子对接并行计算方法,大大提高了分子对接[1]评分过程的性能。该计算方法使预测分子活性的高质量对接计算成为可能。最后的实验结果表明,采用该预测方法的分子生成模型可以产生近20%的活性分子,表明本文提出的方法可以有效地提高分子生成的性能。
{"title":"A Reinforcement Learning-Based Reward Mechanism for Molecule Generation that Introduces Activity Information","authors":"Hao Liu, Jinmeng Yan, Yuandong Zhou","doi":"10.1145/3469877.3497700","DOIUrl":"https://doi.org/10.1145/3469877.3497700","url":null,"abstract":"In this paper, we propose an activity prediction method for molecule generation based on the framework of reinforcement learning. The method is used as a scoring module for the molecule generation process. By introducing information about known active molecules for specific set of target conformations, it overcomes the traditional molecular optimization strategy where the method only uses computable properties. Eventually, our prediction method improves the quality of the generated molecules. The prediction method utilized fusion features that consist of traditional countable properties of molecules such as atomic number and the binding property of the molecule to the target. Furthermore, this paper designs a ultra large-scale molecular docking parallel computing method, which greatly improves the performance of the molecular docking [1] scoring process. The computing method makes the high-quality docking computing to predict molecular activity possible. The final experimental result shows that the molecule generation model using the prediction method can produce nearly twenty percent active molecules, which shows that the method proposed in this paper can effectively improve the performance of molecule generation.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115944114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latent Pattern Sensing: Deepfake Video Detection via Predictive Representation Learning 潜在模式感知:基于预测表示学习的深度假视频检测
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490586
Shiming Ge, Fanzhao Lin, Chenyu Li, Daichi Zhang, Jiyong Tan, Weiping Wang, Dan Zeng
Increasingly advanced deepfake approaches have made the detection of deepfake videos very challenging. We observe that the general deepfake videos often exhibit appearance-level temporal inconsistencies in some facial components between frames, resulting in discriminable spatiotemporal latent patterns among semantic-level feature maps. Inspired by this finding, we propose a predictive representative learning approach termed Latent Pattern Sensing to capture these semantic change characteristics for deepfake video detection. The approach cascades a CNN-based encoder, a ConvGRU-based aggregator and a single-layer binary classifier. The encoder and aggregator are pre-trained in a self-supervised manner to form the representative spatiotemporal context features. Finally, the classifier is trained to classify the context features, distinguishing fake videos from real ones. In this manner, the extracted features can simultaneously describe the latent patterns of videos across frames spatially and temporally in a unified way, leading to an effective deepfake video detector. Extensive experiments prove our approach’s effectiveness, e.g., surpassing 10 state-of-the-arts at least 7.92%@AUC on challenging Celeb-DF(v2) benchmark.
越来越先进的深度伪造方法使得深度伪造视频的检测变得非常具有挑战性。我们观察到,一般深度假视频在帧之间的某些面部成分往往表现出外观级时间不一致,导致语义级特征映射之间存在可区分的时空潜在模式。受这一发现的启发,我们提出了一种称为潜在模式感知的预测代表性学习方法来捕获这些语义变化特征,用于深度假视频检测。该方法级联了一个基于cnn的编码器、一个基于convru的聚合器和一个单层二进制分类器。以自监督的方式对编码器和聚合器进行预训练,形成具有代表性的时空上下文特征。最后,训练分类器对上下文特征进行分类,区分假视频和真实视频。这样,提取的特征可以同时在空间和时间上统一地描述跨帧视频的潜在模式,从而实现有效的深度假视频检测器。大量的实验证明了我们的方法的有效性,例如,在挑战Celeb-DF(v2)基准上,超过了10个最先进的至少7.92%@AUC。
{"title":"Latent Pattern Sensing: Deepfake Video Detection via Predictive Representation Learning","authors":"Shiming Ge, Fanzhao Lin, Chenyu Li, Daichi Zhang, Jiyong Tan, Weiping Wang, Dan Zeng","doi":"10.1145/3469877.3490586","DOIUrl":"https://doi.org/10.1145/3469877.3490586","url":null,"abstract":"Increasingly advanced deepfake approaches have made the detection of deepfake videos very challenging. We observe that the general deepfake videos often exhibit appearance-level temporal inconsistencies in some facial components between frames, resulting in discriminable spatiotemporal latent patterns among semantic-level feature maps. Inspired by this finding, we propose a predictive representative learning approach termed Latent Pattern Sensing to capture these semantic change characteristics for deepfake video detection. The approach cascades a CNN-based encoder, a ConvGRU-based aggregator and a single-layer binary classifier. The encoder and aggregator are pre-trained in a self-supervised manner to form the representative spatiotemporal context features. Finally, the classifier is trained to classify the context features, distinguishing fake videos from real ones. In this manner, the extracted features can simultaneously describe the latent patterns of videos across frames spatially and temporally in a unified way, leading to an effective deepfake video detector. Extensive experiments prove our approach’s effectiveness, e.g., surpassing 10 state-of-the-arts at least 7.92%@AUC on challenging Celeb-DF(v2) benchmark.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125289993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Towards Discriminative Visual Search via Semantically Cycle-consistent Hashing Networks 基于语义循环一致哈希网络的判别视觉搜索
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490583
Zheng Zhang, Jianning Wang, Guangming Lu
Deep hashing has shown great potentials in large-scale visual similarity search due to preferable storage and computation efficiency. Typically, deep hashing encodes visual features into compact binary codes by preserving representative semantic visual features. Works in this area mainly focus on building the relationship between the visual and objective hash space, while they seldom study the triadic cross-domain semantic knowledge transfer among visual, semantic and hashing spaces, leading to serious semantic ignorance problem during space transformation. In this paper, we propose a novel deep tripartite semantically interactive hashing framework, dubbed Semantically Cycle-consistent Hashing Networks (SCHN), for discriminative hash code learning. Particularly, we construct a flexible semantic space and a transitive latent space, in conjunction with the visual space, to jointly deduce the privileged discriminative hash space. Specifically, a semantic space is conceived to strengthen the flexibility and completeness of categories in feature inference. Moreover, a transitive latent space is formulated to explore the shared semantic interactivity embedded in visual and semantic features. Our SCHN, for the first time, establishes the cyclic principle of deep semantic-preserving hashing by adaptive semantic parsing across different spaces in visual similarity search. In addition, the entire learning framework is jointly optimized in an end-to-end manner. Extensive experiments performed on diverse large-scale datasets evidence the superiority of our method against other state-of-the-art deep hashing algorithms.
由于较好的存储效率和计算效率,深度哈希在大规模视觉相似搜索中显示出巨大的潜力。通常,深度哈希通过保留具有代表性的语义视觉特征,将视觉特征编码为紧凑的二进制代码。该领域的工作主要集中在建立视觉哈希空间与客观哈希空间之间的关系,而很少研究视觉、语义和哈希空间之间的三元跨域语义知识转移,导致空间转换过程中存在严重的语义忽略问题。在本文中,我们提出了一种新的深度三方语义交互哈希框架,称为语义循环一致哈希网络(SCHN),用于判别哈希码学习。特别地,我们构建了一个灵活的语义空间和一个传递潜空间,结合视觉空间,共同推导出特权判别哈希空间。具体来说,语义空间是用来增强特征推理中类别的灵活性和完整性的。此外,我们还建立了一个传递潜空间来探索嵌入在视觉和语义特征中的共享语义交互性。我们的SCHN首次在视觉相似性搜索中通过跨不同空间的自适应语义解析建立了深度语义保持哈希的循环原理。此外,整个学习框架以端到端方式共同优化。在不同的大规模数据集上进行的大量实验证明了我们的方法相对于其他最先进的深度哈希算法的优越性。
{"title":"Towards Discriminative Visual Search via Semantically Cycle-consistent Hashing Networks","authors":"Zheng Zhang, Jianning Wang, Guangming Lu","doi":"10.1145/3469877.3490583","DOIUrl":"https://doi.org/10.1145/3469877.3490583","url":null,"abstract":"Deep hashing has shown great potentials in large-scale visual similarity search due to preferable storage and computation efficiency. Typically, deep hashing encodes visual features into compact binary codes by preserving representative semantic visual features. Works in this area mainly focus on building the relationship between the visual and objective hash space, while they seldom study the triadic cross-domain semantic knowledge transfer among visual, semantic and hashing spaces, leading to serious semantic ignorance problem during space transformation. In this paper, we propose a novel deep tripartite semantically interactive hashing framework, dubbed Semantically Cycle-consistent Hashing Networks (SCHN), for discriminative hash code learning. Particularly, we construct a flexible semantic space and a transitive latent space, in conjunction with the visual space, to jointly deduce the privileged discriminative hash space. Specifically, a semantic space is conceived to strengthen the flexibility and completeness of categories in feature inference. Moreover, a transitive latent space is formulated to explore the shared semantic interactivity embedded in visual and semantic features. Our SCHN, for the first time, establishes the cyclic principle of deep semantic-preserving hashing by adaptive semantic parsing across different spaces in visual similarity search. In addition, the entire learning framework is jointly optimized in an end-to-end manner. Extensive experiments performed on diverse large-scale datasets evidence the superiority of our method against other state-of-the-art deep hashing algorithms.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121589772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Dedark+Detection: A Hybrid Scheme for Object Detection under Low-light Surveillance Dedark+Detection:一种低光监视下的混合目标检测方案
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3497691
Xiaolei Luo, S. Xiang, Yingfeng Wang, Qiong Liu, You Yang, Kejun Wu
Object detection under low-light surveillance is a crucial problem that less efforts have been made on it. In this paper, we proposed a hybrid method that jointly use enhancement and object detection for the above challenge, namely Dedark+Detection. In this method, the low-light surveillance video is processed by the proposed de-dark method, and the video can thus be converted to appearance under normal lighting condition. This enhancement bring more benefits to the subsequent stage of object detection. After that, an object detection network is trained on the enhanced dataset for practical applications under low-light surveillance. Experiments are performed on 18 low-light surveillance video test sequences, and superior performance can be found when comparing to state-of-the-arts.
微光监视下的目标检测是一个关键问题,但目前研究较少。本文针对上述挑战,提出了一种结合增强和目标检测的混合方法,即Dedark+ detection。该方法对低照度监控视频进行去暗处理,将视频转换为正常光照条件下的图像。这种增强为后续阶段的目标检测带来了更多的好处。然后,在增强数据集上训练目标检测网络,用于弱光监控的实际应用。在18个弱光监控视频测试序列上进行了实验,与最先进的技术相比,可以发现优越的性能。
{"title":"Dedark+Detection: A Hybrid Scheme for Object Detection under Low-light Surveillance","authors":"Xiaolei Luo, S. Xiang, Yingfeng Wang, Qiong Liu, You Yang, Kejun Wu","doi":"10.1145/3469877.3497691","DOIUrl":"https://doi.org/10.1145/3469877.3497691","url":null,"abstract":"Object detection under low-light surveillance is a crucial problem that less efforts have been made on it. In this paper, we proposed a hybrid method that jointly use enhancement and object detection for the above challenge, namely Dedark+Detection. In this method, the low-light surveillance video is processed by the proposed de-dark method, and the video can thus be converted to appearance under normal lighting condition. This enhancement bring more benefits to the subsequent stage of object detection. After that, an object detection network is trained on the enhanced dataset for practical applications under low-light surveillance. Experiments are performed on 18 low-light surveillance video test sequences, and superior performance can be found when comparing to state-of-the-arts.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131183674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Zero-shot Recognition with Image Attributes Generation using Hierarchical Coupled Dictionary Learning 使用层次耦合字典学习生成图像属性的零射击识别
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490613
Shuang Li, Lichun Wang, Shaofan Wang, Dehui Kong, Baocai Yin
Zero-shot learning (ZSL) aims to recognize images from unseen (novel) classes with the training images from seen classes. The attributes of each class is exploited as auxiliary semantic information. Recently most ZSL approaches focus on learning visual-semantic embeddings to transfer knowledge from the seen classes to the unseen classes. However, few works study whether the auxiliary semantic information in the class-level is extensive enough or not for the ZSL task. To tackle such problem, we propose a hierarchical coupled dictionary learning (HCDL) approach to hierarchically align the visual-semantic structures in both the class-level and the image-level. Firstly, the class-level coupled dictionary is trained to establish a basic connection between visual space and semantic space. Then, the image attributes are generated based on the basic connection. Finally, the fine-grained information can be embedded by training the image-level coupled dictionary. Zero-shot recognition is performed in multiple spaces by searching the nearest neighbor class of the unseen image. Experiments on two widely used benchmark datasets show the effectiveness of the proposed approach.
零射击学习(Zero-shot learning, ZSL)的目的是利用已见类的训练图像来识别未见(新)类的图像。每个类的属性被用作辅助语义信息。最近,大多数ZSL方法专注于学习视觉语义嵌入,以将知识从可见类转移到不可见类。然而,很少有研究类层面的辅助语义信息是否足够广泛地适用于ZSL任务。为了解决这一问题,我们提出了一种层次耦合字典学习(HCDL)方法,在类级和图像级对视觉语义结构进行层次对齐。首先,训练类级耦合字典,建立视觉空间和语义空间之间的基本联系;然后,根据基本连接生成图像属性。最后,通过训练图像级耦合字典来嵌入细粒度信息。通过搜索未见图像的最近邻类,在多个空间中进行零射击识别。在两个广泛使用的基准数据集上的实验表明了该方法的有效性。
{"title":"Zero-shot Recognition with Image Attributes Generation using Hierarchical Coupled Dictionary Learning","authors":"Shuang Li, Lichun Wang, Shaofan Wang, Dehui Kong, Baocai Yin","doi":"10.1145/3469877.3490613","DOIUrl":"https://doi.org/10.1145/3469877.3490613","url":null,"abstract":"Zero-shot learning (ZSL) aims to recognize images from unseen (novel) classes with the training images from seen classes. The attributes of each class is exploited as auxiliary semantic information. Recently most ZSL approaches focus on learning visual-semantic embeddings to transfer knowledge from the seen classes to the unseen classes. However, few works study whether the auxiliary semantic information in the class-level is extensive enough or not for the ZSL task. To tackle such problem, we propose a hierarchical coupled dictionary learning (HCDL) approach to hierarchically align the visual-semantic structures in both the class-level and the image-level. Firstly, the class-level coupled dictionary is trained to establish a basic connection between visual space and semantic space. Then, the image attributes are generated based on the basic connection. Finally, the fine-grained information can be embedded by training the image-level coupled dictionary. Zero-shot recognition is performed in multiple spaces by searching the nearest neighbor class of the unseen image. Experiments on two widely used benchmark datasets show the effectiveness of the proposed approach.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133254279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
PBNet: Position-specific Text-to-image Generation by Boundary PBNet:根据边界生成特定位置的文本到图像
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3493594
Tian Tian, Li Liu, Huaxiang Zhang, Dongmei Liu
Most existing methods focus on improving the clarity and semantic consistency of the image with a given text, but do not pay attention to the multiple control of generated image content, such as the position of the object in generated image. In this paper, we introduce a novel position-based generative network (PBNet) which can generate fine-grained images with the object at the specified location. PBNet combines iterative structure with generative adversarial network (GAN). A location information embedding module (LIEM) is proposed to combine the location information extracted from the boundary block image with the semantic information extracted from the text. In addition, a silhouette generation module (SGM) is proposed to train the generator to generate object based on location information. The experimental results on CUB dataset demonstrate that PBNet effectively controls the location of the object in the generated image.
现有的方法大多侧重于提高图像与给定文本的清晰度和语义一致性,而没有注意对生成图像内容的多重控制,例如对象在生成图像中的位置。本文介绍了一种新的基于位置的生成网络(PBNet),该网络可以生成具有指定位置目标的细粒度图像。PBNet将迭代结构与生成对抗网络(GAN)相结合。提出了一种位置信息嵌入模块,将从边界块图像中提取的位置信息与从文本中提取的语义信息相结合。此外,提出了一个轮廓生成模块(SGM)来训练生成器根据位置信息生成目标。在CUB数据集上的实验结果表明,PBNet可以有效地控制目标在生成图像中的位置。
{"title":"PBNet: Position-specific Text-to-image Generation by Boundary","authors":"Tian Tian, Li Liu, Huaxiang Zhang, Dongmei Liu","doi":"10.1145/3469877.3493594","DOIUrl":"https://doi.org/10.1145/3469877.3493594","url":null,"abstract":"Most existing methods focus on improving the clarity and semantic consistency of the image with a given text, but do not pay attention to the multiple control of generated image content, such as the position of the object in generated image. In this paper, we introduce a novel position-based generative network (PBNet) which can generate fine-grained images with the object at the specified location. PBNet combines iterative structure with generative adversarial network (GAN). A location information embedding module (LIEM) is proposed to combine the location information extracted from the boundary block image with the semantic information extracted from the text. In addition, a silhouette generation module (SGM) is proposed to train the generator to generate object based on location information. The experimental results on CUB dataset demonstrate that PBNet effectively controls the location of the object in the generated image.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131580782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Cross-stitch Graph Convolutional Networks 自适应十字绣图卷积网络
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3495643
Zehui Hu, Zidong Su, Yangding Li, Junbo Ma
Graph convolutional networks (GCN) have been widely used in processing graphs and networks data. However, some recent research experiments show that the existing graph convolutional networks have isseus when integrating node features and topology structure. In order to remedy the weakness, we propose a new GCN architecture. Firstly, the proposed architecture introduces the cross-stitch networks into GCN with improved cross-stitch units. Cross-stitch networks spread information/knowledge between node features and topology structure, and obtains consistent learned representation by integrating information of node features and topology structure at the same time. Therefore, the proposed model can capture various channel information in all images through multiple channels. Secondly, an attention mechanism is to further extract the most relevant information between channel embeddings. Experiments on six benchmark datasets shows that our method outperforms all comparison methods on different evaluation indicators.
图卷积网络(GCN)在处理图和网络数据方面得到了广泛的应用。然而,最近的一些研究实验表明,现有的图卷积网络在整合节点特征和拓扑结构时存在缺陷。为了弥补这一缺陷,我们提出了一种新的GCN架构。首先,采用改进的十字绣单元将十字绣网络引入GCN。十字绣网络在节点特征和拓扑结构之间传播信息/知识,通过同时整合节点特征和拓扑结构的信息,得到一致的学习表示。因此,该模型可以通过多个通道捕获所有图像中的各种通道信息。其次,采用注意机制进一步提取频道嵌入之间最相关的信息。在六个基准数据集上的实验表明,我们的方法在不同的评价指标上优于所有的比较方法。
{"title":"Adaptive Cross-stitch Graph Convolutional Networks","authors":"Zehui Hu, Zidong Su, Yangding Li, Junbo Ma","doi":"10.1145/3469877.3495643","DOIUrl":"https://doi.org/10.1145/3469877.3495643","url":null,"abstract":"Graph convolutional networks (GCN) have been widely used in processing graphs and networks data. However, some recent research experiments show that the existing graph convolutional networks have isseus when integrating node features and topology structure. In order to remedy the weakness, we propose a new GCN architecture. Firstly, the proposed architecture introduces the cross-stitch networks into GCN with improved cross-stitch units. Cross-stitch networks spread information/knowledge between node features and topology structure, and obtains consistent learned representation by integrating information of node features and topology structure at the same time. Therefore, the proposed model can capture various channel information in all images through multiple channels. Secondly, an attention mechanism is to further extract the most relevant information between channel embeddings. Experiments on six benchmark datasets shows that our method outperforms all comparison methods on different evaluation indicators.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123061015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
BAND: A Benchmark Dataset forBangla News Audio Classification BAND:孟加拉语新闻音频分类的基准数据集
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490575
Md. Rafi Ur Rashid, Mahim Mahbub, Muhammad Abdullah Adnan
Despite being the sixth most widely spoken language in the world, Bangla has barely received any attention in the domain of audio-visual news classification. In this work, we collect, annotate, and prepare a comprehensive news audio dataset in Bangla, comprising 5120 news clips, with around 820 hours of total duration. We also conduct practical experiments to obtain a human baseline for the news audio classification task. Later, we implement one of the human approaches by performing news classification directly on the audio features using various state-of-the-art classifiers and a few transfer learning models. To the best of our knowledge, this is the very first work developing a benchmark dataset for news audio classification in Bangla.
尽管孟加拉语是世界上第六大最广泛使用的语言,但它在视听新闻分类领域几乎没有受到任何关注。在这项工作中,我们收集、注释并准备了一个全面的孟加拉国新闻音频数据集,其中包括5120个新闻片段,总时长约为820小时。我们还进行了实际实验,以获得新闻音频分类任务的人类基线。稍后,我们通过使用各种最先进的分类器和一些迁移学习模型直接对音频特征执行新闻分类来实现一种人类方法。据我们所知,这是第一个为孟加拉国的新闻音频分类开发基准数据集的工作。
{"title":"BAND: A Benchmark Dataset forBangla News Audio Classification","authors":"Md. Rafi Ur Rashid, Mahim Mahbub, Muhammad Abdullah Adnan","doi":"10.1145/3469877.3490575","DOIUrl":"https://doi.org/10.1145/3469877.3490575","url":null,"abstract":"Despite being the sixth most widely spoken language in the world, Bangla has barely received any attention in the domain of audio-visual news classification. In this work, we collect, annotate, and prepare a comprehensive news audio dataset in Bangla, comprising 5120 news clips, with around 820 hours of total duration. We also conduct practical experiments to obtain a human baseline for the news audio classification task. Later, we implement one of the human approaches by performing news classification directly on the audio features using various state-of-the-art classifiers and a few transfer learning models. To the best of our knowledge, this is the very first work developing a benchmark dataset for news audio classification in Bangla.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124485321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Inter-modality Discordance for Multimodal Fake News Detection 多模态假新闻检测中的多模态不一致
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490614
Shivangi Singhal, Mudit Dhawan, R. Shah, P. Kumaraguru
The paradigm shift in the consumption of news via online platforms has cultivated the growth of digital journalism. Contrary to traditional media, lowering entry barriers and enabling everyone to be part of content creation have disabled the concept of centralized gatekeeping in digital journalism. This in turn has triggered the production of fake news. Current studies have made a significant effort towards multimodal fake news detection with less emphasis on exploring the discordance between the different multimedia present in a news article. We hypothesize that fabrication of either modality will lead to dissonance between the modalities, and resulting in misrepresented, misinterpreted and misleading news. In this paper, we inspect the authenticity of news coming from online media outlets by exploiting relationship (discordance) between the textual and multiple visual cues. We develop an inter-modality discordance based fake news detection framework to achieve the goal. The modal-specific discriminative features are learned, employing the cross-entropy loss and a modified version of contrastive loss that explores the inter-modality discordance. To the best of our knowledge, this is the first work that leverages information from different components of the news article (i.e., headline, body, and multiple images) for multimodal fake news detection. We conduct extensive experiments on the real-world datasets to show that our approach outperforms the state-of-the-art by an average F1-score of 6.3%.
通过在线平台消费新闻的模式转变促进了数字新闻的发展。与传统媒体不同的是,降低进入门槛,让每个人都能参与内容创作,这使得数字新闻行业的中心化把关的概念失效。这反过来又引发了假新闻的产生。目前的研究已经在多模态假新闻检测方面做出了很大的努力,但对新闻文章中不同多媒体之间的不一致性的探索却很少。我们假设,任何一种模态的捏造都会导致模态之间的不和谐,并导致歪曲、误解和误导性的新闻。在本文中,我们通过利用文本和多个视觉线索之间的关系(不一致性)来检验来自网络媒体的新闻的真实性。我们开发了一个基于模态间不一致的假新闻检测框架来实现这一目标。学习特定模态的判别特征,采用交叉熵损失和改进版本的对比损失来探索模态间的不一致性。据我们所知,这是第一个利用新闻文章的不同组成部分(即标题、正文和多个图像)进行多模态假新闻检测的工作。我们在真实世界的数据集上进行了广泛的实验,表明我们的方法比最先进的方法平均f1得分高6.3%。
{"title":"Inter-modality Discordance for Multimodal Fake News Detection","authors":"Shivangi Singhal, Mudit Dhawan, R. Shah, P. Kumaraguru","doi":"10.1145/3469877.3490614","DOIUrl":"https://doi.org/10.1145/3469877.3490614","url":null,"abstract":"The paradigm shift in the consumption of news via online platforms has cultivated the growth of digital journalism. Contrary to traditional media, lowering entry barriers and enabling everyone to be part of content creation have disabled the concept of centralized gatekeeping in digital journalism. This in turn has triggered the production of fake news. Current studies have made a significant effort towards multimodal fake news detection with less emphasis on exploring the discordance between the different multimedia present in a news article. We hypothesize that fabrication of either modality will lead to dissonance between the modalities, and resulting in misrepresented, misinterpreted and misleading news. In this paper, we inspect the authenticity of news coming from online media outlets by exploiting relationship (discordance) between the textual and multiple visual cues. We develop an inter-modality discordance based fake news detection framework to achieve the goal. The modal-specific discriminative features are learned, employing the cross-entropy loss and a modified version of contrastive loss that explores the inter-modality discordance. To the best of our knowledge, this is the first work that leverages information from different components of the news article (i.e., headline, body, and multiple images) for multimodal fake news detection. We conduct extensive experiments on the real-world datasets to show that our approach outperforms the state-of-the-art by an average F1-score of 6.3%.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124346528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Source-Style Transferred Mean Teacher for Source-data Free Object Detection 用于无源数据对象检测的源样式转移平均教师
Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490584
Dan Zhang, Mao Ye, Lin Xiong, Shuaifeng Li, Xue Li
Unsupervised cross-domain object detection transfers a detection model trained on a source domain to the target domain that has a different data distribution from the source domain. Conventional domain adaptation detection protocols need source domain data during adaptation. However, due to some reasons such as data security, privacy and storage, we cannot access the source data in many practical applications. In this paper, we focus on source-data free domain adaptive object detection, which uses the pre-trained source model instead of the source data for cross-domain adaptation. Due to the lack of source data, we cannot directly align domain distribution between domains. To challenge this, we propose the Source style transferred Mean Teacher (SMT) for source-data free Object Detection. The batch normalization layers in the pre-trained model contain the style information and the data distribution of the non-observed source data. Thus we use the batch normalization information from the pre-trained source model to transfer the target domain feature to the source-like style feature to make full use of the knowledge from the pre-trained source model. Meanwhile, we use the consistent regularization of the Mean Teacher to further distill the knowledge from the source domain to the target domain. Furthermore, we found that by adding perturbations associated with the target domain distribution, the model can increase the robustness of domain-specific information, thus making the learned model generalized to the target domain. Experiments on multiple domain adaptation object detection benchmarks verify that our method is able to achieve state-of-the-art performance.
无监督跨域目标检测将在源域上训练的检测模型转移到与源域数据分布不同的目标域。传统的域自适应检测协议在自适应过程中需要使用源域数据。然而,由于数据安全、隐私和存储等原因,在很多实际应用中我们无法访问源数据。本文主要研究无源数据的域自适应目标检测,即使用预训练的源模型代替源数据进行跨域自适应。由于缺乏源数据,我们无法直接对齐域之间的域分布。为了挑战这一点,我们提出了源样式转移平均教师(SMT)用于无源数据的目标检测。预训练模型中的批规范化层包含非观测源数据的样式信息和数据分布。利用预训练源模型的批归一化信息,将目标域特征转化为类源样式特征,充分利用预训练源模型的知识。同时,利用Mean Teacher的一致性正则化,进一步将知识从源域提炼到目标域。此外,我们发现通过加入与目标域分布相关的扰动,可以提高模型对特定域信息的鲁棒性,从而使学习到的模型推广到目标域。在多域自适应目标检测基准上的实验验证了我们的方法能够达到最先进的性能。
{"title":"Source-Style Transferred Mean Teacher for Source-data Free Object Detection","authors":"Dan Zhang, Mao Ye, Lin Xiong, Shuaifeng Li, Xue Li","doi":"10.1145/3469877.3490584","DOIUrl":"https://doi.org/10.1145/3469877.3490584","url":null,"abstract":"Unsupervised cross-domain object detection transfers a detection model trained on a source domain to the target domain that has a different data distribution from the source domain. Conventional domain adaptation detection protocols need source domain data during adaptation. However, due to some reasons such as data security, privacy and storage, we cannot access the source data in many practical applications. In this paper, we focus on source-data free domain adaptive object detection, which uses the pre-trained source model instead of the source data for cross-domain adaptation. Due to the lack of source data, we cannot directly align domain distribution between domains. To challenge this, we propose the Source style transferred Mean Teacher (SMT) for source-data free Object Detection. The batch normalization layers in the pre-trained model contain the style information and the data distribution of the non-observed source data. Thus we use the batch normalization information from the pre-trained source model to transfer the target domain feature to the source-like style feature to make full use of the knowledge from the pre-trained source model. Meanwhile, we use the consistent regularization of the Mean Teacher to further distill the knowledge from the source domain to the target domain. Furthermore, we found that by adding perturbations associated with the target domain distribution, the model can increase the robustness of domain-specific information, thus making the learned model generalized to the target domain. Experiments on multiple domain adaptation object detection benchmarks verify that our method is able to achieve state-of-the-art performance.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"211 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115937888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
ACM Multimedia Asia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1