首页 > 最新文献

Information Processing & Management最新文献

英文 中文
Few-shot multi-hop reasoning via reinforcement learning and path search strategy over temporal knowledge graphs 基于时间知识图的强化学习和路径搜索策略的少射多跳推理
IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-01 DOI: 10.1016/j.ipm.2024.104001
Luyi Bai, Han Zhang, Xuanxuan An, Lin Zhu
Multi-hop reasoning on knowledge graphs is an important way to complete the knowledge graph. However, existing multi-hop reasoning methods often perform poorly in few-shot scenarios and primarily focus on static knowledge graphs, neglecting to model the dynamic changes of events over time in Temporal Knowledge Graphs (TKGs). Therefore, in this paper, we consider the few-shot multi-hop reasoning task on TKGs and propose a few-shot multi-hop reasoning model for TKGs (TFSM), which uses a reinforcement learning framework to improve model interpretability and introduces the one-hop neighbors of the task entity to consider the impact of previous events on the representation of current task entity. In order to reduce the cost of searching complex nodes, our model adopts a strategy based on path search and prunes the search space by considering the correlation between existing paths and the current state. Compared to the baseline method, our model achieved 5-shot Few-shot Temporal Knowledge Graph (FTKG) performance improvements of 1.0% ∼ 18.9% on ICEWS18-few, 0.6% ∼ 22.9% on ICEWS14-few, and 0.7% ∼ 10.5% on GDELT-few. Extensive experiments show that TFSM outperforms existing models on most metrics on the commonly used benchmark datasets ICEWS18-few, ICEWS14-few, and GDELT-few. Furthermore, ablation experiments demonstrated the effectiveness of each part of our model. In addition, we demonstrate the interpretability of the model by performing path analysis with a path search-based strategy.
知识图上的多跳推理是完善知识图的重要途径。然而,现有的多跳推理方法往往在少数场景下表现不佳,并且主要关注静态知识图,而忽略了在时间知识图(TKGs)中对事件随时间的动态变化进行建模。因此,本文考虑了TKGs上的少跳多推理任务,提出了TKGs的少跳多推理模型(TFSM),该模型使用强化学习框架来提高模型的可解释性,并引入任务实体的一跳邻居来考虑先前事件对当前任务实体表示的影响。为了降低搜索复杂节点的代价,我们的模型采用基于路径搜索的策略,通过考虑已有路径与当前状态之间的相关性,对搜索空间进行修剪。与基线方法相比,我们的模型在ICEWS18-few上实现了5次时间知识图(FTKG)的性能改进,分别为1.0% ~ 18.9%、0.6% ~ 22.9%和0.7% ~ 10.5%。大量的实验表明,在常用的基准数据集ICEWS18-few、ICEWS14-few和GDELT-few上,TFSM在大多数指标上优于现有模型。此外,烧蚀实验证明了模型各部分的有效性。此外,我们通过使用基于路径搜索的策略执行路径分析来证明模型的可解释性。
{"title":"Few-shot multi-hop reasoning via reinforcement learning and path search strategy over temporal knowledge graphs","authors":"Luyi Bai,&nbsp;Han Zhang,&nbsp;Xuanxuan An,&nbsp;Lin Zhu","doi":"10.1016/j.ipm.2024.104001","DOIUrl":"10.1016/j.ipm.2024.104001","url":null,"abstract":"<div><div>Multi-hop reasoning on knowledge graphs is an important way to complete the knowledge graph. However, existing multi-hop reasoning methods often perform poorly in few-shot scenarios and primarily focus on static knowledge graphs, neglecting to model the dynamic changes of events over time in Temporal Knowledge Graphs (TKGs). Therefore, in this paper, we consider the few-shot multi-hop reasoning task on TKGs and propose a few-shot multi-hop reasoning model for TKGs (TFSM), which uses a reinforcement learning framework to improve model interpretability and introduces the one-hop neighbors of the task entity to consider the impact of previous events on the representation of current task entity. In order to reduce the cost of searching complex nodes, our model adopts a strategy based on path search and prunes the search space by considering the correlation between existing paths and the current state. Compared to the baseline method, our model achieved 5-shot Few-shot Temporal Knowledge Graph (FTKG) performance improvements of 1.0% ∼ 18.9% on ICEWS18-few, 0.6% ∼ 22.9% on ICEWS14-few, and 0.7% ∼ 10.5% on GDELT-few. Extensive experiments show that TFSM outperforms existing models on most metrics on the commonly used benchmark datasets ICEWS18-few, ICEWS14-few, and GDELT-few. Furthermore, ablation experiments demonstrated the effectiveness of each part of our model. In addition, we demonstrate the interpretability of the model by performing path analysis with a path search-based strategy.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 3","pages":"Article 104001"},"PeriodicalIF":7.4,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142756610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Basis is also explanation: Interpretable Legal Judgment Reasoning prompted by multi-source knowledge 依据也是解释:多源知识提示下的可解释性法律判决推理
IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-29 DOI: 10.1016/j.ipm.2024.103996
Shangyuan Li , Shiman Zhao , Zhuoran Zhang , Zihao Fang , Wei Chen , Tengjiao Wang
The task of Legal Judgment Prediction (LJP) aims to forecast case outcomes by analyzing fact descriptions, playing a pivotal role in enhancing judicial system efficiency and fairness. Existing LJP methods primarily focus on improving representations of fact descriptions to enhance judgment performance. However, these methods typically depend on the superficial case information and neglect the underlying legal basis, resulting in a lack of in-depth reasoning and interpretability in the judgment process of long-tail or confusing cases. Recognizing that the basis for judgments in real-world legal contexts encompasses both factual logic and related legal knowledge, we introduce the interpretable legal judgment reasoning framework with multi-source knowledge prompted. The essence of this framework is to transform the implicit factual logic of cases and external legal knowledge into explicit basis for judgment, aiming to enhance not only the accuracy of judgment predictions but also the interpretability of the reasoning process. Specifically, we design a chain prompt reasoning module that guides a large language model to elucidate factual logic basis through incremental reasoning, aligning the model prior knowledge with task-oriented knowledge in the process. To match the above fact-based information with legal knowledge basis, we propose a contrastive knowledge fusing module to inject external statutes knowledge into the fact description embedding. It pushes away the distance of similar knowledge in the semantic space during the encoding of external knowledge base without manual annotation, thus improving the judgment prediction performance of long-tail and confusing cases. Experimental results on two real datasets indicate that our framework significantly outperforms existing LJP baseline methods in accuracy and interpretability, achieving new state-of-the-art performance. In addition, tests on specially constructed long-tail and confusing case datasets demonstrate that the proposed framework possesses improved generalization abilities for predicting these complex cases.
法律判决预测的任务是通过分析事实描述来预测案件结果,在提高司法系统效率和公正方面发挥着关键作用。现有的LJP方法主要侧重于改进事实描述的表示,以提高判断性能。然而,这些方法往往依赖于表面的案件信息,忽视了潜在的法律依据,导致在长尾案件或混淆案件的判决过程中缺乏深入的推理和可解释性。认识到现实世界法律环境中判决的基础既包括事实逻辑和相关法律知识,我们引入了多源知识提示的可解释法律判决推理框架。这一框架的实质是将案件的隐性事实逻辑和外部法律知识转化为明确的判断依据,旨在提高判断预测的准确性和推理过程的可解释性。具体而言,我们设计了一个链式提示推理模块,引导一个大型语言模型通过增量推理来阐明事实逻辑基础,并在此过程中将模型先验知识与任务导向知识对齐。为了将上述事实信息与法律知识基础相匹配,我们提出了一个对比知识融合模块,将外部法规知识注入事实描述嵌入中。在外部知识库编码过程中,不需要人工标注,将语义空间中相似知识的距离推远,从而提高了长尾和混淆案例的判断预测性能。在两个真实数据集上的实验结果表明,我们的框架在准确性和可解释性方面明显优于现有的LJP基线方法,实现了新的最先进的性能。此外,对特殊构建的长尾和混淆案例数据集的测试表明,该框架在预测这些复杂案例方面具有更好的泛化能力。
{"title":"Basis is also explanation: Interpretable Legal Judgment Reasoning prompted by multi-source knowledge","authors":"Shangyuan Li ,&nbsp;Shiman Zhao ,&nbsp;Zhuoran Zhang ,&nbsp;Zihao Fang ,&nbsp;Wei Chen ,&nbsp;Tengjiao Wang","doi":"10.1016/j.ipm.2024.103996","DOIUrl":"10.1016/j.ipm.2024.103996","url":null,"abstract":"<div><div>The task of Legal Judgment Prediction (LJP) aims to forecast case outcomes by analyzing fact descriptions, playing a pivotal role in enhancing judicial system efficiency and fairness. Existing LJP methods primarily focus on improving representations of fact descriptions to enhance judgment performance. However, these methods typically depend on the superficial case information and neglect the underlying legal basis, resulting in a lack of in-depth reasoning and interpretability in the judgment process of long-tail or confusing cases. Recognizing that the basis for judgments in real-world legal contexts encompasses both factual logic and related legal knowledge, we introduce the interpretable legal judgment reasoning framework with multi-source knowledge prompted. The essence of this framework is to transform the implicit factual logic of cases and external legal knowledge into explicit basis for judgment, aiming to enhance not only the accuracy of judgment predictions but also the interpretability of the reasoning process. Specifically, we design a chain prompt reasoning module that guides a large language model to elucidate factual logic basis through incremental reasoning, aligning the model prior knowledge with task-oriented knowledge in the process. To match the above fact-based information with legal knowledge basis, we propose a contrastive knowledge fusing module to inject external statutes knowledge into the fact description embedding. It pushes away the distance of similar knowledge in the semantic space during the encoding of external knowledge base without manual annotation, thus improving the judgment prediction performance of long-tail and confusing cases. Experimental results on two real datasets indicate that our framework significantly outperforms existing LJP baseline methods in accuracy and interpretability, achieving new state-of-the-art performance. In addition, tests on specially constructed long-tail and confusing case datasets demonstrate that the proposed framework possesses improved generalization abilities for predicting these complex cases.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 3","pages":"Article 103996"},"PeriodicalIF":7.4,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142744584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive CLIP for open-domain 3D model retrieval 开放域三维模型检索的自适应CLIP
IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-29 DOI: 10.1016/j.ipm.2024.103989
Dan Song , Zekai Qiang , Chumeng Zhang , Lanjun Wang , Qiong Liu , You Yang , An-An Liu
In order to effectively enhance the practicality of 3D model retrieval, we adopt a single real image as the query sample for retrieving 3D models. However, the significant differences between 2D images and 3D models in terms of lighting conditions, textures and backgrounds, posing a great challenge for accurate retrieval. Existing work on 3D model retrieval mainly focuses on closed-domain research, while the open-domain condition where the category relationship between the query image and the 3D model is unknown is more in line with the needs of real scenarios. CLIP shows significant promise in comprehending open-world visual concepts, facilitating effective zero-shot image recognition. Based on this multimodal pre-training large language model, we introduce Adaptive Open-domain Semantic Nearest-neighbor Contrast (AOSNC), a method for learning and aligning multi-modal text, image, and 3D model. In order to solve the issue of inconsistent cross-domain categories and difficult sample correlation in open-domain, we construct a cross-modal bridge using CLIP. This model utilizes textual features to bridge the gap between 2D images and 3D model views. Additionally, we design an adaptive network layer to address the limitations of the pre-training model for 3D model views and enhance cross-modal alignment. We propose a mutual nearest-neighbor semantic alignment loss to address the challenge of aligning features from disparate modalities (text, images, and 3D models). This loss function enhances cross-modal learning by effectively associating and distinguishing features, improving retrieval accuracy. We conducted comprehensive experiments using the image-based 3D model retrieval dataset MI3DOR and the cross-domain 3D model retrieval dataset NTU-PSB to validate the superiority of the proposed method. Our results show significant improvements in several evaluation metrics, underscoring the efficacy of our method in augmenting cross-modal feature alignment and retrieval performance.
为了有效增强三维模型检索的实用性,我们采用单幅真实图像作为检索三维模型的查询样本。然而,2D图像与3D模型在光照条件、纹理和背景等方面存在显著差异,这给准确检索带来了很大的挑战。现有的三维模型检索工作主要集中在闭域研究,而开放域条件下查询图像与三维模型之间的类别关系未知更符合真实场景的需要。CLIP在理解开放世界视觉概念,促进有效的零射击图像识别方面显示出重要的前景。在此多模态预训练大型语言模型的基础上,我们引入了一种多模态文本、图像和3D模型的学习和对齐方法——自适应开放域语义最近邻对比(AOSNC)。为了解决开放域中跨域分类不一致和样本关联困难的问题,我们使用CLIP构造了一个跨模态桥。该模型利用文本特征来弥合2D图像和3D模型视图之间的差距。此外,我们设计了一个自适应网络层来解决3D模型视图预训练模型的局限性,并增强了跨模态对齐。我们提出了一种相互最近邻语义对齐损失来解决来自不同模式(文本、图像和3D模型)的特征对齐的挑战。该损失函数通过有效地关联和区分特征来增强跨模态学习,提高检索精度。利用基于图像的三维模型检索数据集MI3DOR和跨域三维模型检索数据集NTU-PSB进行了综合实验,验证了所提方法的优越性。我们的研究结果显示在几个评估指标上有显著的改进,强调了我们的方法在增强跨模态特征对齐和检索性能方面的有效性。
{"title":"Adaptive CLIP for open-domain 3D model retrieval","authors":"Dan Song ,&nbsp;Zekai Qiang ,&nbsp;Chumeng Zhang ,&nbsp;Lanjun Wang ,&nbsp;Qiong Liu ,&nbsp;You Yang ,&nbsp;An-An Liu","doi":"10.1016/j.ipm.2024.103989","DOIUrl":"10.1016/j.ipm.2024.103989","url":null,"abstract":"<div><div>In order to effectively enhance the practicality of 3D model retrieval, we adopt a single real image as the query sample for retrieving 3D models. However, the significant differences between 2D images and 3D models in terms of lighting conditions, textures and backgrounds, posing a great challenge for accurate retrieval. Existing work on 3D model retrieval mainly focuses on closed-domain research, while the open-domain condition where the category relationship between the query image and the 3D model is unknown is more in line with the needs of real scenarios. CLIP shows significant promise in comprehending open-world visual concepts, facilitating effective zero-shot image recognition. Based on this multimodal pre-training large language model, we introduce Adaptive Open-domain Semantic Nearest-neighbor Contrast (AOSNC), a method for learning and aligning multi-modal text, image, and 3D model. In order to solve the issue of inconsistent cross-domain categories and difficult sample correlation in open-domain, we construct a cross-modal bridge using CLIP. This model utilizes textual features to bridge the gap between 2D images and 3D model views. Additionally, we design an adaptive network layer to address the limitations of the pre-training model for 3D model views and enhance cross-modal alignment. We propose a mutual nearest-neighbor semantic alignment loss to address the challenge of aligning features from disparate modalities (text, images, and 3D models). This loss function enhances cross-modal learning by effectively associating and distinguishing features, improving retrieval accuracy. We conducted comprehensive experiments using the image-based 3D model retrieval dataset MI3DOR and the cross-domain 3D model retrieval dataset NTU-PSB to validate the superiority of the proposed method. Our results show significant improvements in several evaluation metrics, underscoring the efficacy of our method in augmenting cross-modal feature alignment and retrieval performance.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 2","pages":"Article 103989"},"PeriodicalIF":7.4,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142747458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DCIB: Dual contrastive information bottleneck for knowledge-aware recommendation DCIB:知识感知推荐的双重对比信息瓶颈
IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-29 DOI: 10.1016/j.ipm.2024.103980
Qiang Guo , Jialong Hai , Zhongchuan Sun , Bin Wu , Yangdong Ye
Knowledge-aware recommendations effectively enhance model performance by integrating rich external information from the knowledge graphs. Graph contrastive learning methods have recently demonstrated superior results in such recommendations. However, they still face two limitations: (1) the disruption of intrinsic semantic structures caused by stochastic or predefined augmentations for constructing contrastive views, and (2) the neglect of the extrinsic semantic gap arising from the different semantic information in the user-item bipartite graph and the knowledge graph during their incorporation. To address these issues, we propose a novel Dual Contrastive Information Bottleneck (DCIB) method for the knowledge-aware recommendation, which can well preserve the intrinsic semantic structures and bridge the semantic gap to obtain complementary conducive information for learning enhanced representations. Specifically, DCIB implements contrastive learning with the information bottleneck principle (CIB) upon a collaborative view and a knowledge view. View-specific CIB is formalized to suppress the noise and distill high-quality information within each view using a devised learnable denoising module. Cross-view CIB is developed to bridge the semantic gap and fully leverage the different semantics of both views, thereby obtaining complementary information to enrich the representations. Extensive experimental results on the Last.FM, Book-Crossing, and MovieLens-1M show that DCIB outperforms existing state-of-the-art methods. Specifically, in terms of the NDCG@10 metric, DCIB obtains performance improvements of 5.78%, 7.67%, and 5.67% over the second-best methods across the three benchmarks, respectively.
知识感知推荐通过集成来自知识图的丰富外部信息,有效地提高了模型性能。图对比学习方法最近在这类推荐中表现出了优异的效果。然而,它们仍然面临两个局限性:(1)构造对比视图的随机或预定义增强对固有语义结构的破坏;(2)在合并用户-项目二部图和知识图时,由于用户-项目二部图和知识图的语义信息不同而导致的外在语义缺口被忽视。为了解决这些问题,我们提出了一种新的双对比信息瓶颈(Dual contrtional Information Bottleneck, DCIB)方法用于知识感知推荐,该方法可以很好地保留固有的语义结构并弥补语义差距,从而获得互补的有益信息,用于学习增强表征。具体来说,DCIB在协作视图和知识视图上利用信息瓶颈原理实现了对比学习。特定于视图的CIB被形式化以抑制噪声,并使用设计的可学习的去噪模块提取每个视图中的高质量信息。开发跨视图CIB是为了弥合语义差距,充分利用两种视图的不同语义,从而获得互补的信息,丰富表示。最后的广泛实验结果。FM、Book-Crossing和MovieLens-1M表明,DCIB优于现有的最先进的方法。具体来说,就NDCG@10指标而言,DCIB在三个基准测试中分别比次优方法获得了5.78%、7.67%和5.67%的性能改进。
{"title":"DCIB: Dual contrastive information bottleneck for knowledge-aware recommendation","authors":"Qiang Guo ,&nbsp;Jialong Hai ,&nbsp;Zhongchuan Sun ,&nbsp;Bin Wu ,&nbsp;Yangdong Ye","doi":"10.1016/j.ipm.2024.103980","DOIUrl":"10.1016/j.ipm.2024.103980","url":null,"abstract":"<div><div>Knowledge-aware recommendations effectively enhance model performance by integrating rich external information from the knowledge graphs. Graph contrastive learning methods have recently demonstrated superior results in such recommendations. However, they still face two limitations: (1) the disruption of intrinsic semantic structures caused by stochastic or predefined augmentations for constructing contrastive views, and (2) the neglect of the extrinsic semantic gap arising from the different semantic information in the user-item bipartite graph and the knowledge graph during their incorporation. To address these issues, we propose a novel Dual Contrastive Information Bottleneck (DCIB) method for the knowledge-aware recommendation, which can well preserve the intrinsic semantic structures and bridge the semantic gap to obtain complementary conducive information for learning enhanced representations. Specifically, DCIB implements contrastive learning with the information bottleneck principle (CIB) upon a collaborative view and a knowledge view. View-specific CIB is formalized to suppress the noise and distill high-quality information within each view using a devised learnable denoising module. Cross-view CIB is developed to bridge the semantic gap and fully leverage the different semantics of both views, thereby obtaining complementary information to enrich the representations. Extensive experimental results on the Last.FM, Book-Crossing, and MovieLens-1M show that DCIB outperforms existing state-of-the-art methods. Specifically, in terms of the NDCG@10 metric, DCIB obtains performance improvements of 5.78%, 7.67%, and 5.67% over the second-best methods across the three benchmarks, respectively.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 2","pages":"Article 103980"},"PeriodicalIF":7.4,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142747457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing rule learning in knowledge graphs with structure-aware graph transformer 利用结构感知图转换器推进知识图的规则学习
IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-29 DOI: 10.1016/j.ipm.2024.103976
Kang Xu, Miqi Chen, Yifan Feng, Zhenjiang Dong
In knowledge graphs (KGs), logic rules offer interpretable explanations for predictions and are essential for reasoning on downstream tasks, such as question answering. However, a key challenge remains unresolved: how to effectively encode and utilize the structural features around the head entity to generate the most applicable rules. This paper proposes a structure-aware graph transformer for rule learning, namely Structure-Aware Rule Learning (SARL), which leverages both local and global structural information of the subgraph around the head entity to generate the most suitable rule path. SARL employs a generalized attention mechanism combined with replaceable feature extractors to aggregate local structural information of entities. It then incorporates global structural and relational information to further model the subgraph structure. Finally, a rule decoder utilizes the comprehensive subgraph representation to generate the most appropriate rules. Comprehensive experiments on four real-world knowledge graph datasets reveal that SARL significantly enhances performance and surpasses existing methods in the link prediction task on large-scale KGs, with Hits@1 improvements of 6.5% on UMLS and 4.5% on FB15K-237.
在知识图(KGs)中,逻辑规则为预测提供可解释的解释,并且对于下游任务(如问题回答)的推理至关重要。然而,一个关键的挑战仍然没有解决:如何有效地编码和利用头部实体周围的结构特征来生成最适用的规则。本文提出了一种用于规则学习的结构感知图转换器,即结构感知规则学习(SARL),它利用头部实体周围子图的局部和全局结构信息来生成最合适的规则路径。SARL采用广义注意机制结合可替换的特征提取器对实体的局部结构信息进行聚合。然后结合全局结构和关系信息进一步建模子图结构。最后,规则解码器利用综合子图表示生成最合适的规则。在四个真实世界知识图数据集上的综合实验表明,SARL在大规模KGs的链路预测任务中显著提高了性能,并超越了现有的方法,在UMLS上提高了6.5%,在FB15K-237上提高了4.5%。
{"title":"Advancing rule learning in knowledge graphs with structure-aware graph transformer","authors":"Kang Xu,&nbsp;Miqi Chen,&nbsp;Yifan Feng,&nbsp;Zhenjiang Dong","doi":"10.1016/j.ipm.2024.103976","DOIUrl":"10.1016/j.ipm.2024.103976","url":null,"abstract":"<div><div>In knowledge graphs (KGs), logic rules offer interpretable explanations for predictions and are essential for reasoning on downstream tasks, such as question answering. However, a key challenge remains unresolved: how to effectively encode and utilize the structural features around the head entity to generate the most applicable rules. This paper proposes a structure-aware graph transformer for rule learning, namely Structure-Aware Rule Learning (SARL), which leverages both local and global structural information of the subgraph around the head entity to generate the most suitable rule path. SARL employs a generalized attention mechanism combined with replaceable feature extractors to aggregate local structural information of entities. It then incorporates global structural and relational information to further model the subgraph structure. Finally, a rule decoder utilizes the comprehensive subgraph representation to generate the most appropriate rules. Comprehensive experiments on four real-world knowledge graph datasets reveal that SARL significantly enhances performance and surpasses existing methods in the link prediction task on large-scale KGs, with Hits@1 improvements of 6.5% on UMLS and 4.5% on FB15K-237.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 2","pages":"Article 103976"},"PeriodicalIF":7.4,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142747456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extracting key insights from earnings call transcript via information-theoretic contrastive learning 通过信息论对比学习从收益电话会议记录中提取关键见解
IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-29 DOI: 10.1016/j.ipm.2024.103998
Yanlong Huang , Wenxin Tai , Fan Zhou , Qiang Gao , Ting Zhong , Kunpeng Zhang
Earnings conference calls provide critical insights into a company’s financial health, future outlook, and strategic direction. Traditionally, analysts manually analyze these lengthy transcripts to extract key information, a process that is both time-consuming and prone to bias and error. To address this, text mining tools, particularly extractive summarization, are increasingly being used to automatically extract key insights, aiming to standardize the analysis process and improve efficiency. Extractive summarization automates the selection of the most informative sentences, offering a promising solution for transcript analysis. However, existing extractive summarization techniques face several challenges, such as the lack of labeled training data, difficulties in incorporating domain-specific knowledge, and inefficiencies in handling large-scale datasets. In this work, we introduce ECT-SKIE, an information-theoretic, self-supervised approach for extracting key insights from earnings call transcripts. We leverage variational information bottleneck theory to extract insights in parallel, significantly accelerating the process. In addition, we propose a structure-aware contrastive learning strategy that enables model training without the need for labeled data. We further develop a novel container-based key sentence extractor to alleviate sentence redundancy. Using a large-scale dataset of U.S. market earnings call transcripts, we evaluate our method against nine representative baselines across three downstream tasks. Experimental results show that ECT-SKIE can consistently extract high-quality key sentences. The code is publicly available at: https://github.com/MongoTap/ECT-SKIE.
盈利电话会议提供了对公司财务状况、未来前景和战略方向的重要见解。传统上,分析人员手动分析这些冗长的文本以提取关键信息,这一过程既耗时又容易产生偏见和错误。为了解决这个问题,文本挖掘工具,特别是提取摘要,越来越多地被用于自动提取关键的见解,旨在标准化分析过程并提高效率。提取摘要自动选择最有信息的句子,为转录分析提供了一个有前途的解决方案。然而,现有的提取摘要技术面临着一些挑战,例如缺乏标记的训练数据,难以整合特定领域的知识,以及处理大规模数据集的效率低下。在这项工作中,我们介绍了ect - sky,这是一种信息论的、自我监督的方法,用于从财报电话会议记录中提取关键见解。我们利用变分信息瓶颈理论来并行提取见解,大大加快了过程。此外,我们提出了一种结构感知的对比学习策略,使模型训练不需要标记数据。我们进一步开发了一种新的基于容器的关键句子提取器,以减轻句子冗余。使用美国市场收益电话会议记录的大规模数据集,我们根据三个下游任务的九个代表性基线评估了我们的方法。实验结果表明,ect - sky能够持续提取高质量的关键句。该代码可在https://github.com/MongoTap/ECT-SKIE公开获取。
{"title":"Extracting key insights from earnings call transcript via information-theoretic contrastive learning","authors":"Yanlong Huang ,&nbsp;Wenxin Tai ,&nbsp;Fan Zhou ,&nbsp;Qiang Gao ,&nbsp;Ting Zhong ,&nbsp;Kunpeng Zhang","doi":"10.1016/j.ipm.2024.103998","DOIUrl":"10.1016/j.ipm.2024.103998","url":null,"abstract":"<div><div>Earnings conference calls provide critical insights into a company’s financial health, future outlook, and strategic direction. Traditionally, analysts manually analyze these lengthy transcripts to extract key information, a process that is both time-consuming and prone to bias and error. To address this, text mining tools, particularly extractive summarization, are increasingly being used to automatically extract key insights, aiming to standardize the analysis process and improve efficiency. Extractive summarization automates the selection of the most informative sentences, offering a promising solution for transcript analysis. However, existing extractive summarization techniques face several challenges, such as the lack of labeled training data, difficulties in incorporating domain-specific knowledge, and inefficiencies in handling large-scale datasets. In this work, we introduce ECT-SKIE, an information-theoretic, self-supervised approach for extracting key insights from earnings call transcripts. We leverage variational information bottleneck theory to extract insights in parallel, significantly accelerating the process. In addition, we propose a structure-aware contrastive learning strategy that enables model training without the need for labeled data. We further develop a novel container-based key sentence extractor to alleviate sentence redundancy. Using a large-scale dataset of U.S. market earnings call transcripts, we evaluate our method against nine representative baselines across three downstream tasks. Experimental results show that ECT-SKIE can consistently extract high-quality key sentences. The code is publicly available at: <span><span>https://github.com/MongoTap/ECT-SKIE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 3","pages":"Article 103998"},"PeriodicalIF":7.4,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142744585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Let long-term interests talk: An disentangled learning model for recommendation based on short-term interests generation 让长期利益说话:基于短期利益生成的无纠缠推荐学习模型
IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-28 DOI: 10.1016/j.ipm.2024.103997
Sirui Duan, Mengya Ouyang, Rong Wang, Qian Li, Yunpeng Xiao
In e-commerce recommendation systems, users’ long-term and short-term interests jointly influence product selection. However, the behavioral conformity phenomenon tends to be more prominent in short-term sequences, and the entanglement of true preference and popularity conformity data confuses the user’s real interest needs. To address this issue, we propose a sequential recommendation model called DFRec to disentangle short-term interests from popularity bias. By leveraging long-term interest trends, the model promotes the separation of short-term interests from popularity-driven deviations, thereby reducing the impact of popularity interference in short-term sequences. Firstly, we propose a Disentangled Frequency Attention Network(DFAN) to address the entanglement between real sequence features and conformity data in users’ short-term behavioral sequences. The approach clarify the non-entangled representation of the user’s short-term interest and conformity on the basis of long-term interest trends. Secondly, in order to capture the real long-term interest characteristics of users, this paper suggests using a Learnable Filter(LF) to filter the noise frequencies in long-term sequence. The method decouples the horizontal and vertical directions of the sequence and filters out the noise in both directions. Finally, consider the importance of the two interests characteristics is dynamic, we propose a joint learning framework with dual embeddings to balance and fusion these two features of users’ interests. Experimental results on three public datasets demonstrate that our model effectively captures dynamic user interests and outperforms six baseline models.
在电子商务推荐系统中,用户的长期利益和短期利益共同影响产品的选择。然而,行为从众现象往往在短期序列中更为突出,真实偏好和人气从众数据的纠缠混淆了用户的真实兴趣需求。为了解决这个问题,我们提出了一个称为DFRec的顺序推荐模型,以将短期利益与流行偏见分开。该模型通过利用长期利益趋势,促进了短期利益与人气驱动偏差的分离,从而降低了人气干扰对短期序列的影响。首先,我们提出了一种解纠缠频率注意网络(Disentangled Frequency Attention Network, DFAN)来解决用户短期行为序列中真实序列特征与一致性数据之间的纠缠问题。该方法在长期利益趋势的基础上阐明了用户短期利益和一致性的非纠缠表示。其次,为了捕捉用户真实的长期兴趣特征,本文建议使用可学习滤波器(LF)对长期序列中的噪声频率进行滤波。该方法对序列的水平方向和垂直方向进行解耦,并在两个方向上滤除噪声。最后,考虑到这两种兴趣特征的重要性是动态的,我们提出了一种双嵌入的联合学习框架来平衡和融合这两种用户兴趣特征。在三个公共数据集上的实验结果表明,我们的模型有效地捕获了动态用户兴趣,并且优于六个基线模型。
{"title":"Let long-term interests talk: An disentangled learning model for recommendation based on short-term interests generation","authors":"Sirui Duan,&nbsp;Mengya Ouyang,&nbsp;Rong Wang,&nbsp;Qian Li,&nbsp;Yunpeng Xiao","doi":"10.1016/j.ipm.2024.103997","DOIUrl":"10.1016/j.ipm.2024.103997","url":null,"abstract":"<div><div>In e-commerce recommendation systems, users’ long-term and short-term interests jointly influence product selection. However, the behavioral conformity phenomenon tends to be more prominent in short-term sequences, and the entanglement of true preference and popularity conformity data confuses the user’s real interest needs. To address this issue, we propose a sequential recommendation model called DFRec to disentangle short-term interests from popularity bias. By leveraging long-term interest trends, the model promotes the separation of short-term interests from popularity-driven deviations, thereby reducing the impact of popularity interference in short-term sequences. Firstly, we propose a Disentangled Frequency Attention Network(DFAN) to address the entanglement between real sequence features and conformity data in users’ short-term behavioral sequences. The approach clarify the non-entangled representation of the user’s short-term interest and conformity on the basis of long-term interest trends. Secondly, in order to capture the real long-term interest characteristics of users, this paper suggests using a Learnable Filter(LF) to filter the noise frequencies in long-term sequence. The method decouples the horizontal and vertical directions of the sequence and filters out the noise in both directions. Finally, consider the importance of the two interests characteristics is dynamic, we propose a joint learning framework with dual embeddings to balance and fusion these two features of users’ interests. Experimental results on three public datasets demonstrate that our model effectively captures dynamic user interests and outperforms six baseline models.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 2","pages":"Article 103997"},"PeriodicalIF":7.4,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142747460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trust driven On-Demand scheme for client deployment in Federated Learning 联邦学习中客户端部署的信任驱动按需方案
IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-28 DOI: 10.1016/j.ipm.2024.103991
Mario Chahoud , Azzam Mourad , Hadi Otrok , Jamal Bentahar , Mohsen Guizani
Containerization technology plays a crucial role in Federated Learning (FL) setups, expanding the pool of potential clients and ensuring the availability of specific subsets for each learning iteration. However, doubts arise about the trustworthiness of devices deployed as clients in FL scenarios, especially when container deployment processes are involved. Addressing these challenges is important, particularly in managing potentially malicious clients capable of disrupting the learning process or compromising the entire model. In our research, we are motivated to integrate a trust element into the client selection and model deployment processes within our system architecture. This is a feature lacking in the initial client selection and deployment mechanism of the On-Demand architecture. We introduce a trust mechanism, named “Trusted-On-Demand-FL”, which establishes a relationship of trust between the server and the pool of eligible clients. Utilizing Docker in our deployment strategy enables us to monitor and validate participant actions effectively, ensuring strict adherence to agreed-upon protocols while strengthening defenses against unauthorized data access or tampering. Our simulations rely on continuous user behavior datasets, deploying an optimization model powered by a genetic algorithm to efficiently select clients for participation. By assigning trust values to individual clients and dynamically adjusting these values, combined with penalizing malicious clients through decreased trust scores, our proposed framework identifies and isolates harmful clients. This approach not only reduces disruptions to regular rounds but also minimizes instances of round dismissal, Consequently enhancing both system stability and security.
容器化技术在联邦学习(FL)设置中起着至关重要的作用,它扩展了潜在客户池,并确保每个学习迭代的特定子集的可用性。然而,对于在FL场景中作为客户机部署的设备的可靠性,特别是涉及到容器部署过程时,会产生疑问。解决这些挑战非常重要,特别是在管理可能破坏学习过程或危及整个模型的潜在恶意客户端方面。在我们的研究中,我们被激励将信任元素集成到我们的系统架构中的客户端选择和模型部署过程中。这是按需架构的初始客户端选择和部署机制所缺乏的特性。我们引入了一种名为“Trusted-On-Demand-FL”的信任机制,它在服务器和符合条件的客户端池之间建立了信任关系。在我们的部署策略中使用Docker使我们能够有效地监控和验证参与者的操作,确保严格遵守商定的协议,同时加强对未经授权的数据访问或篡改的防御。我们的模拟依赖于连续的用户行为数据集,部署了一个由遗传算法驱动的优化模型,以有效地选择客户参与。通过为单个客户端分配信任值并动态调整这些值,结合通过降低信任分数来惩罚恶意客户端,我们提出的框架识别并隔离有害客户端。这种方法不仅减少了对常规轮次的干扰,而且最大限度地减少了轮次解雇的情况,从而提高了系统的稳定性和安全性。
{"title":"Trust driven On-Demand scheme for client deployment in Federated Learning","authors":"Mario Chahoud ,&nbsp;Azzam Mourad ,&nbsp;Hadi Otrok ,&nbsp;Jamal Bentahar ,&nbsp;Mohsen Guizani","doi":"10.1016/j.ipm.2024.103991","DOIUrl":"10.1016/j.ipm.2024.103991","url":null,"abstract":"<div><div>Containerization technology plays a crucial role in Federated Learning (FL) setups, expanding the pool of potential clients and ensuring the availability of specific subsets for each learning iteration. However, doubts arise about the trustworthiness of devices deployed as clients in FL scenarios, especially when container deployment processes are involved. Addressing these challenges is important, particularly in managing potentially malicious clients capable of disrupting the learning process or compromising the entire model. In our research, we are motivated to integrate a trust element into the client selection and model deployment processes within our system architecture. This is a feature lacking in the initial client selection and deployment mechanism of the On-Demand architecture. We introduce a trust mechanism, named “Trusted-On-Demand-FL”, which establishes a relationship of trust between the server and the pool of eligible clients. Utilizing Docker in our deployment strategy enables us to monitor and validate participant actions effectively, ensuring strict adherence to agreed-upon protocols while strengthening defenses against unauthorized data access or tampering. Our simulations rely on continuous user behavior datasets, deploying an optimization model powered by a genetic algorithm to efficiently select clients for participation. By assigning trust values to individual clients and dynamically adjusting these values, combined with penalizing malicious clients through decreased trust scores, our proposed framework identifies and isolates harmful clients. This approach not only reduces disruptions to regular rounds but also minimizes instances of round dismissal, Consequently enhancing both system stability and security.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 2","pages":"Article 103991"},"PeriodicalIF":7.4,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142747459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integration of public libraries and cultural tourism in China: An analysis of library attractiveness components based on tourist review mining 中国公共图书馆与文化旅游的融合:基于游客评论挖掘的图书馆吸引力要素分析
IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-28 DOI: 10.1016/j.ipm.2024.104000
Tingting Jiang , Yanrun Xu , Yao Li , Yikun Xia
The integration of public libraries and tourism represents an emerging trend in China, aiming to foster sustainable development of libraries. However, there still lacks an accurate understanding of the attractiveness and performance of library tourism. Targeting all the national first-tier public libraries in China, this study collected a total of 70,301 online reviews provided by library tourists from popular online travel platforms. Text mining on 41,255 valid reviews, based on topic modeling and sentiment analysis, revealed seven primary components of library tourism attractiveness. Chinese public libraries demonstrated a satisfactory overall performance as tourist attractions (a¯= 0.732609), though variations were observed across different components: performance was excellent (a¯i> 0.8) in environment & atmosphere, architectural & interior design, and location & transportation, adequate (0.6 < a¯i<0.8) in online popularity, library collections, and cultural events, and just acceptable (a¯i> 0.5) in personnel & services. A thematic analysis on 883 negative opinions extracted from the reviews further identified 22 major challenges negatively impacting the performance of each component. Additionally, an asymmetric impact-performance analysis recognized architectural & interior design as the basic component and location & transportation as the linear component, suggesting that visually aesthetic and conveniently located public libraries hold the highest potential for tourism. This study establishes a mixed-methods analytical framework and provides empirical evidence about the success of library tourism in China. In addition, it offers valuable insights for the global development of this burgeoning tourism trend.
在中国,公共图书馆与旅游业的融合是一种新兴趋势,旨在促进图书馆的可持续发展。然而,人们对图书馆旅游的吸引力和表现仍缺乏准确的认识。本研究以全国所有一级公共图书馆为对象,从热门的在线旅游平台上收集了图书馆游客提供的共计 70,301 条在线评论。基于主题建模和情感分析,对 41,255 条有效评论进行了文本挖掘,揭示了图书馆旅游吸引力的七个主要组成部分。中国公共图书馆作为旅游景点的总体表现令人满意(a¯= 0.732609),但不同要素之间存在差异:在环境和氛围、建筑和室内设计、位置和交通方面表现优异(a¯i> 0.8),在网络人气、馆藏和文化活动方面表现适当(0.6 <a¯i<0.8),在人员和服务方面表现尚可(a¯i> 0.5)。对从评论中提取的 883 条负面意见进行的专题分析进一步确定了对每个组成部分的绩效产生负面影响的 22 个主要挑战。此外,非对称影响绩效分析认为,建筑和室内设计是基本要素,位置和交通是线性要素,这表明视觉美观、交通便利的公共图书馆具有最大的旅游潜力。本研究建立了一个混合方法分析框架,为中国图书馆旅游的成功提供了实证证据。此外,它还为这一新兴旅游趋势在全球的发展提供了有价值的见解。
{"title":"Integration of public libraries and cultural tourism in China: An analysis of library attractiveness components based on tourist review mining","authors":"Tingting Jiang ,&nbsp;Yanrun Xu ,&nbsp;Yao Li ,&nbsp;Yikun Xia","doi":"10.1016/j.ipm.2024.104000","DOIUrl":"10.1016/j.ipm.2024.104000","url":null,"abstract":"<div><div>The integration of public libraries and tourism represents an emerging trend in China, aiming to foster sustainable development of libraries. However, there still lacks an accurate understanding of the attractiveness and performance of library tourism. Targeting all the national first-tier public libraries in China, this study collected a total of 70,301 online reviews provided by library tourists from popular online travel platforms. Text mining on 41,255 valid reviews, based on topic modeling and sentiment analysis, revealed seven primary components of library tourism attractiveness. Chinese public libraries demonstrated a satisfactory overall performance as tourist attractions (<span><math><mover><mi>a</mi><mo>¯</mo></mover></math></span>= 0.732609), though variations were observed across different components: performance was excellent (<span><math><mrow><msub><mover><mi>a</mi><mo>¯</mo></mover><mi>i</mi></msub><mspace></mspace></mrow></math></span>&gt; 0.8) in <em>environment &amp; atmosphere, architectural &amp; interior design</em>, and <em>location &amp; transportation</em>, adequate (0.6 &lt; <span><math><mrow><msub><mover><mi>a</mi><mo>¯</mo></mover><mi>i</mi></msub><mspace></mspace></mrow></math></span>&lt;0.8) in <em>online popularity, library collections</em>, and <em>cultural events</em>, and just acceptable (<span><math><mrow><msub><mover><mi>a</mi><mo>¯</mo></mover><mi>i</mi></msub><mspace></mspace></mrow></math></span>&gt; 0.5) in <em>personnel &amp; services</em>. A thematic analysis on 883 negative opinions extracted from the reviews further identified 22 major challenges negatively impacting the performance of each component. Additionally, an asymmetric impact-performance analysis recognized <em>architectural &amp; interior design</em> as the basic component and <em>location &amp; transportation</em> as the linear component, suggesting that visually aesthetic and conveniently located public libraries hold the highest potential for tourism. This study establishes a mixed-methods analytical framework and provides empirical evidence about the success of library tourism in China. In addition, it offers valuable insights for the global development of this burgeoning tourism trend.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 2","pages":"Article 104000"},"PeriodicalIF":7.4,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142721385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bias-guided margin loss for robust Visual Question Answering 用于稳健视觉问题解答的偏差指导边际损失
IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-27 DOI: 10.1016/j.ipm.2024.103988
Yanhan Sun , Jiangtao Qi , Zhenfang Zhu , Kefeng Li , Liang Zhao , Lei Lv
Visual Question Answering (VQA) suffers from language prior issue, where models tend to rely on dataset biases to answer the questions while ignoring the image information. Existing studies have been devoted to mitigating language bias by using extra question-only models or balancing the dataset. However, these works fail to comprehensively identify the bias, despite the fact that some methods utilizing margin loss to separate the biased answer embeddings. In this paper, we propose a bias-guided debiasing architecture with margin loss named as BGML, which utilizes a bias model to guide the margin loss for explicitly locating biases of different question types in the answer space. This distinction of bias prompts the model to avoid the adverse effects of language priors. Additionally, we encourage the bias model to comprehensively learn biases by integrating the adversarial training, knowledge distillation, and contrastive learning. The experimental results show that BGML achieved the state-of-the-art results with 62.28% on VQA-CP v2, while retaining competitive results with 60.84% on VQA v2.
视觉问题解答(VQA)存在语言先验问题,即模型倾向于依赖数据集偏差来回答问题,而忽略图像信息。现有研究致力于通过使用额外的纯问题模型或平衡数据集来减轻语言偏差。然而,尽管有些方法利用边际损失来分离有偏差的答案嵌入,但这些工作未能全面识别偏差。在本文中,我们提出了一种具有边际损失的偏差引导去除法架构,并将其命名为 BGML,该架构利用偏差模型来引导边际损失,以明确定位答案空间中不同问题类型的偏差。这种对偏差的区分促使模型避免语言先验的不利影响。此外,我们还鼓励偏误模型通过整合对抗训练、知识提炼和对比学习来全面学习偏误。实验结果表明,BGML 在 VQA-CP v2 上取得了 62.28% 的先进结果,同时在 VQA v2 上保持了 60.84% 的竞争结果。
{"title":"Bias-guided margin loss for robust Visual Question Answering","authors":"Yanhan Sun ,&nbsp;Jiangtao Qi ,&nbsp;Zhenfang Zhu ,&nbsp;Kefeng Li ,&nbsp;Liang Zhao ,&nbsp;Lei Lv","doi":"10.1016/j.ipm.2024.103988","DOIUrl":"10.1016/j.ipm.2024.103988","url":null,"abstract":"<div><div>Visual Question Answering (VQA) suffers from language prior issue, where models tend to rely on dataset biases to answer the questions while ignoring the image information. Existing studies have been devoted to mitigating language bias by using extra question-only models or balancing the dataset. However, these works fail to comprehensively identify the bias, despite the fact that some methods utilizing margin loss to separate the biased answer embeddings. In this paper, we propose a bias-guided debiasing architecture with margin loss named as BGML, which utilizes a bias model to guide the margin loss for explicitly locating biases of different question types in the answer space. This distinction of bias prompts the model to avoid the adverse effects of language priors. Additionally, we encourage the bias model to comprehensively learn biases by integrating the adversarial training, knowledge distillation, and contrastive learning. The experimental results show that BGML achieved the state-of-the-art results with 62.28% on VQA-CP v2, while retaining competitive results with 60.84% on VQA v2.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 2","pages":"Article 103988"},"PeriodicalIF":7.4,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142721384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Processing & Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1