首页 > 最新文献

Information Processing & Management最新文献

英文 中文
Model-aware privacy-preserving with start trigger method for person re-identification 用于人员重新识别的模型感知隐私保护与启动触发方法
IF 7.4 1区 管理学 Q1 Engineering Pub Date : 2024-06-22 DOI: 10.1016/j.ipm.2024.103819
Tongzhen Si , Penglei Li , Xiaohui Yang , Linkun Fan , Fazhi He

Person Re-identification (ReID) could search for the same pedestrian from non-overlapping cameras, which completes the pedestrian location and search purpose. However, the process contains much sensitive pedestrian information and raises serious privacy problems. Conventional methods mainly remove identity-related features from pedestrian images to alleviate the privacy issue. Unfortunately, these strategies cause pedestrian information loss and poor data utility. In the paper, we propose a novel Model-Aware Privacy-Preserving with Start Trigger (MPST) method, which not only prevents personal identity for third parties but also achieves accurate pedestrian location for authorized organizations. The core idea is that authorized organizations obtain the start trigger to activate the ReID model that has the ability to search for target pedestrians, while third parties (i.e., hackers) cannot employ the ReID model to complete the pedestrian matching task without the start trigger. To this end, we develop a universal adversarial algorithm to learn an ingenious start trigger for the person ReID system. Afterwards, we further design a model-aware training strategy to facilitate our deep model to perceive issued instructions by synthetically utilizing the start trigger and original pedestrian samples. As a result, we successfully install an activate button to change the ReID model state for deciding whether the deep model has the ability to search for pedestrians or not. Abundant experiments demonstrate that the proposed MPST is effective for pedestrian identity anonymization. Our study achieves superior performance for authorized organizations and completes the privacy protection goal.

人员重新识别(ReID)可以从不相干的摄像头中搜索到相同的行人,从而达到行人定位和搜索的目的。然而,这一过程包含大量敏感的行人信息,会引发严重的隐私问题。传统方法主要是去除行人图像中与身份相关的特征,以缓解隐私问题。遗憾的是,这些策略会导致行人信息丢失,数据实用性差。本文提出了一种新颖的 "模型感知隐私保护与起始触发(MPST)"方法,不仅能防止第三方获取个人身份信息,还能为授权机构实现准确的行人定位。其核心思想是,授权机构获得启动触发器,激活具有搜索目标行人能力的 ReID 模型,而第三方(即黑客)在没有启动触发器的情况下无法利用 ReID 模型完成行人匹配任务。为此,我们开发了一种通用对抗算法,为人的 ReID 系统学习巧妙的启动触发器。之后,我们进一步设计了一种模型感知训练策略,通过合成利用起始触发器和原始行人样本,促进我们的深度模型感知发出的指令。结果,我们成功地安装了一个激活按钮来改变 ReID 模型的状态,以决定深度模型是否有能力搜索行人。大量实验证明,所提出的 MPST 在行人身份匿名化方面非常有效。我们的研究为授权机构实现了卓越的性能,完成了隐私保护的目标。
{"title":"Model-aware privacy-preserving with start trigger method for person re-identification","authors":"Tongzhen Si ,&nbsp;Penglei Li ,&nbsp;Xiaohui Yang ,&nbsp;Linkun Fan ,&nbsp;Fazhi He","doi":"10.1016/j.ipm.2024.103819","DOIUrl":"https://doi.org/10.1016/j.ipm.2024.103819","url":null,"abstract":"<div><p>Person Re-identification (ReID) could search for the same pedestrian from non-overlapping cameras, which completes the pedestrian location and search purpose. However, the process contains much sensitive pedestrian information and raises serious privacy problems. Conventional methods mainly remove identity-related features from pedestrian images to alleviate the privacy issue. Unfortunately, these strategies cause pedestrian information loss and poor data utility. In the paper, we propose a novel Model-Aware Privacy-Preserving with Start Trigger (MPST) method, which not only prevents personal identity for third parties but also achieves accurate pedestrian location for authorized organizations. The core idea is that authorized organizations obtain the start trigger to activate the ReID model that has the ability to search for target pedestrians, while third parties (i.e., hackers) cannot employ the ReID model to complete the pedestrian matching task without the start trigger. To this end, we develop a universal adversarial algorithm to learn an ingenious start trigger for the person ReID system. Afterwards, we further design a model-aware training strategy to facilitate our deep model to perceive issued instructions by synthetically utilizing the start trigger and original pedestrian samples. As a result, we successfully install an activate button to change the ReID model state for deciding whether the deep model has the ability to search for pedestrians or not. Abundant experiments demonstrate that the proposed MPST is effective for pedestrian identity anonymization. Our study achieves superior performance for authorized organizations and completes the privacy protection goal.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141444648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Candidate-Heuristic In-Context Learning: A new framework for enhancing medical visual question answering with LLMs 候选探究式上下文学习(Candidate-Heuristic In-Context Learning):利用 LLM 增强医学视觉问题解答的新框架
IF 7.4 1区 管理学 Q1 Engineering Pub Date : 2024-06-21 DOI: 10.1016/j.ipm.2024.103805
Xiao Liang , Di Wang , Haodi Zhong , Quan Wang , Ronghan Li , Rui Jia , Bo Wan

Medical Visual Question Answering (MedVQA) is designed to answer natural language questions related to medical images. Existing methods largely adopting the cross-modal pre-training and fine-tuning paradigm, face limitations in accuracy due to data scarcity and insufficient incorporation of extensive medical knowledge. Drawing inspiration from the Knowledge-Based Visual Question Answering (KB-VQA) domain, which leverages Large Language Models (LLMs) and external knowledge bases, we introduce the Candidate-Heuristic In-Context Learning (CH-ICL) framework, a novel approach that leverages LLMs augmented with external knowledge to directly enhance existing MedVQA models. Specifically, we collect a pathology terminology dictionary from a public digital pathology library as an external knowledge base and use it to train a knowledge scope discriminator, which helps identify the knowledge scope required to answer a question. Then, we employ existing MedVQA models to provide reliable answer candidates along with their confidence scores. Finally, the knowledge scope and candidates, combined with retrieved in-context exemplars, are aggregated into prompts for heuristically guiding LLMs in answer generation. Experimental results on the PathVQA, VQA-RAD, and SLAKE public benchmarks show state-of-the-art performance, with improvements of 1.91%, 1.88%, and 2.17% respectively over the baseline. Code and dataset are available at https://github.com/ecoxial2007/CH-ICL.

医学视觉问题解答(MedVQA)旨在回答与医学图像相关的自然语言问题。现有的方法主要采用跨模态预训练和微调范式,但由于数据稀缺和未充分纳入广泛的医学知识,其准确性受到限制。基于知识的视觉问题解答(KB-VQA)领域利用了大型语言模型(LLM)和外部知识库,我们从这一领域中汲取灵感,引入了候选逻辑情境学习(CH-ICL)框架,这是一种利用外部知识增强的 LLM 直接增强现有 MedVQA 模型的新方法。具体来说,我们从公共数字病理学图书馆收集病理学术语字典作为外部知识库,并用它来训练知识范围判别器,帮助识别回答问题所需的知识范围。然后,我们利用现有的 MedVQA 模型提供可靠的候选答案及其置信度分数。最后,将知识范围和候选答案与检索到的上下文示例相结合,汇总成提示信息,启发式地指导 LLM 生成答案。在 PathVQA、VQA-RAD 和 SLAKE 公共基准上的实验结果显示了最先进的性能,与基准相比分别提高了 1.91%、1.88% 和 2.17%。代码和数据集见 https://github.com/ecoxial2007/CH-ICL。
{"title":"Candidate-Heuristic In-Context Learning: A new framework for enhancing medical visual question answering with LLMs","authors":"Xiao Liang ,&nbsp;Di Wang ,&nbsp;Haodi Zhong ,&nbsp;Quan Wang ,&nbsp;Ronghan Li ,&nbsp;Rui Jia ,&nbsp;Bo Wan","doi":"10.1016/j.ipm.2024.103805","DOIUrl":"https://doi.org/10.1016/j.ipm.2024.103805","url":null,"abstract":"<div><p>Medical Visual Question Answering (MedVQA) is designed to answer natural language questions related to medical images. Existing methods largely adopting the cross-modal pre-training and fine-tuning paradigm, face limitations in accuracy due to data scarcity and insufficient incorporation of extensive medical knowledge. Drawing inspiration from the Knowledge-Based Visual Question Answering (KB-VQA) domain, which leverages Large Language Models (LLMs) and external knowledge bases, we introduce the <strong>C</strong>andidate-<strong>H</strong>euristic <strong>I</strong>n-<strong>C</strong>ontext <strong>L</strong>earning (CH-ICL) framework, a novel approach that leverages LLMs augmented with external knowledge to directly enhance existing MedVQA models. Specifically, we collect a pathology terminology dictionary from a public digital pathology library as an external knowledge base and use it to train a knowledge scope discriminator, which helps identify the knowledge scope required to answer a question. Then, we employ existing MedVQA models to provide reliable answer candidates along with their confidence scores. Finally, the knowledge scope and candidates, combined with retrieved in-context exemplars, are aggregated into prompts for heuristically guiding LLMs in answer generation. Experimental results on the PathVQA, VQA-RAD, and SLAKE public benchmarks show state-of-the-art performance, with improvements of 1.91%, 1.88%, and 2.17% respectively over the baseline. Code and dataset are available at <span>https://github.com/ecoxial2007/CH-ICL</span><svg><path></path></svg>.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141437996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KGRED: Knowledge-graph-based rule discovery for weakly supervised data labeling KGRED:基于知识图谱的规则发现,用于弱监督数据标注
IF 8.6 1区 管理学 Q1 Engineering Pub Date : 2024-06-19 DOI: 10.1016/j.ipm.2024.103816
Wenjun Hou , Liang Hong , Ziyi Zhu

In weakly supervised learning, labeling rules can automatically label data to train models. However, due to insufficient prior knowledge, rule discovery often suffers from semantic drift. Since misclassified rules are generated from wrongly matched sentences, the sentences matched by rules shift from the target labels to other labels. It is worth noting that rules do not exist in isolation. The multi-dimensional semantic associations among rules can impose semantic constraints for rule generation, as well as enrich the semantic information of rules for rule matching. Therefore, we propose a Knowledge-Graph-based RulE Discovery method (KGRED), which can leverage the multi-dimensional semantic associations among rules to alleviate semantic drift in rule discovery. Specifically, to decrease misclassified rules, we design a label-aware rule generation approach to attentively propagate prior knowledge from seed rules to candidate rules based on rule KG. To reduce wrongly-matched sentences, we present a cross-attention-based semantic matching mechanism to refine the semantic information of sentences while enriching that of rules. Moreover, we propose an inconsistency-directed active learning strategy to verify rules that perform inconsistently in rule generation and matching. Experiments on two public datasets prove that KGRED can achieve at least 5.1 % gain in F1 score compared to state-of-the-art methods.

在弱监督学习中,标注规则可以自动标注数据以训练模型。然而,由于先验知识不足,规则发现往往会出现语义漂移。由于错误分类的规则是由错误匹配的句子生成的,因此规则匹配的句子会从目标标签转向其他标签。值得注意的是,规则并不是孤立存在的。规则之间的多维语义关联可以为规则生成提供语义约束,也可以为规则匹配丰富规则的语义信息。因此,我们提出了一种基于知识图谱的规则发现方法(KGRED),它可以利用规则之间的多维语义关联来缓解规则发现过程中的语义漂移。具体来说,为了减少错误分类规则,我们设计了一种标签感知规则生成方法,根据规则 KG,将先验知识从种子规则传播到候选规则。为了减少错误匹配的句子,我们提出了一种基于交叉关注的语义匹配机制,以完善句子的语义信息,同时丰富规则的语义信息。此外,我们还提出了一种不一致导向的主动学习策略,以验证在规则生成和匹配过程中表现不一致的规则。在两个公开数据集上的实验证明,与最先进的方法相比,KGRED 的 F1 分数至少提高了 5.1%。
{"title":"KGRED: Knowledge-graph-based rule discovery for weakly supervised data labeling","authors":"Wenjun Hou ,&nbsp;Liang Hong ,&nbsp;Ziyi Zhu","doi":"10.1016/j.ipm.2024.103816","DOIUrl":"https://doi.org/10.1016/j.ipm.2024.103816","url":null,"abstract":"<div><p>In weakly supervised learning, labeling rules can automatically label data to train models. However, due to insufficient prior knowledge, rule discovery often suffers from semantic drift. Since misclassified rules are generated from wrongly matched sentences, the sentences matched by rules shift from the target labels to other labels. It is worth noting that rules do not exist in isolation. The multi-dimensional semantic associations among rules can impose semantic constraints for rule generation, as well as enrich the semantic information of rules for rule matching. Therefore, we propose a Knowledge-Graph-based RulE Discovery method (KGRED), which can leverage the multi-dimensional semantic associations among rules to alleviate semantic drift in rule discovery. Specifically, to decrease misclassified rules, we design a label-aware rule generation approach to attentively propagate prior knowledge from seed rules to candidate rules based on rule KG. To reduce wrongly-matched sentences, we present a cross-attention-based semantic matching mechanism to refine the semantic information of sentences while enriching that of rules. Moreover, we propose an inconsistency-directed active learning strategy to verify rules that perform inconsistently in rule generation and matching. Experiments on two public datasets prove that KGRED can achieve at least 5.1 % gain in F1 score compared to state-of-the-art methods.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":8.6,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141429453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Are LLMs good at structured outputs? A benchmark for evaluating structured output capabilities in LLMs 法学硕士擅长结构化产出吗?评估法律硕士结构化产出能力的基准
IF 8.6 1区 管理学 Q1 Engineering Pub Date : 2024-06-18 DOI: 10.1016/j.ipm.2024.103809
Yu Liu , Duantengchuan Li , Kaili Wang , Zhuoran Xiong , Fobo Shi , Jian Wang , Bing Li , Bo Hang

Existing benchmarks for Large Language Models (LLMs) mostly focus on general or specific domain capabilities, overlooking structured output capabilities. We introduce SoEval, a benchmark for assessing LLMs’ ability to generate structured outputs like JSON, XML, and lists. SoEval contains 3.7K entries in Chinese and English, covering 13 types of structured output tasks across 20 subjects. In experiments, we found that while current mainstream LLMs have deficiencies in structured output, GPT-4 outperforms them in this aspect. GPT-4 achieved an average score of 0.4 on SoEval, representing a 24% enhancement over the next best-performing model. At the same time, the performance of current mainstream models on English tasks is also better than on Chinese tasks. We also report the performance of mainstream large models on different structured output types and task subjects. The benchmark construction code and SoEval dataset are open-sourced at https://github.com/MoranCoder95/SoEval.

现有的大型语言模型(LLM)基准大多侧重于一般或特定领域的能力,而忽略了结构化输出能力。我们介绍 SoEval,这是一个评估 LLM 生成 JSON、XML 和列表等结构化输出能力的基准。SoEval 包含 3.7K 个中英文条目,涵盖 20 个科目的 13 种结构化输出任务。在实验中,我们发现目前主流的 LLM 在结构化输出方面存在不足,而 GPT-4 在这方面的表现优于它们。GPT-4 在 SoEval 上的平均得分达到了 0.4,比表现次佳的模型提高了 24%。同时,当前主流模型在英文任务上的表现也优于中文任务。我们还报告了主流大型模型在不同结构化输出类型和任务主题上的表现。基准构建代码和 SoEval 数据集开源于 https://github.com/MoranCoder95/SoEval。
{"title":"Are LLMs good at structured outputs? A benchmark for evaluating structured output capabilities in LLMs","authors":"Yu Liu ,&nbsp;Duantengchuan Li ,&nbsp;Kaili Wang ,&nbsp;Zhuoran Xiong ,&nbsp;Fobo Shi ,&nbsp;Jian Wang ,&nbsp;Bing Li ,&nbsp;Bo Hang","doi":"10.1016/j.ipm.2024.103809","DOIUrl":"https://doi.org/10.1016/j.ipm.2024.103809","url":null,"abstract":"<div><p>Existing benchmarks for Large Language Models (LLMs) mostly focus on general or specific domain capabilities, overlooking structured output capabilities. We introduce SoEval, a benchmark for assessing LLMs’ ability to generate structured outputs like JSON, XML, and lists. SoEval contains 3.7K entries in Chinese and English, covering 13 types of structured output tasks across 20 subjects. In experiments, we found that while current mainstream LLMs have deficiencies in structured output, GPT-4 outperforms them in this aspect. GPT-4 achieved an average score of 0.4 on SoEval, representing a 24% enhancement over the next best-performing model. At the same time, the performance of current mainstream models on English tasks is also better than on Chinese tasks. We also report the performance of mainstream large models on different structured output types and task subjects. The benchmark construction code and SoEval dataset are open-sourced at <span>https://github.com/MoranCoder95/SoEval</span><svg><path></path></svg>.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":8.6,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141422773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SUMEX: A hybrid framework for Semantic textUal siMilarity and EXplanation generation SUMEX:用于生成语义文本相似性和 EXplanation 的混合框架
IF 8.6 1区 管理学 Q1 Engineering Pub Date : 2024-06-18 DOI: 10.1016/j.ipm.2024.103771
Sumaira Saeed, Quratulain Rajput, Sajjad Haider

Measuring semantic similarity between two pieces of text is a widely known problem in Natural language processing(NLP). It has many applications, such as finding similar medical notes of patients to accelerate the diagnosis process, plagiarism detection, and document clustering. Most state-of-the-art models are based on machine/deep learning and lack sufficient explanations for their results, limiting their adoption in critical domains like healthcare. This paper presents a hybrid framework SUMEX (Semantic textUal siMilarity and EXplanation generation) that uniquely combines ontology with a state-of-the-art embedding-based model for semantic textual similarity. The primary strength of the framework is that it explains its results in human-understandable natural language, which is vital in critical domains such as healthcare. Experiments have been conducted on two datasets of clinical notes using four embeddings: ScispaCy, BioWord2Vec, ClinicalBERT, and a customized Word2Vec trained on clinical notes. The SUMEX framework outperforms the embedding-based model on the benchmark datasets of ClinicalSTS by improving average precision scores by 7 % and reducing the false-positives-rate by 23 %. On the Patients Similarity Dataset, the average top-five and top-three precision scores were improved by 14% and 10%, respectively, using SUMEX. The SUMEX also generates explanations for its results in natural language. The domain experts evaluated the quality of the explanations. The results show that the generated explanations are of significantly good quality, with a score of 90 % and 93 % for measures of Completeness and Correctness, respectively. In addition, ChatGPT was also used for similarity score and generating explanations. The experiments show that the SUMEX framework performed better than the ChatGPT.

测量两篇文本之间的语义相似性是自然语言处理(NLP)中一个广为人知的问题。它有很多应用,如寻找病人的相似医疗笔记以加速诊断过程、剽窃检测和文档聚类。大多数最先进的模型都是基于机器/深度学习的,缺乏对其结果的充分解释,限制了它们在医疗保健等关键领域的应用。本文提出了一个混合框架 SUMEX(语义文本相似性和 EXplanation 生成),它独特地将本体与最先进的基于嵌入的语义文本相似性模型相结合。该框架的主要优势在于它能用人类可理解的自然语言解释其结果,这在医疗保健等关键领域至关重要。在两个临床笔记数据集上使用四种嵌入进行了实验:ScispaCy、BioWord2Vec、ClinicalBERT 以及根据临床笔记训练的定制 Word2Vec。在 ClinicalSTS 基准数据集上,SUMEX 框架的表现优于基于嵌入的模型,平均精确度提高了 7%,误判率降低了 23%。在患者相似性数据集上,使用 SUMEX,前五名和前三名的平均精确度分别提高了 14% 和 10%。SUMEX 还能用自然语言生成结果解释。领域专家对解释的质量进行了评估。结果表明,生成的解释质量很高,在完整性和正确性方面的得分分别为 90% 和 93%。此外,ChatGPT 也用于相似性评分和生成解释。实验结果表明,SUMEX 框架的性能优于 ChatGPT。
{"title":"SUMEX: A hybrid framework for Semantic textUal siMilarity and EXplanation generation","authors":"Sumaira Saeed,&nbsp;Quratulain Rajput,&nbsp;Sajjad Haider","doi":"10.1016/j.ipm.2024.103771","DOIUrl":"https://doi.org/10.1016/j.ipm.2024.103771","url":null,"abstract":"<div><p>Measuring semantic similarity between two pieces of text is a widely known problem in Natural language processing(NLP). It has many applications, such as finding similar medical notes of patients to accelerate the diagnosis process, plagiarism detection, and document clustering. Most state-of-the-art models are based on machine/deep learning and lack sufficient explanations for their results, limiting their adoption in critical domains like healthcare. This paper presents a hybrid framework SUMEX (Semantic textUal siMilarity and EXplanation generation) that uniquely combines ontology with a state-of-the-art embedding-based model for semantic textual similarity. The primary strength of the framework is that it explains its results in human-understandable natural language, which is vital in critical domains such as healthcare. Experiments have been conducted on two datasets of clinical notes using four embeddings: ScispaCy, BioWord2Vec, ClinicalBERT, and a customized Word2Vec trained on clinical notes. The SUMEX framework outperforms the embedding-based model on the benchmark datasets of ClinicalSTS by improving average precision scores by 7 % and reducing the false-positives-rate by 23 %. On the Patients Similarity Dataset, the average top-five and top-three precision scores were improved by 14% and 10%, respectively, using SUMEX. The SUMEX also generates explanations for its results in natural language. The domain experts evaluated the quality of the explanations. The results show that the generated explanations are of significantly good quality, with a score of 90 % and 93 % for measures of Completeness and Correctness, respectively. In addition, ChatGPT was also used for similarity score and generating explanations. The experiments show that the SUMEX framework performed better than the ChatGPT.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":8.6,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141422774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating the causal effects of affiliation diversity on the disruption of papers in Artificial Intelligence 调查隶属关系多样性对人工智能论文干扰的因果效应
IF 8.6 1区 管理学 Q1 Engineering Pub Date : 2024-06-17 DOI: 10.1016/j.ipm.2024.103806
Xuli Tang , Xin Li , Ming Yi

Growing multiple-affiliation collaboration in Artificial Intelligence (AI) can help solve complex integrated problems, but will it trigger the disruption in AI? Scholars have discussed the related topics in other fields. However, these studies did not specifically target the field of AI and primarily relied on correlation methods, which may not lead to a causal conclusion. Analyzing around 0.6 million AI collaborative papers between 1950 and 2019 with 872,727 authors and 9,258 affiliations, this study tests the causal effect of multiple-affiliation collaboration on the disruption in AI by using descriptive analysis and a causal inference method, i.e., the Propensity Score Matching (PSM). We propose an improved affiliation diversity indicator to measure the distribution of affiliation differences in multiple-affiliation collaboration by taking disparity into account. Our results show that affiliation diversity exerts a negative causal effect on the disruption of papers in AI: (a) The average level of AI papers with diverse affiliation types or affiliation countries of authors is less disruptive than those with a single type or country. (b) Affiliation diversity will causally reduce the disruption of papers in AI by 2.006%∼5.891%. That indicates that AI papers with high affiliation diversity are significantly less disruptive, ranging from 2.006% to 5.891%, compared to those without. We cross-validate the findings by using five comparison experiments and five other matching methods. This study provides a comprehensive understanding of multiple-affiliation collaboration on AI disruption.

人工智能(AI)领域日益增长的多方合作有助于解决复杂的综合问题,但这会引发人工智能领域的颠覆吗?学者们曾在其他领域讨论过相关话题。然而,这些研究并没有专门针对人工智能领域,而且主要依赖相关方法,未必能得出因果结论。本研究分析了1950年至2019年间约60万篇人工智能合作论文,涉及872727位作者和9258个从属关系,通过描述性分析和因果推断方法,即倾向得分匹配(PSM),检验了多从属关系合作对人工智能领域混乱的因果效应。我们提出了一个改进的隶属关系多样性指标,通过考虑差异来衡量多隶属关系合作中的隶属关系差异分布。我们的研究结果表明,从属关系多样性对人工智能论文的破坏性具有负向因果效应:(a) 作者从属关系类型或从属关系国家多样化的人工智能论文的平均水平比从属关系类型或从属关系国家单一的论文的破坏性要低。(b) 从属关系多样性会使人工智能论文的干扰性因果关系降低 2.006%∼5.891%。这表明,与没有关联多样性的人工智能论文相比,关联多样性高的人工智能论文的干扰性明显降低,降幅从 2.006% 到 5.891%不等。我们通过五个对比实验和其他五种匹配方法对研究结果进行了交叉验证。这项研究提供了对人工智能破坏性的多重关联合作的全面理解。
{"title":"Investigating the causal effects of affiliation diversity on the disruption of papers in Artificial Intelligence","authors":"Xuli Tang ,&nbsp;Xin Li ,&nbsp;Ming Yi","doi":"10.1016/j.ipm.2024.103806","DOIUrl":"https://doi.org/10.1016/j.ipm.2024.103806","url":null,"abstract":"<div><p>Growing multiple-affiliation collaboration in Artificial Intelligence (AI) can help solve complex integrated problems, but will it trigger the disruption in AI? Scholars have discussed the related topics in other fields. However, these studies did not specifically target the field of AI and primarily relied on correlation methods, which may not lead to a causal conclusion. Analyzing around 0.6 million AI collaborative papers between 1950 and 2019 with 872,727 authors and 9,258 affiliations, this study tests the causal effect of multiple-affiliation collaboration on the disruption in AI by using descriptive analysis and a causal inference method, i.e., the Propensity Score Matching (PSM). We propose an improved affiliation diversity indicator to measure the distribution of affiliation differences in multiple-affiliation collaboration by taking disparity into account. Our results show that affiliation diversity exerts a negative causal effect on the disruption of papers in AI: (a) The average level of AI papers with diverse affiliation types or affiliation countries of authors is less disruptive than those with a single type or country. (b) Affiliation diversity will causally reduce the disruption of papers in AI by 2.006%∼5.891%. That indicates that AI papers with high affiliation diversity are significantly less disruptive, ranging from 2.006% to 5.891%, compared to those without. We cross-validate the findings by using five comparison experiments and five other matching methods. This study provides a comprehensive understanding of multiple-affiliation collaboration on AI disruption.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":8.6,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141422772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Entity-centric multi-domain transformer for improving generalization in fake news detection 以实体为中心的多域转换器,用于提高假新闻检测的概括能力
IF 8.6 1区 管理学 Q1 Engineering Pub Date : 2024-06-14 DOI: 10.1016/j.ipm.2024.103807
Parisa Bazmi , Masoud Asadpour , Azadeh Shakery , Abbas Maazallahi

Fake news has become a significant concern in recent times, particularly during the COVID-19 pandemic, as spreading false information can pose significant public health risks. Although many models have been suggested to detect fake news, they are often limited in their ability to extend to new emerging domains since they are designed for a single domain. Previous studies on multidomain fake news detection have focused on developing models that can perform well on multiple domains, but they often lack the ability to generalize to new unseen domains, which limits their effectiveness. To overcome this limitation, in this paper, we propose the Entity-centric Multi-domain Transformer (EMT) model. EMT uses entities in the news as key components in learning domain-invariant and domain-specific news representations, which addresses the challenges of domain shift and incomplete domain labeling in multidomain fake news detection. It incorporates entity background information from external knowledge sources to enhance fine-grained news domain representation. EMT consists of a Domain-Invariant (DI) encoder, a Domain-Specific (DS) encoder, and a Cross-Domain Transformer (CT) that facilitates investigation of domain relationships and knowledge interaction with input news, enabling effective generalization. We evaluate the EMT's performance in multi-domain fake news detection across three settings: supervised multi-domain, zero-shot setting on new unseen domain, and limited samples from new domain. EMT demonstrates greater stability than state-of-the-art models when dealing with domain changes and varying training data. Specifically, in the zero-shot setting on new unseen domains, EMT achieves a good F1 score of approximately 72 %. The results highlight the effectiveness of EMT's entity-centric approach and its potential for real-world applications, as it demonstrates the ability to adapt to various training settings and outperform existing models in handling limited label data and adapting to previously unseen domains.

近来,特别是在 COVID-19 大流行期间,虚假新闻已成为人们关注的一个重要问题,因为传播虚假信息会给公共健康带来重大风险。虽然已经提出了许多检测假新闻的模型,但由于这些模型是针对单一领域设计的,因此它们扩展到新兴领域的能力往往受到限制。以往关于多领域假新闻检测的研究主要集中在开发能在多个领域表现良好的模型上,但这些模型往往缺乏向新的未见领域推广的能力,从而限制了其有效性。为了克服这一局限,我们在本文中提出了以实体为中心的多域转换器(EMT)模型。EMT 将新闻中的实体作为学习领域不变和特定领域新闻表征的关键组件,从而解决了多领域虚假新闻检测中领域转移和领域标记不完整的难题。它结合了来自外部知识源的实体背景信息,以增强细粒度的新闻领域表征。EMT 由领域不变(DI)编码器、特定领域(DS)编码器和跨领域转换器(CT)组成,有助于研究领域关系以及与输入新闻之间的知识交互,从而实现有效的泛化。我们对 EMT 在多领域假新闻检测中的性能进行了评估,包括三种情况:有监督的多领域检测、在未见过的新领域进行零检测以及来自新领域的有限样本检测。与最先进的模型相比,EMT 在处理领域变化和不同训练数据时表现出更高的稳定性。具体来说,在新的未见领域的 "0-shot "设置中,EMT 取得了约 72% 的良好 F1 分数。这些结果凸显了 EMT 以实体为中心的方法的有效性及其在实际应用中的潜力,因为它展示了适应各种训练设置的能力,并在处理有限的标签数据和适应以前未见过的领域方面优于现有模型。
{"title":"Entity-centric multi-domain transformer for improving generalization in fake news detection","authors":"Parisa Bazmi ,&nbsp;Masoud Asadpour ,&nbsp;Azadeh Shakery ,&nbsp;Abbas Maazallahi","doi":"10.1016/j.ipm.2024.103807","DOIUrl":"https://doi.org/10.1016/j.ipm.2024.103807","url":null,"abstract":"<div><p>Fake news has become a significant concern in recent times, particularly during the COVID-19 pandemic, as spreading false information can pose significant public health risks. Although many models have been suggested to detect fake news, they are often limited in their ability to extend to new emerging domains since they are designed for a single domain. Previous studies on multidomain fake news detection have focused on developing models that can perform well on multiple domains, but they often lack the ability to generalize to new unseen domains, which limits their effectiveness. To overcome this limitation, in this paper, we propose the Entity-centric Multi-domain Transformer (EMT) model. EMT uses entities in the news as key components in learning domain-invariant and domain-specific news representations, which addresses the challenges of domain shift and incomplete domain labeling in multidomain fake news detection. It incorporates entity background information from external knowledge sources to enhance fine-grained news domain representation. EMT consists of a Domain-Invariant (DI) encoder, a Domain-Specific (DS) encoder, and a Cross-Domain Transformer (CT) that facilitates investigation of domain relationships and knowledge interaction with input news, enabling effective generalization. We evaluate the EMT's performance in multi-domain fake news detection across three settings: supervised multi-domain, zero-shot setting on new unseen domain, and limited samples from new domain. EMT demonstrates greater stability than state-of-the-art models when dealing with domain changes and varying training data. Specifically, in the zero-shot setting on new unseen domains, EMT achieves a good F1 score of approximately 72 %. The results highlight the effectiveness of EMT's entity-centric approach and its potential for real-world applications, as it demonstrates the ability to adapt to various training settings and outperform existing models in handling limited label data and adapting to previously unseen domains.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":8.6,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141323940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TBC-MI : Suppressing noise labels by maximizing cleaning samples for robust image classification TBC-MI:通过最大化清洗样本来抑制噪声标签,从而实现稳健的图像分类
IF 8.6 1区 管理学 Q1 Engineering Pub Date : 2024-06-12 DOI: 10.1016/j.ipm.2024.103801
Yanhong Li, Zhiqing Guo, Liejun Wang, Lianghui Xu

In classification tasks with noisy labels, eliminating the interference of noisy label samples in the dataset is the key to improving network performance. However, the distribution between some noise and clean samples is overlapping, so it is a great challenge to distinguish them. Clean label samples within the overlapping region often contain highly representative feature information, which is extremely valuable for deep learning. We propose a new method called twin binary classification-mixed input (TBC-MI) to tackle this challenge. Specifically, TBC-MI utilizes the twin classification network to partition the sample and converts the original complex classification problem into a binary classification. It filters clean label samples from hard label regions using a simple multilayer binary classification network. TBC-MI uses noise from the dataset in the dividing process to better reflect real-world scenarios. After maximizing the clean label samples, TBC-MI adopts a hybrid online and offline input method to expand the subsequent input form of the samples. The proposed method is verified on CIFAR-10 and CIFAR-100 datasets containing artificially synthesized noise and Clothing1M ANIMAL-10N, CIFAR-10N, and CHAOYANG datasets with real-world noise. Extensive experiments show that our method achieves the best test accuracy on most datasets, with the best improvement of 2% compared to previous learning methods with noisy labels.

在有噪声标签的分类任务中,消除数据集中噪声标签样本的干扰是提高网络性能的关键。然而,一些噪声样本和干净样本之间的分布是重叠的,因此如何区分它们是一个巨大的挑战。重叠区域内的干净标签样本往往包含极具代表性的特征信息,这对深度学习来说极具价值。我们提出了一种名为孪生二元分类混合输入(TBC-MI)的新方法来应对这一挑战。具体来说,TBC-MI 利用孪生分类网络对样本进行分割,将原本复杂的分类问题转换为二元分类。它利用一个简单的多层二进制分类网络,从硬标签区域过滤干净的标签样本。TBC-MI 在划分过程中使用了数据集的噪声,以更好地反映真实世界的场景。在最大化干净标签样本后,TBC-MI 采用在线和离线混合输入法来扩展样本的后续输入形式。我们在包含人工合成噪声的 CIFAR-10 和 CIFAR-100 数据集以及包含真实世界噪声的 Clothing1M ANIMAL-10N、CIFAR-10N 和 CHAOYANG 数据集上验证了所提出的方法。广泛的实验表明,我们的方法在大多数数据集上都达到了最佳的测试准确率,与之前使用噪声标签的学习方法相比,最好的改进幅度为 2%。
{"title":"TBC-MI : Suppressing noise labels by maximizing cleaning samples for robust image classification","authors":"Yanhong Li,&nbsp;Zhiqing Guo,&nbsp;Liejun Wang,&nbsp;Lianghui Xu","doi":"10.1016/j.ipm.2024.103801","DOIUrl":"https://doi.org/10.1016/j.ipm.2024.103801","url":null,"abstract":"<div><p>In classification tasks with noisy labels, eliminating the interference of noisy label samples in the dataset is the key to improving network performance. However, the distribution between some noise and clean samples is overlapping, so it is a great challenge to distinguish them. Clean label samples within the overlapping region often contain highly representative feature information, which is extremely valuable for deep learning. We propose a new method called twin binary classification-mixed input (TBC-MI) to tackle this challenge. Specifically, TBC-MI utilizes the twin classification network to partition the sample and converts the original complex classification problem into a binary classification. It filters clean label samples from hard label regions using a simple multilayer binary classification network. TBC-MI uses noise from the dataset in the dividing process to better reflect real-world scenarios. After maximizing the clean label samples, TBC-MI adopts a hybrid online and offline input method to expand the subsequent input form of the samples. The proposed method is verified on CIFAR-10 and CIFAR-100 datasets containing artificially synthesized noise and Clothing1M ANIMAL-10N, CIFAR-10N, and CHAOYANG datasets with real-world noise. Extensive experiments show that our method achieves the best test accuracy on most datasets, with the best improvement of 2% compared to previous learning methods with noisy labels.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":8.6,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141313071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text-enhanced knowledge graph representation learning with local structure 具有局部结构的文本增强型知识图谱表示学习
IF 8.6 1区 管理学 Q1 Engineering Pub Date : 2024-06-11 DOI: 10.1016/j.ipm.2024.103797
Zhifei Li , Yue Jian , Zengcan Xue , Yumin Zheng , Miao Zhang , Yan Zhang , Xiaoju Hou , Xiaoguang Wang

Knowledge graph representation learning entails transforming entities and relationships within a knowledge graph into vectors to enhance downstream tasks. The rise of pre-trained language models has recently promoted text-based approaches for knowledge graph representation learning. However, these methods often need more structural information on knowledge graphs, prompting the challenge of integrating graph structure knowledge into text-based methodologies. To tackle this issue, we introduce a text-enhanced model with local structure (TEGS) that embeds local graph structure details from the knowledge graph into the text encoder. TEGS integrates k-hop neighbor entity information into the text encoder and employs a decoupled attention mechanism to blend relative position encoding and text semantics. This strategy augments learnable content through graph structure information and mitigates the impact of semantic ambiguity via the decoupled attention mechanism. Experimental findings demonstrate TEGS’s effectiveness at fusing graph structure information, resulting in state-of-the-art performance across three datasets in link prediction tasks. In terms of Hit@1, when compared to the previous text-based models, our model demonstrated improvements of 2.1% on WN18RR, 2.4% on FB15k-237, and 2.7% on the NELL-One dataset. Our code is made publicly available on https://github.com/HubuKG/TEGS.

知识图谱表征学习需要将知识图谱中的实体和关系转化为向量,以加强下游任务。最近,预训练语言模型的兴起推动了基于文本的知识图谱表示学习方法。然而,这些方法往往需要更多的知识图谱结构信息,这就给将图谱结构知识整合到基于文本的方法中带来了挑战。为了解决这个问题,我们引入了一种具有局部结构的文本增强模型(TEGS),它将知识图谱中的局部图结构细节嵌入到文本编码器中。TEGS 将 k 跳邻居实体信息整合到文本编码器中,并采用解耦注意力机制来融合相对位置编码和文本语义。这一策略通过图结构信息增加了可学习的内容,并通过解耦注意力机制减轻了语义模糊的影响。实验结果证明了 TEGS 在融合图结构信息方面的有效性,在链接预测任务的三个数据集中取得了最先进的性能。在Hit@1方面,与之前基于文本的模型相比,我们的模型在WN18RR上提高了2.1%,在FB15k-237上提高了2.4%,在NELL-One数据集上提高了2.7%。我们的代码可在 https://github.com/HubuKG/TEGS 上公开获取。
{"title":"Text-enhanced knowledge graph representation learning with local structure","authors":"Zhifei Li ,&nbsp;Yue Jian ,&nbsp;Zengcan Xue ,&nbsp;Yumin Zheng ,&nbsp;Miao Zhang ,&nbsp;Yan Zhang ,&nbsp;Xiaoju Hou ,&nbsp;Xiaoguang Wang","doi":"10.1016/j.ipm.2024.103797","DOIUrl":"https://doi.org/10.1016/j.ipm.2024.103797","url":null,"abstract":"<div><p>Knowledge graph representation learning entails transforming entities and relationships within a knowledge graph into vectors to enhance downstream tasks. The rise of pre-trained language models has recently promoted text-based approaches for knowledge graph representation learning. However, these methods often need more structural information on knowledge graphs, prompting the challenge of integrating graph structure knowledge into text-based methodologies. To tackle this issue, we introduce a text-enhanced model with local structure (TEGS) that embeds local graph structure details from the knowledge graph into the text encoder. TEGS integrates <em>k</em>-hop neighbor entity information into the text encoder and employs a decoupled attention mechanism to blend relative position encoding and text semantics. This strategy augments learnable content through graph structure information and mitigates the impact of semantic ambiguity via the decoupled attention mechanism. Experimental findings demonstrate TEGS’s effectiveness at fusing graph structure information, resulting in state-of-the-art performance across three datasets in link prediction tasks. In terms of Hit@1, when compared to the previous text-based models, our model demonstrated improvements of 2.1% on WN18RR, 2.4% on FB15k-237, and 2.7% on the NELL-One dataset. Our code is made publicly available on <span>https://github.com/HubuKG/TEGS</span><svg><path></path></svg>.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":8.6,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141313072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation-based optimizations for the semantic compression of RDF knowledge bases 基于估算的 RDF 知识库语义压缩优化技术
IF 8.6 1区 管理学 Q1 Engineering Pub Date : 2024-06-08 DOI: 10.1016/j.ipm.2024.103799
Ruoyu Wang , Raymond Wong , Daniel Sun

Structured knowledge bases are critical for the interpretability of AI techniques. RDF KBs, which are the dominant representation of structured knowledge, are expanding extremely fast to increase their knowledge coverage, enhancing the capability of knowledge reasoning while bringing heavy burdens to downstream applications. Recent studies employ semantic compression to detect and remove knowledge redundancies via semantic models and use the induced model for further applications, such as knowledge completion and error detection. However, semantic models that are sufficiently expressive for semantic compression cannot be efficiently induced, especially for large-scale KBs, due to the hardness of logic induction. In this article, we present estimation-based optimizations for the semantic compression of RDF KBs from the perspectives of input and intermediate data involved in the induction of first-order logic rules. The negative sampling technique selects a representative subset of all negative tuples with respect to the closed-world assumption, reducing the cost of evaluating the quality of a logic rule used for knowledge inference. The number of logic inference operations used during a compression procedure is reduced by a statistical estimation technique that prunes logic rules of low quality. The evaluation results show that the two techniques are feasible for the purpose of semantic compression and accelerate the compression algorithm by up to 47x compared to the state-of-the-art system.

结构化知识库对于人工智能技术的可解释性至关重要。RDF 知识库是结构化知识的主流表示形式,它正以极快的速度扩展以增加知识覆盖面,在增强知识推理能力的同时,也给下游应用带来了沉重负担。最近的研究采用了语义压缩的方法,通过语义模型检测和去除知识冗余,并将诱导出的模型用于进一步的应用,如知识补全和错误检测。然而,由于逻辑归纳的困难性,无法有效地诱导出具有足够表达力的语义模型来进行语义压缩,尤其是对于大规模知识库而言。在本文中,我们从一阶逻辑规则归纳所涉及的输入数据和中间数据的角度,提出了基于估计的 RDF 知识库语义压缩优化方案。负抽样技术根据封闭世界假设从所有负元组中选择一个有代表性的子集,从而降低了用于知识推理的逻辑规则的质量评估成本。在压缩过程中,使用统计估算技术对低质量的逻辑规则进行修剪,从而减少了逻辑推理操作的次数。评估结果表明,这两种技术在语义压缩方面是可行的,与最先进的系统相比,压缩算法的速度提高了 47 倍。
{"title":"Estimation-based optimizations for the semantic compression of RDF knowledge bases","authors":"Ruoyu Wang ,&nbsp;Raymond Wong ,&nbsp;Daniel Sun","doi":"10.1016/j.ipm.2024.103799","DOIUrl":"https://doi.org/10.1016/j.ipm.2024.103799","url":null,"abstract":"<div><p>Structured knowledge bases are critical for the interpretability of AI techniques. RDF KBs, which are the dominant representation of structured knowledge, are expanding extremely fast to increase their knowledge coverage, enhancing the capability of knowledge reasoning while bringing heavy burdens to downstream applications. Recent studies employ semantic compression to detect and remove knowledge redundancies via semantic models and use the induced model for further applications, such as knowledge completion and error detection. However, semantic models that are sufficiently expressive for semantic compression cannot be efficiently induced, especially for large-scale KBs, due to the hardness of logic induction. In this article, we present estimation-based optimizations for the semantic compression of RDF KBs from the perspectives of input and intermediate data involved in the induction of first-order logic rules. The negative sampling technique selects a representative subset of all negative tuples with respect to the closed-world assumption, reducing the cost of evaluating the quality of a logic rule used for knowledge inference. The number of logic inference operations used during a compression procedure is reduced by a statistical estimation technique that prunes logic rules of low quality. The evaluation results show that the two techniques are feasible for the purpose of semantic compression and accelerate the compression algorithm by up to 47x compared to the state-of-the-art system.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":8.6,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306457324001584/pdfft?md5=1434ced08cb844b2e1fe9c678d211fae&pid=1-s2.0-S0306457324001584-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141290035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Processing & Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1