CMVC+: A Multi-View Clustering Framework for Open Knowledge Base Canonicalization Via Contrastive Learning

IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-02-18 DOI:10.1109/TKDE.2025.3543423
Yang Yang;Wei Shen;Junfeng Shu;Yinan Liu;Edward Curry;Guoliang Li
{"title":"CMVC+: A Multi-View Clustering Framework for Open Knowledge Base Canonicalization Via Contrastive Learning","authors":"Yang Yang;Wei Shen;Junfeng Shu;Yinan Liu;Edward Curry;Guoliang Li","doi":"10.1109/TKDE.2025.3543423","DOIUrl":null,"url":null,"abstract":"Open information extraction (OIE) methods extract plenty of OIE triples <italic><inline-formula><tex-math>$&lt; $</tex-math><alternatives><mml:math><mml:mo>&lt;</mml:mo></mml:math><inline-graphic></alternatives></inline-formula>noun phrase, relation phrase, noun phrase<inline-formula><tex-math>$&gt; $</tex-math><alternatives><mml:math><mml:mo>&gt;</mml:mo></mml:math><inline-graphic></alternatives></inline-formula></i> from unstructured text, which compose large open knowledge bases (OKBs). Noun phrases and relation phrases in such OKBs are not canonicalized, which leads to scattered and redundant facts. It is found that two views of knowledge (i.e., a fact view based on the fact triple and a context view based on the fact triple's source context) provide complementary information that is vital to the task of OKB canonicalization, which clusters synonymous noun phrases and relation phrases into the same group and assigns them unique identifiers. In order to leverage these two views of knowledge jointly, we propose CMVC+, a novel unsupervised framework for canonicalizing OKBs without the need for manually annotated labels. Specifically, we propose a multi-view CHF K-Means clustering algorithm to mutually reinforce the clustering of view-specific embeddings learned from each view by considering the clustering quality in a fine-grained manner. Furthermore, we propose a novel contrastive learning module to refine the learned view-specific embeddings and further enhance the canonicalization performance. We demonstrate the superiority of our framework through extensive experiments on multiple real-world OKB data sets against state-of-the-art methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 5","pages":"2296-2310"},"PeriodicalIF":10.4000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10891880/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Open information extraction (OIE) methods extract plenty of OIE triples $< $<noun phrase, relation phrase, noun phrase$> $> from unstructured text, which compose large open knowledge bases (OKBs). Noun phrases and relation phrases in such OKBs are not canonicalized, which leads to scattered and redundant facts. It is found that two views of knowledge (i.e., a fact view based on the fact triple and a context view based on the fact triple's source context) provide complementary information that is vital to the task of OKB canonicalization, which clusters synonymous noun phrases and relation phrases into the same group and assigns them unique identifiers. In order to leverage these two views of knowledge jointly, we propose CMVC+, a novel unsupervised framework for canonicalizing OKBs without the need for manually annotated labels. Specifically, we propose a multi-view CHF K-Means clustering algorithm to mutually reinforce the clustering of view-specific embeddings learned from each view by considering the clustering quality in a fine-grained manner. Furthermore, we propose a novel contrastive learning module to refine the learned view-specific embeddings and further enhance the canonicalization performance. We demonstrate the superiority of our framework through extensive experiments on multiple real-world OKB data sets against state-of-the-art methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CMVC+:基于对比学习的开放知识库规范化多视图聚类框架
开放信息提取(OIE)方法提取了大量的OIE三元组$<;名词短语,关系短语,名词短语$>;在美元;从非结构化的文本,组成大型开放知识库(okb)。这些okb中的名词短语和关系短语没有规范化,导致事实的分散和冗余。研究发现,两种知识视图(即基于事实三元组的事实视图和基于事实三元组源上下文的上下文视图)提供了对OKB规范化任务至关重要的互补信息,该任务将同义名词短语和关系短语聚类到同一组中,并为它们分配唯一标识符。为了共同利用这两种知识视图,我们提出了CMVC+,这是一种新的无监督框架,用于规范化okb,而不需要手动注释标签。具体来说,我们提出了一种多视图CHF K-Means聚类算法,通过细粒度方式考虑聚类质量,相互加强从每个视图学习的特定视图嵌入的聚类。此外,我们提出了一种新的对比学习模块来改进学习到的特定于视图的嵌入,进一步提高规范化性能。我们通过在多个真实世界的OKB数据集上对最先进的方法进行了广泛的实验,证明了我们框架的优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering 工程技术-工程:电子与电气
CiteScore
11.70
自引率
3.40%
发文量
515
审稿时长
6 months
期刊介绍: The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.
期刊最新文献
2025 Reviewers List XiYan-SQL: A Novel Multi-Generator Framework for Text-to-SQL Toward Federated Learning of Deep Graph Neural Networks HCGBot: Learning Homophilous Context Graphs for Twitter Bot Detection Optimizing KBQA by Correcting LLM-Generated Non-Executable Logical Form Through Knowledge-Assisted Path Reconstruction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1