哪个更好？分类归纳法通过对比学习获得最佳结构

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Knowledge-Based Systems Pub Date : 2024-11-25 Epub Date: 2024-09-03 DOI:10.1016/j.knosys.2024.112405

Yuan Meng , Songlin Zhai , Zhihua Chai , Yuxin Zhang , Tianxing Wu , Guilin Qi , Wei Song

{"title":"哪个更好？分类归纳法通过对比学习获得最佳结构","authors":"Yuan Meng , Songlin Zhai , Zhihua Chai , Yuxin Zhang , Tianxing Wu , Guilin Qi , Wei Song","doi":"10.1016/j.knosys.2024.112405","DOIUrl":null,"url":null,"abstract":"<div><p>A taxonomy represents a hierarchically structured knowledge graph that forms the infrastructure for various downstream applications, including recommender systems, web search, and question answering. The exploration of automated induction from text corpora has yielded notable taxonomies such as CN-probase, CN-DBpedia, and Zhishi.schema. Despite these efforts, existing taxonomies still face two critical issues that result in sub-optimal hierarchical structures. On the one hand, commonly observed taxonomies exhibit a coarse-grained and “flat” structure, stemming from a noticeable lack of diversity in both nodes and edges. This limitation primarily originates from the biased and homogeneous data distribution. On the other hand, the semantic granularity among “siblings” within these taxonomies remains inconsistent, presenting a challenge in accurately and comprehensively identifying hierarchical relations. To address these issues, this study introduces a novel taxonomy induction framework composed of three meticulously designed components. Initially, we established a seed schema by leveraging statistical information from external data sources as distant supervision to append nodes and edges containing “generic semantics”, thereby rectifying biased data distributions. Subsequently, a clustering algorithm is employed to group the nodes based on their similarities, followed by a refinement operation of the hierarchical relations among these nodes. Building on this seed schema, we propose a fine-grained contrastive learning method in the expansion module to strengthen the utilization of taxonomic structures, consequently boosting the precision of query-anchor matching. Finally, we meticulously scrutinized the hierarchical relations between each query and its siblings to ensure the integrity of the constructed taxonomy. Extensive experiments on real-world datasets validated the efficacy of our proposed framework for constructing well-structured taxonomies.</p></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"304 ","pages":"Article 112405"},"PeriodicalIF":7.6000,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Which is better? Taxonomy induction with learning the optimal structure via contrastive learning\",\"authors\":\"Yuan Meng , Songlin Zhai , Zhihua Chai , Yuxin Zhang , Tianxing Wu , Guilin Qi , Wei Song\",\"doi\":\"10.1016/j.knosys.2024.112405\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>A taxonomy represents a hierarchically structured knowledge graph that forms the infrastructure for various downstream applications, including recommender systems, web search, and question answering. The exploration of automated induction from text corpora has yielded notable taxonomies such as CN-probase, CN-DBpedia, and Zhishi.schema. Despite these efforts, existing taxonomies still face two critical issues that result in sub-optimal hierarchical structures. On the one hand, commonly observed taxonomies exhibit a coarse-grained and “flat” structure, stemming from a noticeable lack of diversity in both nodes and edges. This limitation primarily originates from the biased and homogeneous data distribution. On the other hand, the semantic granularity among “siblings” within these taxonomies remains inconsistent, presenting a challenge in accurately and comprehensively identifying hierarchical relations. To address these issues, this study introduces a novel taxonomy induction framework composed of three meticulously designed components. Initially, we established a seed schema by leveraging statistical information from external data sources as distant supervision to append nodes and edges containing “generic semantics”, thereby rectifying biased data distributions. Subsequently, a clustering algorithm is employed to group the nodes based on their similarities, followed by a refinement operation of the hierarchical relations among these nodes. Building on this seed schema, we propose a fine-grained contrastive learning method in the expansion module to strengthen the utilization of taxonomic structures, consequently boosting the precision of query-anchor matching. Finally, we meticulously scrutinized the hierarchical relations between each query and its siblings to ensure the integrity of the constructed taxonomy. Extensive experiments on real-world datasets validated the efficacy of our proposed framework for constructing well-structured taxonomies.</p></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"304 \",\"pages\":\"Article 112405\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2024-11-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705124010396\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/9/3 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705124010396","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/3 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

分类法代表了一种分层结构的知识图谱，它构成了各种下游应用的基础架构，包括推荐系统、网络搜索和问题解答。从文本语料库中进行自动归纳的探索已经产生了一些著名的分类法，如 CN-probase、CN-DBpedia 和 Zhishi.schema。尽管做出了这些努力，但现有的分类法仍然面临两个关键问题，它们导致了次优的层次结构。一方面，由于节点和边都明显缺乏多样性，常见的分类标准表现出粗粒度和 "扁平 "结构。这种局限性主要源于数据分布的偏差和同质化。另一方面，这些分类法中 "兄弟姐妹 "之间的语义粒度仍然不一致，给准确、全面地识别层次关系带来了挑战。为了解决这些问题，本研究引入了一个新颖的分类归纳框架，该框架由三个精心设计的部分组成。首先，我们建立了一个种子模式，利用外部数据源的统计信息作为远距离监督，添加包含 "通用语义 "的节点和边，从而纠正偏差的数据分布。随后，采用聚类算法根据节点的相似性对节点进行分组，再对这些节点之间的层次关系进行细化操作。在这一种子模式的基础上，我们在扩展模块中提出了一种细粒度对比学习方法，以加强对分类结构的利用，从而提高查询锚点匹配的精确度。最后，我们仔细检查了每个查询及其同类之间的层次关系，以确保构建的分类法的完整性。在真实世界数据集上进行的大量实验验证了我们提出的构建结构良好的分类法框架的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Which is better? Taxonomy induction with learning the optimal structure via contrastive learning

A taxonomy represents a hierarchically structured knowledge graph that forms the infrastructure for various downstream applications, including recommender systems, web search, and question answering. The exploration of automated induction from text corpora has yielded notable taxonomies such as CN-probase, CN-DBpedia, and Zhishi.schema. Despite these efforts, existing taxonomies still face two critical issues that result in sub-optimal hierarchical structures. On the one hand, commonly observed taxonomies exhibit a coarse-grained and “flat” structure, stemming from a noticeable lack of diversity in both nodes and edges. This limitation primarily originates from the biased and homogeneous data distribution. On the other hand, the semantic granularity among “siblings” within these taxonomies remains inconsistent, presenting a challenge in accurately and comprehensively identifying hierarchical relations. To address these issues, this study introduces a novel taxonomy induction framework composed of three meticulously designed components. Initially, we established a seed schema by leveraging statistical information from external data sources as distant supervision to append nodes and edges containing “generic semantics”, thereby rectifying biased data distributions. Subsequently, a clustering algorithm is employed to group the nodes based on their similarities, followed by a refinement operation of the hierarchical relations among these nodes. Building on this seed schema, we propose a fine-grained contrastive learning method in the expansion module to strengthen the utilization of taxonomic structures, consequently boosting the precision of query-anchor matching. Finally, we meticulously scrutinized the hierarchical relations between each query and its siblings to ensure the integrity of the constructed taxonomy. Extensive experiments on real-world datasets validated the efficacy of our proposed framework for constructing well-structured taxonomies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.