{"title":"哪个更好?分类归纳法通过对比学习获得最佳结构","authors":"","doi":"10.1016/j.knosys.2024.112405","DOIUrl":null,"url":null,"abstract":"<div><p>A taxonomy represents a hierarchically structured knowledge graph that forms the infrastructure for various downstream applications, including recommender systems, web search, and question answering. The exploration of automated induction from text corpora has yielded notable taxonomies such as CN-probase, CN-DBpedia, and Zhishi.schema. Despite these efforts, existing taxonomies still face two critical issues that result in sub-optimal hierarchical structures. On the one hand, commonly observed taxonomies exhibit a coarse-grained and “flat” structure, stemming from a noticeable lack of diversity in both nodes and edges. This limitation primarily originates from the biased and homogeneous data distribution. On the other hand, the semantic granularity among “siblings” within these taxonomies remains inconsistent, presenting a challenge in accurately and comprehensively identifying hierarchical relations. To address these issues, this study introduces a novel taxonomy induction framework composed of three meticulously designed components. Initially, we established a seed schema by leveraging statistical information from external data sources as distant supervision to append nodes and edges containing “generic semantics”, thereby rectifying biased data distributions. Subsequently, a clustering algorithm is employed to group the nodes based on their similarities, followed by a refinement operation of the hierarchical relations among these nodes. Building on this seed schema, we propose a fine-grained contrastive learning method in the expansion module to strengthen the utilization of taxonomic structures, consequently boosting the precision of query-anchor matching. Finally, we meticulously scrutinized the hierarchical relations between each query and its siblings to ensure the integrity of the constructed taxonomy. Extensive experiments on real-world datasets validated the efficacy of our proposed framework for constructing well-structured taxonomies.</p></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Which is better? Taxonomy induction with learning the optimal structure via contrastive learning\",\"authors\":\"\",\"doi\":\"10.1016/j.knosys.2024.112405\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>A taxonomy represents a hierarchically structured knowledge graph that forms the infrastructure for various downstream applications, including recommender systems, web search, and question answering. The exploration of automated induction from text corpora has yielded notable taxonomies such as CN-probase, CN-DBpedia, and Zhishi.schema. Despite these efforts, existing taxonomies still face two critical issues that result in sub-optimal hierarchical structures. On the one hand, commonly observed taxonomies exhibit a coarse-grained and “flat” structure, stemming from a noticeable lack of diversity in both nodes and edges. This limitation primarily originates from the biased and homogeneous data distribution. On the other hand, the semantic granularity among “siblings” within these taxonomies remains inconsistent, presenting a challenge in accurately and comprehensively identifying hierarchical relations. To address these issues, this study introduces a novel taxonomy induction framework composed of three meticulously designed components. Initially, we established a seed schema by leveraging statistical information from external data sources as distant supervision to append nodes and edges containing “generic semantics”, thereby rectifying biased data distributions. Subsequently, a clustering algorithm is employed to group the nodes based on their similarities, followed by a refinement operation of the hierarchical relations among these nodes. Building on this seed schema, we propose a fine-grained contrastive learning method in the expansion module to strengthen the utilization of taxonomic structures, consequently boosting the precision of query-anchor matching. Finally, we meticulously scrutinized the hierarchical relations between each query and its siblings to ensure the integrity of the constructed taxonomy. Extensive experiments on real-world datasets validated the efficacy of our proposed framework for constructing well-structured taxonomies.</p></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2024-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705124010396\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705124010396","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Which is better? Taxonomy induction with learning the optimal structure via contrastive learning
A taxonomy represents a hierarchically structured knowledge graph that forms the infrastructure for various downstream applications, including recommender systems, web search, and question answering. The exploration of automated induction from text corpora has yielded notable taxonomies such as CN-probase, CN-DBpedia, and Zhishi.schema. Despite these efforts, existing taxonomies still face two critical issues that result in sub-optimal hierarchical structures. On the one hand, commonly observed taxonomies exhibit a coarse-grained and “flat” structure, stemming from a noticeable lack of diversity in both nodes and edges. This limitation primarily originates from the biased and homogeneous data distribution. On the other hand, the semantic granularity among “siblings” within these taxonomies remains inconsistent, presenting a challenge in accurately and comprehensively identifying hierarchical relations. To address these issues, this study introduces a novel taxonomy induction framework composed of three meticulously designed components. Initially, we established a seed schema by leveraging statistical information from external data sources as distant supervision to append nodes and edges containing “generic semantics”, thereby rectifying biased data distributions. Subsequently, a clustering algorithm is employed to group the nodes based on their similarities, followed by a refinement operation of the hierarchical relations among these nodes. Building on this seed schema, we propose a fine-grained contrastive learning method in the expansion module to strengthen the utilization of taxonomic structures, consequently boosting the precision of query-anchor matching. Finally, we meticulously scrutinized the hierarchical relations between each query and its siblings to ensure the integrity of the constructed taxonomy. Extensive experiments on real-world datasets validated the efficacy of our proposed framework for constructing well-structured taxonomies.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.