{"title":"Deeply integrating unsupervised semantics and syntax into heterogeneous graphs for inductive text classification","authors":"Yue Gao, Xiangling Fu, Xien Liu, Ji Wu","doi":"10.1007/s40747-023-01228-8","DOIUrl":null,"url":null,"abstract":"Abstract Graph-based neural networks and unsupervised pre-trained models are both cutting-edge text representation methods, given their outstanding ability to capture global information and contextualized information, respectively. However, both representation methods meet obstacles to further performance improvements. On one hand, graph-based neural networks lack knowledge orientation to guide textual interpretation during global information interaction. On the other hand, unsupervised pre-trained models imply rich semantic and syntactic knowledge which lacks sufficient induction and expression. Therefore, how to effectively integrate graph-based global information and unsupervised contextualized semantic and syntactic information to achieve better text representation is an important issue pending for solution. In this paper, we propose a representation method that deeply integrates Unsupervised Semantics and Syntax into heterogeneous Graphs (USS-Graph) for inductive text classification. By constructing a heterogeneous graph whose edges and nodes are totally generated by knowledge from unsupervised pre-trained models, USS-Graph can harmonize the two perspectives of information under a bidirectionally weighted graph structure and thereby realizing the intra-fusion of graph-based global information and unsupervised contextualized semantic and syntactic information. Based on USS-Graph, we also propose a series of optimization measures to further improve the knowledge integration and representation performance. Extensive experiments conducted on benchmark datasets show that USS-Graph consistently achieves state-of-the-art performances on inductive text classification tasks. Additionally, extended experiments are conducted to deeply analyze the characteristics of USS-Graph and the effectiveness of our proposed optimization measures for further knowledge integration and information complementation.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"31 1","pages":"0"},"PeriodicalIF":5.0000,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s40747-023-01228-8","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract Graph-based neural networks and unsupervised pre-trained models are both cutting-edge text representation methods, given their outstanding ability to capture global information and contextualized information, respectively. However, both representation methods meet obstacles to further performance improvements. On one hand, graph-based neural networks lack knowledge orientation to guide textual interpretation during global information interaction. On the other hand, unsupervised pre-trained models imply rich semantic and syntactic knowledge which lacks sufficient induction and expression. Therefore, how to effectively integrate graph-based global information and unsupervised contextualized semantic and syntactic information to achieve better text representation is an important issue pending for solution. In this paper, we propose a representation method that deeply integrates Unsupervised Semantics and Syntax into heterogeneous Graphs (USS-Graph) for inductive text classification. By constructing a heterogeneous graph whose edges and nodes are totally generated by knowledge from unsupervised pre-trained models, USS-Graph can harmonize the two perspectives of information under a bidirectionally weighted graph structure and thereby realizing the intra-fusion of graph-based global information and unsupervised contextualized semantic and syntactic information. Based on USS-Graph, we also propose a series of optimization measures to further improve the knowledge integration and representation performance. Extensive experiments conducted on benchmark datasets show that USS-Graph consistently achieves state-of-the-art performances on inductive text classification tasks. Additionally, extended experiments are conducted to deeply analyze the characteristics of USS-Graph and the effectiveness of our proposed optimization measures for further knowledge integration and information complementation.
期刊介绍:
Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.