{"title":"A cross-domain transfer learning model for author name disambiguation on heterogeneous graph with pretrained language model","authors":"Zhenyuan Huang , Hui Zhang , Chengqian Hao , Haijun Yang , Harris Wu","doi":"10.1016/j.knosys.2024.112624","DOIUrl":null,"url":null,"abstract":"<div><div>Author names in scientific literature are often ambiguous, complicating the accurate retrieval of academic information. Furthermore, many author names are shared by multiple scholars, making it challenging to construct academic search engine knowledge bases. These issues highlight the need for effective author name disambiguation. Existing methods have limitations in handling text content and heterogeneous graph node representations and often require extensive annotated training data. This study introduces an academic heterogeneous graph embedding neural network, HGNN-S, which leverages a pretrained semantic language model to integrate semantic information from texts, heterogeneous attribute relationships, and heterogeneous neighbor data. Trained on a small amount of single-domain annotated data, HGNN-S can disambiguate names across multiple domains. Experimental results demonstrate that our model outperforms current state-of-the-art methods and enhances search performance on the China National Platform, Kejso.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"305 ","pages":"Article 112624"},"PeriodicalIF":7.2000,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705124012589","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Author names in scientific literature are often ambiguous, complicating the accurate retrieval of academic information. Furthermore, many author names are shared by multiple scholars, making it challenging to construct academic search engine knowledge bases. These issues highlight the need for effective author name disambiguation. Existing methods have limitations in handling text content and heterogeneous graph node representations and often require extensive annotated training data. This study introduces an academic heterogeneous graph embedding neural network, HGNN-S, which leverages a pretrained semantic language model to integrate semantic information from texts, heterogeneous attribute relationships, and heterogeneous neighbor data. Trained on a small amount of single-domain annotated data, HGNN-S can disambiguate names across multiple domains. Experimental results demonstrate that our model outperforms current state-of-the-art methods and enhances search performance on the China National Platform, Kejso.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.