Automatic author name disambiguation by differentiable feature selection

IF 1.8 4区 管理学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of Information Science Pub Date : 2023-09-19 DOI:10.1177/01655515231193859
ZhiJian Fang, Yue Zhuo, Jinying Xu, Zhechong Tang, Zijie Jia, HuaXiong Zhang
{"title":"Automatic author name disambiguation by differentiable feature selection","authors":"ZhiJian Fang, Yue Zhuo, Jinying Xu, Zhechong Tang, Zijie Jia, HuaXiong Zhang","doi":"10.1177/01655515231193859","DOIUrl":null,"url":null,"abstract":"Author name disambiguation (AND) is the task of resolving the ambiguity problem in bibliographic databases, where distinct real-world authors may share the same name or same author may have distinct names. The aim of AND is to split the name-ambiguous entities (articles) into the corresponding authors. Existing AND algorithms mainly focus on designing different similarity metrics between two ambiguous articles. However, most previous methods empirically select and process the features of entities, then use features to predict the similarity by data-driven models. In this article, we are motivated by natural questions: Which features are most useful for splitting name-ambiguous entities? Can they be automatically determined by an optimisation approach rather than heuristic feature engineering? Therefore, we proposed a novel end-to-end differentiable feature selection algorithm, automatically searching the optimal features for AND task (AAND). AAND optimises the discrete feature selection by differentiable Gumbel-Softmax, leading to the joint learning of feature selection policy and similarity prediction model. The experiments are conducted on a benchmark data set, S2AND, which harmonises eight different AND data sets. The results show that the performance of our proposal is superior to the advanced AND methods and feature selection algorithms. Meanwhile, deep insights into AND features are also given.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":"11 1","pages":"0"},"PeriodicalIF":1.8000,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/01655515231193859","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Author name disambiguation (AND) is the task of resolving the ambiguity problem in bibliographic databases, where distinct real-world authors may share the same name or same author may have distinct names. The aim of AND is to split the name-ambiguous entities (articles) into the corresponding authors. Existing AND algorithms mainly focus on designing different similarity metrics between two ambiguous articles. However, most previous methods empirically select and process the features of entities, then use features to predict the similarity by data-driven models. In this article, we are motivated by natural questions: Which features are most useful for splitting name-ambiguous entities? Can they be automatically determined by an optimisation approach rather than heuristic feature engineering? Therefore, we proposed a novel end-to-end differentiable feature selection algorithm, automatically searching the optimal features for AND task (AAND). AAND optimises the discrete feature selection by differentiable Gumbel-Softmax, leading to the joint learning of feature selection policy and similarity prediction model. The experiments are conducted on a benchmark data set, S2AND, which harmonises eight different AND data sets. The results show that the performance of our proposal is superior to the advanced AND methods and feature selection algorithms. Meanwhile, deep insights into AND features are also given.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于可微分特征选择的作者姓名自动消歧
作者姓名消歧(AND)是解决书目数据库中的歧义问题的任务,其中不同的现实世界作者可能共享相同的名称,或者相同的作者可能具有不同的名称。AND的目的是将名称不明确的实体(文章)拆分为对应的作者。现有的AND算法主要集中在设计两篇歧义文章之间不同的相似度度量。然而,以往的方法大多是经验地选择和处理实体的特征,然后利用特征通过数据驱动模型来预测相似度。在本文中,我们的动机是一个自然的问题:哪些特性对于拆分名称不明确的实体最有用?它们可以通过优化方法而不是启发式特征工程来自动确定吗?为此,我们提出了一种新的端到端可微特征选择算法,自动搜索与任务的最优特征(AAND)。AAND通过可微Gumbel-Softmax优化离散特征选择,实现特征选择策略和相似度预测模型的联合学习。实验是在一个基准数据集S2AND上进行的,该数据集协调了八个不同的AND数据集。结果表明,该方法的性能优于先进的AND方法和特征选择算法。同时,对AND特征也进行了深入的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Information Science
Journal of Information Science 工程技术-计算机:信息系统
CiteScore
6.80
自引率
8.30%
发文量
121
审稿时长
4 months
期刊介绍: The Journal of Information Science is a peer-reviewed international journal of high repute covering topics of interest to all those researching and working in the sciences of information and knowledge management. The Editors welcome material on any aspect of information science theory, policy, application or practice that will advance thinking in the field.
期刊最新文献
Government chatbot: Empowering smart conversations with enhanced contextual understanding and reasoning Knowing within multispecies families: An information experience study How are global university rankings adjusted for erroneous science, fraud and misconduct? Posterior reduction or adjustment in rankings in response to retractions and invalidation of scientific findings Predicting the technological impact of papers: Exploring optimal models and most important features Cross-domain corpus selection for cold-start context
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1