使用神经网络方法的科学论文的作者身份识别和消歧

S. Schifano, Tommaso Sgarbanti, L. Tomassetti
{"title":"使用神经网络方法的科学论文的作者身份识别和消歧","authors":"S. Schifano, Tommaso Sgarbanti, L. Tomassetti","doi":"10.22323/1.327.0007","DOIUrl":null,"url":null,"abstract":"Authorship recognition and author names disambiguation are main issues affecting the quality and reliability of bibliographic records retrieved from digital libraries, such as Web of Science, Scopus, Google Scholar and many others. So far, these problems have been faced using methods mainly based on text-pattern-recognition for specific datasets, with high-level degree of errors. \n \nIn this paper, we propose a different approach using neural networks to learn features automatically for solving authorship recognition and disambiguation of author names. The network learns for each author the set of co-writers, and from this information recovers authorship of papers. In addition, the network can be trained taking into account other features, such as author affiliations, keywords, projects and research areas. \n \nThe network has been developed using the TensorFlow framework, and run on recent Nvidia GPUs and multi-core Intel CPUs. Test datasets have been selected from records of Scopus digital library, for several groups of authors working in the fields of computer science, environmental science and physics. The proposed methods achieves accuracies above 99\\% in authorship recognition, and is able to effectively disambiguate homonyms. \n \nWe have taken into account several network parameters, such as training-set and batch size, number of levels and hidden units, weights initialization, back-propagation algorithms, and analyzed also their impact on accuracy of results. This approach can be easily extended to any dataset and any bibliographic records provider.","PeriodicalId":135658,"journal":{"name":"Proceedings of International Symposium on Grids and Clouds 2018 in conjunction with Frontiers in Computational Drug Discovery — PoS(ISGC 2018 & FCDD)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Authorship recognition and disambiguation of scientific papers using a neural networks approach\",\"authors\":\"S. Schifano, Tommaso Sgarbanti, L. Tomassetti\",\"doi\":\"10.22323/1.327.0007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Authorship recognition and author names disambiguation are main issues affecting the quality and reliability of bibliographic records retrieved from digital libraries, such as Web of Science, Scopus, Google Scholar and many others. So far, these problems have been faced using methods mainly based on text-pattern-recognition for specific datasets, with high-level degree of errors. \\n \\nIn this paper, we propose a different approach using neural networks to learn features automatically for solving authorship recognition and disambiguation of author names. The network learns for each author the set of co-writers, and from this information recovers authorship of papers. In addition, the network can be trained taking into account other features, such as author affiliations, keywords, projects and research areas. \\n \\nThe network has been developed using the TensorFlow framework, and run on recent Nvidia GPUs and multi-core Intel CPUs. Test datasets have been selected from records of Scopus digital library, for several groups of authors working in the fields of computer science, environmental science and physics. The proposed methods achieves accuracies above 99\\\\% in authorship recognition, and is able to effectively disambiguate homonyms. \\n \\nWe have taken into account several network parameters, such as training-set and batch size, number of levels and hidden units, weights initialization, back-propagation algorithms, and analyzed also their impact on accuracy of results. This approach can be easily extended to any dataset and any bibliographic records provider.\",\"PeriodicalId\":135658,\"journal\":{\"name\":\"Proceedings of International Symposium on Grids and Clouds 2018 in conjunction with Frontiers in Computational Drug Discovery — PoS(ISGC 2018 & FCDD)\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of International Symposium on Grids and Clouds 2018 in conjunction with Frontiers in Computational Drug Discovery — PoS(ISGC 2018 & FCDD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22323/1.327.0007\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of International Symposium on Grids and Clouds 2018 in conjunction with Frontiers in Computational Drug Discovery — PoS(ISGC 2018 & FCDD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22323/1.327.0007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

作者身份识别和作者姓名消歧是影响Web of Science、Scopus、b谷歌Scholar等数字图书馆检索书目记录质量和可靠性的主要问题。到目前为止,这些问题主要是使用基于特定数据集的文本模式识别方法来解决的,误差程度很高。在本文中,我们提出了一种使用神经网络自动学习特征的方法来解决作者身份识别和作者姓名消歧问题。网络为每个作者学习共同作者的集合,并从这些信息中恢复论文的作者身份。此外,还可以考虑其他特征来训练网络,例如作者隶属关系、关键词、项目和研究领域。该网络是使用TensorFlow框架开发的,并在最新的Nvidia gpu和多核Intel cpu上运行。测试数据集是从Scopus数字图书馆的记录中选择的,适用于计算机科学、环境科学和物理学领域的几组作者。该方法在作者身份识别中准确率达到99%以上,能够有效地消除同音异义。我们考虑了几个网络参数,如训练集和批大小、层次数量和隐藏单元、权重初始化、反向传播算法,并分析了它们对结果准确性的影响。这种方法可以很容易地扩展到任何数据集和任何书目记录提供者。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Authorship recognition and disambiguation of scientific papers using a neural networks approach
Authorship recognition and author names disambiguation are main issues affecting the quality and reliability of bibliographic records retrieved from digital libraries, such as Web of Science, Scopus, Google Scholar and many others. So far, these problems have been faced using methods mainly based on text-pattern-recognition for specific datasets, with high-level degree of errors. In this paper, we propose a different approach using neural networks to learn features automatically for solving authorship recognition and disambiguation of author names. The network learns for each author the set of co-writers, and from this information recovers authorship of papers. In addition, the network can be trained taking into account other features, such as author affiliations, keywords, projects and research areas. The network has been developed using the TensorFlow framework, and run on recent Nvidia GPUs and multi-core Intel CPUs. Test datasets have been selected from records of Scopus digital library, for several groups of authors working in the fields of computer science, environmental science and physics. The proposed methods achieves accuracies above 99\% in authorship recognition, and is able to effectively disambiguate homonyms. We have taken into account several network parameters, such as training-set and batch size, number of levels and hidden units, weights initialization, back-propagation algorithms, and analyzed also their impact on accuracy of results. This approach can be easily extended to any dataset and any bibliographic records provider.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Progress on Machine and Deep Learning applications in CMS Computing What Goes Up, Must Go Down: A Case Study From RAL on Shrinking an Existing Storage Service Unified Account Management for High Performance Computing as a Service with Microservice Architecture Optical Interconnects for Cloud Computing Data Centers: Recent Advances and Future Challenges Studies on Job Queue Health and Problem Recovery
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1