Extraction and Evaluation of Knowledge Entities from Scientific Documents

Chengzhi Zhang, Philipp Mayr, Wei Lu, Yi Zhang
{"title":"Extraction and Evaluation of Knowledge Entities from Scientific Documents","authors":"Chengzhi Zhang, Philipp Mayr, Wei Lu, Yi Zhang","doi":"10.2478/jdis-2021-0025","DOIUrl":null,"url":null,"abstract":"As a core resource of scientific knowledge, academic documents have been frequently used by scholars, especially newcomers to a given field. In the era of big data, scientific documents such as academic articles, patents, technical reports, and webpages are booming. The rapid daily growth of scientific documents indicates that a large amount of knowledge is proposed, improved, and used (Zhang et al., 2021). In scientific documents, knowledge entities (KEs) refer to the knowledge mentioned or cited by authors, such as algorithms, models, theories, datasets and software, diseases, drugs, and genes, reflecting rich resources in diverse problemsolving scenarios (Brack et al., 2020; Ding et al., 2013; Hou et al., 2019; Li et al. 2020). The advancement, improvement, and application of KEs in academic research have played a crucial role in promoting the development of different disciplines. Extracting various KEs from scientific documents can determine whether such KEs are emerging or typical in a specific field, and help scholars gain a comprehensive understanding of these KEs and even the entire research field (Wang & Zhang, 2020). KE extraction is also useful for multiple downstream tasks in information extraction, text mining, natural language processing, information retrieval, digital library research, and so on (Zhang et al., 2021). Particularly for researchers in artificial intelligence (AI), information science, and other related disciplines, discovering methods from large-scale academic literature, and evaluating their performance and influence have become increasingly necessary and meaningful (Hou et al., 2020). There are four kinds of methods of KE extraction in scientific documents. They are manual annotation-based (Chu & Ke, 2017; Tateisi et al., 2014; Zadeh & Schumann, 2016), rule-based (Kondo et al., 2009), statistics-based (Heffernan & Teufel, 2018; Névéol, Wilbur, & Lu, 2011; Okamoto, Shan, & Orihara, 2017), and","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"1 - 5"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of data and information science (Warsaw, Poland)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/jdis-2021-0025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

As a core resource of scientific knowledge, academic documents have been frequently used by scholars, especially newcomers to a given field. In the era of big data, scientific documents such as academic articles, patents, technical reports, and webpages are booming. The rapid daily growth of scientific documents indicates that a large amount of knowledge is proposed, improved, and used (Zhang et al., 2021). In scientific documents, knowledge entities (KEs) refer to the knowledge mentioned or cited by authors, such as algorithms, models, theories, datasets and software, diseases, drugs, and genes, reflecting rich resources in diverse problemsolving scenarios (Brack et al., 2020; Ding et al., 2013; Hou et al., 2019; Li et al. 2020). The advancement, improvement, and application of KEs in academic research have played a crucial role in promoting the development of different disciplines. Extracting various KEs from scientific documents can determine whether such KEs are emerging or typical in a specific field, and help scholars gain a comprehensive understanding of these KEs and even the entire research field (Wang & Zhang, 2020). KE extraction is also useful for multiple downstream tasks in information extraction, text mining, natural language processing, information retrieval, digital library research, and so on (Zhang et al., 2021). Particularly for researchers in artificial intelligence (AI), information science, and other related disciplines, discovering methods from large-scale academic literature, and evaluating their performance and influence have become increasingly necessary and meaningful (Hou et al., 2020). There are four kinds of methods of KE extraction in scientific documents. They are manual annotation-based (Chu & Ke, 2017; Tateisi et al., 2014; Zadeh & Schumann, 2016), rule-based (Kondo et al., 2009), statistics-based (Heffernan & Teufel, 2018; Névéol, Wilbur, & Lu, 2011; Okamoto, Shan, & Orihara, 2017), and
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
科学文献中知识实体的提取与评价
学术文献作为科学知识的核心资源,经常被学者,特别是新进入某一领域的学者所使用。在大数据时代,学术文章、专利、技术报告、网页等科学文献蓬勃发展。科学文献的快速增长表明大量的知识被提出、改进和使用(Zhang et al., 2021)。在科学文献中,知识实体(knowledge entities, ke)是指作者提及或引用的知识,如算法、模型、理论、数据集和软件、疾病、药物、基因等,反映了不同问题解决场景下的丰富资源(Brack et al., 2020;丁等人,2013;侯等人,2019;Li et al. 2020)。KEs在学术研究中的发展、完善和应用,对不同学科的发展起到了至关重要的推动作用。从科学文献中提取各种ke,可以判断这些ke在特定领域是新兴的还是典型的,有助于学者对这些ke乃至整个研究领域有一个全面的了解(Wang & Zhang, 2020)。KE提取还可用于信息提取、文本挖掘、自然语言处理、信息检索、数字图书馆研究等多个下游任务(Zhang et al., 2021)。特别是对于人工智能(AI)、信息科学和其他相关学科的研究人员来说,从大规模的学术文献中发现方法并评估其性能和影响力变得越来越必要和有意义(Hou et al., 2020)。科学文献中KE的提取方法有四种。它们是基于手工注释的(Chu & Ke, 2017;Tateisi et al., 2014;Zadeh & Schumann, 2016),基于规则的(Kondo等人,2009),基于统计的(Heffernan & Teufel, 2018;nsamuzi, Wilbur, & Lu, 2011;Okamoto, Shan, & Orihara, 2017),和
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Editorial board publication strategy and acceptance rates in Turkish national journals Multimodal sentiment analysis for social media contents during public emergencies Perspectives from a publishing ethics and research integrity team for required improvements Build neural network models to identify and correct news headlines exaggerating obesity-related scientific findings An author credit allocation method with improved distinguishability and robustness
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1