{"title":"科学文献中知识实体的提取与评价","authors":"Chengzhi Zhang, Philipp Mayr, Wei Lu, Yi Zhang","doi":"10.2478/jdis-2021-0025","DOIUrl":null,"url":null,"abstract":"As a core resource of scientific knowledge, academic documents have been frequently used by scholars, especially newcomers to a given field. In the era of big data, scientific documents such as academic articles, patents, technical reports, and webpages are booming. The rapid daily growth of scientific documents indicates that a large amount of knowledge is proposed, improved, and used (Zhang et al., 2021). In scientific documents, knowledge entities (KEs) refer to the knowledge mentioned or cited by authors, such as algorithms, models, theories, datasets and software, diseases, drugs, and genes, reflecting rich resources in diverse problemsolving scenarios (Brack et al., 2020; Ding et al., 2013; Hou et al., 2019; Li et al. 2020). The advancement, improvement, and application of KEs in academic research have played a crucial role in promoting the development of different disciplines. Extracting various KEs from scientific documents can determine whether such KEs are emerging or typical in a specific field, and help scholars gain a comprehensive understanding of these KEs and even the entire research field (Wang & Zhang, 2020). KE extraction is also useful for multiple downstream tasks in information extraction, text mining, natural language processing, information retrieval, digital library research, and so on (Zhang et al., 2021). Particularly for researchers in artificial intelligence (AI), information science, and other related disciplines, discovering methods from large-scale academic literature, and evaluating their performance and influence have become increasingly necessary and meaningful (Hou et al., 2020). There are four kinds of methods of KE extraction in scientific documents. They are manual annotation-based (Chu & Ke, 2017; Tateisi et al., 2014; Zadeh & Schumann, 2016), rule-based (Kondo et al., 2009), statistics-based (Heffernan & Teufel, 2018; Névéol, Wilbur, & Lu, 2011; Okamoto, Shan, & Orihara, 2017), and","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"1 - 5"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Extraction and Evaluation of Knowledge Entities from Scientific Documents\",\"authors\":\"Chengzhi Zhang, Philipp Mayr, Wei Lu, Yi Zhang\",\"doi\":\"10.2478/jdis-2021-0025\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As a core resource of scientific knowledge, academic documents have been frequently used by scholars, especially newcomers to a given field. In the era of big data, scientific documents such as academic articles, patents, technical reports, and webpages are booming. The rapid daily growth of scientific documents indicates that a large amount of knowledge is proposed, improved, and used (Zhang et al., 2021). In scientific documents, knowledge entities (KEs) refer to the knowledge mentioned or cited by authors, such as algorithms, models, theories, datasets and software, diseases, drugs, and genes, reflecting rich resources in diverse problemsolving scenarios (Brack et al., 2020; Ding et al., 2013; Hou et al., 2019; Li et al. 2020). The advancement, improvement, and application of KEs in academic research have played a crucial role in promoting the development of different disciplines. Extracting various KEs from scientific documents can determine whether such KEs are emerging or typical in a specific field, and help scholars gain a comprehensive understanding of these KEs and even the entire research field (Wang & Zhang, 2020). KE extraction is also useful for multiple downstream tasks in information extraction, text mining, natural language processing, information retrieval, digital library research, and so on (Zhang et al., 2021). Particularly for researchers in artificial intelligence (AI), information science, and other related disciplines, discovering methods from large-scale academic literature, and evaluating their performance and influence have become increasingly necessary and meaningful (Hou et al., 2020). There are four kinds of methods of KE extraction in scientific documents. They are manual annotation-based (Chu & Ke, 2017; Tateisi et al., 2014; Zadeh & Schumann, 2016), rule-based (Kondo et al., 2009), statistics-based (Heffernan & Teufel, 2018; Névéol, Wilbur, & Lu, 2011; Okamoto, Shan, & Orihara, 2017), and\",\"PeriodicalId\":92237,\"journal\":{\"name\":\"Journal of data and information science (Warsaw, Poland)\",\"volume\":\"6 1\",\"pages\":\"1 - 5\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of data and information science (Warsaw, Poland)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/jdis-2021-0025\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of data and information science (Warsaw, Poland)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/jdis-2021-0025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
摘要
学术文献作为科学知识的核心资源,经常被学者,特别是新进入某一领域的学者所使用。在大数据时代,学术文章、专利、技术报告、网页等科学文献蓬勃发展。科学文献的快速增长表明大量的知识被提出、改进和使用(Zhang et al., 2021)。在科学文献中,知识实体(knowledge entities, ke)是指作者提及或引用的知识,如算法、模型、理论、数据集和软件、疾病、药物、基因等,反映了不同问题解决场景下的丰富资源(Brack et al., 2020;丁等人,2013;侯等人,2019;Li et al. 2020)。KEs在学术研究中的发展、完善和应用,对不同学科的发展起到了至关重要的推动作用。从科学文献中提取各种ke,可以判断这些ke在特定领域是新兴的还是典型的,有助于学者对这些ke乃至整个研究领域有一个全面的了解(Wang & Zhang, 2020)。KE提取还可用于信息提取、文本挖掘、自然语言处理、信息检索、数字图书馆研究等多个下游任务(Zhang et al., 2021)。特别是对于人工智能(AI)、信息科学和其他相关学科的研究人员来说,从大规模的学术文献中发现方法并评估其性能和影响力变得越来越必要和有意义(Hou et al., 2020)。科学文献中KE的提取方法有四种。它们是基于手工注释的(Chu & Ke, 2017;Tateisi et al., 2014;Zadeh & Schumann, 2016),基于规则的(Kondo等人,2009),基于统计的(Heffernan & Teufel, 2018;nsamuzi, Wilbur, & Lu, 2011;Okamoto, Shan, & Orihara, 2017),和
Extraction and Evaluation of Knowledge Entities from Scientific Documents
As a core resource of scientific knowledge, academic documents have been frequently used by scholars, especially newcomers to a given field. In the era of big data, scientific documents such as academic articles, patents, technical reports, and webpages are booming. The rapid daily growth of scientific documents indicates that a large amount of knowledge is proposed, improved, and used (Zhang et al., 2021). In scientific documents, knowledge entities (KEs) refer to the knowledge mentioned or cited by authors, such as algorithms, models, theories, datasets and software, diseases, drugs, and genes, reflecting rich resources in diverse problemsolving scenarios (Brack et al., 2020; Ding et al., 2013; Hou et al., 2019; Li et al. 2020). The advancement, improvement, and application of KEs in academic research have played a crucial role in promoting the development of different disciplines. Extracting various KEs from scientific documents can determine whether such KEs are emerging or typical in a specific field, and help scholars gain a comprehensive understanding of these KEs and even the entire research field (Wang & Zhang, 2020). KE extraction is also useful for multiple downstream tasks in information extraction, text mining, natural language processing, information retrieval, digital library research, and so on (Zhang et al., 2021). Particularly for researchers in artificial intelligence (AI), information science, and other related disciplines, discovering methods from large-scale academic literature, and evaluating their performance and influence have become increasingly necessary and meaningful (Hou et al., 2020). There are four kinds of methods of KE extraction in scientific documents. They are manual annotation-based (Chu & Ke, 2017; Tateisi et al., 2014; Zadeh & Schumann, 2016), rule-based (Kondo et al., 2009), statistics-based (Heffernan & Teufel, 2018; Névéol, Wilbur, & Lu, 2011; Okamoto, Shan, & Orihara, 2017), and