首页 > 最新文献

Journal of data and information science (Warsaw, Poland)最新文献

英文 中文
Extracting and Measuring Uncertain Biomedical Knowledge from Scientific Statements 从科学声明中提取和测量不确定的生物医学知识
Pub Date : 2021-12-05 DOI: 10.2478/jdis-2022-0008
Xin Guo, Yuming Chen, Jian Du, Erdan Dong
Abstract Purpose Given the information overload of scientific literature, there is an increasing need for computable biomedical knowledge buried in free text. This study aimed to develop a novel approach to extracting and measuring uncertain biomedical knowledge from scientific statements. Design/methodology/approach Taking cardiovascular research publications in China as a sample, we extracted subject–predicate–object triples (SPO triples) as knowledge units and unknown/hedging/conflicting uncertainties as the knowledge context. We introduced information entropy (IE) as potential metric to quantify the uncertainty of epistemic status of scientific knowledge represented at subject-object pairs (SO pairs) levels. Findings The results indicated an extraordinary growth of cardiovascular publications in China while only a modest growth of the novel SPO triples. After evaluating the uncertainty of biomedical knowledge with IE, we identified the Top 10 SO pairs with highest IE, which implied the epistemic status pluralism. Visual presentation of the SO pairs overlaid with uncertainty provided a comprehensive overview of clusters of biomedical knowledge and contending topics in cardiovascular research. Research limitations The current methods didn’t distinguish the specificity and probabilities of uncertainty cue words. The number of sentences surrounding a given triple may also influence the value of IE. Practical implications Our approach identified major uncertain knowledge areas such as diagnostic biomarkers, genetic polymorphism and co-existing risk factors related to cardiovascular diseases in China. These areas are suggested to be prioritized; new hypotheses need to be verified, while disputes, conflicts, and contradictions need to be settled. Originality/value We provided a novel approach by combining natural language processing and computational linguistics with informetric methods to extract and measure uncertain knowledge from scientific statements.
摘要目的鉴于科学文献的信息过载,人们越来越需要隐藏在自由文本中的可计算生物医学知识。本研究旨在开发一种从科学声明中提取和测量不确定生物医学知识的新方法。设计/方法论/方法以中国心血管研究出版物为样本,我们提取了主-谓语-宾语三元组(SPO三元组)作为知识单元,提取了未知/对冲/冲突的不确定性作为知识背景。我们引入了信息熵(IE)作为潜在的度量标准,以量化在主客体对(SO对)水平上表示的科学知识的认识状态的不确定性。研究结果表明,中国心血管疾病出版物的增长非同寻常,而新的SPO三倍出版物仅略有增长。在用IE评估生物医学知识的不确定性后,我们确定了IE最高的前10个SO对,这意味着认知地位的多元主义。覆盖着不确定性的SO对的视觉呈现提供了生物医学知识集群和心血管研究中竞争主题的全面概述。研究局限性目前的方法没有区分不确定性提示词的特异性和概率。围绕给定三元组的句子数量也可能影响IE的价值。实际意义我们的方法确定了主要的不确定知识领域,如诊断生物标志物、遗传多态性和与中国心血管疾病相关的共存风险因素。建议优先考虑这些领域;新的假设需要验证,而争议、冲突和矛盾需要解决。独创性/价值我们提供了一种新颖的方法,将自然语言处理和计算语言学与信息测量方法相结合,从科学陈述中提取和测量不确定的知识。
{"title":"Extracting and Measuring Uncertain Biomedical Knowledge from Scientific Statements","authors":"Xin Guo, Yuming Chen, Jian Du, Erdan Dong","doi":"10.2478/jdis-2022-0008","DOIUrl":"https://doi.org/10.2478/jdis-2022-0008","url":null,"abstract":"Abstract Purpose Given the information overload of scientific literature, there is an increasing need for computable biomedical knowledge buried in free text. This study aimed to develop a novel approach to extracting and measuring uncertain biomedical knowledge from scientific statements. Design/methodology/approach Taking cardiovascular research publications in China as a sample, we extracted subject–predicate–object triples (SPO triples) as knowledge units and unknown/hedging/conflicting uncertainties as the knowledge context. We introduced information entropy (IE) as potential metric to quantify the uncertainty of epistemic status of scientific knowledge represented at subject-object pairs (SO pairs) levels. Findings The results indicated an extraordinary growth of cardiovascular publications in China while only a modest growth of the novel SPO triples. After evaluating the uncertainty of biomedical knowledge with IE, we identified the Top 10 SO pairs with highest IE, which implied the epistemic status pluralism. Visual presentation of the SO pairs overlaid with uncertainty provided a comprehensive overview of clusters of biomedical knowledge and contending topics in cardiovascular research. Research limitations The current methods didn’t distinguish the specificity and probabilities of uncertainty cue words. The number of sentences surrounding a given triple may also influence the value of IE. Practical implications Our approach identified major uncertain knowledge areas such as diagnostic biomarkers, genetic polymorphism and co-existing risk factors related to cardiovascular diseases in China. These areas are suggested to be prioritized; new hypotheses need to be verified, while disputes, conflicts, and contradictions need to be settled. Originality/value We provided a novel approach by combining natural language processing and computational linguistics with informetric methods to extract and measure uncertain knowledge from scientific statements.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"6 - 30"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47105659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature and Tendency of Technology Transfer in Z-Park Patent Cooperation Network: From the Perspective of Global Optimal Path 园区专利合作网络的技术转移特征与趋势——基于全局最优路径的视角
Pub Date : 2021-11-01 DOI: 10.2478/jdis-2021-0034
Jun Guan, Jingying Xu, Yuanqing Han, Dawei Wang, Lizhi Xing
Abstract Purpose This study aims to provide a new framework for analyzing the path of technology diffusion in the innovation network at the regional level and industrial level respectively, which is conducive to the integration of innovation resources, the coordinated development of innovative subjects, and the improvement of innovation abilities. Design/methodology/approach Based on the Z-Park patent cooperation data, we establish Inter-Enterprise Technology Transfer Network model and apply the concept of Pivotability to describe the key links of technology diffusion and quantify the importance of innovative partnerships. By measuring the topologically structural characteristics in the levels of branch park and the technosphere, this paper demonstrates how technology spreads and promotes overall innovation activities within the innovation network. Findings The results indicate that: (1) Patent cooperation network of the Z-Park displays heterogeneity and the connections between the innovative subjects distribute extremely uneven. (2) Haidian park owns the highest pivotability in the IETTN model, yet the related inter-enterprise patent cooperation is mainly concentrated in its internal, failing to facilitate the technology diffusion across multiple branch parks. (3) Such fields as “electronics and information” and “advanced manufacturing” are prominent in the cross-technosphere cooperation, while fields such as “new energy” and “environmental protection technology” can better promote industrial integration. Research limitations Only the part of the joint patent application is taken into account while establishing the patent cooperation network. The other factors that influence the mechanism of technology diffusion in the innovation network need to be further studied, such as financial capital, market competition, and personnel mobility, etc. Practical implications The findings of this paper will provide useful information and suggestions for the administration and policy-making of high-tech parks. Originality/value The value of this paper is to build a bridge between the massive amount of patent data and the nature of technology diffusion, and to develop a set of tools to analyze the nonlinear relations between innovative subjects.
摘要目的本研究旨在为区域层面和产业层面的创新网络技术扩散路径分析提供一个新的框架,有利于创新资源的整合、创新主体的协调发展和创新能力的提升。基于Z-Park专利合作数据,我们建立了企业间技术转移网络模型,并运用pivot概念描述了技术扩散的关键环节,量化了创新伙伴关系的重要性。本文通过测量分支园区和技术圈层面的拓扑结构特征,论证了技术如何在创新网络中传播和促进整体创新活动。结果表明:(1)园区专利合作网络具有异质性,创新主体之间的联系分布极不均匀;(2)在IETTN模型中,海淀园区的支点性最高,但相关的企业间专利合作主要集中在园区内部,无法促进多个园区之间的技术扩散。(3)“电子信息”、“先进制造”等领域在跨技术圈合作中表现突出,“新能源”、“环保技术”等领域能更好地促进产业融合。在构建专利合作网络时,只考虑了联合专利申请的部分内容。影响创新网络中技术扩散机制的其他因素,如金融资本、市场竞争、人员流动等,还有待进一步研究。本文的研究结果将为高新技术园区的管理和决策提供有益的信息和建议。本文的价值在于在海量的专利数据和技术扩散的本质之间架起一座桥梁,并开发出一套分析创新主体之间非线性关系的工具。
{"title":"Feature and Tendency of Technology Transfer in Z-Park Patent Cooperation Network: From the Perspective of Global Optimal Path","authors":"Jun Guan, Jingying Xu, Yuanqing Han, Dawei Wang, Lizhi Xing","doi":"10.2478/jdis-2021-0034","DOIUrl":"https://doi.org/10.2478/jdis-2021-0034","url":null,"abstract":"Abstract Purpose This study aims to provide a new framework for analyzing the path of technology diffusion in the innovation network at the regional level and industrial level respectively, which is conducive to the integration of innovation resources, the coordinated development of innovative subjects, and the improvement of innovation abilities. Design/methodology/approach Based on the Z-Park patent cooperation data, we establish Inter-Enterprise Technology Transfer Network model and apply the concept of Pivotability to describe the key links of technology diffusion and quantify the importance of innovative partnerships. By measuring the topologically structural characteristics in the levels of branch park and the technosphere, this paper demonstrates how technology spreads and promotes overall innovation activities within the innovation network. Findings The results indicate that: (1) Patent cooperation network of the Z-Park displays heterogeneity and the connections between the innovative subjects distribute extremely uneven. (2) Haidian park owns the highest pivotability in the IETTN model, yet the related inter-enterprise patent cooperation is mainly concentrated in its internal, failing to facilitate the technology diffusion across multiple branch parks. (3) Such fields as “electronics and information” and “advanced manufacturing” are prominent in the cross-technosphere cooperation, while fields such as “new energy” and “environmental protection technology” can better promote industrial integration. Research limitations Only the part of the joint patent application is taken into account while establishing the patent cooperation network. The other factors that influence the mechanism of technology diffusion in the innovation network need to be further studied, such as financial capital, market competition, and personnel mobility, etc. Practical implications The findings of this paper will provide useful information and suggestions for the administration and policy-making of high-tech parks. Originality/value The value of this paper is to build a bridge between the massive amount of patent data and the nature of technology diffusion, and to develop a set of tools to analyze the nonlinear relations between innovative subjects.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"111 - 138"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48133895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Scientific Value Weights more than Being Open or Toll Access: An analysis of the OA advantage in Nature and Science 科学价值比开放和收费更重要:《自然》和《科学》的开放获取优势分析
Pub Date : 2021-09-26 DOI: 10.2478/jdis-2021-0033
Howell Y. Wang, Shelia X. Wei, Cong Cao, Xianwen Wang, F. Y. Ye
Abstract Purpose We attempt to find out whether OA or TA really affects the dissemination of scientific discoveries. Design/methodology/approach We design the indicators, hot-degree, and R-index to indicate a topic OA or TA advantages. First, according to the OA classification of the Web of Science (WoS), we collect data from the WoS by downloading OA and TA articles, letters, and reviews published in Nature and Science during 2010–2019. These papers are divided into three broad disciplines, namely biomedicine, physics, and others. Then, taking a discipline in a journal and using the classical Latent Dirichlet Allocation (LDA) to cluster 100 topics of OA and TA papers respectively, we apply the Pearson correlation coefficient to match the topics of OA and TA, and calculate the hot-degree and R-index of every OA-TA topic pair. Finally, characteristics of the discipline can be presented. In qualitative comparison, we choose some high-quality papers which belong to Nature remarkable papers or Science breakthroughs, and analyze the relations between OA/TA and citation numbers. Findings The result shows that OA hot-degree in biomedicine is significantly greater than that of TA, but significantly less than that of TA in physics. Based on the R-index, it is found that OA advantages exist in biomedicine and TA advantages do in physics. Therefore, the dissemination of average scientific discoveries in all fields is not necessarily affected by OA or TA. However, OA promotes the spread of important scientific discoveries in high-quality papers. Research limitations We lost some citations by ignoring other open sources such as arXiv and bioArxiv. Another limitation came from that Nature employs some strong measures for access-promoting subscription-based articles, on which the boundary between OA and TA became fuzzy. Practical implications It is useful to select hot topics in a set of publications by the hot-degree index. The finding comprehensively reflects the differences of OA and TA in different disciplines, which is a useful reference when researchers choose the publishing way as OA or TA. Originality/value We propose a new method, including two indicators, to explore and measure OA or TA advantages.
我们试图找出OA或TA是否真的影响科学发现的传播。设计/方法/方法我们设计了指标、热度和r指数来表明一个主题OA或TA的优势。首先,根据Web of Science (WoS)的OA分类,通过下载2010-2019年发表在《Nature》和《Science》上的OA和TA文章、信函和综述,收集WoS的数据。这些论文分为三个广泛的学科,即生物医学、物理学和其他。然后,以某一期刊的某一学科为例,分别采用经典的潜狄利克雷分配(Latent Dirichlet Allocation, LDA)对OA和TA的100篇论文的主题进行聚类,应用Pearson相关系数对OA和TA的主题进行匹配,计算每个OA-TA主题对的热点度和r指数。最后,提出了该学科的特点。在定性比较中,我们选择了一些属于Nature卓越论文或Science突破的高质量论文,分析了OA/TA与被引数之间的关系。结果表明,生物医学领域OA热度显著大于TA,而物理领域OA热度显著小于TA。基于r指数,发现OA优势存在于生物医学领域,TA优势存在于物理领域。因此,各个领域的平均科学发现的传播并不一定受到OA或TA的影响。然而,OA促进了重要科学发现在高质量论文中的传播。我们忽略了其他开放源代码,如arXiv和bioArxiv,从而丢失了一些引用。另一个限制来自于《自然》采用了一些强有力的措施来促进基于订阅的文章的访问,这使得OA和TA之间的界限变得模糊。应用热点度指数在一组出版物中选择热点话题是有用的。这一发现全面反映了不同学科OA与TA的差异,对研究者选择OA或TA发表方式具有参考价值。我们提出了一种新的方法,包括两个指标,来探索和衡量OA或TA的优势。
{"title":"Scientific Value Weights more than Being Open or Toll Access: An analysis of the OA advantage in Nature and Science","authors":"Howell Y. Wang, Shelia X. Wei, Cong Cao, Xianwen Wang, F. Y. Ye","doi":"10.2478/jdis-2021-0033","DOIUrl":"https://doi.org/10.2478/jdis-2021-0033","url":null,"abstract":"Abstract Purpose We attempt to find out whether OA or TA really affects the dissemination of scientific discoveries. Design/methodology/approach We design the indicators, hot-degree, and R-index to indicate a topic OA or TA advantages. First, according to the OA classification of the Web of Science (WoS), we collect data from the WoS by downloading OA and TA articles, letters, and reviews published in Nature and Science during 2010–2019. These papers are divided into three broad disciplines, namely biomedicine, physics, and others. Then, taking a discipline in a journal and using the classical Latent Dirichlet Allocation (LDA) to cluster 100 topics of OA and TA papers respectively, we apply the Pearson correlation coefficient to match the topics of OA and TA, and calculate the hot-degree and R-index of every OA-TA topic pair. Finally, characteristics of the discipline can be presented. In qualitative comparison, we choose some high-quality papers which belong to Nature remarkable papers or Science breakthroughs, and analyze the relations between OA/TA and citation numbers. Findings The result shows that OA hot-degree in biomedicine is significantly greater than that of TA, but significantly less than that of TA in physics. Based on the R-index, it is found that OA advantages exist in biomedicine and TA advantages do in physics. Therefore, the dissemination of average scientific discoveries in all fields is not necessarily affected by OA or TA. However, OA promotes the spread of important scientific discoveries in high-quality papers. Research limitations We lost some citations by ignoring other open sources such as arXiv and bioArxiv. Another limitation came from that Nature employs some strong measures for access-promoting subscription-based articles, on which the boundary between OA and TA became fuzzy. Practical implications It is useful to select hot topics in a set of publications by the hot-degree index. The finding comprehensively reflects the differences of OA and TA in different disciplines, which is a useful reference when researchers choose the publishing way as OA or TA. Originality/value We propose a new method, including two indicators, to explore and measure OA or TA advantages.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"62 - 75"},"PeriodicalIF":0.0,"publicationDate":"2021-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44006494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Topic Detection Method Based on Word-attention Networks 一种基于单词注意力网络的主题检测方法
Pub Date : 2021-08-18 DOI: 10.2478/jdis-2021-0032
Zhengwen Xie
Abstract Purpose We proposed a method to represent scientific papers by a complex network, which combines the approaches of neural and complex networks. Design/methodology/approach Its novelty is representing a paper by a word branch, which carries the sequential structure of words in sentences. The branches are generated by the attention mechanism in deep learning models. We connected those branches at the positions of their common words to generate networks, called word-attention networks, and then detect their communities, defined as topics. Findings Those detected topics can carry the sequential structure of words in sentences, represent the intra- and inter-sentential dependencies among words, and reveal the roles of words playing in them by network indexes. Research limitations The parameter setting of our method may depend on practical data. Thus it needs human experience to find proper settings. Practical implications Our method is applied to the papers of the PNAS, where the discipline designations provided by authors are used as the golden labels of papers’ topics. Originality/value This empirical study shows that the proposed method outperforms the Latent Dirichlet Allocation and is more stable.
摘要目的我们提出了一种用复杂网络表示科学论文的方法,该方法结合了神经网络和复杂网络的方法。设计/方法论/方法论它的新颖之处在于用一个词支来表示一篇论文,这个词支承载着句子中单词的顺序结构。分支是由深度学习模型中的注意力机制生成的。我们在这些分支的常用词位置连接它们,以生成网络,称为单词注意力网络,然后检测它们的社区,定义为主题。发现这些检测到的主题可以携带句子中单词的顺序结构,表示单词之间的句内和句间依赖关系,并通过网络索引揭示单词在其中的作用。研究局限性我们方法的参数设置可能取决于实际数据。因此,它需要人类的经验来找到合适的环境。实际意义我们的方法应用于PNAS的论文,作者提供的学科名称被用作论文主题的黄金标签。原创性/价值这项实证研究表明,所提出的方法优于潜在狄利克雷分配,并且更稳定。
{"title":"A Topic Detection Method Based on Word-attention Networks","authors":"Zhengwen Xie","doi":"10.2478/jdis-2021-0032","DOIUrl":"https://doi.org/10.2478/jdis-2021-0032","url":null,"abstract":"Abstract Purpose We proposed a method to represent scientific papers by a complex network, which combines the approaches of neural and complex networks. Design/methodology/approach Its novelty is representing a paper by a word branch, which carries the sequential structure of words in sentences. The branches are generated by the attention mechanism in deep learning models. We connected those branches at the positions of their common words to generate networks, called word-attention networks, and then detect their communities, defined as topics. Findings Those detected topics can carry the sequential structure of words in sentences, represent the intra- and inter-sentential dependencies among words, and reveal the roles of words playing in them by network indexes. Research limitations The parameter setting of our method may depend on practical data. Thus it needs human experience to find proper settings. Practical implications Our method is applied to the papers of the PNAS, where the discipline designations provided by authors are used as the golden labels of papers’ topics. Originality/value This empirical study shows that the proposed method outperforms the Latent Dirichlet Allocation and is more stable.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"139 - 163"},"PeriodicalIF":0.0,"publicationDate":"2021-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46157299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Does Success Breed Success? A Study on the Correlation between Impact Factor and Quantity in Chinese Academic Journals 成功孕育成功吗?中国学术期刊影响因素与数量相关性研究
Pub Date : 2021-08-18 DOI: 10.2478/jdis-2021-0031
Kun-Fu Chen, Xian-tong Ren, Guo-liang Yang, Ailifeire Abudouguli
Abstract Purpose This paper studies the relationship between the impact factor (IF) and the number of journal papers in Chinese publishing system. Design/methodology/approach The method proposed by Huang (2016) is used whereas to analysis the data of Chinese journals in this study. Findings Based on the analysis, we find the following. (1) The average impact factor (AIF) of journals in all disciplines maintained a growth trend from 2007 to 2017. Whether before or after removing outlier journals that may garner publication fees, the IF and its growth rate for most social sciences disciplines are larger than those of most natural sciences disciplines, and the number of journal papers on social sciences disciplines decreased while that of natural sciences disciplines increased from 2007 to 2017. (2) The removal of outlier journals has a greater impact on the relationship between the IF and the number of journal papers in some disciplines such as Geosciences because there may be journals that publish many papers to garner publication fees. (3) The success-breeds-success (SBS) principle is applicable in Chinese journals on natural sciences disciplines but not in Chinese journals on social sciences disciplines, and the relationship is the reverse of the SBS principle in Economics and Education & Educational Research. (4) Based on interviews and surveys, the difference in the relationship between the IF and the number of journal papers for Chinese natural sciences disciplines and Chinese social sciences disciplines may be due to the influence of the international publishing system. Chinese natural sciences journals are losing their academic power while Chinese social sciences journals that are less influenced by the international publishing system are in fierce competition. Research limitation More implications could be found if long-term tracking and comparing the international publishing system with Chinese publishing system are taken. Practical implications It is suggested that researchers from different countries study natural science and social sciences journals in their languages and observe the influence of the international publishing system. Originality/value This paper presents an overview of the relationship between IF and the number of journal papers in Chinese publishing system from 2007 to 2017, provides insights into the relationship in different disciplines in Chinese publishing system, and points out the similarities and differences between Chinese publishing system and international publishing system.
摘要目的研究影响因子(IF)与中国出版系统期刊论文数量的关系。设计/方法论/方法本研究采用黄(2016)提出的方法对中国期刊的数据进行分析。调查结果基于分析,我们发现以下几点。(1) 2007-2017年,各学科期刊的平均影响因子(AIF)均保持增长趋势。从2007年到2017年,无论是在删除可能收取出版费的异常期刊之前还是之后,大多数社会科学学科的IF及其增长率都大于大多数自然科学学科,社会科学学科期刊论文数量减少,而自然科学学科期刊文件数量增加。(2) 剔除异常期刊对国际单项体育联合会与一些学科(如地球科学)期刊论文数量之间的关系有更大的影响,因为可能会有期刊发表许多论文来收取出版费。(3) 成功孕育成功(SBS)原则适用于中国自然科学学科期刊,但不适用于中国社会科学学科期刊。(4) 基于访谈和调查,IF与中国自然科学学科和中国社会科学学科期刊论文数量之间的关系差异可能是由于国际出版体系的影响。中国的自然科学期刊正在失去学术力量,而受国际出版体系影响较小的中国社会科学期刊则处于激烈的竞争中。研究局限如果长期跟踪和比较国际出版体系和中国出版体系,可能会发现更多的启示。实际意义建议来自不同国家的研究人员用各自的语言研究自然科学和社会科学期刊,并观察国际出版体系的影响。原创性/价值本文概述了2007-2017年中国出版系统期刊论文数量与IF之间的关系,深入了解了中国出版系统不同学科的关系,并指出了中国出版体系与国际出版体系的异同。
{"title":"Does Success Breed Success? A Study on the Correlation between Impact Factor and Quantity in Chinese Academic Journals","authors":"Kun-Fu Chen, Xian-tong Ren, Guo-liang Yang, Ailifeire Abudouguli","doi":"10.2478/jdis-2021-0031","DOIUrl":"https://doi.org/10.2478/jdis-2021-0031","url":null,"abstract":"Abstract Purpose This paper studies the relationship between the impact factor (IF) and the number of journal papers in Chinese publishing system. Design/methodology/approach The method proposed by Huang (2016) is used whereas to analysis the data of Chinese journals in this study. Findings Based on the analysis, we find the following. (1) The average impact factor (AIF) of journals in all disciplines maintained a growth trend from 2007 to 2017. Whether before or after removing outlier journals that may garner publication fees, the IF and its growth rate for most social sciences disciplines are larger than those of most natural sciences disciplines, and the number of journal papers on social sciences disciplines decreased while that of natural sciences disciplines increased from 2007 to 2017. (2) The removal of outlier journals has a greater impact on the relationship between the IF and the number of journal papers in some disciplines such as Geosciences because there may be journals that publish many papers to garner publication fees. (3) The success-breeds-success (SBS) principle is applicable in Chinese journals on natural sciences disciplines but not in Chinese journals on social sciences disciplines, and the relationship is the reverse of the SBS principle in Economics and Education & Educational Research. (4) Based on interviews and surveys, the difference in the relationship between the IF and the number of journal papers for Chinese natural sciences disciplines and Chinese social sciences disciplines may be due to the influence of the international publishing system. Chinese natural sciences journals are losing their academic power while Chinese social sciences journals that are less influenced by the international publishing system are in fierce competition. Research limitation More implications could be found if long-term tracking and comparing the international publishing system with Chinese publishing system are taken. Practical implications It is suggested that researchers from different countries study natural science and social sciences journals in their languages and observe the influence of the international publishing system. Originality/value This paper presents an overview of the relationship between IF and the number of journal papers in Chinese publishing system from 2007 to 2017, provides insights into the relationship in different disciplines in Chinese publishing system, and points out the similarities and differences between Chinese publishing system and international publishing system.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"90 - 110"},"PeriodicalIF":0.0,"publicationDate":"2021-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45477999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How Has Covid-19 Affected Published Academic Research? A Content Analysis of Journal Articles Mentioning the Virus 新冠肺炎如何影响已发表的学术研究?关于病毒的期刊文章内容分析
Pub Date : 2021-08-09 DOI: 10.2478/jdis-2021-0030
M. Thelwall, Saheeda Thelwall
Abstract Purpose Methods to tackle Covid-19 have been developed by a wave of biomedical research but the pandemic has also influenced many aspects of society, generating a need for research into its consequences, and potentially changing the way existing topics are investigated. This article investigates the nature of this influence on the wider academic research mission. Design/methodology/approach This article reports an inductive content analysis of 500 randomly selected journal articles mentioning Covid-19, as recorded by the Dimensions scholarly database on 19 March 2021. Covid-19 mentions were coded for the influence of the disease on the research. Findings Whilst two thirds of these articles were about biomedicine (e.g. treatments, vaccines, virology), or health services in response to Covid-19, others covered the pandemic economy, society, safety, or education. In addition, some articles were not about the pandemic but stated that Covid-19 had increased or decreased the value of the reported research or changed the context in which it was conducted. Research limitations The findings relate only to Covid-19 influences declared in published journal articles. Practical implications Research managers and funders should consider whether their current procedures are effective in supporting researchers to address the evolving demands of pandemic societies, particularly in terms of timeliness. Originality/value The results show that although health research dominates the academic response to Covid-19, it is more widely disrupting academic research with new demands and challenges.
摘要目的生物医学研究浪潮开发了应对新冠肺炎的方法,但这场大流行也影响了社会的许多方面,产生了对其后果进行研究的需求,并可能改变现有主题的调查方式。本文探讨了这种影响对更广泛的学术研究任务的性质。设计/方法论/方法本文报告了2021年3月19日Dimensions学术数据库记录的500篇随机选择的提及新冠肺炎的期刊文章的归纳内容分析。提及新冠肺炎是为了说明该疾病对研究的影响。研究结果尽管这些文章中有三分之二是关于生物医学(如治疗、疫苗、病毒学)或应对新冠肺炎的卫生服务,但其他文章则涉及大流行性经济、社会、安全或教育。此外,一些文章不是关于大流行的,但指出新冠肺炎增加或降低了报告研究的价值,或改变了研究的背景。研究局限性研究结果仅与已发表的期刊文章中宣布的新冠肺炎影响有关。实际影响研究管理人员和资助者应考虑他们目前的程序是否有效地支持研究人员应对疫情社会不断变化的需求,特别是在及时性方面。原创/价值研究结果表明,尽管健康研究主导了学术界对新冠肺炎的反应,但它正在更广泛地扰乱学术研究,带来新的需求和挑战。
{"title":"How Has Covid-19 Affected Published Academic Research? A Content Analysis of Journal Articles Mentioning the Virus","authors":"M. Thelwall, Saheeda Thelwall","doi":"10.2478/jdis-2021-0030","DOIUrl":"https://doi.org/10.2478/jdis-2021-0030","url":null,"abstract":"Abstract Purpose Methods to tackle Covid-19 have been developed by a wave of biomedical research but the pandemic has also influenced many aspects of society, generating a need for research into its consequences, and potentially changing the way existing topics are investigated. This article investigates the nature of this influence on the wider academic research mission. Design/methodology/approach This article reports an inductive content analysis of 500 randomly selected journal articles mentioning Covid-19, as recorded by the Dimensions scholarly database on 19 March 2021. Covid-19 mentions were coded for the influence of the disease on the research. Findings Whilst two thirds of these articles were about biomedicine (e.g. treatments, vaccines, virology), or health services in response to Covid-19, others covered the pandemic economy, society, safety, or education. In addition, some articles were not about the pandemic but stated that Covid-19 had increased or decreased the value of the reported research or changed the context in which it was conducted. Research limitations The findings relate only to Covid-19 influences declared in published journal articles. Practical implications Research managers and funders should consider whether their current procedures are effective in supporting researchers to address the evolving demands of pandemic societies, particularly in terms of timeliness. Originality/value The results show that although health research dominates the academic response to Covid-19, it is more widely disrupting academic research with new demands and challenges.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"1 - 12"},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49218241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Substantiality: A Construct Indicating Research Excellence to Measure University Research Performance 实质性:衡量大学科研绩效的研究卓越性结构
Pub Date : 2021-07-25 DOI: 10.2478/jdis-2021-0029
Masashi Shirabe, A. Koizumi
Abstract Purpose The adequacy of research performance of universities or research institutes have often been evaluated and understood in two axes: “quantity” (i.e. size or volume) and “quality” (i.e. what we define here as a measure of excellence that is considered theoretically independent of size or volume, such as clarity in diamond grading). The purpose of this article is, however, to introduce a third construct named “substantiality” (“ATSUMI” in Japanese) of research performance and to demonstrate its importance in evaluating/understanding research universities. Design/methodology/approach We take a two-step approach to demonstrate the effectiveness of the proposed construct by showing that (1) some characteristics of research universities are not well captured by the conventional constructs (“quantity” and “quality”)-based indicators, and (2) the “substantiality” indicators can capture them. Furthermore, by suggesting that “substantiality” indicators appear linked to the reputation that appeared in university reputation rankings by simple statistical analysis, we reveal additional benefits of the construct. Findings We propose a new construct named “substantiality” for measuring research performance. We show that indicators based on “substantiality” can capture important characteristics of research institutes. “Substantiality” indicators demonstrate their “predictive powers” on research reputation. Research limitations The concept of “substantiality” originated from IGO game; therefore the ease/difficulty of accepting the concept is culturally dependent. In other words, while it is easily accepted by people from Japan and other East Asian countries and regions, it might be difficult for researchers from other cultural regions to accept it. Practical implications There is no simple solution to the challenge of evaluating research universities’ research performance. It is vital to combine different types of indicators to understand the excellence of research institutes. Substantiality indicators could be part of such a combination of indicators. Originality/value The authors propose a new construct named substantiality for measuring research performance. They show that indicators based on this construct can capture the important characteristics of research institutes.
摘要目的大学或研究机构研究业绩的充分性通常从两个方面进行评估和理解:“数量”(即规模或体积)和“质量”(即我们在这里定义的卓越程度,在理论上被认为与规模或体积无关,如钻石分级的清晰度)。然而,本文的目的是引入第三个名为“实质性”(日语中为“ATSUMI”)的研究绩效结构,并证明其在评估/理解研究型大学中的重要性。设计/方法论/方法我们采用两步走的方法来证明所提出的结构的有效性,方法是:(1)基于传统结构(“数量”和“质量”)的指标不能很好地捕捉研究型大学的一些特征,以及(2)“实质性”指标可以捕捉这些特征。此外,通过简单的统计分析表明,“实质性”指标似乎与大学声誉排名中出现的声誉有关,我们揭示了该结构的额外好处。研究结果我们提出了一个名为“实质性”的新结构来衡量研究绩效。我们表明,基于“实质性”的指标可以捕捉研究机构的重要特征。“实质性”指标展示了它们对研究声誉的“预测能力”。研究局限性“实体性”概念起源于IGO游戏;因此,接受这个概念的难易程度取决于文化。换句话说,尽管它很容易被日本和其他东亚国家和地区的人接受,但其他文化地区的研究人员可能很难接受。将不同类型的指标结合起来以了解研究机构的卓越性至关重要。实质性指标可以是这种指标组合的一部分。原创性/价值作者提出了一个新的衡量研究绩效的结构,名为实质性。他们表明,基于这一结构的指标可以反映研究机构的重要特征。
{"title":"Substantiality: A Construct Indicating Research Excellence to Measure University Research Performance","authors":"Masashi Shirabe, A. Koizumi","doi":"10.2478/jdis-2021-0029","DOIUrl":"https://doi.org/10.2478/jdis-2021-0029","url":null,"abstract":"Abstract Purpose The adequacy of research performance of universities or research institutes have often been evaluated and understood in two axes: “quantity” (i.e. size or volume) and “quality” (i.e. what we define here as a measure of excellence that is considered theoretically independent of size or volume, such as clarity in diamond grading). The purpose of this article is, however, to introduce a third construct named “substantiality” (“ATSUMI” in Japanese) of research performance and to demonstrate its importance in evaluating/understanding research universities. Design/methodology/approach We take a two-step approach to demonstrate the effectiveness of the proposed construct by showing that (1) some characteristics of research universities are not well captured by the conventional constructs (“quantity” and “quality”)-based indicators, and (2) the “substantiality” indicators can capture them. Furthermore, by suggesting that “substantiality” indicators appear linked to the reputation that appeared in university reputation rankings by simple statistical analysis, we reveal additional benefits of the construct. Findings We propose a new construct named “substantiality” for measuring research performance. We show that indicators based on “substantiality” can capture important characteristics of research institutes. “Substantiality” indicators demonstrate their “predictive powers” on research reputation. Research limitations The concept of “substantiality” originated from IGO game; therefore the ease/difficulty of accepting the concept is culturally dependent. In other words, while it is easily accepted by people from Japan and other East Asian countries and regions, it might be difficult for researchers from other cultural regions to accept it. Practical implications There is no simple solution to the challenge of evaluating research universities’ research performance. It is vital to combine different types of indicators to understand the excellence of research institutes. Substantiality indicators could be part of such a combination of indicators. Originality/value The authors propose a new construct named substantiality for measuring research performance. They show that indicators based on this construct can capture the important characteristics of research institutes.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"76 - 89"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43071142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
New Indicators of the Technological Impact of Scientific Production 科学生产技术影响的新指标
Pub Date : 2021-06-24 DOI: 10.2478/jdis-2021-0028
V. Guerrero-Bote, H. Moed, F. M. Anegón
Abstract Purpose Building upon pioneering work by Francis Narin and others, a new methodological approach to assessing the technological impact of scientific research is presented. Design/methodology/approach It is based on the analysis of citations made in patent families included in the PATSTAT database that is to scientific papers indexed in Scopus. Findings An advanced citation matching procedure is applied to the data in order to construct two indicators of technological impact: on the citing (patent) side, the country/region in which protection is sought and a patent family's propensity to cite scientific papers are taken into account, and on the cited (paper) side, a relative citation rate is defined for patent citations to papers that is similar to the scientific paper-to-paper citation rate in classical bibliometrics. Research limitations The results are limited by the available data, in our case Scopus and PATSTAT, and especially by the lack of standardization of references in patents. This required a matching procedure that is neither trivial nor exact. Practical implications Results at the country/region, document type, and publication age levels are presented. The country/region-level results in particular reveal features that have remained hidden in analyses of straight counts. Especially notable is that the rankings of some Asian countries/regions move upwards when the proposed normalized indicator of technological impact is applied as against the case with straight counts of patent citations to those countries/regions’ published papers. Originality/value In our opinion, the level of sophistication of the indicators proposed in the current paper is unparalleled in the scientific literature, and provides a solid basis for the assessment of the technological impact of scientific research in countries/regions and institutions.
摘要目的在Francis Narin等人开创性工作的基础上,提出了一种评估科学研究技术影响的新方法。设计/方法论/方法它基于对PATSTAT数据库中专利家族引用的分析,该数据库是Scopus中索引的科学论文的引用。研究结果将先进的引文匹配程序应用于数据,以构建技术影响的两个指标:在引用(专利)方面,考虑到寻求保护的国家/地区和专利家族引用科学论文的倾向;在引用(论文)方面,在经典文献计量学中,专利论文引用率定义为与科学论文引用率相似的论文引用率。研究局限性结果受到现有数据的限制,在我们的案例中是Scopus和PATSTAT,尤其是专利中参考文献缺乏标准化。这需要一个既不琐碎也不精确的匹配过程。实际影响介绍了国家/地区、文件类型和出版年龄级别的结果。国家/地区一级的结果尤其揭示了在直接计数分析中仍然隐藏的特征。特别值得注意的是,当应用拟议的技术影响标准化指标时,一些亚洲国家/地区的排名会上升,而不是直接统计这些国家/地区发表论文的专利引用次数。原创性/价值在我们看来,当前论文中提出的指标的复杂程度在科学文献中是无与伦比的,并为评估国家/地区和机构的科学研究的技术影响提供了坚实的基础。
{"title":"New Indicators of the Technological Impact of Scientific Production","authors":"V. Guerrero-Bote, H. Moed, F. M. Anegón","doi":"10.2478/jdis-2021-0028","DOIUrl":"https://doi.org/10.2478/jdis-2021-0028","url":null,"abstract":"Abstract Purpose Building upon pioneering work by Francis Narin and others, a new methodological approach to assessing the technological impact of scientific research is presented. Design/methodology/approach It is based on the analysis of citations made in patent families included in the PATSTAT database that is to scientific papers indexed in Scopus. Findings An advanced citation matching procedure is applied to the data in order to construct two indicators of technological impact: on the citing (patent) side, the country/region in which protection is sought and a patent family's propensity to cite scientific papers are taken into account, and on the cited (paper) side, a relative citation rate is defined for patent citations to papers that is similar to the scientific paper-to-paper citation rate in classical bibliometrics. Research limitations The results are limited by the available data, in our case Scopus and PATSTAT, and especially by the lack of standardization of references in patents. This required a matching procedure that is neither trivial nor exact. Practical implications Results at the country/region, document type, and publication age levels are presented. The country/region-level results in particular reveal features that have remained hidden in analyses of straight counts. Especially notable is that the rankings of some Asian countries/regions move upwards when the proposed normalized indicator of technological impact is applied as against the case with straight counts of patent citations to those countries/regions’ published papers. Originality/value In our opinion, the level of sophistication of the indicators proposed in the current paper is unparalleled in the scientific literature, and provides a solid basis for the assessment of the technological impact of scientific research in countries/regions and institutions.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"36 - 61"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43503165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Embedding-based Detection and Extraction of Research Topics from Academic Documents Using Deep Clustering 基于深度聚类的学术文献研究主题的嵌入检测与提取
Pub Date : 2021-06-01 DOI: 10.2478/jdis-2021-0024
Sahand Vahidnia, A. Abbasi, H. Abbass
Abstract Purpose Detection of research fields or topics and understanding the dynamics help the scientific community in their decisions regarding the establishment of scientific fields. This also helps in having a better collaboration with governments and businesses. This study aims to investigate the development of research fields over time, translating it into a topic detection problem. Design/methodology/approach To achieve the objectives, we propose a modified deep clustering method to detect research trends from the abstracts and titles of academic documents. Document embedding approaches are utilized to transform documents into vector-based representations. The proposed method is evaluated by comparing it with a combination of different embedding and clustering approaches and the classical topic modeling algorithms (i.e. LDA) against a benchmark dataset. A case study is also conducted exploring the evolution of Artificial Intelligence (AI) detecting the research topics or sub-fields in related AI publications. Findings Evaluating the performance of the proposed method using clustering performance indicators reflects that our proposed method outperforms similar approaches against the benchmark dataset. Using the proposed method, we also show how the topics have evolved in the period of the recent 30 years, taking advantage of a keyword extraction method for cluster tagging and labeling, demonstrating the context of the topics. Research limitations We noticed that it is not possible to generalize one solution for all downstream tasks. Hence, it is required to fine-tune or optimize the solutions for each task and even datasets. In addition, interpretation of cluster labels can be subjective and vary based on the readers’ opinions. It is also very difficult to evaluate the labeling techniques, rendering the explanation of the clusters further limited. Practical implications As demonstrated in the case study, we show that in a real-world example, how the proposed method would enable the researchers and reviewers of the academic research to detect, summarize, analyze, and visualize research topics from decades of academic documents. This helps the scientific community and all related organizations in fast and effective analysis of the fields, by establishing and explaining the topics. Originality/value In this study, we introduce a modified and tuned deep embedding clustering coupled with Doc2Vec representations for topic extraction. We also use a concept extraction method as a labeling approach in this study. The effectiveness of the method has been evaluated in a case study of AI publications, where we analyze the AI topics during the past three decades.
摘要目的检测研究领域或主题,了解其动态,有助于科学界对科学领域的建立做出决策。这也有助于与政府和企业进行更好的合作。本研究旨在考察研究领域随时间的发展,并将其转化为主题检测问题。为了实现这一目标,我们提出了一种改进的深度聚类方法,从学术文献的摘要和标题中检测研究趋势。文档嵌入方法用于将文档转换为基于向量的表示。通过对基准数据集与不同嵌入和聚类方法的组合以及经典主题建模算法(即LDA)进行比较,对所提出的方法进行了评估。案例研究还探讨了人工智能(AI)的演变,检测了相关AI出版物中的研究主题或子领域。使用聚类性能指标评估所提出方法的性能反映了我们提出的方法优于针对基准数据集的类似方法。利用该方法,我们还展示了近30年来主题的演变,利用关键字提取方法进行聚类标记和标注,展示了主题的上下文。我们注意到,不可能将一个解决方案推广到所有下游任务。因此,需要对每个任务甚至数据集的解决方案进行微调或优化。此外,对聚类标签的解释可能是主观的,并根据读者的意见而有所不同。对标记技术的评价也非常困难,使得对聚类的解释进一步受到限制。在案例研究中,我们展示了在一个现实世界的例子中,所提出的方法如何使学术研究的研究人员和审稿人能够从数十年的学术文献中检测、总结、分析和可视化研究主题。通过建立和解释主题,这有助于科学界和所有相关组织对领域进行快速有效的分析。在本研究中,我们引入了一种改进和调整的深度嵌入聚类,结合Doc2Vec表示进行主题提取。在本研究中,我们还使用概念提取方法作为标记方法。在人工智能出版物的案例研究中,我们分析了过去三十年中的人工智能主题,对该方法的有效性进行了评估。
{"title":"Embedding-based Detection and Extraction of Research Topics from Academic Documents Using Deep Clustering","authors":"Sahand Vahidnia, A. Abbasi, H. Abbass","doi":"10.2478/jdis-2021-0024","DOIUrl":"https://doi.org/10.2478/jdis-2021-0024","url":null,"abstract":"Abstract Purpose Detection of research fields or topics and understanding the dynamics help the scientific community in their decisions regarding the establishment of scientific fields. This also helps in having a better collaboration with governments and businesses. This study aims to investigate the development of research fields over time, translating it into a topic detection problem. Design/methodology/approach To achieve the objectives, we propose a modified deep clustering method to detect research trends from the abstracts and titles of academic documents. Document embedding approaches are utilized to transform documents into vector-based representations. The proposed method is evaluated by comparing it with a combination of different embedding and clustering approaches and the classical topic modeling algorithms (i.e. LDA) against a benchmark dataset. A case study is also conducted exploring the evolution of Artificial Intelligence (AI) detecting the research topics or sub-fields in related AI publications. Findings Evaluating the performance of the proposed method using clustering performance indicators reflects that our proposed method outperforms similar approaches against the benchmark dataset. Using the proposed method, we also show how the topics have evolved in the period of the recent 30 years, taking advantage of a keyword extraction method for cluster tagging and labeling, demonstrating the context of the topics. Research limitations We noticed that it is not possible to generalize one solution for all downstream tasks. Hence, it is required to fine-tune or optimize the solutions for each task and even datasets. In addition, interpretation of cluster labels can be subjective and vary based on the readers’ opinions. It is also very difficult to evaluate the labeling techniques, rendering the explanation of the clusters further limited. Practical implications As demonstrated in the case study, we show that in a real-world example, how the proposed method would enable the researchers and reviewers of the academic research to detect, summarize, analyze, and visualize research topics from decades of academic documents. This helps the scientific community and all related organizations in fast and effective analysis of the fields, by establishing and explaining the topics. Originality/value In this study, we introduce a modified and tuned deep embedding clustering coupled with Doc2Vec representations for topic extraction. We also use a concept extraction method as a labeling approach in this study. The effectiveness of the method has been evaluated in a case study of AI publications, where we analyze the AI topics during the past three decades.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"99 - 122"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45320814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Extraction and Evaluation of Knowledge Entities from Scientific Documents 科学文献中知识实体的提取与评价
Pub Date : 2021-06-01 DOI: 10.2478/jdis-2021-0025
Chengzhi Zhang, Philipp Mayr, Wei Lu, Yi Zhang
As a core resource of scientific knowledge, academic documents have been frequently used by scholars, especially newcomers to a given field. In the era of big data, scientific documents such as academic articles, patents, technical reports, and webpages are booming. The rapid daily growth of scientific documents indicates that a large amount of knowledge is proposed, improved, and used (Zhang et al., 2021). In scientific documents, knowledge entities (KEs) refer to the knowledge mentioned or cited by authors, such as algorithms, models, theories, datasets and software, diseases, drugs, and genes, reflecting rich resources in diverse problemsolving scenarios (Brack et al., 2020; Ding et al., 2013; Hou et al., 2019; Li et al. 2020). The advancement, improvement, and application of KEs in academic research have played a crucial role in promoting the development of different disciplines. Extracting various KEs from scientific documents can determine whether such KEs are emerging or typical in a specific field, and help scholars gain a comprehensive understanding of these KEs and even the entire research field (Wang & Zhang, 2020). KE extraction is also useful for multiple downstream tasks in information extraction, text mining, natural language processing, information retrieval, digital library research, and so on (Zhang et al., 2021). Particularly for researchers in artificial intelligence (AI), information science, and other related disciplines, discovering methods from large-scale academic literature, and evaluating their performance and influence have become increasingly necessary and meaningful (Hou et al., 2020). There are four kinds of methods of KE extraction in scientific documents. They are manual annotation-based (Chu & Ke, 2017; Tateisi et al., 2014; Zadeh & Schumann, 2016), rule-based (Kondo et al., 2009), statistics-based (Heffernan & Teufel, 2018; Névéol, Wilbur, & Lu, 2011; Okamoto, Shan, & Orihara, 2017), and
学术文献作为科学知识的核心资源,经常被学者,特别是新进入某一领域的学者所使用。在大数据时代,学术文章、专利、技术报告、网页等科学文献蓬勃发展。科学文献的快速增长表明大量的知识被提出、改进和使用(Zhang et al., 2021)。在科学文献中,知识实体(knowledge entities, ke)是指作者提及或引用的知识,如算法、模型、理论、数据集和软件、疾病、药物、基因等,反映了不同问题解决场景下的丰富资源(Brack et al., 2020;丁等人,2013;侯等人,2019;Li et al. 2020)。KEs在学术研究中的发展、完善和应用,对不同学科的发展起到了至关重要的推动作用。从科学文献中提取各种ke,可以判断这些ke在特定领域是新兴的还是典型的,有助于学者对这些ke乃至整个研究领域有一个全面的了解(Wang & Zhang, 2020)。KE提取还可用于信息提取、文本挖掘、自然语言处理、信息检索、数字图书馆研究等多个下游任务(Zhang et al., 2021)。特别是对于人工智能(AI)、信息科学和其他相关学科的研究人员来说,从大规模的学术文献中发现方法并评估其性能和影响力变得越来越必要和有意义(Hou et al., 2020)。科学文献中KE的提取方法有四种。它们是基于手工注释的(Chu & Ke, 2017;Tateisi et al., 2014;Zadeh & Schumann, 2016),基于规则的(Kondo等人,2009),基于统计的(Heffernan & Teufel, 2018;nsamuzi, Wilbur, & Lu, 2011;Okamoto, Shan, & Orihara, 2017),和
{"title":"Extraction and Evaluation of Knowledge Entities from Scientific Documents","authors":"Chengzhi Zhang, Philipp Mayr, Wei Lu, Yi Zhang","doi":"10.2478/jdis-2021-0025","DOIUrl":"https://doi.org/10.2478/jdis-2021-0025","url":null,"abstract":"As a core resource of scientific knowledge, academic documents have been frequently used by scholars, especially newcomers to a given field. In the era of big data, scientific documents such as academic articles, patents, technical reports, and webpages are booming. The rapid daily growth of scientific documents indicates that a large amount of knowledge is proposed, improved, and used (Zhang et al., 2021). In scientific documents, knowledge entities (KEs) refer to the knowledge mentioned or cited by authors, such as algorithms, models, theories, datasets and software, diseases, drugs, and genes, reflecting rich resources in diverse problemsolving scenarios (Brack et al., 2020; Ding et al., 2013; Hou et al., 2019; Li et al. 2020). The advancement, improvement, and application of KEs in academic research have played a crucial role in promoting the development of different disciplines. Extracting various KEs from scientific documents can determine whether such KEs are emerging or typical in a specific field, and help scholars gain a comprehensive understanding of these KEs and even the entire research field (Wang & Zhang, 2020). KE extraction is also useful for multiple downstream tasks in information extraction, text mining, natural language processing, information retrieval, digital library research, and so on (Zhang et al., 2021). Particularly for researchers in artificial intelligence (AI), information science, and other related disciplines, discovering methods from large-scale academic literature, and evaluating their performance and influence have become increasingly necessary and meaningful (Hou et al., 2020). There are four kinds of methods of KE extraction in scientific documents. They are manual annotation-based (Chu & Ke, 2017; Tateisi et al., 2014; Zadeh & Schumann, 2016), rule-based (Kondo et al., 2009), statistics-based (Heffernan & Teufel, 2018; Névéol, Wilbur, & Lu, 2011; Okamoto, Shan, & Orihara, 2017), and","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"1 - 5"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48775327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Journal of data and information science (Warsaw, Poland)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1