Abstract Purpose Given the information overload of scientific literature, there is an increasing need for computable biomedical knowledge buried in free text. This study aimed to develop a novel approach to extracting and measuring uncertain biomedical knowledge from scientific statements. Design/methodology/approach Taking cardiovascular research publications in China as a sample, we extracted subject–predicate–object triples (SPO triples) as knowledge units and unknown/hedging/conflicting uncertainties as the knowledge context. We introduced information entropy (IE) as potential metric to quantify the uncertainty of epistemic status of scientific knowledge represented at subject-object pairs (SO pairs) levels. Findings The results indicated an extraordinary growth of cardiovascular publications in China while only a modest growth of the novel SPO triples. After evaluating the uncertainty of biomedical knowledge with IE, we identified the Top 10 SO pairs with highest IE, which implied the epistemic status pluralism. Visual presentation of the SO pairs overlaid with uncertainty provided a comprehensive overview of clusters of biomedical knowledge and contending topics in cardiovascular research. Research limitations The current methods didn’t distinguish the specificity and probabilities of uncertainty cue words. The number of sentences surrounding a given triple may also influence the value of IE. Practical implications Our approach identified major uncertain knowledge areas such as diagnostic biomarkers, genetic polymorphism and co-existing risk factors related to cardiovascular diseases in China. These areas are suggested to be prioritized; new hypotheses need to be verified, while disputes, conflicts, and contradictions need to be settled. Originality/value We provided a novel approach by combining natural language processing and computational linguistics with informetric methods to extract and measure uncertain knowledge from scientific statements.
{"title":"Extracting and Measuring Uncertain Biomedical Knowledge from Scientific Statements","authors":"Xin Guo, Yuming Chen, Jian Du, Erdan Dong","doi":"10.2478/jdis-2022-0008","DOIUrl":"https://doi.org/10.2478/jdis-2022-0008","url":null,"abstract":"Abstract Purpose Given the information overload of scientific literature, there is an increasing need for computable biomedical knowledge buried in free text. This study aimed to develop a novel approach to extracting and measuring uncertain biomedical knowledge from scientific statements. Design/methodology/approach Taking cardiovascular research publications in China as a sample, we extracted subject–predicate–object triples (SPO triples) as knowledge units and unknown/hedging/conflicting uncertainties as the knowledge context. We introduced information entropy (IE) as potential metric to quantify the uncertainty of epistemic status of scientific knowledge represented at subject-object pairs (SO pairs) levels. Findings The results indicated an extraordinary growth of cardiovascular publications in China while only a modest growth of the novel SPO triples. After evaluating the uncertainty of biomedical knowledge with IE, we identified the Top 10 SO pairs with highest IE, which implied the epistemic status pluralism. Visual presentation of the SO pairs overlaid with uncertainty provided a comprehensive overview of clusters of biomedical knowledge and contending topics in cardiovascular research. Research limitations The current methods didn’t distinguish the specificity and probabilities of uncertainty cue words. The number of sentences surrounding a given triple may also influence the value of IE. Practical implications Our approach identified major uncertain knowledge areas such as diagnostic biomarkers, genetic polymorphism and co-existing risk factors related to cardiovascular diseases in China. These areas are suggested to be prioritized; new hypotheses need to be verified, while disputes, conflicts, and contradictions need to be settled. Originality/value We provided a novel approach by combining natural language processing and computational linguistics with informetric methods to extract and measure uncertain knowledge from scientific statements.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"6 - 30"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47105659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Guan, Jingying Xu, Yuanqing Han, Dawei Wang, Lizhi Xing
Abstract Purpose This study aims to provide a new framework for analyzing the path of technology diffusion in the innovation network at the regional level and industrial level respectively, which is conducive to the integration of innovation resources, the coordinated development of innovative subjects, and the improvement of innovation abilities. Design/methodology/approach Based on the Z-Park patent cooperation data, we establish Inter-Enterprise Technology Transfer Network model and apply the concept of Pivotability to describe the key links of technology diffusion and quantify the importance of innovative partnerships. By measuring the topologically structural characteristics in the levels of branch park and the technosphere, this paper demonstrates how technology spreads and promotes overall innovation activities within the innovation network. Findings The results indicate that: (1) Patent cooperation network of the Z-Park displays heterogeneity and the connections between the innovative subjects distribute extremely uneven. (2) Haidian park owns the highest pivotability in the IETTN model, yet the related inter-enterprise patent cooperation is mainly concentrated in its internal, failing to facilitate the technology diffusion across multiple branch parks. (3) Such fields as “electronics and information” and “advanced manufacturing” are prominent in the cross-technosphere cooperation, while fields such as “new energy” and “environmental protection technology” can better promote industrial integration. Research limitations Only the part of the joint patent application is taken into account while establishing the patent cooperation network. The other factors that influence the mechanism of technology diffusion in the innovation network need to be further studied, such as financial capital, market competition, and personnel mobility, etc. Practical implications The findings of this paper will provide useful information and suggestions for the administration and policy-making of high-tech parks. Originality/value The value of this paper is to build a bridge between the massive amount of patent data and the nature of technology diffusion, and to develop a set of tools to analyze the nonlinear relations between innovative subjects.
{"title":"Feature and Tendency of Technology Transfer in Z-Park Patent Cooperation Network: From the Perspective of Global Optimal Path","authors":"Jun Guan, Jingying Xu, Yuanqing Han, Dawei Wang, Lizhi Xing","doi":"10.2478/jdis-2021-0034","DOIUrl":"https://doi.org/10.2478/jdis-2021-0034","url":null,"abstract":"Abstract Purpose This study aims to provide a new framework for analyzing the path of technology diffusion in the innovation network at the regional level and industrial level respectively, which is conducive to the integration of innovation resources, the coordinated development of innovative subjects, and the improvement of innovation abilities. Design/methodology/approach Based on the Z-Park patent cooperation data, we establish Inter-Enterprise Technology Transfer Network model and apply the concept of Pivotability to describe the key links of technology diffusion and quantify the importance of innovative partnerships. By measuring the topologically structural characteristics in the levels of branch park and the technosphere, this paper demonstrates how technology spreads and promotes overall innovation activities within the innovation network. Findings The results indicate that: (1) Patent cooperation network of the Z-Park displays heterogeneity and the connections between the innovative subjects distribute extremely uneven. (2) Haidian park owns the highest pivotability in the IETTN model, yet the related inter-enterprise patent cooperation is mainly concentrated in its internal, failing to facilitate the technology diffusion across multiple branch parks. (3) Such fields as “electronics and information” and “advanced manufacturing” are prominent in the cross-technosphere cooperation, while fields such as “new energy” and “environmental protection technology” can better promote industrial integration. Research limitations Only the part of the joint patent application is taken into account while establishing the patent cooperation network. The other factors that influence the mechanism of technology diffusion in the innovation network need to be further studied, such as financial capital, market competition, and personnel mobility, etc. Practical implications The findings of this paper will provide useful information and suggestions for the administration and policy-making of high-tech parks. Originality/value The value of this paper is to build a bridge between the massive amount of patent data and the nature of technology diffusion, and to develop a set of tools to analyze the nonlinear relations between innovative subjects.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"111 - 138"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48133895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Howell Y. Wang, Shelia X. Wei, Cong Cao, Xianwen Wang, F. Y. Ye
Abstract Purpose We attempt to find out whether OA or TA really affects the dissemination of scientific discoveries. Design/methodology/approach We design the indicators, hot-degree, and R-index to indicate a topic OA or TA advantages. First, according to the OA classification of the Web of Science (WoS), we collect data from the WoS by downloading OA and TA articles, letters, and reviews published in Nature and Science during 2010–2019. These papers are divided into three broad disciplines, namely biomedicine, physics, and others. Then, taking a discipline in a journal and using the classical Latent Dirichlet Allocation (LDA) to cluster 100 topics of OA and TA papers respectively, we apply the Pearson correlation coefficient to match the topics of OA and TA, and calculate the hot-degree and R-index of every OA-TA topic pair. Finally, characteristics of the discipline can be presented. In qualitative comparison, we choose some high-quality papers which belong to Nature remarkable papers or Science breakthroughs, and analyze the relations between OA/TA and citation numbers. Findings The result shows that OA hot-degree in biomedicine is significantly greater than that of TA, but significantly less than that of TA in physics. Based on the R-index, it is found that OA advantages exist in biomedicine and TA advantages do in physics. Therefore, the dissemination of average scientific discoveries in all fields is not necessarily affected by OA or TA. However, OA promotes the spread of important scientific discoveries in high-quality papers. Research limitations We lost some citations by ignoring other open sources such as arXiv and bioArxiv. Another limitation came from that Nature employs some strong measures for access-promoting subscription-based articles, on which the boundary between OA and TA became fuzzy. Practical implications It is useful to select hot topics in a set of publications by the hot-degree index. The finding comprehensively reflects the differences of OA and TA in different disciplines, which is a useful reference when researchers choose the publishing way as OA or TA. Originality/value We propose a new method, including two indicators, to explore and measure OA or TA advantages.
我们试图找出OA或TA是否真的影响科学发现的传播。设计/方法/方法我们设计了指标、热度和r指数来表明一个主题OA或TA的优势。首先,根据Web of Science (WoS)的OA分类,通过下载2010-2019年发表在《Nature》和《Science》上的OA和TA文章、信函和综述,收集WoS的数据。这些论文分为三个广泛的学科,即生物医学、物理学和其他。然后,以某一期刊的某一学科为例,分别采用经典的潜狄利克雷分配(Latent Dirichlet Allocation, LDA)对OA和TA的100篇论文的主题进行聚类,应用Pearson相关系数对OA和TA的主题进行匹配,计算每个OA-TA主题对的热点度和r指数。最后,提出了该学科的特点。在定性比较中,我们选择了一些属于Nature卓越论文或Science突破的高质量论文,分析了OA/TA与被引数之间的关系。结果表明,生物医学领域OA热度显著大于TA,而物理领域OA热度显著小于TA。基于r指数,发现OA优势存在于生物医学领域,TA优势存在于物理领域。因此,各个领域的平均科学发现的传播并不一定受到OA或TA的影响。然而,OA促进了重要科学发现在高质量论文中的传播。我们忽略了其他开放源代码,如arXiv和bioArxiv,从而丢失了一些引用。另一个限制来自于《自然》采用了一些强有力的措施来促进基于订阅的文章的访问,这使得OA和TA之间的界限变得模糊。应用热点度指数在一组出版物中选择热点话题是有用的。这一发现全面反映了不同学科OA与TA的差异,对研究者选择OA或TA发表方式具有参考价值。我们提出了一种新的方法,包括两个指标,来探索和衡量OA或TA的优势。
{"title":"Scientific Value Weights more than Being Open or Toll Access: An analysis of the OA advantage in Nature and Science","authors":"Howell Y. Wang, Shelia X. Wei, Cong Cao, Xianwen Wang, F. Y. Ye","doi":"10.2478/jdis-2021-0033","DOIUrl":"https://doi.org/10.2478/jdis-2021-0033","url":null,"abstract":"Abstract Purpose We attempt to find out whether OA or TA really affects the dissemination of scientific discoveries. Design/methodology/approach We design the indicators, hot-degree, and R-index to indicate a topic OA or TA advantages. First, according to the OA classification of the Web of Science (WoS), we collect data from the WoS by downloading OA and TA articles, letters, and reviews published in Nature and Science during 2010–2019. These papers are divided into three broad disciplines, namely biomedicine, physics, and others. Then, taking a discipline in a journal and using the classical Latent Dirichlet Allocation (LDA) to cluster 100 topics of OA and TA papers respectively, we apply the Pearson correlation coefficient to match the topics of OA and TA, and calculate the hot-degree and R-index of every OA-TA topic pair. Finally, characteristics of the discipline can be presented. In qualitative comparison, we choose some high-quality papers which belong to Nature remarkable papers or Science breakthroughs, and analyze the relations between OA/TA and citation numbers. Findings The result shows that OA hot-degree in biomedicine is significantly greater than that of TA, but significantly less than that of TA in physics. Based on the R-index, it is found that OA advantages exist in biomedicine and TA advantages do in physics. Therefore, the dissemination of average scientific discoveries in all fields is not necessarily affected by OA or TA. However, OA promotes the spread of important scientific discoveries in high-quality papers. Research limitations We lost some citations by ignoring other open sources such as arXiv and bioArxiv. Another limitation came from that Nature employs some strong measures for access-promoting subscription-based articles, on which the boundary between OA and TA became fuzzy. Practical implications It is useful to select hot topics in a set of publications by the hot-degree index. The finding comprehensively reflects the differences of OA and TA in different disciplines, which is a useful reference when researchers choose the publishing way as OA or TA. Originality/value We propose a new method, including two indicators, to explore and measure OA or TA advantages.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"62 - 75"},"PeriodicalIF":0.0,"publicationDate":"2021-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44006494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Purpose We proposed a method to represent scientific papers by a complex network, which combines the approaches of neural and complex networks. Design/methodology/approach Its novelty is representing a paper by a word branch, which carries the sequential structure of words in sentences. The branches are generated by the attention mechanism in deep learning models. We connected those branches at the positions of their common words to generate networks, called word-attention networks, and then detect their communities, defined as topics. Findings Those detected topics can carry the sequential structure of words in sentences, represent the intra- and inter-sentential dependencies among words, and reveal the roles of words playing in them by network indexes. Research limitations The parameter setting of our method may depend on practical data. Thus it needs human experience to find proper settings. Practical implications Our method is applied to the papers of the PNAS, where the discipline designations provided by authors are used as the golden labels of papers’ topics. Originality/value This empirical study shows that the proposed method outperforms the Latent Dirichlet Allocation and is more stable.
{"title":"A Topic Detection Method Based on Word-attention Networks","authors":"Zhengwen Xie","doi":"10.2478/jdis-2021-0032","DOIUrl":"https://doi.org/10.2478/jdis-2021-0032","url":null,"abstract":"Abstract Purpose We proposed a method to represent scientific papers by a complex network, which combines the approaches of neural and complex networks. Design/methodology/approach Its novelty is representing a paper by a word branch, which carries the sequential structure of words in sentences. The branches are generated by the attention mechanism in deep learning models. We connected those branches at the positions of their common words to generate networks, called word-attention networks, and then detect their communities, defined as topics. Findings Those detected topics can carry the sequential structure of words in sentences, represent the intra- and inter-sentential dependencies among words, and reveal the roles of words playing in them by network indexes. Research limitations The parameter setting of our method may depend on practical data. Thus it needs human experience to find proper settings. Practical implications Our method is applied to the papers of the PNAS, where the discipline designations provided by authors are used as the golden labels of papers’ topics. Originality/value This empirical study shows that the proposed method outperforms the Latent Dirichlet Allocation and is more stable.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"139 - 163"},"PeriodicalIF":0.0,"publicationDate":"2021-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46157299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Purpose This paper studies the relationship between the impact factor (IF) and the number of journal papers in Chinese publishing system. Design/methodology/approach The method proposed by Huang (2016) is used whereas to analysis the data of Chinese journals in this study. Findings Based on the analysis, we find the following. (1) The average impact factor (AIF) of journals in all disciplines maintained a growth trend from 2007 to 2017. Whether before or after removing outlier journals that may garner publication fees, the IF and its growth rate for most social sciences disciplines are larger than those of most natural sciences disciplines, and the number of journal papers on social sciences disciplines decreased while that of natural sciences disciplines increased from 2007 to 2017. (2) The removal of outlier journals has a greater impact on the relationship between the IF and the number of journal papers in some disciplines such as Geosciences because there may be journals that publish many papers to garner publication fees. (3) The success-breeds-success (SBS) principle is applicable in Chinese journals on natural sciences disciplines but not in Chinese journals on social sciences disciplines, and the relationship is the reverse of the SBS principle in Economics and Education & Educational Research. (4) Based on interviews and surveys, the difference in the relationship between the IF and the number of journal papers for Chinese natural sciences disciplines and Chinese social sciences disciplines may be due to the influence of the international publishing system. Chinese natural sciences journals are losing their academic power while Chinese social sciences journals that are less influenced by the international publishing system are in fierce competition. Research limitation More implications could be found if long-term tracking and comparing the international publishing system with Chinese publishing system are taken. Practical implications It is suggested that researchers from different countries study natural science and social sciences journals in their languages and observe the influence of the international publishing system. Originality/value This paper presents an overview of the relationship between IF and the number of journal papers in Chinese publishing system from 2007 to 2017, provides insights into the relationship in different disciplines in Chinese publishing system, and points out the similarities and differences between Chinese publishing system and international publishing system.
{"title":"Does Success Breed Success? A Study on the Correlation between Impact Factor and Quantity in Chinese Academic Journals","authors":"Kun-Fu Chen, Xian-tong Ren, Guo-liang Yang, Ailifeire Abudouguli","doi":"10.2478/jdis-2021-0031","DOIUrl":"https://doi.org/10.2478/jdis-2021-0031","url":null,"abstract":"Abstract Purpose This paper studies the relationship between the impact factor (IF) and the number of journal papers in Chinese publishing system. Design/methodology/approach The method proposed by Huang (2016) is used whereas to analysis the data of Chinese journals in this study. Findings Based on the analysis, we find the following. (1) The average impact factor (AIF) of journals in all disciplines maintained a growth trend from 2007 to 2017. Whether before or after removing outlier journals that may garner publication fees, the IF and its growth rate for most social sciences disciplines are larger than those of most natural sciences disciplines, and the number of journal papers on social sciences disciplines decreased while that of natural sciences disciplines increased from 2007 to 2017. (2) The removal of outlier journals has a greater impact on the relationship between the IF and the number of journal papers in some disciplines such as Geosciences because there may be journals that publish many papers to garner publication fees. (3) The success-breeds-success (SBS) principle is applicable in Chinese journals on natural sciences disciplines but not in Chinese journals on social sciences disciplines, and the relationship is the reverse of the SBS principle in Economics and Education & Educational Research. (4) Based on interviews and surveys, the difference in the relationship between the IF and the number of journal papers for Chinese natural sciences disciplines and Chinese social sciences disciplines may be due to the influence of the international publishing system. Chinese natural sciences journals are losing their academic power while Chinese social sciences journals that are less influenced by the international publishing system are in fierce competition. Research limitation More implications could be found if long-term tracking and comparing the international publishing system with Chinese publishing system are taken. Practical implications It is suggested that researchers from different countries study natural science and social sciences journals in their languages and observe the influence of the international publishing system. Originality/value This paper presents an overview of the relationship between IF and the number of journal papers in Chinese publishing system from 2007 to 2017, provides insights into the relationship in different disciplines in Chinese publishing system, and points out the similarities and differences between Chinese publishing system and international publishing system.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"90 - 110"},"PeriodicalIF":0.0,"publicationDate":"2021-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45477999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Purpose Methods to tackle Covid-19 have been developed by a wave of biomedical research but the pandemic has also influenced many aspects of society, generating a need for research into its consequences, and potentially changing the way existing topics are investigated. This article investigates the nature of this influence on the wider academic research mission. Design/methodology/approach This article reports an inductive content analysis of 500 randomly selected journal articles mentioning Covid-19, as recorded by the Dimensions scholarly database on 19 March 2021. Covid-19 mentions were coded for the influence of the disease on the research. Findings Whilst two thirds of these articles were about biomedicine (e.g. treatments, vaccines, virology), or health services in response to Covid-19, others covered the pandemic economy, society, safety, or education. In addition, some articles were not about the pandemic but stated that Covid-19 had increased or decreased the value of the reported research or changed the context in which it was conducted. Research limitations The findings relate only to Covid-19 influences declared in published journal articles. Practical implications Research managers and funders should consider whether their current procedures are effective in supporting researchers to address the evolving demands of pandemic societies, particularly in terms of timeliness. Originality/value The results show that although health research dominates the academic response to Covid-19, it is more widely disrupting academic research with new demands and challenges.
{"title":"How Has Covid-19 Affected Published Academic Research? A Content Analysis of Journal Articles Mentioning the Virus","authors":"M. Thelwall, Saheeda Thelwall","doi":"10.2478/jdis-2021-0030","DOIUrl":"https://doi.org/10.2478/jdis-2021-0030","url":null,"abstract":"Abstract Purpose Methods to tackle Covid-19 have been developed by a wave of biomedical research but the pandemic has also influenced many aspects of society, generating a need for research into its consequences, and potentially changing the way existing topics are investigated. This article investigates the nature of this influence on the wider academic research mission. Design/methodology/approach This article reports an inductive content analysis of 500 randomly selected journal articles mentioning Covid-19, as recorded by the Dimensions scholarly database on 19 March 2021. Covid-19 mentions were coded for the influence of the disease on the research. Findings Whilst two thirds of these articles were about biomedicine (e.g. treatments, vaccines, virology), or health services in response to Covid-19, others covered the pandemic economy, society, safety, or education. In addition, some articles were not about the pandemic but stated that Covid-19 had increased or decreased the value of the reported research or changed the context in which it was conducted. Research limitations The findings relate only to Covid-19 influences declared in published journal articles. Practical implications Research managers and funders should consider whether their current procedures are effective in supporting researchers to address the evolving demands of pandemic societies, particularly in terms of timeliness. Originality/value The results show that although health research dominates the academic response to Covid-19, it is more widely disrupting academic research with new demands and challenges.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"1 - 12"},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49218241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Purpose The adequacy of research performance of universities or research institutes have often been evaluated and understood in two axes: “quantity” (i.e. size or volume) and “quality” (i.e. what we define here as a measure of excellence that is considered theoretically independent of size or volume, such as clarity in diamond grading). The purpose of this article is, however, to introduce a third construct named “substantiality” (“ATSUMI” in Japanese) of research performance and to demonstrate its importance in evaluating/understanding research universities. Design/methodology/approach We take a two-step approach to demonstrate the effectiveness of the proposed construct by showing that (1) some characteristics of research universities are not well captured by the conventional constructs (“quantity” and “quality”)-based indicators, and (2) the “substantiality” indicators can capture them. Furthermore, by suggesting that “substantiality” indicators appear linked to the reputation that appeared in university reputation rankings by simple statistical analysis, we reveal additional benefits of the construct. Findings We propose a new construct named “substantiality” for measuring research performance. We show that indicators based on “substantiality” can capture important characteristics of research institutes. “Substantiality” indicators demonstrate their “predictive powers” on research reputation. Research limitations The concept of “substantiality” originated from IGO game; therefore the ease/difficulty of accepting the concept is culturally dependent. In other words, while it is easily accepted by people from Japan and other East Asian countries and regions, it might be difficult for researchers from other cultural regions to accept it. Practical implications There is no simple solution to the challenge of evaluating research universities’ research performance. It is vital to combine different types of indicators to understand the excellence of research institutes. Substantiality indicators could be part of such a combination of indicators. Originality/value The authors propose a new construct named substantiality for measuring research performance. They show that indicators based on this construct can capture the important characteristics of research institutes.
{"title":"Substantiality: A Construct Indicating Research Excellence to Measure University Research Performance","authors":"Masashi Shirabe, A. Koizumi","doi":"10.2478/jdis-2021-0029","DOIUrl":"https://doi.org/10.2478/jdis-2021-0029","url":null,"abstract":"Abstract Purpose The adequacy of research performance of universities or research institutes have often been evaluated and understood in two axes: “quantity” (i.e. size or volume) and “quality” (i.e. what we define here as a measure of excellence that is considered theoretically independent of size or volume, such as clarity in diamond grading). The purpose of this article is, however, to introduce a third construct named “substantiality” (“ATSUMI” in Japanese) of research performance and to demonstrate its importance in evaluating/understanding research universities. Design/methodology/approach We take a two-step approach to demonstrate the effectiveness of the proposed construct by showing that (1) some characteristics of research universities are not well captured by the conventional constructs (“quantity” and “quality”)-based indicators, and (2) the “substantiality” indicators can capture them. Furthermore, by suggesting that “substantiality” indicators appear linked to the reputation that appeared in university reputation rankings by simple statistical analysis, we reveal additional benefits of the construct. Findings We propose a new construct named “substantiality” for measuring research performance. We show that indicators based on “substantiality” can capture important characteristics of research institutes. “Substantiality” indicators demonstrate their “predictive powers” on research reputation. Research limitations The concept of “substantiality” originated from IGO game; therefore the ease/difficulty of accepting the concept is culturally dependent. In other words, while it is easily accepted by people from Japan and other East Asian countries and regions, it might be difficult for researchers from other cultural regions to accept it. Practical implications There is no simple solution to the challenge of evaluating research universities’ research performance. It is vital to combine different types of indicators to understand the excellence of research institutes. Substantiality indicators could be part of such a combination of indicators. Originality/value The authors propose a new construct named substantiality for measuring research performance. They show that indicators based on this construct can capture the important characteristics of research institutes.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"76 - 89"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43071142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Purpose Building upon pioneering work by Francis Narin and others, a new methodological approach to assessing the technological impact of scientific research is presented. Design/methodology/approach It is based on the analysis of citations made in patent families included in the PATSTAT database that is to scientific papers indexed in Scopus. Findings An advanced citation matching procedure is applied to the data in order to construct two indicators of technological impact: on the citing (patent) side, the country/region in which protection is sought and a patent family's propensity to cite scientific papers are taken into account, and on the cited (paper) side, a relative citation rate is defined for patent citations to papers that is similar to the scientific paper-to-paper citation rate in classical bibliometrics. Research limitations The results are limited by the available data, in our case Scopus and PATSTAT, and especially by the lack of standardization of references in patents. This required a matching procedure that is neither trivial nor exact. Practical implications Results at the country/region, document type, and publication age levels are presented. The country/region-level results in particular reveal features that have remained hidden in analyses of straight counts. Especially notable is that the rankings of some Asian countries/regions move upwards when the proposed normalized indicator of technological impact is applied as against the case with straight counts of patent citations to those countries/regions’ published papers. Originality/value In our opinion, the level of sophistication of the indicators proposed in the current paper is unparalleled in the scientific literature, and provides a solid basis for the assessment of the technological impact of scientific research in countries/regions and institutions.
{"title":"New Indicators of the Technological Impact of Scientific Production","authors":"V. Guerrero-Bote, H. Moed, F. M. Anegón","doi":"10.2478/jdis-2021-0028","DOIUrl":"https://doi.org/10.2478/jdis-2021-0028","url":null,"abstract":"Abstract Purpose Building upon pioneering work by Francis Narin and others, a new methodological approach to assessing the technological impact of scientific research is presented. Design/methodology/approach It is based on the analysis of citations made in patent families included in the PATSTAT database that is to scientific papers indexed in Scopus. Findings An advanced citation matching procedure is applied to the data in order to construct two indicators of technological impact: on the citing (patent) side, the country/region in which protection is sought and a patent family's propensity to cite scientific papers are taken into account, and on the cited (paper) side, a relative citation rate is defined for patent citations to papers that is similar to the scientific paper-to-paper citation rate in classical bibliometrics. Research limitations The results are limited by the available data, in our case Scopus and PATSTAT, and especially by the lack of standardization of references in patents. This required a matching procedure that is neither trivial nor exact. Practical implications Results at the country/region, document type, and publication age levels are presented. The country/region-level results in particular reveal features that have remained hidden in analyses of straight counts. Especially notable is that the rankings of some Asian countries/regions move upwards when the proposed normalized indicator of technological impact is applied as against the case with straight counts of patent citations to those countries/regions’ published papers. Originality/value In our opinion, the level of sophistication of the indicators proposed in the current paper is unparalleled in the scientific literature, and provides a solid basis for the assessment of the technological impact of scientific research in countries/regions and institutions.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"36 - 61"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43503165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Purpose Detection of research fields or topics and understanding the dynamics help the scientific community in their decisions regarding the establishment of scientific fields. This also helps in having a better collaboration with governments and businesses. This study aims to investigate the development of research fields over time, translating it into a topic detection problem. Design/methodology/approach To achieve the objectives, we propose a modified deep clustering method to detect research trends from the abstracts and titles of academic documents. Document embedding approaches are utilized to transform documents into vector-based representations. The proposed method is evaluated by comparing it with a combination of different embedding and clustering approaches and the classical topic modeling algorithms (i.e. LDA) against a benchmark dataset. A case study is also conducted exploring the evolution of Artificial Intelligence (AI) detecting the research topics or sub-fields in related AI publications. Findings Evaluating the performance of the proposed method using clustering performance indicators reflects that our proposed method outperforms similar approaches against the benchmark dataset. Using the proposed method, we also show how the topics have evolved in the period of the recent 30 years, taking advantage of a keyword extraction method for cluster tagging and labeling, demonstrating the context of the topics. Research limitations We noticed that it is not possible to generalize one solution for all downstream tasks. Hence, it is required to fine-tune or optimize the solutions for each task and even datasets. In addition, interpretation of cluster labels can be subjective and vary based on the readers’ opinions. It is also very difficult to evaluate the labeling techniques, rendering the explanation of the clusters further limited. Practical implications As demonstrated in the case study, we show that in a real-world example, how the proposed method would enable the researchers and reviewers of the academic research to detect, summarize, analyze, and visualize research topics from decades of academic documents. This helps the scientific community and all related organizations in fast and effective analysis of the fields, by establishing and explaining the topics. Originality/value In this study, we introduce a modified and tuned deep embedding clustering coupled with Doc2Vec representations for topic extraction. We also use a concept extraction method as a labeling approach in this study. The effectiveness of the method has been evaluated in a case study of AI publications, where we analyze the AI topics during the past three decades.
{"title":"Embedding-based Detection and Extraction of Research Topics from Academic Documents Using Deep Clustering","authors":"Sahand Vahidnia, A. Abbasi, H. Abbass","doi":"10.2478/jdis-2021-0024","DOIUrl":"https://doi.org/10.2478/jdis-2021-0024","url":null,"abstract":"Abstract Purpose Detection of research fields or topics and understanding the dynamics help the scientific community in their decisions regarding the establishment of scientific fields. This also helps in having a better collaboration with governments and businesses. This study aims to investigate the development of research fields over time, translating it into a topic detection problem. Design/methodology/approach To achieve the objectives, we propose a modified deep clustering method to detect research trends from the abstracts and titles of academic documents. Document embedding approaches are utilized to transform documents into vector-based representations. The proposed method is evaluated by comparing it with a combination of different embedding and clustering approaches and the classical topic modeling algorithms (i.e. LDA) against a benchmark dataset. A case study is also conducted exploring the evolution of Artificial Intelligence (AI) detecting the research topics or sub-fields in related AI publications. Findings Evaluating the performance of the proposed method using clustering performance indicators reflects that our proposed method outperforms similar approaches against the benchmark dataset. Using the proposed method, we also show how the topics have evolved in the period of the recent 30 years, taking advantage of a keyword extraction method for cluster tagging and labeling, demonstrating the context of the topics. Research limitations We noticed that it is not possible to generalize one solution for all downstream tasks. Hence, it is required to fine-tune or optimize the solutions for each task and even datasets. In addition, interpretation of cluster labels can be subjective and vary based on the readers’ opinions. It is also very difficult to evaluate the labeling techniques, rendering the explanation of the clusters further limited. Practical implications As demonstrated in the case study, we show that in a real-world example, how the proposed method would enable the researchers and reviewers of the academic research to detect, summarize, analyze, and visualize research topics from decades of academic documents. This helps the scientific community and all related organizations in fast and effective analysis of the fields, by establishing and explaining the topics. Originality/value In this study, we introduce a modified and tuned deep embedding clustering coupled with Doc2Vec representations for topic extraction. We also use a concept extraction method as a labeling approach in this study. The effectiveness of the method has been evaluated in a case study of AI publications, where we analyze the AI topics during the past three decades.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"99 - 122"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45320814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As a core resource of scientific knowledge, academic documents have been frequently used by scholars, especially newcomers to a given field. In the era of big data, scientific documents such as academic articles, patents, technical reports, and webpages are booming. The rapid daily growth of scientific documents indicates that a large amount of knowledge is proposed, improved, and used (Zhang et al., 2021). In scientific documents, knowledge entities (KEs) refer to the knowledge mentioned or cited by authors, such as algorithms, models, theories, datasets and software, diseases, drugs, and genes, reflecting rich resources in diverse problemsolving scenarios (Brack et al., 2020; Ding et al., 2013; Hou et al., 2019; Li et al. 2020). The advancement, improvement, and application of KEs in academic research have played a crucial role in promoting the development of different disciplines. Extracting various KEs from scientific documents can determine whether such KEs are emerging or typical in a specific field, and help scholars gain a comprehensive understanding of these KEs and even the entire research field (Wang & Zhang, 2020). KE extraction is also useful for multiple downstream tasks in information extraction, text mining, natural language processing, information retrieval, digital library research, and so on (Zhang et al., 2021). Particularly for researchers in artificial intelligence (AI), information science, and other related disciplines, discovering methods from large-scale academic literature, and evaluating their performance and influence have become increasingly necessary and meaningful (Hou et al., 2020). There are four kinds of methods of KE extraction in scientific documents. They are manual annotation-based (Chu & Ke, 2017; Tateisi et al., 2014; Zadeh & Schumann, 2016), rule-based (Kondo et al., 2009), statistics-based (Heffernan & Teufel, 2018; Névéol, Wilbur, & Lu, 2011; Okamoto, Shan, & Orihara, 2017), and
学术文献作为科学知识的核心资源,经常被学者,特别是新进入某一领域的学者所使用。在大数据时代,学术文章、专利、技术报告、网页等科学文献蓬勃发展。科学文献的快速增长表明大量的知识被提出、改进和使用(Zhang et al., 2021)。在科学文献中,知识实体(knowledge entities, ke)是指作者提及或引用的知识,如算法、模型、理论、数据集和软件、疾病、药物、基因等,反映了不同问题解决场景下的丰富资源(Brack et al., 2020;丁等人,2013;侯等人,2019;Li et al. 2020)。KEs在学术研究中的发展、完善和应用,对不同学科的发展起到了至关重要的推动作用。从科学文献中提取各种ke,可以判断这些ke在特定领域是新兴的还是典型的,有助于学者对这些ke乃至整个研究领域有一个全面的了解(Wang & Zhang, 2020)。KE提取还可用于信息提取、文本挖掘、自然语言处理、信息检索、数字图书馆研究等多个下游任务(Zhang et al., 2021)。特别是对于人工智能(AI)、信息科学和其他相关学科的研究人员来说,从大规模的学术文献中发现方法并评估其性能和影响力变得越来越必要和有意义(Hou et al., 2020)。科学文献中KE的提取方法有四种。它们是基于手工注释的(Chu & Ke, 2017;Tateisi et al., 2014;Zadeh & Schumann, 2016),基于规则的(Kondo等人,2009),基于统计的(Heffernan & Teufel, 2018;nsamuzi, Wilbur, & Lu, 2011;Okamoto, Shan, & Orihara, 2017),和
{"title":"Extraction and Evaluation of Knowledge Entities from Scientific Documents","authors":"Chengzhi Zhang, Philipp Mayr, Wei Lu, Yi Zhang","doi":"10.2478/jdis-2021-0025","DOIUrl":"https://doi.org/10.2478/jdis-2021-0025","url":null,"abstract":"As a core resource of scientific knowledge, academic documents have been frequently used by scholars, especially newcomers to a given field. In the era of big data, scientific documents such as academic articles, patents, technical reports, and webpages are booming. The rapid daily growth of scientific documents indicates that a large amount of knowledge is proposed, improved, and used (Zhang et al., 2021). In scientific documents, knowledge entities (KEs) refer to the knowledge mentioned or cited by authors, such as algorithms, models, theories, datasets and software, diseases, drugs, and genes, reflecting rich resources in diverse problemsolving scenarios (Brack et al., 2020; Ding et al., 2013; Hou et al., 2019; Li et al. 2020). The advancement, improvement, and application of KEs in academic research have played a crucial role in promoting the development of different disciplines. Extracting various KEs from scientific documents can determine whether such KEs are emerging or typical in a specific field, and help scholars gain a comprehensive understanding of these KEs and even the entire research field (Wang & Zhang, 2020). KE extraction is also useful for multiple downstream tasks in information extraction, text mining, natural language processing, information retrieval, digital library research, and so on (Zhang et al., 2021). Particularly for researchers in artificial intelligence (AI), information science, and other related disciplines, discovering methods from large-scale academic literature, and evaluating their performance and influence have become increasingly necessary and meaningful (Hou et al., 2020). There are four kinds of methods of KE extraction in scientific documents. They are manual annotation-based (Chu & Ke, 2017; Tateisi et al., 2014; Zadeh & Schumann, 2016), rule-based (Kondo et al., 2009), statistics-based (Heffernan & Teufel, 2018; Névéol, Wilbur, & Lu, 2011; Okamoto, Shan, & Orihara, 2017), and","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"1 - 5"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48775327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}