Purpose Many science, technology and innovation (STI) resources are attached with several different labels. To assign automatically the resulting labels to an interested instance, many approaches with good performance on the benchmark datasets have been proposed for multilabel classification task in the literature. Furthermore, several open-source tools implementing these approaches have also been developed. However, the characteristics of real-world multilabel patent and publication datasets are not completely in line with those of benchmark ones. Therefore, the main purpose of this paper is to evaluate comprehensively seven multi-label classification methods on real-world datasets. Design/methodology/approach Three real-world datasets (Biological-Sciences, Health-Sciences, and USPTO) from SciGraph and USPTO database are constructed. Seven multilabel classification methods with tuned parameters (dependency-LDA, ML<jats:italic>k</jats:italic>NN, LabelPowerset, RA<jats:italic>k</jats:italic>EL, TextCNN, TexRNN, and TextRCNN) are comprehensively compared on these three real-world datasets. To evaluate the performance, the study adopts three classification-based metrics: Macro-F1, Micro-F1, and Hamming Loss. Findings The TextCNN and TextRCNN models show obvious superiority on small-scale datasets with more complex hierarchical structure of labels and more balanced documentlabel distribution in terms of macro-F1, micro-F1 and Hamming Loss. The ML<jats:italic>k</jats:italic>NN method works better on the larger-scale dataset with more unbalanced document-label distribution. Research limitations Three real-world datasets differ in the following aspects: statement, data quality, and purposes. Additionally, open-source tools designed for multi-label classification also have intrinsic differences in their approaches for data processing and feature selection, which in turn impacts the performance of a multi-label classification approach. In the near future, we will enhance experimental precision and reinforce the validity of conclusions by employing more rigorous control over variables through introducing expanded parameter settings. Practical implications The observed Macro F1 and Micro F1 scores on real-world datasets typically fall short of those achieved on benchmark datasets, underscoring the complexity of real-world multi-label classification tasks. Approaches leveraging deep learning techniques offer promising solutions by accommodating the hierarchical relationships and interdependencies among labels. With ongoing enhancements in deep learning algorithms and large-scale models, it is expected that the efficacy of multi-label classification tasks will be significantly improved, reaching a level of practical utility in the foreseeable future. Originality/value (1) Seven multi-label classification methods are comprehensively compared on three real-world datasets. (2) The TextCNN and TextRCNN models perform better on small-scale datasets with more compl
{"title":"Performance evaluation of seven multi-label classification methods on real-world patent and publication datasets","authors":"Shuo Xu, Yuefu Zhang, Xin An, Sainan Pi","doi":"10.2478/jdis-2024-0014","DOIUrl":"https://doi.org/10.2478/jdis-2024-0014","url":null,"abstract":"Purpose Many science, technology and innovation (STI) resources are attached with several different labels. To assign automatically the resulting labels to an interested instance, many approaches with good performance on the benchmark datasets have been proposed for multilabel classification task in the literature. Furthermore, several open-source tools implementing these approaches have also been developed. However, the characteristics of real-world multilabel patent and publication datasets are not completely in line with those of benchmark ones. Therefore, the main purpose of this paper is to evaluate comprehensively seven multi-label classification methods on real-world datasets. Design/methodology/approach Three real-world datasets (Biological-Sciences, Health-Sciences, and USPTO) from SciGraph and USPTO database are constructed. Seven multilabel classification methods with tuned parameters (dependency-LDA, ML<jats:italic>k</jats:italic>NN, LabelPowerset, RA<jats:italic>k</jats:italic>EL, TextCNN, TexRNN, and TextRCNN) are comprehensively compared on these three real-world datasets. To evaluate the performance, the study adopts three classification-based metrics: Macro-F1, Micro-F1, and Hamming Loss. Findings The TextCNN and TextRCNN models show obvious superiority on small-scale datasets with more complex hierarchical structure of labels and more balanced documentlabel distribution in terms of macro-F1, micro-F1 and Hamming Loss. The ML<jats:italic>k</jats:italic>NN method works better on the larger-scale dataset with more unbalanced document-label distribution. Research limitations Three real-world datasets differ in the following aspects: statement, data quality, and purposes. Additionally, open-source tools designed for multi-label classification also have intrinsic differences in their approaches for data processing and feature selection, which in turn impacts the performance of a multi-label classification approach. In the near future, we will enhance experimental precision and reinforce the validity of conclusions by employing more rigorous control over variables through introducing expanded parameter settings. Practical implications The observed Macro F1 and Micro F1 scores on real-world datasets typically fall short of those achieved on benchmark datasets, underscoring the complexity of real-world multi-label classification tasks. Approaches leveraging deep learning techniques offer promising solutions by accommodating the hierarchical relationships and interdependencies among labels. With ongoing enhancements in deep learning algorithms and large-scale models, it is expected that the efficacy of multi-label classification tasks will be significantly improved, reaching a level of practical utility in the foreseeable future. Originality/value (1) Seven multi-label classification methods are comprehensively compared on three real-world datasets. (2) The TextCNN and TextRCNN models perform better on small-scale datasets with more compl","PeriodicalId":44622,"journal":{"name":"Journal of Data and Information Science","volume":"66 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141165897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose Assess whether ChatGPT 4.0 is accurate enough to perform research evaluations on journal articles to automate this time-consuming task. Design/methodology/approach Test the extent to which ChatGPT-4 can assess the quality of journal articles using a case study of the published scoring guidelines of the UK Research Excellence Framework (REF) 2021 to create a research evaluation ChatGPT. This was applied to 51 of my own articles and compared against my own quality judgements. Findings ChatGPT-4 can produce plausible document summaries and quality evaluation rationales that match the REF criteria. Its overall scores have weak correlations with my self-evaluation scores of the same documents (averaging r=0.281 over 15 iterations, with 8 being statistically significantly different from 0). In contrast, the average scores from the 15 iterations produced a statistically significant positive correlation of 0.509. Thus, averaging scores from multiple ChatGPT-4 rounds seems more effective than individual scores. The positive correlation may be due to ChatGPT being able to extract the author’s significance, rigour, and originality claims from inside each paper. If my weakest articles are removed, then the correlation with average scores (r=0.200) falls below statistical significance, suggesting that ChatGPT struggles to make fine-grained evaluations. Research limitations The data is self-evaluations of a convenience sample of articles from one academic in one field. Practical implications Overall, ChatGPT does not yet seem to be accurate enough to be trusted for any formal or informal research quality evaluation tasks. Research evaluators, including journal editors, should therefore take steps to control its use. Originality/value This is the first published attempt at post-publication expert review accuracy testing for ChatGPT.
{"title":"Can ChatGPT evaluate research quality?","authors":"Mike Thelwall","doi":"10.2478/jdis-2024-0013","DOIUrl":"https://doi.org/10.2478/jdis-2024-0013","url":null,"abstract":"Purpose Assess whether ChatGPT 4.0 is accurate enough to perform research evaluations on journal articles to automate this time-consuming task. Design/methodology/approach Test the extent to which ChatGPT-4 can assess the quality of journal articles using a case study of the published scoring guidelines of the UK Research Excellence Framework (REF) 2021 to create a research evaluation ChatGPT. This was applied to 51 of my own articles and compared against my own quality judgements. Findings ChatGPT-4 can produce plausible document summaries and quality evaluation rationales that match the REF criteria. Its overall scores have weak correlations with my self-evaluation scores of the same documents (averaging r=0.281 over 15 iterations, with 8 being statistically significantly different from 0). In contrast, the average scores from the 15 iterations produced a statistically significant positive correlation of 0.509. Thus, averaging scores from multiple ChatGPT-4 rounds seems more effective than individual scores. The positive correlation may be due to ChatGPT being able to extract the author’s significance, rigour, and originality claims from inside each paper. If my weakest articles are removed, then the correlation with average scores (r=0.200) falls below statistical significance, suggesting that ChatGPT struggles to make fine-grained evaluations. Research limitations The data is self-evaluations of a convenience sample of articles from one academic in one field. Practical implications Overall, ChatGPT does not yet seem to be accurate enough to be trusted for any formal or informal research quality evaluation tasks. Research evaluators, including journal editors, should therefore take steps to control its use. Originality/value This is the first published attempt at post-publication expert review accuracy testing for ChatGPT.","PeriodicalId":44622,"journal":{"name":"Journal of Data and Information Science","volume":"8 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140827846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose The notable increase in retraction papers has attracted considerable attention from diverse stakeholders. Various sources are now offering information related to research integrity, including concerns voiced on social media, disclosed lists of paper mills, and retraction notices accessible through journal websites. However, despite the availability of such resources, there remains a lack of a unified platform to consolidate this information, thereby hindering efficient searching and cross-referencing. Thus, it is imperative to develop a comprehensive platform for retracted papers and related concerns. This article aims to introduce “Amend,” a platform designed to integrate information on research integrity from diverse sources. Design/methodology/approach The Amend platform consolidates concerns and lists of problematic articles sourced from social media platforms (e.g., PubPeer, For Better Science), retraction notices from journal websites, and citation databases (e.g., Web of Science, CrossRef). Moreover, Amend includes investigation and punishment announcements released by administrative agencies (e.g., NSFC, MOE, MOST, CAS). Each related paper is marked and can be traced back to its information source via a provided link. Furthermore, the Amend database incorporates various attributes of retracted articles, including citation topics, funding details, open access status, and more. The reasons for retraction are identified and classified as either academic misconduct or honest errors, with detailed subcategories provided for further clarity. Findings Within the Amend platform, a total of 32,515 retracted papers indexed in SCI, SSCI, and ESCI between 1980 and 2023 were identified. Of these, 26,620 (81.87%) were associated with academic misconduct. The retraction rate stands at 6.64 per 10,000 articles. Notably, the retraction rate for non-gold open access articles significantly differs from that for gold open access articles, with this disparity progressively widening over the years. Furthermore, the reasons for retractions have shifted from traditional individual behaviors like falsification, fabrication, plagiarism, and duplication to more organized large-scale fraudulent practices, including Paper Mills, Fake Peer-review, and Artificial Intelligence Generated Content (AIGC). Research limitations The Amend platform may not fully capture all retracted and concerning papers, thereby impacting its comprehensiveness. Additionally, inaccuracies in retraction notices may lead to errors in tagged reasons. Practical implications Amend provides an integrated platform for stakeholders to enhance monitoring, analysis, and research on academic misconduct issues. Ultimately, the Amend database can contribute to upholding scientific integrity. Originality/value This study introduces a globally integrated platform for retracted and concerning papers, along with a preliminary analysis of the evolutionary trends in retracted papers.
{"title":"Amend: an integrated platform of retracted papers and concerned papers","authors":"Menghui Li, Fuyou Chen, Sichao Tong, Liying Yang, Zhesi Shen","doi":"10.2478/jdis-2024-0012","DOIUrl":"https://doi.org/10.2478/jdis-2024-0012","url":null,"abstract":"Purpose The notable increase in retraction papers has attracted considerable attention from diverse stakeholders. Various sources are now offering information related to research integrity, including concerns voiced on social media, disclosed lists of paper mills, and retraction notices accessible through journal websites. However, despite the availability of such resources, there remains a lack of a unified platform to consolidate this information, thereby hindering efficient searching and cross-referencing. Thus, it is imperative to develop a comprehensive platform for retracted papers and related concerns. This article aims to introduce “Amend,” a platform designed to integrate information on research integrity from diverse sources. Design/methodology/approach The Amend platform consolidates concerns and lists of problematic articles sourced from social media platforms (e.g., PubPeer, For Better Science), retraction notices from journal websites, and citation databases (e.g., Web of Science, CrossRef). Moreover, Amend includes investigation and punishment announcements released by administrative agencies (e.g., NSFC, MOE, MOST, CAS). Each related paper is marked and can be traced back to its information source via a provided link. Furthermore, the Amend database incorporates various attributes of retracted articles, including citation topics, funding details, open access status, and more. The reasons for retraction are identified and classified as either academic misconduct or honest errors, with detailed subcategories provided for further clarity. Findings Within the Amend platform, a total of 32,515 retracted papers indexed in SCI, SSCI, and ESCI between 1980 and 2023 were identified. Of these, 26,620 (81.87%) were associated with academic misconduct. The retraction rate stands at 6.64 per 10,000 articles. Notably, the retraction rate for non-gold open access articles significantly differs from that for gold open access articles, with this disparity progressively widening over the years. Furthermore, the reasons for retractions have shifted from traditional individual behaviors like falsification, fabrication, plagiarism, and duplication to more organized large-scale fraudulent practices, including Paper Mills, Fake Peer-review, and Artificial Intelligence Generated Content (AIGC). Research limitations The Amend platform may not fully capture all retracted and concerning papers, thereby impacting its comprehensiveness. Additionally, inaccuracies in retraction notices may lead to errors in tagged reasons. Practical implications Amend provides an integrated platform for stakeholders to enhance monitoring, analysis, and research on academic misconduct issues. Ultimately, the Amend database can contribute to upholding scientific integrity. Originality/value This study introduces a globally integrated platform for retracted and concerning papers, along with a preliminary analysis of the evolutionary trends in retracted papers.","PeriodicalId":44622,"journal":{"name":"Journal of Data and Information Science","volume":"21 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140201266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yizhan Li, Lu Dong, Xiaoxiao Fan, Ren Wei, Shijie Guo, Wenzhen Ma, Zexia Li
Research data infrastructures form the cornerstone in both cyber and physical spaces, driving the progression of the data-intensive scientific research paradigm. This opinion paper presents an overview of global research data infrastructure, drawing insights from national roadmaps and strategic documents related to research data infrastructure. It emphasizes the pivotal role of research data infrastructures by delineating four new missions aimed at positioning them at the core of the current scientific research and communication ecosystem. The four new missions of research data infrastructures are: (1) as a pioneer, to transcend the disciplinary border and address complex, cutting-edge scientific and social challenges with problem- and data-oriented insights; (2) as an architect, to establish a digital, intelligent, flexible research and knowledge services environment; (3) as a platform, to foster the high-end academic communication; (4) as a coordinator, to balance scientific openness with ethics needs.
{"title":"New roles of research data infrastructure in research paradigm evolution","authors":"Yizhan Li, Lu Dong, Xiaoxiao Fan, Ren Wei, Shijie Guo, Wenzhen Ma, Zexia Li","doi":"10.2478/jdis-2024-0011","DOIUrl":"https://doi.org/10.2478/jdis-2024-0011","url":null,"abstract":"Research data infrastructures form the cornerstone in both cyber and physical spaces, driving the progression of the data-intensive scientific research paradigm. This opinion paper presents an overview of global research data infrastructure, drawing insights from national roadmaps and strategic documents related to research data infrastructure. It emphasizes the pivotal role of research data infrastructures by delineating four new missions aimed at positioning them at the core of the current scientific research and communication ecosystem. The four new missions of research data infrastructures are: (1) as a pioneer, to transcend the disciplinary border and address complex, cutting-edge scientific and social challenges with problem- and data-oriented insights; (2) as an architect, to establish a digital, intelligent, flexible research and knowledge services environment; (3) as a platform, to foster the high-end academic communication; (4) as a coordinator, to balance scientific openness with ethics needs.","PeriodicalId":44622,"journal":{"name":"Journal of Data and Information Science","volume":"29 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140047152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose The goal of this study is to analyze the relationship between funded and unfunded papers and their citations in both basic and applied sciences. Design/methodology/approach A power law model analyzes the relationship between research funding and citations of papers using 831,337 documents recorded in the Web of Science database. Findings The original results reveal general characteristics of the diffusion of science in research fields: a) Funded articles receive higher citations compared to unfunded papers in journals; b) Funded articles exhibit a super-linear growth in citations, surpassing the increase seen in unfunded articles. This finding reveals a higher diffusion of scientific knowledge in funded articles. Moreover, c) funded articles in both basic and applied sciences demonstrate a similar expected change in citations, equivalent to about 1.23%, when the number of funded papers increases by 1% in journals. This result suggests, for the first time, that funding effect of scientific research is an invariant driver, irrespective of the nature of the basic or applied sciences. Originality/value This evidence suggests empirical laws of funding for scientific citations that explain the importance of robust funding mechanisms for achieving impactful research outcomes in science and society. These findings here also highlight that funding for scientific research is a critical driving force in supporting citations and the dissemination of scientific knowledge in recorded documents in both basic and applied sciences. Practical implications This comprehensive result provides a holistic view of the relationship between funding and citation performance in science to guide policymakers and R&D managers with science policies by directing funding to research in promoting the scientific development and higher diffusion of results for the progress of human society.
目的 本研究旨在分析基础科学和应用科学领域中获得资助和未获资助的论文及其引用率之间的关系。设计/方法/手段 利用 Web of Science 数据库中记录的 831,337 篇文献,采用幂律模型分析了研究经费与论文引用率之间的关系。研究结果 原始结果揭示了科学在研究领域传播的一般特征:a) 与未获资助的论文相比,获得资助的文章在期刊中获得的引用率更高;b) 获得资助的文章在引用率方面呈现超线性增长,超过了未获资助文章的增幅。这一发现表明,受资助文章的科学知识传播率更高。此外,c) 基础科学和应用科学领域的受资助文章在期刊中的受资助论文数量增加 1%时,引文量也会出现类似的预期变化,约为 1.23%。这一结果首次表明,无论基础科学或应用科学的性质如何,科学研究的资助效应都是一个不变的驱动因素。原创性/价值 这一证据提出了科学引文资助的经验规律,解释了健全的资助机制对于在科学和社会领域取得有影响力的研究成果的重要性。这些发现还强调,科研经费是支持基础科学和应用科学领域记录文献中科学知识的引用和传播的重要推动力。实践意义 这一综合结果提供了科学研究经费与引文绩效之间关系的整体视角,可指导政策制定者和研发管理者制定科学政策,引导科研经费用于促进科学发展和成果传播,从而推动人类社会的进步。
{"title":"General laws of funding for scientific citations: how citations change in funded and unfunded research between basic and applied sciences","authors":"Mario Coccia, Saeed Roshani","doi":"10.2478/jdis-2024-0005","DOIUrl":"https://doi.org/10.2478/jdis-2024-0005","url":null,"abstract":"Purpose The goal of this study is to analyze the relationship between funded and unfunded papers and their citations in both basic and applied sciences. Design/methodology/approach A power law model analyzes the relationship between research funding and citations of papers using 831,337 documents recorded in the Web of Science database. Findings The original results reveal general characteristics of the diffusion of science in research fields: a) Funded articles receive higher citations compared to unfunded papers in journals; b) Funded articles exhibit a super-linear growth in citations, surpassing the increase seen in unfunded articles. This finding reveals a higher diffusion of scientific knowledge in funded articles. Moreover, c) funded articles in both basic and applied sciences demonstrate a similar expected change in citations, equivalent to about 1.23%, when the number of funded papers increases by 1% in journals. This result suggests, for the first time, that funding effect of scientific research is an invariant driver, irrespective of the nature of the basic or applied sciences. Originality/value This evidence suggests empirical laws of funding for scientific citations that explain the importance of robust funding mechanisms for achieving impactful research outcomes in science and society. These findings here also highlight that funding for scientific research is a critical driving force in supporting citations and the dissemination of scientific knowledge in recorded documents in both basic and applied sciences. Practical implications This comprehensive result provides a holistic view of the relationship between funding and citation performance in science to guide policymakers and R&D managers with science policies by directing funding to research in promoting the scientific development and higher diffusion of results for the progress of human society.","PeriodicalId":44622,"journal":{"name":"Journal of Data and Information Science","volume":"143 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139981056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose The goal of this study is a comparative analysis of the relation between funding (a main driver for scientific research) and citations in papers of Nobel Laureates in physics, chemistry and medicine over 2019-2020 and the same relation in these research fields as a whole. Design/Methodology/Approach This study utilizes a power law model to explore the relationship between research funding and citations of related papers. The study here analyzes 3,539 recorded documents by Nobel Laureates in physics, chemistry and medicine and a broader dataset of 183,016 documents related to the fields of physics, medicine, and chemistry recorded in the Web of Science database. Findings Results reveal that in chemistry and medicine, funded researches published in papers of Nobel Laureates have higher citations than unfunded studies published in articles; vice versa high citations of Nobel Laureates in physics are for unfunded studies published in papers. Instead, when overall data of publications and citations in physics, chemistry and medicine are analyzed, all papers based on funded researches show higher citations than unfunded ones. Originality/Value Results clarify the driving role of research funding for science diffusion that are systematized in general properties: a) articles concerning funded researches receive more citations than (un)funded studies published in papers of physics, chemistry and medicine sciences, generating a high Matthew effect (a higher growth of citations with the increase in the number of papers); b) research funding increases the citations of articles in fields oriented to applied research (e.g., chemistry and medicine) more than fields oriented towards basic research (e.g., physics). Practical Implications The results here explain some characteristics of scientific development and diffusion, highlighting the critical role of research funding in fostering citations and the expansion of scientific knowledge. This finding can support decisionmaking of policymakers and R&D managers to improve the effectiveness in allocating financial resources in science policies to generate a higher positive scientific and societal impact.
目的 本研究旨在比较分析 2019-2020 年物理学、化学和医学诺贝尔奖获得者论文的经费(科学研究的主要驱动力)与引用率之间的关系,以及这些研究领域作为一个整体的相同关系。设计/方法/途径 本研究利用幂律模型来探讨科研经费与相关论文引用率之间的关系。本研究分析了物理学、化学和医学领域诺贝尔奖获得者的 3,539 篇记录文献,以及 Web of Science 数据库中与物理学、医学和化学领域相关的 183,016 篇更广泛的数据集。研究结果 研究结果显示,在化学和医学领域,诺贝尔奖获得者论文中发表的受资助研究的引用率高于文章中发表的未受资助研究的引用率;反之,物理学领域诺贝尔奖获得者论文中发表的未受资助研究的引用率较高。相反,如果对物理学、化学和医学的论文发表和引用的整体数据进行分析,所有基于资助研究的论文都比未获资助的论文引用率高。原创性/价值 研究结果阐明了科研经费对科学传播的推动作用,其系统化的一般特性是:a) 与物理、化学和医学科学论文中发表的(未获)资助的研究相比,与资助研究相关的文章获得了更多的引用,从而产生了较高的马太效应(随着论文数量的增加,引用的增长也更高);b) 与基础研究领域(如物理)相比,科研经费更能增加应用研究领域(如化学和医学)文章的引用。实际意义 本文的研究结果解释了科学发展和传播的一些特点,强调了研究经费在促进引用和科学知识扩展方面的关键作用。这一发现可以为政策制定者和研发管理人员的决策提供支持,从而提高科学政策中财政资源分配的有效性,产生更积极的科学和社会影响。
{"title":"Research funding and citations in papers of Nobel Laureates in Physics, Chemistry and Medicine, 2019-2020","authors":"Mario Coccia, Saeed Roshani","doi":"10.2478/jdis-2024-0006","DOIUrl":"https://doi.org/10.2478/jdis-2024-0006","url":null,"abstract":"Purpose The goal of this study is a comparative analysis of the relation between funding (a main driver for scientific research) and citations in papers of Nobel Laureates in physics, chemistry and medicine over 2019-2020 and the same relation in these research fields as a whole. Design/Methodology/Approach This study utilizes a power law model to explore the relationship between research funding and citations of related papers. The study here analyzes 3,539 recorded documents by Nobel Laureates in physics, chemistry and medicine and a broader dataset of 183,016 documents related to the fields of physics, medicine, and chemistry recorded in the Web of Science database. Findings Results reveal that in chemistry and medicine, funded researches published in papers of Nobel Laureates have higher citations than unfunded studies published in articles; vice versa high citations of Nobel Laureates in physics are for unfunded studies published in papers. Instead, when overall data of publications and citations in physics, chemistry and medicine are analyzed, all papers based on funded researches show higher citations than unfunded ones. Originality/Value Results clarify the driving role of research funding for science diffusion that are systematized in general properties: a) articles concerning funded researches receive more citations than (un)funded studies published in papers of physics, chemistry and medicine sciences, generating a high Matthew effect (a higher growth of citations with the increase in the number of papers); b) research funding increases the citations of articles in fields oriented to applied research (e.g., chemistry and medicine) more than fields oriented towards basic research (e.g., physics). Practical Implications The results here explain some characteristics of scientific development and diffusion, highlighting the critical role of research funding in fostering citations and the expansion of scientific knowledge. This finding can support decisionmaking of policymakers and R&D managers to improve the effectiveness in allocating financial resources in science policies to generate a higher positive scientific and societal impact.","PeriodicalId":44622,"journal":{"name":"Journal of Data and Information Science","volume":"37 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139923542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose Accurately assigning the document type of review articles in citation index databases like Web of Science(WoS) and Scopus is important. This study aims to investigate the document type assignation of review articles in Web of Science, Scopus and Publisher’s websites on a large scale. Design/methodology/approach 27,616 papers from 160 journals from 10 review journal series indexed in SCI are analyzed. The document types of these papers labeled on journals’ websites, and assigned by WoS and Scopus are retrieved and compared to determine the assigning accuracy and identify the possible reasons for wrongly assigning. For the document type labeled on the website, we further differentiate them into explicit review and implicit review based on whether the website directly indicates it is a review or not. Findings Overall, WoS and Scopus performed similarly, with an average precision of about 99% and recall of about 80%. However, there were some differences between WoS and Scopus across different journal series and within the same journal series. The assigning accuracy of WoS and Scopus for implicit reviews dropped significantly, especially for Scopus. Research limitations The document types we used as the gold standard were based on the journal websites’ labeling which were not manually validated one by one. We only studied the labeling performance for review articles published during 2017-2018 in review journals. Whether this conclusion can be extended to review articles published in non-review journals and most current situation is not very clear. Practical implications This study provides a reference for the accuracy of document type assigning of review articles in WoS and Scopus, and the identified pattern for assigning implicit reviews may be helpful to better labeling on websites, WoS and Scopus. Originality/value This study investigated the assigning accuracy of document type of reviews and identified the some patterns of wrong assignments.
{"title":"An explorative study on document type assignment of review articles in Web of Science, Scopus and journals’ websites","authors":"Manman Zhu, Xinyue Lu, Fuyou Chen, Liying Yang, Zhesi Shen","doi":"10.2478/jdis-2024-0003","DOIUrl":"https://doi.org/10.2478/jdis-2024-0003","url":null,"abstract":"Purpose Accurately assigning the document type of review articles in citation index databases like Web of Science(WoS) and Scopus is important. This study aims to investigate the document type assignation of review articles in Web of Science, Scopus and Publisher’s websites on a large scale. Design/methodology/approach 27,616 papers from 160 journals from 10 review journal series indexed in SCI are analyzed. The document types of these papers labeled on journals’ websites, and assigned by WoS and Scopus are retrieved and compared to determine the assigning accuracy and identify the possible reasons for wrongly assigning. For the document type labeled on the website, we further differentiate them into explicit review and implicit review based on whether the website directly indicates it is a review or not. Findings Overall, WoS and Scopus performed similarly, with an average precision of about 99% and recall of about 80%. However, there were some differences between WoS and Scopus across different journal series and within the same journal series. The assigning accuracy of WoS and Scopus for implicit reviews dropped significantly, especially for Scopus. Research limitations The document types we used as the gold standard were based on the journal websites’ labeling which were not manually validated one by one. We only studied the labeling performance for review articles published during 2017-2018 in review journals. Whether this conclusion can be extended to review articles published in non-review journals and most current situation is not very clear. Practical implications This study provides a reference for the accuracy of document type assigning of review articles in WoS and Scopus, and the identified pattern for assigning implicit reviews may be helpful to better labeling on websites, WoS and Scopus. Originality/value This study investigated the assigning accuracy of document type of reviews and identified the some patterns of wrong assignments.","PeriodicalId":44622,"journal":{"name":"Journal of Data and Information Science","volume":"180 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139923549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose To contribute to the study of networks and graphs. Design/methodology/approach We apply standard mathematical thinking. Findings We show that the distance distribution in an undirected network Lorenz majorizes the one of a chain. As a consequence, the average and median distances in any such network are smaller than or equal to those of a chain. Research limitations We restricted our investigations to undirected, unweighted networks. Practical implications We are convinced that these results are useful in the study of small worlds and the so-called six degrees of separation property. Originality/value To the best of our knowledge our research contains new network results, especially those related to frequencies of distances.
{"title":"Extended Lorenz majorization and frequencies of distances in an undirected network","authors":"Leo Egghe","doi":"10.2478/jdis-2024-0007","DOIUrl":"https://doi.org/10.2478/jdis-2024-0007","url":null,"abstract":"Purpose To contribute to the study of networks and graphs. Design/methodology/approach We apply standard mathematical thinking. Findings We show that the distance distribution in an undirected network Lorenz majorizes the one of a chain. As a consequence, the average and median distances in any such network are smaller than or equal to those of a chain. Research limitations We restricted our investigations to undirected, unweighted networks. Practical implications We are convinced that these results are useful in the study of small worlds and the so-called six degrees of separation property. Originality/value To the best of our knowledge our research contains new network results, especially those related to frequencies of distances.","PeriodicalId":44622,"journal":{"name":"Journal of Data and Information Science","volume":"51 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139756514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose This paper aims to address the limitations in existing research on the evolution of knowledge flow networks by proposing a meso-level institutional field knowledge flow network evolution model (IKM). The purpose is to simulate the construction process of a knowledge flow network using knowledge organizations as units and to investigate its effectiveness in replicating institutional field knowledge flow networks. Design/Methodology/Approach The IKM model enhances the preferential attachment and growth observed in scale-free BA networks, while incorporating three adjustment parameters to simulate the selection of connection targets and the types of nodes involved in the network evolution process Using the PageRank algorithm to calculate the significance of nodes within the knowledge flow network. To compare its performance, the BA and DMS models are also employed for simulating the network. Pearson coefficient analysis is conducted on the simulated networks generated by the IKM, BA and DMS models, as well as on the actual network. Findings The research findings demonstrate that the IKM model outperforms the BA and DMS models in replicating the institutional field knowledge flow network. It provides comprehensive insights into the evolution mechanism of knowledge flow networks in the scientific research realm. The model also exhibits potential applicability to other knowledge networks that involve knowledge organizations as node units. Research Limitations This study has some limitations. Firstly, it primarily focuses on the evolution of knowledge flow networks within the field of physics, neglecting other fields. Additionally, the analysis is based on a specific set of data, which may limit the generalizability of the findings. Future research could address these limitations by exploring knowledge flow networks in diverse fields and utilizing broader datasets. Practical Implications The proposed IKM model offers practical implications for the construction and analysis of knowledge flow networks within institutions. It provides a valuable tool for understanding and managing knowledge exchange between knowledge organizations. The model can aid in optimizing knowledge flow and enhancing collaboration within organizations. Originality/value This research highlights the significance of meso-level studies in understanding knowledge organization and its impact on knowledge flow networks. The IKM model demonstrates its effectiveness in replicating institutional field knowledge flow networks and offers practical implications for knowledge management in institutions. Moreover, the model has the potential to be applied to other knowledge networks, which are formed by knowledge organizations as node units.
目的 本文针对现有知识流网络演化研究的局限性,提出了中观层面的机构领域知识流网络演化模型(IKM)。目的是以知识组织为单位,模拟知识流网络的构建过程,并研究其在复制机构领域知识流网络方面的有效性。设计/方法/途径 IKM 模型增强了在无标度 BA 网络中观察到的优先附着和增长,同时加入了三个调整参数来模拟网络演化过程中连接目标和节点类型的选择。为了比较其性能,还采用了 BA 和 DMS 模型来模拟网络。对 IKM、BA 和 DMS 模型生成的模拟网络以及实际网络进行了皮尔逊系数分析。研究结果 研究结果表明,在复制机构领域知识流网络方面,IKM 模型优于 BA 和 DMS 模型。该模型全面揭示了科研领域知识流网络的演化机制。该模型还具有潜在的适用性,可用于以知识组织为节点单元的其他知识网络。研究局限性 本研究存在一些局限性。首先,它主要关注物理学领域知识流网络的演变,忽略了其他领域。此外,分析基于一组特定的数据,这可能会限制研究结果的普适性。未来的研究可以通过探索不同领域的知识流网络和利用更广泛的数据集来解决这些局限性。实际意义 所提出的知识管理模型为构建和分析机构内的知识流动网络提供了实际意义。它为理解和管理知识组织之间的知识交流提供了一个有价值的工具。该模型有助于优化知识流和加强组织内部的协作。原创性/价值 本研究强调了中层研究在理解知识组织及其对知识流网络的影响方面的重要意义。知识管理模型证明了其在复制机构领域知识流网络方面的有效性,并为机构的知识管理提供了实际意义。此外,该模型还有可能应用于以知识组织为节点单位形成的其他知识网络。
{"title":"A new evolutional model for institutional field knowledge flow network","authors":"Jinzhong Guo, Kai Wang, Xueqin Liao, Xiaoling Liu","doi":"10.2478/jdis-2024-0009","DOIUrl":"https://doi.org/10.2478/jdis-2024-0009","url":null,"abstract":"Purpose This paper aims to address the limitations in existing research on the evolution of knowledge flow networks by proposing a meso-level institutional field knowledge flow network evolution model (IKM). The purpose is to simulate the construction process of a knowledge flow network using knowledge organizations as units and to investigate its effectiveness in replicating institutional field knowledge flow networks. Design/Methodology/Approach The IKM model enhances the preferential attachment and growth observed in scale-free BA networks, while incorporating three adjustment parameters to simulate the selection of connection targets and the types of nodes involved in the network evolution process Using the PageRank algorithm to calculate the significance of nodes within the knowledge flow network. To compare its performance, the BA and DMS models are also employed for simulating the network. Pearson coefficient analysis is conducted on the simulated networks generated by the IKM, BA and DMS models, as well as on the actual network. Findings The research findings demonstrate that the IKM model outperforms the BA and DMS models in replicating the institutional field knowledge flow network. It provides comprehensive insights into the evolution mechanism of knowledge flow networks in the scientific research realm. The model also exhibits potential applicability to other knowledge networks that involve knowledge organizations as node units. Research Limitations This study has some limitations. Firstly, it primarily focuses on the evolution of knowledge flow networks within the field of physics, neglecting other fields. Additionally, the analysis is based on a specific set of data, which may limit the generalizability of the findings. Future research could address these limitations by exploring knowledge flow networks in diverse fields and utilizing broader datasets. Practical Implications The proposed IKM model offers practical implications for the construction and analysis of knowledge flow networks within institutions. It provides a valuable tool for understanding and managing knowledge exchange between knowledge organizations. The model can aid in optimizing knowledge flow and enhancing collaboration within organizations. Originality/value This research highlights the significance of meso-level studies in understanding knowledge organization and its impact on knowledge flow networks. The IKM model demonstrates its effectiveness in replicating institutional field knowledge flow networks and offers practical implications for knowledge management in institutions. Moreover, the model has the potential to be applied to other knowledge networks, which are formed by knowledge organizations as node units.","PeriodicalId":44622,"journal":{"name":"Journal of Data and Information Science","volume":"51 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139756507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose Interdisciplinary fields have become the driving force of modern science and a significant source of scientific innovation. However, there is still a paucity of analysis about the essential characteristics of disciplines’ cross-disciplinary impact. Design/methodology/approach In this study, we define cross-disciplinary impact on one discipline as its impact to other disciplines, and refer to a three-dimensional framework of variety-balance-disparity to characterize the structure of cross-disciplinary impact. The variety of cross-disciplinary impact of the discipline was defined as the proportion of the high cross-disciplinary impact publications, and the balance and disparity of cross-disciplinary impact were measured as well. To demonstrate the cross-disciplinary impact of the disciplines in science, we chose Microsoft Academic Graph (MAG) as the data source, and investigated the relationship between disciplines’ cross-disciplinary impact and their positions in the Hierarchy of Science (HOS). Findings Analytical results show that there is a significant correlation between the ranking of cross-disciplinary impact and the HOS structure, and that the discipline exerts a greater cross-disciplinary impact on its neighboring disciplines. Several bibliometric features that measure the hardness of a discipline, including the number of references, the number of cited disciplines, the citation distribution, and the Price index have a significant positive effect on the variety of cross-disciplinary impact. The number of references, the number of cited disciplines, and the citation distribution have significant positive and negative effects on balance and disparity, respectively. It is concluded that the less hard the discipline, the greater the cross-disciplinary impact, the higher balance and the lower disparity of cross-disciplinary impact. Research limitations In the empirical analysis of HOS, we only included five broad disciplines. This study also has some biases caused by the data source and applied regression models. Practical implications This study contributes to the formulation of discipline-specific policies and promotes the growth of interdisciplinary research, as well as offering fresh insights for predicting the cross-disciplinary impact of disciplines. Originality/value This study provides a new perspective to properly understand the mechanisms of cross-disciplinary impact and disciplinary integration.
目的 跨学科领域已成为现代科学的推动力和科学创新的重要源泉。然而,关于学科交叉影响的基本特征的分析仍然很少。设计/方法/途径 在本研究中,我们将某一学科的交叉影响定义为其对其他学科的影响,并参照多样性-平衡性-差异性三维框架来表征交叉学科影响的结构。学科交叉影响的多样性被定义为高交叉影响出版物的比例,同时还测量了交叉影响的平衡性和差异性。为了证明学科在科学领域的跨学科影响,我们选择了微软学术图谱(MAG)作为数据源,并研究了学科的跨学科影响与其在科学层次结构(HOS)中的位置之间的关系。研究结果 分析结果表明,交叉学科影响力排名与 HOS 结构之间存在显著相关性,学科对其相邻学科产生的交叉学科影响力更大。衡量学科硬度的几个文献计量特征,包括参考文献数、被引学科数、引文分布和普赖斯指数,对跨学科影响的多样性有显著的正向影响。参考文献数、被引学科数和引文分布分别对平衡性和差异性有显著的正效应和负效应。结论是,学科难度越小,跨学科影响越大,跨学科影响的平衡性越高,差异性越小。研究局限性 在居屋的实证分析中,我们只纳入了五大学科。由于数据来源和应用回归模型的原因,本研究也存在一些偏差。现实意义 本研究有助于制定学科政策,促进跨学科研究的发展,并为预测学科的跨学科影响提供了新的见解。原创性/价值 本研究为正确理解跨学科影响和学科融合的机制提供了一个新的视角。
{"title":"Characterizing structure of cross-disciplinary impact of global disciplines: A perspective of the Hierarchy of Science","authors":"Ruolan Liu, Jin Mao, Gang Li, Yujie Cao","doi":"10.2478/jdis-2024-0008","DOIUrl":"https://doi.org/10.2478/jdis-2024-0008","url":null,"abstract":"Purpose Interdisciplinary fields have become the driving force of modern science and a significant source of scientific innovation. However, there is still a paucity of analysis about the essential characteristics of disciplines’ cross-disciplinary impact. Design/methodology/approach In this study, we define cross-disciplinary impact on one discipline as its impact to other disciplines, and refer to a three-dimensional framework of variety-balance-disparity to characterize the structure of cross-disciplinary impact. The variety of cross-disciplinary impact of the discipline was defined as the proportion of the high cross-disciplinary impact publications, and the balance and disparity of cross-disciplinary impact were measured as well. To demonstrate the cross-disciplinary impact of the disciplines in science, we chose Microsoft Academic Graph (MAG) as the data source, and investigated the relationship between disciplines’ cross-disciplinary impact and their positions in the Hierarchy of Science (HOS). Findings Analytical results show that there is a significant correlation between the ranking of cross-disciplinary impact and the HOS structure, and that the discipline exerts a greater cross-disciplinary impact on its neighboring disciplines. Several bibliometric features that measure the hardness of a discipline, including the number of references, the number of cited disciplines, the citation distribution, and the Price index have a significant positive effect on the variety of cross-disciplinary impact. The number of references, the number of cited disciplines, and the citation distribution have significant positive and negative effects on balance and disparity, respectively. It is concluded that the less hard the discipline, the greater the cross-disciplinary impact, the higher balance and the lower disparity of cross-disciplinary impact. Research limitations In the empirical analysis of HOS, we only included five broad disciplines. This study also has some biases caused by the data source and applied regression models. Practical implications This study contributes to the formulation of discipline-specific policies and promotes the growth of interdisciplinary research, as well as offering fresh insights for predicting the cross-disciplinary impact of disciplines. Originality/value This study provides a new perspective to properly understand the mechanisms of cross-disciplinary impact and disciplinary integration.","PeriodicalId":44622,"journal":{"name":"Journal of Data and Information Science","volume":"46 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139756519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}