Abstract Purpose With the availability of large-scale scholarly datasets, scientists from various domains hope to understand the underlying mechanisms behind science, forming a vibrant area of inquiry in the emerging “science of science” field. As the results from the science of science often has strong policy implications, understanding the causal relationships between variables becomes prominent. However, the most credible quasi-experimental method among all causal inference methods, and a highly valuable tool in the empirical toolkit, Regression Discontinuity Design (RDD) has not been fully exploited in the field of science of science. In this paper, we provide a systematic survey of the RDD method, and its practical applications in the science of science. Design/methodology/approach First, we introduce the basic assumptions, mathematical notations, and two types of RDD, i.e., sharp and fuzzy RDD. Second, we use the Web of Science and the Microsoft Academic Graph datasets to study the evolution and citation patterns of RDD papers. Moreover, we provide a systematic survey of the applications of RDD methodologies in various scientific domains, as well as in the science of science. Finally, we demonstrate a case study to estimate the effect of Head Start Funding Proposals on child mortality. Findings RDD was almost neglected for 30 years after it was first introduced in 1960. Afterward, scientists used mathematical and economic tools to develop the RDD methodology. After 2010, RDD methods showed strong applications in various domains, including medicine, psychology, political science and environmental science. However, we also notice that the RDD method has not been well developed in science of science research. Research Limitations This work uses a keyword search to obtain RDD papers, which may neglect some related work. Additionally, our work does not aim to develop rigorous mathematical and technical details of RDD but rather focuses on its intuitions and applications. Practical implications This work proposes how to use the RDD method in science of science research. Originality/value This work systematically introduces the RDD, and calls for the awareness of using such a method in the field of science of science.
摘要目的随着大规模学术数据集的可用性,来自各个领域的科学家希望了解科学背后的潜在机制,在新兴的“科学的科学”领域形成一个充满活力的研究领域。由于科学的结果往往具有强烈的政策含义,理解变量之间的因果关系变得尤为突出。然而,回归不连续性设计(RDD)是所有因果推理方法中最可信的准实验方法,也是经验工具包中极具价值的工具,在科学领域尚未得到充分利用。在本文中,我们对RDD方法及其在科学中的实际应用进行了系统的综述。设计/方法论/方法首先,我们介绍了基本假设、数学符号和两种类型的RDD,即尖锐和模糊RDD。其次,我们使用Web of Science和Microsoft Academic Graph数据集来研究RDD论文的演变和引用模式。此外,我们对RDD方法在各个科学领域以及科学中的应用进行了系统的调查。最后,我们展示了一个案例研究,以评估领先资金提案对儿童死亡率的影响。研究结果RDD在1960年首次引入后的30年里几乎被忽视。之后,科学家们利用数学和经济工具开发了RDD方法。2010年之后,RDD方法在医学、心理学、政治学和环境科学等各个领域都有了强大的应用。然而,我们也注意到RDD方法在科学研究中并没有得到很好的发展。研究局限性这项工作使用关键词搜索来获得RDD论文,这可能会忽略一些相关工作。此外,我们的工作并不旨在开发RDD的严格数学和技术细节,而是专注于其直觉和应用。实际意义这项工作提出了如何在科学研究中使用RDD方法。独创性/价值这部作品系统地介绍了RDD,并呼吁人们意识到在科学领域使用这种方法。
{"title":"Regression discontinuity design and its applications to Science of Science: A survey","authors":"Mei Li, Yang Zhang, Yang Wang","doi":"10.2478/jdis-2023-0008","DOIUrl":"https://doi.org/10.2478/jdis-2023-0008","url":null,"abstract":"Abstract Purpose With the availability of large-scale scholarly datasets, scientists from various domains hope to understand the underlying mechanisms behind science, forming a vibrant area of inquiry in the emerging “science of science” field. As the results from the science of science often has strong policy implications, understanding the causal relationships between variables becomes prominent. However, the most credible quasi-experimental method among all causal inference methods, and a highly valuable tool in the empirical toolkit, Regression Discontinuity Design (RDD) has not been fully exploited in the field of science of science. In this paper, we provide a systematic survey of the RDD method, and its practical applications in the science of science. Design/methodology/approach First, we introduce the basic assumptions, mathematical notations, and two types of RDD, i.e., sharp and fuzzy RDD. Second, we use the Web of Science and the Microsoft Academic Graph datasets to study the evolution and citation patterns of RDD papers. Moreover, we provide a systematic survey of the applications of RDD methodologies in various scientific domains, as well as in the science of science. Finally, we demonstrate a case study to estimate the effect of Head Start Funding Proposals on child mortality. Findings RDD was almost neglected for 30 years after it was first introduced in 1960. Afterward, scientists used mathematical and economic tools to develop the RDD methodology. After 2010, RDD methods showed strong applications in various domains, including medicine, psychology, political science and environmental science. However, we also notice that the RDD method has not been well developed in science of science research. Research Limitations This work uses a keyword search to obtain RDD papers, which may neglect some related work. Additionally, our work does not aim to develop rigorous mathematical and technical details of RDD but rather focuses on its intuitions and applications. Practical implications This work proposes how to use the RDD method in science of science research. Originality/value This work systematically introduces the RDD, and calls for the awareness of using such a method in the field of science of science.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"8 1","pages":"43 - 65"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47869216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Purpose This study examines the effects of using publication-based metrics for the initial screening in the application process for a project leader. The key questions are whether formal policy affects the allocation of funds to researchers with a better publication record and how the previous academic performance of principal investigators is related to future project results. Design/methodology/approach We compared two competitions, before and after the policy raised the publication threshold for the principal investigators. We analyzed 9,167 papers published by 332 winners in physics and the social sciences and humanities (SSH), and 11,253 publications resulting from each funded project. Findings We found that among physicists, even in the first period, grants tended to be allocated to prolific authors publishing in high-quality journals. In contrast, the SSH project grantees had been less prolific in publishing internationally in both periods; however, in the second period, the selection of grant recipients yielded better results regarding awarding grants to more productive authors in terms of the quantity and quality of publications. There was no evidence that this better selection of grant recipients resulted in better publication records during grant realization. Originality This study contributes to the discussion of formal policies that rely on metrics for the evaluation of grant proposals. The Russian case shows that such policy may have a profound effect on changing the supply side of applicants, especially in disciplines that are less suitable for metric-based evaluations. In spite of the criticism given to metrics, they might be a useful additional instrument in academic systems where professional expertise is corrupted and prevents allocation of funds to prolific researchers.
{"title":"Evaluating grant proposals: lessons from using metrics as screening device","authors":"K. Guba, Alexey Zheleznov, Elena Chechik","doi":"10.2478/jdis-2023-0010","DOIUrl":"https://doi.org/10.2478/jdis-2023-0010","url":null,"abstract":"Abstract Purpose This study examines the effects of using publication-based metrics for the initial screening in the application process for a project leader. The key questions are whether formal policy affects the allocation of funds to researchers with a better publication record and how the previous academic performance of principal investigators is related to future project results. Design/methodology/approach We compared two competitions, before and after the policy raised the publication threshold for the principal investigators. We analyzed 9,167 papers published by 332 winners in physics and the social sciences and humanities (SSH), and 11,253 publications resulting from each funded project. Findings We found that among physicists, even in the first period, grants tended to be allocated to prolific authors publishing in high-quality journals. In contrast, the SSH project grantees had been less prolific in publishing internationally in both periods; however, in the second period, the selection of grant recipients yielded better results regarding awarding grants to more productive authors in terms of the quantity and quality of publications. There was no evidence that this better selection of grant recipients resulted in better publication records during grant realization. Originality This study contributes to the discussion of formal policies that rely on metrics for the evaluation of grant proposals. The Russian case shows that such policy may have a profound effect on changing the supply side of applicants, especially in disciplines that are less suitable for metric-based evaluations. In spite of the criticism given to metrics, they might be a useful additional instrument in academic systems where professional expertise is corrupted and prevents allocation of funds to prolific researchers.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"8 1","pages":"66 - 92"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46442307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How a systems perspective can help us with the interdisciplinarity puzzle","authors":"J. Eykens","doi":"10.2478/jdis-2023-0005","DOIUrl":"https://doi.org/10.2478/jdis-2023-0005","url":null,"abstract":"","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"8 1","pages":"2 - 8"},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41636586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Purpose In recent decades, with the availability of large-scale scientific corpus datasets, difference-in-difference (DID) is increasingly used in the science of science and bibliometrics studies. DID method outputs the unbiased estimation on condition that several hypotheses hold, especially the common trend assumption. In this paper, we gave a systematic demonstration of DID in the science of science, and the potential ways to improve the accuracy of DID method. Design/methodology/approach At first, we reviewed the statistical assumptions, the model specification, and the application procedures of DID method. Second, to improve the necessary assumptions before conducting DID regression and the accuracy of estimation, we introduced some matching techniques serving as the pre-selecting step for DID design by matching control individuals who are equivalent to those treated ones on observational variables before the intervention. Lastly, we performed a case study to estimate the effects of prizewinning on the scientific performance of Nobel laureates, by comparing the yearly citation impact after the prizewinning year between Nobel laureates and their prizewinning-work coauthors. Findings We introduced the procedures to conduct a DID estimation and demonstrated the effectiveness to use matching method to improve the results. As a case study, we found that there are no significant increases in citations for Nobel laureates compared to their prizewinning coauthors. Research limitations This study ignored the rigorous mathematical deduction parts of DID, while focused on the practical parts. Practical implications This work gives experimental practice and potential guidelines to use DID method in science of science and bibliometrics studies. Originality/value This study gains insights into the usage of econometric tools in science of science.
{"title":"Practical operation and theoretical basis of difference-in-difference regression in science of science: The comparative trial on the scientific performance of Nobel laureates versus their coauthors","authors":"Yurui Huang, Chaolin Tian, Yifang Ma","doi":"10.2478/jdis-2023-0003","DOIUrl":"https://doi.org/10.2478/jdis-2023-0003","url":null,"abstract":"Abstract Purpose In recent decades, with the availability of large-scale scientific corpus datasets, difference-in-difference (DID) is increasingly used in the science of science and bibliometrics studies. DID method outputs the unbiased estimation on condition that several hypotheses hold, especially the common trend assumption. In this paper, we gave a systematic demonstration of DID in the science of science, and the potential ways to improve the accuracy of DID method. Design/methodology/approach At first, we reviewed the statistical assumptions, the model specification, and the application procedures of DID method. Second, to improve the necessary assumptions before conducting DID regression and the accuracy of estimation, we introduced some matching techniques serving as the pre-selecting step for DID design by matching control individuals who are equivalent to those treated ones on observational variables before the intervention. Lastly, we performed a case study to estimate the effects of prizewinning on the scientific performance of Nobel laureates, by comparing the yearly citation impact after the prizewinning year between Nobel laureates and their prizewinning-work coauthors. Findings We introduced the procedures to conduct a DID estimation and demonstrated the effectiveness to use matching method to improve the results. As a case study, we found that there are no significant increases in citations for Nobel laureates compared to their prizewinning coauthors. Research limitations This study ignored the rigorous mathematical deduction parts of DID, while focused on the practical parts. Practical implications This work gives experimental practice and potential guidelines to use DID method in science of science and bibliometrics studies. Originality/value This study gains insights into the usage of econometric tools in science of science.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"8 1","pages":"29 - 46"},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41942700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Regression is a widely used econometric tool in research. In observational studies, based on a number of assumptions, regression-based statistical control methods attempt to analyze the causation between treatment and outcome by adding control variables. However, this approach may not produce reliable estimates of causal effects. In addition to the shortcomings of the method, this lack of confidence is mainly related to ambiguous formulations in econometrics, such as the definition of selection bias, selection of core control variables, and method of testing for robustness. Within the framework of the causal models, we clarify the assumption of causal inference using regression-based statistical controls, as described in econometrics, and discuss how to select core control variables to satisfy this assumption and conduct robustness tests for regression estimates.
{"title":"Causal inference using regression-based statistical control: Confusion in Econometrics","authors":"Fan Chao, Guang Yu","doi":"10.2478/jdis-2023-0006","DOIUrl":"https://doi.org/10.2478/jdis-2023-0006","url":null,"abstract":"Abstract Regression is a widely used econometric tool in research. In observational studies, based on a number of assumptions, regression-based statistical control methods attempt to analyze the causation between treatment and outcome by adding control variables. However, this approach may not produce reliable estimates of causal effects. In addition to the shortcomings of the method, this lack of confidence is mainly related to ambiguous formulations in econometrics, such as the definition of selection bias, selection of core control variables, and method of testing for robustness. Within the framework of the causal models, we clarify the assumption of causal inference using regression-based statistical controls, as described in econometrics, and discuss how to select core control variables to satisfy this assumption and conduct robustness tests for regression estimates.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"8 1","pages":"21 - 28"},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46078264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large language models and scientific publishing","authors":"R. Rousseau, Liying Yang, J. Bollen, Zhesi Shen","doi":"10.2478/jdis-2023-0007","DOIUrl":"https://doi.org/10.2478/jdis-2023-0007","url":null,"abstract":"","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"8 1","pages":"1 - 1"},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47427043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shelia X. Wei, Helena H. Zhang, Howell Y. Wang, F. Y. Ye
Abstract Purpose Following the typical features of the grey-rhino event as predictability and profound influence, we attempt to find a special pattern called the grey-rhino in eminent technologies via patent analysis. Design/methodology/approach We propose to combine triadic patent families and technology life cycle to define the grey-rhino model. Firstly, we design the indicator rhino-index Rh = ST/SP and descriptor sequence {Rh}, where ST and SP are the accumulative number of triadic patent families and all patent families respectively for a specific technology. Secondly, according to the two typical features of the grey-rhino event, a grey-rhino is defined as a technology that meets both qualitative and quantitative conditions. Qualitatively, this technology has a profound influence. Quantitatively, in the emerging stage, Rh ≥ Rae, where Rae is the average level of the proportion of triadic patent families. Finally, this model is verified in three datasets, namely Encyclopedia Britannica's list for the greatest inventions (EB technologies for short), MIT breakthrough technologies (MIT technologies) and Derwent Manual Code technologies (MAN technologies). Findings The result shows that there are 64.71% EB technologies and 50.00% MIT technologies meeting the quantitative standard of the grey-rhino model, but only 14.71% MAN technologies fit the quantitative standard. This falling trend indicates the quantitative standard of the grey-rhino model is reasonable. EB technologies and MIT technologies have profound influence on society, which means they satisfy the qualitative standard of the grey-rhino model. Hence, 64.71% EB technologies and 50.00% MIT technologies are grey-rhinos. In 14.71% MAN technologies meeting the quantitative standard, we make some qualitative judgments and deem U11-A01A, U12-A01A1A, and W01-A01A as grey-rhino technologies. In addition, grey-rhinos and non-grey-rhinos have some differences. Rh values of grey-rhinos have a downward trend, while Rh values of non-grey-rhinos have a contrary trend. Rh values of grey-rhinos are scattered relatively in the early stage and centralize gradually, but non-grey-rhinos do not have this feature. Research limitations There are four main limitations. First, if a technology satisfies the quantitative standard of the model, it is likely to be a grey-rhino but expert judgments are necessary. Second, we don’t know why it will be eminent, which involves technical contents. Thirdly, we did not consider the China National Intellectual Property Administration (CNIPA) and the German Patent and Trademark Office (DPMA) which also play important roles in worldwide patents, so we hope to expand our study to the CNIPA and the DPMA. Furthermore, we did not compare the rhino-index with other patent indicators. Practical implications If a technology meets the quantitative standard, this can be seen as early warning signals and the technology may become a grey-rhino in the future, which can catch people's at
{"title":"Identifying grey-rhino in eminent technologies via patent analysis","authors":"Shelia X. Wei, Helena H. Zhang, Howell Y. Wang, F. Y. Ye","doi":"10.2478/jdis-2023-0002","DOIUrl":"https://doi.org/10.2478/jdis-2023-0002","url":null,"abstract":"Abstract Purpose Following the typical features of the grey-rhino event as predictability and profound influence, we attempt to find a special pattern called the grey-rhino in eminent technologies via patent analysis. Design/methodology/approach We propose to combine triadic patent families and technology life cycle to define the grey-rhino model. Firstly, we design the indicator rhino-index Rh = ST/SP and descriptor sequence {Rh}, where ST and SP are the accumulative number of triadic patent families and all patent families respectively for a specific technology. Secondly, according to the two typical features of the grey-rhino event, a grey-rhino is defined as a technology that meets both qualitative and quantitative conditions. Qualitatively, this technology has a profound influence. Quantitatively, in the emerging stage, Rh ≥ Rae, where Rae is the average level of the proportion of triadic patent families. Finally, this model is verified in three datasets, namely Encyclopedia Britannica's list for the greatest inventions (EB technologies for short), MIT breakthrough technologies (MIT technologies) and Derwent Manual Code technologies (MAN technologies). Findings The result shows that there are 64.71% EB technologies and 50.00% MIT technologies meeting the quantitative standard of the grey-rhino model, but only 14.71% MAN technologies fit the quantitative standard. This falling trend indicates the quantitative standard of the grey-rhino model is reasonable. EB technologies and MIT technologies have profound influence on society, which means they satisfy the qualitative standard of the grey-rhino model. Hence, 64.71% EB technologies and 50.00% MIT technologies are grey-rhinos. In 14.71% MAN technologies meeting the quantitative standard, we make some qualitative judgments and deem U11-A01A, U12-A01A1A, and W01-A01A as grey-rhino technologies. In addition, grey-rhinos and non-grey-rhinos have some differences. Rh values of grey-rhinos have a downward trend, while Rh values of non-grey-rhinos have a contrary trend. Rh values of grey-rhinos are scattered relatively in the early stage and centralize gradually, but non-grey-rhinos do not have this feature. Research limitations There are four main limitations. First, if a technology satisfies the quantitative standard of the model, it is likely to be a grey-rhino but expert judgments are necessary. Second, we don’t know why it will be eminent, which involves technical contents. Thirdly, we did not consider the China National Intellectual Property Administration (CNIPA) and the German Patent and Trademark Office (DPMA) which also play important roles in worldwide patents, so we hope to expand our study to the CNIPA and the DPMA. Furthermore, we did not compare the rhino-index with other patent indicators. Practical implications If a technology meets the quantitative standard, this can be seen as early warning signals and the technology may become a grey-rhino in the future, which can catch people's at","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"8 1","pages":"47 - 71"},"PeriodicalIF":0.0,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48514635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Thelwall, K. Kousha, Meiko Makita, Mahshid Abdoli, E. Stuart, Paul Wilson, Jonathan M. Levitt
Abstract Collaborative research causes problems for research assessments because of the difficulty in fairly crediting its authors. Whilst splitting the rewards for an article amongst its authors has the greatest surface-level fairness, many important evaluations assign full credit to each author, irrespective of team size. The underlying rationales for this are labour reduction and the need to incentivise collaborative work because it is necessary to solve many important societal problems. This article assesses whether full counting changes results compared to fractional counting in the case of the UK's Research Excellence Framework (REF) 2021. For this assessment, fractional counting reduces the number of journal articles to as little as 10% of the full counting value, depending on the Unit of Assessment (UoA). Despite this large difference, allocating an overall grade point average (GPA) based on full counting or fractional counting gives results with a median Pearson correlation within UoAs of 0.98. The largest changes are for Archaeology (r=0.84) and Physics (r=0.88). There is a weak tendency for higher scoring institutions to lose from fractional counting, with the loss being statistically significant in 5 of the 34 UoAs. Thus, whilst the apparent over-weighting of contributions to collaboratively authored outputs does not seem too problematic from a fairness perspective overall, it may be worth examining in the few UoAs in which it makes the most difference.
{"title":"Is big team research fair in national research assessments? The case of the UK Research Excellence Framework 2021","authors":"M. Thelwall, K. Kousha, Meiko Makita, Mahshid Abdoli, E. Stuart, Paul Wilson, Jonathan M. Levitt","doi":"10.2478/jdis-2023-0004","DOIUrl":"https://doi.org/10.2478/jdis-2023-0004","url":null,"abstract":"Abstract Collaborative research causes problems for research assessments because of the difficulty in fairly crediting its authors. Whilst splitting the rewards for an article amongst its authors has the greatest surface-level fairness, many important evaluations assign full credit to each author, irrespective of team size. The underlying rationales for this are labour reduction and the need to incentivise collaborative work because it is necessary to solve many important societal problems. This article assesses whether full counting changes results compared to fractional counting in the case of the UK's Research Excellence Framework (REF) 2021. For this assessment, fractional counting reduces the number of journal articles to as little as 10% of the full counting value, depending on the Unit of Assessment (UoA). Despite this large difference, allocating an overall grade point average (GPA) based on full counting or fractional counting gives results with a median Pearson correlation within UoAs of 0.98. The largest changes are for Archaeology (r=0.84) and Physics (r=0.88). There is a weak tendency for higher scoring institutions to lose from fractional counting, with the loss being statistically significant in 5 of the 34 UoAs. Thus, whilst the apparent over-weighting of contributions to collaboratively authored outputs does not seem too problematic from a fairness perspective overall, it may be worth examining in the few UoAs in which it makes the most difference.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"8 1","pages":"9 - 20"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69216606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-29DOI: 10.48550/arXiv.2211.16124
O. Mryglod, Serhii Nazarovets, S. Kozmenko
Abstract Purpose To supplement the quantitative portrait of Ukrainian Economics discipline with the results of gender and author ordering analysis at the level of individual authors, special methods of working with bibliographic data with a predominant share of non-English authors are used. The properties of gender mixing, the likelihood of male and female authors occupying the first position in the authorship list, as well as the arrangements of names are studied. Design/methodology/approach A data set containing bibliographic records related to Ukrainian journal publications in the field of Economics is constructed using Crossref metadata. Partial semi-automatic disambiguation of authors’ names is performed. First names, along with gender-specific ethnic surnames, are used for gender disambiguation required for further comparative gender analysis. Random reshuffling of data is used to determine the impact of gender correlations. To assess the level of alphabetization for our data set, both Latin and Cyrillic versions of names are taken into account. Findings The lack of well-structured metadata and the poor use of digital identifiers lead to numerous problems with automatization of bibliographic data pre-processing, especially in the case of publications by non-Western authors. The described stages for working with such specific data help to work at the level of authors and analyse, in particular, gender issues. Despite the larger number of female authors, gender equality is more likely to be reported at the individual level for the discipline of Ukrainian Economics. The tendencies towards collaborative or solo-publications and gender mixing patterns are found to be dependent on the journal: the differences for publications indexed in Scopus and/or Web of Science databases are found. It has also been found that Ukrainian Economics research is characterized by rather a non-alphabetical order of authors. Research limitations Only partial authors’ name disambiguation is performed in a semi-automatic way. Gender labels can be derived only for authors declared by full First names or gender-specific Last names. Practical implications The typical features of Ukrainian Economic discipline can be used to perform a comparison with other countries and disciplines, to develop an informed-based assessment procedure at the national level. The proposed way of processing publication data can be borrowed to enrich metadata about other research disciplines, especially for non-English speaking countries. Originality/value To our knowledge, this is the first large-scale quantitative study of Ukrainian Economic discipline. The results obtained are valuable not only at the national level, but also contribute to general knowledge about Economic research, gender issues, and authors’ names ordering. An example of the use of Crossref data is provided, while this data source is still less used due to a number of drawbacks. Here, for the first time, attention is drawn to
摘要目的:为了补充乌克兰经济学学科的定量肖像与性别和作者排序分析的结果,在个别作者的水平,与非英语作者占主导地位的书目数据工作的特殊方法被使用。研究了性别混合的性质、男性和女性作者在作者名单中占据第一位置的可能性以及名字的排列。设计/方法/方法使用Crossref元数据构建了一个包含与乌克兰经济学领域期刊出版物相关的书目记录的数据集。对作者姓名进行部分半自动消歧。在进一步的性别比较分析中,使用名字和特定性别的民族姓氏来消除性别歧义。随机重新洗牌的数据被用来确定性别相关性的影响。为了评估我们的数据集的字母化程度,我们同时考虑了拉丁和西里尔字母版本的名字。缺乏结构良好的元数据和数字标识符的不良使用导致书目数据预处理自动化的许多问题,特别是在非西方作者的出版物中。所述处理这些具体数据的阶段有助于在作者一级开展工作,特别是分析性别问题。尽管女性作者人数较多,但乌克兰经济学学科在个人层面上更有可能报告性别平等。研究发现,合作或单独发表的趋势以及性别混合模式取决于期刊:在Scopus和/或Web of Science数据库中索引的出版物之间存在差异。人们还发现,乌克兰经济学研究的特点是作者的顺序不是按字母顺序排列的。研究局限:仅采用半自动方式对部分作者姓名进行消歧。性别标签只能为由全名或特定性别的姓氏声明的作者派生。乌克兰经济学科的典型特征可用于与其他国家和学科进行比较,以在国家一级制定基于信息的评估程序。所提出的处理出版数据的方法可以用来丰富其他研究学科的元数据,特别是对于非英语国家。据我们所知,这是对乌克兰经济学科的第一次大规模定量研究。所获得的结果不仅在国家层面上有价值,而且有助于对经济研究,性别问题和作者姓名排序的一般知识。本文提供了一个使用Crossref数据的示例,但由于存在许多缺点,该数据源的使用仍然较少。在这里,人们第一次注意到斯拉夫人姓名特征的明确使用。
{"title":"Peculiarities of gender disambiguation and ordering of non-English authors’ names for Economic papers beyond core databases","authors":"O. Mryglod, Serhii Nazarovets, S. Kozmenko","doi":"10.48550/arXiv.2211.16124","DOIUrl":"https://doi.org/10.48550/arXiv.2211.16124","url":null,"abstract":"Abstract Purpose To supplement the quantitative portrait of Ukrainian Economics discipline with the results of gender and author ordering analysis at the level of individual authors, special methods of working with bibliographic data with a predominant share of non-English authors are used. The properties of gender mixing, the likelihood of male and female authors occupying the first position in the authorship list, as well as the arrangements of names are studied. Design/methodology/approach A data set containing bibliographic records related to Ukrainian journal publications in the field of Economics is constructed using Crossref metadata. Partial semi-automatic disambiguation of authors’ names is performed. First names, along with gender-specific ethnic surnames, are used for gender disambiguation required for further comparative gender analysis. Random reshuffling of data is used to determine the impact of gender correlations. To assess the level of alphabetization for our data set, both Latin and Cyrillic versions of names are taken into account. Findings The lack of well-structured metadata and the poor use of digital identifiers lead to numerous problems with automatization of bibliographic data pre-processing, especially in the case of publications by non-Western authors. The described stages for working with such specific data help to work at the level of authors and analyse, in particular, gender issues. Despite the larger number of female authors, gender equality is more likely to be reported at the individual level for the discipline of Ukrainian Economics. The tendencies towards collaborative or solo-publications and gender mixing patterns are found to be dependent on the journal: the differences for publications indexed in Scopus and/or Web of Science databases are found. It has also been found that Ukrainian Economics research is characterized by rather a non-alphabetical order of authors. Research limitations Only partial authors’ name disambiguation is performed in a semi-automatic way. Gender labels can be derived only for authors declared by full First names or gender-specific Last names. Practical implications The typical features of Ukrainian Economic discipline can be used to perform a comparison with other countries and disciplines, to develop an informed-based assessment procedure at the national level. The proposed way of processing publication data can be borrowed to enrich metadata about other research disciplines, especially for non-English speaking countries. Originality/value To our knowledge, this is the first large-scale quantitative study of Ukrainian Economic discipline. The results obtained are valuable not only at the national level, but also contribute to general knowledge about Economic research, gender issues, and authors’ names ordering. An example of the use of Crossref data is provided, while this data source is still less used due to a number of drawbacks. Here, for the first time, attention is drawn to ","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"8 1","pages":"72 - 89"},"PeriodicalIF":0.0,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42279156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Purpose The aim of our paper is to investigate the role of a mentor leading a research team in the overall scientific performance of an academic institution and the possible risks of their departure with a special attention to their publication output. Design/methodology/approach By using SciVal subject area data, we composed a formula describing the level of vulnerability of any given university in the case of losing any of its leading mentors, identifying other risk factors by dividing their careers into separate stages. Findings It turns out that the higher field-weighed citation impact is, the better position universities reach in the rankings by subject and the vulnerability of institutions highly depends on the mentors, especially in view of their contribution to the topic clusters. Research limitations The analysis covers the publication output of leading researchers working at four Hungarian universities, the scope of the analysis is worth being extended. Practical implications Our analysis has the potential to give an applicable systemic approach as well as a data collection scheme to university managements so as to formulate an inclusive and comprehensive research strategy involving the introduction of a reward system aimed at publications and further encouraging national and international research cooperation. Originality/value The methodology and the principles of risk assessment laid down in our paper are not restricted to measuring the vulnerability level of a limited group of academic institutions, they can be appropriately used for investigating the role of mentors or leading researchers at every university across the globe.
{"title":"Subject Area Risk Assessment of Four Hungarian Universities with a View to the QS University Rankings by Subject","authors":"P. Sasvári, Anna Urbanovics","doi":"10.2478/jdis-2022-0023","DOIUrl":"https://doi.org/10.2478/jdis-2022-0023","url":null,"abstract":"Abstract Purpose The aim of our paper is to investigate the role of a mentor leading a research team in the overall scientific performance of an academic institution and the possible risks of their departure with a special attention to their publication output. Design/methodology/approach By using SciVal subject area data, we composed a formula describing the level of vulnerability of any given university in the case of losing any of its leading mentors, identifying other risk factors by dividing their careers into separate stages. Findings It turns out that the higher field-weighed citation impact is, the better position universities reach in the rankings by subject and the vulnerability of institutions highly depends on the mentors, especially in view of their contribution to the topic clusters. Research limitations The analysis covers the publication output of leading researchers working at four Hungarian universities, the scope of the analysis is worth being extended. Practical implications Our analysis has the potential to give an applicable systemic approach as well as a data collection scheme to university managements so as to formulate an inclusive and comprehensive research strategy involving the introduction of a reward system aimed at publications and further encouraging national and international research cooperation. Originality/value The methodology and the principles of risk assessment laid down in our paper are not restricted to measuring the vulnerability level of a limited group of academic institutions, they can be appropriately used for investigating the role of mentors or leading researchers at every university across the globe.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"7 1","pages":"61 - 80"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47795519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}