Abstract In a recent letter to QSS, Kyle Siler (2021), made harsh comments against the decision of the editors to publish a controversial paper signed by Alessandro Strumia (2021) about gender differences in high-energy physics. My aim here is to point to the elements in Siler’s letter that are typical of a new tendency to replace rational and technical arguments with a series of moral statements and ex cathedra affirmations that are not supported by cogent arguments. Such an approach can only be detrimental to rational debates within the bibliometric research community.
{"title":"Towards a moralization of bibliometrics? A response to Kyle Siler","authors":"Y. Gingras","doi":"10.1162/qss_c_00178","DOIUrl":"https://doi.org/10.1162/qss_c_00178","url":null,"abstract":"Abstract In a recent letter to QSS, Kyle Siler (2021), made harsh comments against the decision of the editors to publish a controversial paper signed by Alessandro Strumia (2021) about gender differences in high-energy physics. My aim here is to point to the elements in Siler’s letter that are typical of a new tendency to replace rational and technical arguments with a series of moral statements and ex cathedra affirmations that are not supported by cogent arguments. Such an approach can only be detrimental to rational debates within the bibliometric research community.","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"3 1","pages":"315-318"},"PeriodicalIF":6.4,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42164765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract The effectiveness of research units is assessed on the basis of their performance in relation to scientific, technological, and innovation production, the quality of their results, and their contribution to the solution of scientific and social problems. We examine the management practices employed in some Mexican National Laboratories to identify those practices that could explain their effectiveness in meeting their objectives. The results of other research that propose common elements among laboratories with outstanding performance are used and verified directly in the field. Considering the inherent complexity of each field of knowledge and the sociospatial characteristics in which the laboratories operate, we report which management practices are relevant for their effectiveness, how they contribute to their consolidation as fundamental scientific and technological infrastructures, and how these can be translated into indicators that support the evaluation of their performance.
{"title":"The management of scientific and technological infrastructures: The case of the Mexican National Laboratories","authors":"Leonardo Munguía, E. Robles-Belmont, J. Escalante","doi":"10.1162/qss_a_00230","DOIUrl":"https://doi.org/10.1162/qss_a_00230","url":null,"abstract":"Abstract The effectiveness of research units is assessed on the basis of their performance in relation to scientific, technological, and innovation production, the quality of their results, and their contribution to the solution of scientific and social problems. We examine the management practices employed in some Mexican National Laboratories to identify those practices that could explain their effectiveness in meeting their objectives. The results of other research that propose common elements among laboratories with outstanding performance are used and verified directly in the field. Considering the inherent complexity of each field of knowledge and the sociospatial characteristics in which the laboratories operate, we report which management practices are relevant for their effectiveness, how they contribute to their consolidation as fundamental scientific and technological infrastructures, and how these can be translated into indicators that support the evaluation of their performance.","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"4 1","pages":"246-261"},"PeriodicalIF":6.4,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42055891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexis-Michel Mugabushaka, Nees Jan van Eck, L. Waltman
Abstract To analyze the outcomes of the funding they provide, it is essential for funding agencies to be able to trace the publications resulting from their funding. We study the open availability of funding data in Crossref, focusing on funding data for publications that report research related to COVID-19. We also present a comparison with the funding data available in two proprietary bibliometric databases: Scopus and Web of Science. Our analysis reveals limited coverage of funding data in Crossref. It also shows problems related to the quality of funding data, especially in Scopus. We offer recommendations for improving the open availability of funding data in Crossref.
摘要为了分析他们提供的资助的结果,资助机构必须能够追踪其资助产生的出版物。我们研究了Crossref中资金数据的公开可用性,重点研究了报告新冠肺炎相关研究的出版物的资金数据。我们还与Scopus和Web of Science这两个专有文献计量数据库中的资助数据进行了比较。我们的分析显示,Crossref中的资金数据覆盖范围有限。它还显示了与资金数据质量有关的问题,特别是Scopus中的问题。我们为改善Crossref中资金数据的公开可用性提供建议。
{"title":"Funding COVID-19 research: Insights from an exploratory analysis using open data infrastructures","authors":"Alexis-Michel Mugabushaka, Nees Jan van Eck, L. Waltman","doi":"10.1162/qss_a_00212","DOIUrl":"https://doi.org/10.1162/qss_a_00212","url":null,"abstract":"Abstract To analyze the outcomes of the funding they provide, it is essential for funding agencies to be able to trace the publications resulting from their funding. We study the open availability of funding data in Crossref, focusing on funding data for publications that report research related to COVID-19. We also present a comparison with the funding data available in two proprietary bibliometric databases: Scopus and Web of Science. Our analysis reveals limited coverage of funding data in Crossref. It also shows problems related to the quality of funding data, especially in Scopus. We offer recommendations for improving the open availability of funding data in Crossref.","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"3 1","pages":"560-582"},"PeriodicalIF":6.4,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45834216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Wedell, Minhyuk Park, Dmitriy Korobskiy, T. Warnow, George Chacko
Abstract Clustering and community detection in networks are of broad interest and have been the subject of extensive research that spans several fields. We are interested in the relatively narrow question of detecting communities of scientific publications that are linked by citations. These publication communities can be used to identify scientists with shared interests who form communities of researchers. Building on the well-known k-core algorithm, we have developed a modular pipeline to find publication communities with center–periphery structure. Using a quantitative and qualitative approach, we evaluate community finding results on a citation network consisting of over 14 million publications relevant to the field of extracellular vesicles. We compare our approach to communities discovered by the widely used Leiden algorithm for community finding.
{"title":"Center–periphery structure in research communities","authors":"E. Wedell, Minhyuk Park, Dmitriy Korobskiy, T. Warnow, George Chacko","doi":"10.1162/qss_a_00184","DOIUrl":"https://doi.org/10.1162/qss_a_00184","url":null,"abstract":"Abstract Clustering and community detection in networks are of broad interest and have been the subject of extensive research that spans several fields. We are interested in the relatively narrow question of detecting communities of scientific publications that are linked by citations. These publication communities can be used to identify scientists with shared interests who form communities of researchers. Building on the well-known k-core algorithm, we have developed a modular pipeline to find publication communities with center–periphery structure. Using a quantitative and qualitative approach, we evaluate community finding results on a citation network consisting of over 14 million publications relevant to the field of extracellular vesicles. We compare our approach to communities discovered by the widely used Leiden algorithm for community finding.","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"214 1","pages":"289-314"},"PeriodicalIF":6.4,"publicationDate":"2022-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73980597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Although several large knowledge graphs have been proposed in the scholarly field, such graphs are limited with respect to several data quality dimensions such as accuracy and coverage. In this article, we present methods for enhancing the Microsoft Academic Knowledge Graph (MAKG), a recently published large-scale knowledge graph containing metadata about scientific publications and associated authors, venues, and affiliations. Based on a qualitative analysis of the MAKG, we address three aspects. First, we adopt and evaluate unsupervised approaches for large-scale author name disambiguation. Second, we develop and evaluate methods for tagging publications by their discipline and by keywords, facilitating enhanced search and recommendation of publications and associated entities. Third, we compute and evaluate embeddings for all 239 million publications, 243 million authors, 49,000 journals, and 16,000 conference entities in the MAKG based on several state-of-the-art embedding techniques. Finally, we provide statistics for the updated MAKG. Our final MAKG is publicly available at https://makg.org and can be used for the search or recommendation of scholarly entities, as well as enhanced scientific impact quantification.
{"title":"The Microsoft Academic Knowledge Graph enhanced: Author name disambiguation, publication classification, and embeddings","authors":"Michael Färber, Lin Ao","doi":"10.1162/qss_a_00183","DOIUrl":"https://doi.org/10.1162/qss_a_00183","url":null,"abstract":"Abstract Although several large knowledge graphs have been proposed in the scholarly field, such graphs are limited with respect to several data quality dimensions such as accuracy and coverage. In this article, we present methods for enhancing the Microsoft Academic Knowledge Graph (MAKG), a recently published large-scale knowledge graph containing metadata about scientific publications and associated authors, venues, and affiliations. Based on a qualitative analysis of the MAKG, we address three aspects. First, we adopt and evaluate unsupervised approaches for large-scale author name disambiguation. Second, we develop and evaluate methods for tagging publications by their discipline and by keywords, facilitating enhanced search and recommendation of publications and associated entities. Third, we compute and evaluate embeddings for all 239 million publications, 243 million authors, 49,000 journals, and 16,000 conference entities in the MAKG based on several state-of-the-art embedding techniques. Finally, we provide statistics for the updated MAKG. Our final MAKG is publicly available at https://makg.org and can be used for the search or recommendation of scholarly entities, as well as enhanced scientific impact quantification.","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"3 1","pages":"51-98"},"PeriodicalIF":6.4,"publicationDate":"2022-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47031666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Formal assessments of the quality of the research produced by departments and universities are now conducted by many countries to monitor achievements and allocate performance-related funding. These evaluations are hugely time consuming if conducted by postpublication peer review and are simplistic if based on citations or journal impact factors. I investigate whether machine learning could help reduce the burden of peer review by using citations and metadata to learn how to score articles from a sample assessed by peer review. An experiment is used to underpin the discussion, attempting to predict journal citation thirds, as a proxy for article quality scores, for all Scopus narrow fields from 2014 to 2020. The results show that these proxy quality thirds can be predicted with above baseline accuracy in all 326 narrow fields, with Gradient Boosting Classifier, Random Forest Classifier, or Multinomial Naïve Bayes being the most accurate in nearly all cases. Nevertheless, the results partly leverage journal writing styles and topics, which are unwanted for some practical applications and cause substantial shifts in average scores between countries and between institutions within a country. There may be scope for predicting articles’ scores when the predictions have the highest probability.
{"title":"Can the quality of published academic journal articles be assessed with machine learning?","authors":"M. Thelwall","doi":"10.1162/qss_a_00185","DOIUrl":"https://doi.org/10.1162/qss_a_00185","url":null,"abstract":"Abstract Formal assessments of the quality of the research produced by departments and universities are now conducted by many countries to monitor achievements and allocate performance-related funding. These evaluations are hugely time consuming if conducted by postpublication peer review and are simplistic if based on citations or journal impact factors. I investigate whether machine learning could help reduce the burden of peer review by using citations and metadata to learn how to score articles from a sample assessed by peer review. An experiment is used to underpin the discussion, attempting to predict journal citation thirds, as a proxy for article quality scores, for all Scopus narrow fields from 2014 to 2020. The results show that these proxy quality thirds can be predicted with above baseline accuracy in all 326 narrow fields, with Gradient Boosting Classifier, Random Forest Classifier, or Multinomial Naïve Bayes being the most accurate in nearly all cases. Nevertheless, the results partly leverage journal writing styles and topics, which are unwanted for some practical applications and cause substantial shifts in average scores between countries and between institutions within a country. There may be scope for predicting articles’ scores when the predictions have the highest probability.","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"3 1","pages":"208-226"},"PeriodicalIF":6.4,"publicationDate":"2022-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49489149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Newton’s centuries-old wisdom of standing on the shoulders of giants raises a crucial yet underexplored question: Out of all the prior works cited by a discovery, which one is its giant? Here, we develop a discipline-independent method to identify the giant for any individual paper, allowing us to better understand the role and characteristics of giants in science. We find that across disciplines, about 95% of papers appear to stand on the shoulders of giants, yet the weight of scientific progress rests on relatively few shoulders. Defining a new measure of giant index, we find that, while papers with high citations are more likely to be giants, for papers with the same citations, their giant index sharply predicts a paper’s future impact and prize-winning probabilities. Giants tend to originate from both small and large teams, being either highly disruptive or highly developmental. Papers that did not have a giant tend to do poorly on average, yet interestingly, if such papers later became a giant for other papers, they tend to be home-run papers that are highly disruptive to science. Given the crucial importance of citation-based measures in science, the developed concept of giants may offer a useful dimension in assessing scientific impact that goes beyond sheer citation counts.
{"title":"See further upon the giants: Quantifying intellectual lineage in science","authors":"Woo Seong Jo, Lu Liu, Dashun Wang","doi":"10.1162/qss_a_00186","DOIUrl":"https://doi.org/10.1162/qss_a_00186","url":null,"abstract":"Abstract Newton’s centuries-old wisdom of standing on the shoulders of giants raises a crucial yet underexplored question: Out of all the prior works cited by a discovery, which one is its giant? Here, we develop a discipline-independent method to identify the giant for any individual paper, allowing us to better understand the role and characteristics of giants in science. We find that across disciplines, about 95% of papers appear to stand on the shoulders of giants, yet the weight of scientific progress rests on relatively few shoulders. Defining a new measure of giant index, we find that, while papers with high citations are more likely to be giants, for papers with the same citations, their giant index sharply predicts a paper’s future impact and prize-winning probabilities. Giants tend to originate from both small and large teams, being either highly disruptive or highly developmental. Papers that did not have a giant tend to do poorly on average, yet interestingly, if such papers later became a giant for other papers, they tend to be home-run papers that are highly disruptive to science. Given the crucial importance of citation-based measures in science, the developed concept of giants may offer a useful dimension in assessing scientific impact that goes beyond sheer citation counts.","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"3 1","pages":"319-330"},"PeriodicalIF":6.4,"publicationDate":"2022-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45165060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract A much-debated topic is the role of universities in the prosperity of cities and regions. Two major problems arise. First, what is a reliable measurement of prosperity? And second, what are the characteristics, particularly research performance, of a university that matter? I focus on this research question: Is there a significant relation between having a university and a city’s socioeconomic strength? And if so, what are the determining indicators of a university; for instance, how important is scientific collaboration? What is the role of scientific quality measured by citation impact? Does the size of a university, measured in number of publications or in number of students matter? I compiled a database of city and university data: gross urban product and population data of nearly 200 German cities and 400 districts. University data are derived from the Leiden Ranking 2020 and supplemented with data on the number of students. The socioeconomic strength of a city is determined using the urban scaling methodology. My study shows a significant relation between the presence of a university in a city and its socioeconomic indicators, particularly for larger cities, and that this is especially the case for universities with higher values of their output, impact and collaboration indicators.
{"title":"German cities with universities: Socioeconomic position and university performance","authors":"A. V. van Raan","doi":"10.1162/qss_a_00182","DOIUrl":"https://doi.org/10.1162/qss_a_00182","url":null,"abstract":"Abstract A much-debated topic is the role of universities in the prosperity of cities and regions. Two major problems arise. First, what is a reliable measurement of prosperity? And second, what are the characteristics, particularly research performance, of a university that matter? I focus on this research question: Is there a significant relation between having a university and a city’s socioeconomic strength? And if so, what are the determining indicators of a university; for instance, how important is scientific collaboration? What is the role of scientific quality measured by citation impact? Does the size of a university, measured in number of publications or in number of students matter? I compiled a database of city and university data: gross urban product and population data of nearly 200 German cities and 400 districts. University data are derived from the Leiden Ranking 2020 and supplemented with data on the number of students. The socioeconomic strength of a city is determined using the urban scaling methodology. My study shows a significant relation between the presence of a university in a city and its socioeconomic indicators, particularly for larger cities, and that this is especially the case for universities with higher values of their output, impact and collaboration indicators.","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"3 1","pages":"265-288"},"PeriodicalIF":6.4,"publicationDate":"2022-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43520853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-02-04eCollection Date: 2022-02-01DOI: 10.1162/qss_a_00155
Tzu-Kun Hsiao, Jodi Schneider
We present the first database-wide study on the citation contexts of retracted papers, which covers 7,813 retracted papers indexed in PubMed, 169,434 citations collected from iCite, and 48,134 citation contexts identified from the XML version of the PubMed Central Open Access Subset. Compared with previous citation studies that focused on comparing citation counts using two time frames (i.e., preretraction and postretraction), our analyses show the longitudinal trends of citations to retracted papers in the past 60 years (1960-2020). Our temporal analyses show that retracted papers continued to be cited, but that old retracted papers stopped being cited as time progressed. Analysis of the text progression of pre- and postretraction citation contexts shows that retraction did not change the way the retracted papers were cited. Furthermore, among the 13,252 postretraction citation contexts, only 722 (5.4%) citation contexts acknowledged the retraction. In these 722 citation contexts, the retracted papers were most commonly cited as related work or as an example of problematic science. Our findings deepen the understanding of why retraction does not stop citation and demonstrate that the vast majority of postretraction citations in biomedicine do not document the retraction.
我们首次在数据库范围内对撤回论文的引用上下文进行了研究,涵盖了在PubMed中索引的7813篇撤回论文,从iCite收集的169434篇引用,以及从PubMed Central Open Access子集的XML版本中确定的48134篇引用上下文。与之前的引文研究相比,我们的分析显示了过去60年(1960-2020)中被撤回论文的引文的纵向趋势,这些研究侧重于使用两个时间框架(即撤回前和撤回后)来比较引文计数。我们的时间分析表明,撤回的论文继续被引用,但随着时间的推移,旧的撤回论文不再被引用。对撤回前后引文语境的文本进展分析表明,撤回并没有改变撤回论文的引用方式。此外,在13252个撤回后引用上下文中,只有722个(5.4%)引用上下文承认撤回。在这722篇引文中,被撤回的论文最常被引用为相关工作或有问题科学的例子。我们的研究结果加深了对撤回为什么不会停止引用的理解,并表明生物医学中绝大多数撤回后引用都没有记录撤回。
{"title":"Continued use of retracted papers: Temporal trends in citations and (lack of) awareness of retractions shown in citation contexts in biomedicine.","authors":"Tzu-Kun Hsiao, Jodi Schneider","doi":"10.1162/qss_a_00155","DOIUrl":"10.1162/qss_a_00155","url":null,"abstract":"<p><p>We present the first database-wide study on the citation contexts of retracted papers, which covers 7,813 retracted papers indexed in PubMed, 169,434 citations collected from iCite, and 48,134 citation contexts identified from the XML version of the PubMed Central Open Access Subset. Compared with previous citation studies that focused on comparing citation counts using two time frames (i.e., preretraction and postretraction), our analyses show the longitudinal trends of citations to retracted papers in the past 60 years (1960-2020). Our temporal analyses show that retracted papers continued to be cited, but that old retracted papers stopped being cited as time progressed. Analysis of the text progression of pre- and postretraction citation contexts shows that retraction did not change the way the retracted papers were cited. Furthermore, among the 13,252 postretraction citation contexts, only 722 (5.4%) citation contexts acknowledged the retraction. In these 722 citation contexts, the retracted papers were most commonly cited as related work or as an example of problematic science. Our findings deepen the understanding of why retraction does not stop citation and demonstrate that the vast majority of postretraction citations in biomedicine do not document the retraction.</p>","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"2 4","pages":"1144-1169"},"PeriodicalIF":4.1,"publicationDate":"2022-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9520488/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40391371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract We use several sources to collect and evaluate academic scientific publication on a country-wide scale, and we apply it to the case of France for the years 2015–2020, while presenting a more detailed analysis focused on the reference year 2019. These sources are diverse: databases available by subscription (Scopus, Web of Science) or open to the scientific community (Microsoft Academic Graph), the national open archive HAL, and databases serving thematic communities (ADS and PubMed). We show the contribution of the different sources to the final corpus. These results are then compared to those obtained with another approach, that of the French Open Science Barometer for monitoring open access at the national level. We show that both approaches provide a convergent estimate of the open access rate. We also present and discuss the definitions of the concepts used, and list the main difficulties encountered in processing the data. The results of this study contribute to a better understanding of the respective contributions of the main databases and their complementarity in the broad framework of a countrywide corpus. They also shed light on the calculation of open access rates and thus contribute to a better understanding of current developments in the field of open science.
摘要我们使用几个来源在全国范围内收集和评估学术科学出版物,并将其应用于法国2015-2020年的案例,同时对参考年2019进行了更详细的分析。这些来源多种多样:可通过订阅获得的数据库(Scopus、Web of Science)或向科学界开放的数据库(Microsoft Academic Graph)、国家开放档案HAL,以及为主题社区服务的数据库(ADS和PubMed)。我们展示了不同来源对最终语料库的贡献。然后将这些结果与另一种方法获得的结果进行比较,即法国开放科学晴雨表,用于监测国家层面的开放获取。我们表明,这两种方法都提供了对开放访问率的收敛估计。我们还介绍和讨论了所用概念的定义,并列出了处理数据时遇到的主要困难。这项研究的结果有助于更好地理解主要数据库各自的贡献及其在全国语料库的广泛框架中的互补性。它们还阐明了开放获取率的计算,从而有助于更好地了解开放科学领域的当前发展。
{"title":"Identifying scientific publications countrywide and measuring their open access: The case of the French Open Science Barometer (BSO)","authors":"Lauranne Chaignon, D. Egret","doi":"10.1162/qss_a_00179","DOIUrl":"https://doi.org/10.1162/qss_a_00179","url":null,"abstract":"Abstract We use several sources to collect and evaluate academic scientific publication on a country-wide scale, and we apply it to the case of France for the years 2015–2020, while presenting a more detailed analysis focused on the reference year 2019. These sources are diverse: databases available by subscription (Scopus, Web of Science) or open to the scientific community (Microsoft Academic Graph), the national open archive HAL, and databases serving thematic communities (ADS and PubMed). We show the contribution of the different sources to the final corpus. These results are then compared to those obtained with another approach, that of the French Open Science Barometer for monitoring open access at the national level. We show that both approaches provide a convergent estimate of the open access rate. We also present and discuss the definitions of the concepts used, and list the main difficulties encountered in processing the data. The results of this study contribute to a better understanding of the respective contributions of the main databases and their complementarity in the broad framework of a countrywide corpus. They also shed light on the calculation of open access rates and thus contribute to a better understanding of current developments in the field of open science.","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"3 1","pages":"18-36"},"PeriodicalIF":6.4,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46750125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}