首页 > 最新文献

Quantitative Science Studies最新文献

英文 中文
Overton: A bibliometric database of policy document citations Overton:政策文件引用文献计量数据库
IF 6.4 Q1 INFORMATION SCIENCE & LIBRARY SCIENCE Pub Date : 2022-01-19 DOI: 10.1162/qss_a_00204
M. Szomszor, E. Adie
Abstract This paper presents an analysis of the Overton policy document database, describing the makeup of materials indexed and the nature in which they cite academic literature. We report on various aspects of the data, including growth, geographic spread, language representation, the range of policy source types included, and the availability of citation links in documents. Longitudinal analysis over established journal category schemes is used to reveal the scale and disciplinary focus of citations and determine the feasibility of developing field-normalized citation indicators. To corroborate the data indexed, we also examine how well self-reported funding outcomes collected by UK funders correspond to data indexed in the Overton database. Finally, to test the data in an experimental setting, we assess whether peer-review assessment of impact as measured by the UK Research Excellence Framework (REF) 2014 correlates with derived policy citation metrics. Our findings show that for some research topics, such as health, economics, social care, and the environment, Overton contains a core set of policy documents with sufficient citation linkage to academic literature to support various citation analyses that may be informative in research evaluation, impact assessment, and policy review.
摘要本文对奥弗顿政策文件数据库进行了分析,描述了被索引材料的构成及其引用学术文献的性质。我们报告了数据的各个方面,包括增长、地理分布、语言表示、包括的政策来源类型的范围以及文档中引用链接的可用性。通过对已建立的期刊分类方案进行纵向分析,揭示引文的规模和学科重点,确定制定领域标准化引文指标的可行性。为了证实索引的数据,我们还检查了英国资助者收集的自我报告的资助结果与Overton数据库中索引的数据的对应程度。最后,为了在实验环境中检验数据,我们评估了2014年英国卓越研究框架(REF)衡量的同行评议影响评估是否与衍生的政策引用指标相关。我们的研究结果表明,对于一些研究主题,如健康、经济、社会关怀和环境,Overton包含一组核心政策文件,这些文件与学术文献有足够的引文链接,以支持各种引文分析,这些分析可能在研究评估、影响评估和政策审查中提供信息。
{"title":"Overton: A bibliometric database of policy document citations","authors":"M. Szomszor, E. Adie","doi":"10.1162/qss_a_00204","DOIUrl":"https://doi.org/10.1162/qss_a_00204","url":null,"abstract":"Abstract This paper presents an analysis of the Overton policy document database, describing the makeup of materials indexed and the nature in which they cite academic literature. We report on various aspects of the data, including growth, geographic spread, language representation, the range of policy source types included, and the availability of citation links in documents. Longitudinal analysis over established journal category schemes is used to reveal the scale and disciplinary focus of citations and determine the feasibility of developing field-normalized citation indicators. To corroborate the data indexed, we also examine how well self-reported funding outcomes collected by UK funders correspond to data indexed in the Overton database. Finally, to test the data in an experimental setting, we assess whether peer-review assessment of impact as measured by the UK Research Excellence Framework (REF) 2014 correlates with derived policy citation metrics. Our findings show that for some research topics, such as health, economics, social care, and the environment, Overton contains a core set of policy documents with sufficient citation linkage to academic literature to support various citation analyses that may be informative in research evaluation, impact assessment, and policy review.","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"3 1","pages":"624-650"},"PeriodicalIF":6.4,"publicationDate":"2022-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43471463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
A comparison of different methods of identifying publications related to the United Nations Sustainable Development Goals: Case study of SDG 13—Climate Action 识别与联合国可持续发展目标相关出版物的不同方法的比较:可持续发展目标13 -气候行动的案例研究
IF 6.4 Q1 INFORMATION SCIENCE & LIBRARY SCIENCE Pub Date : 2022-01-06 DOI: 10.1162/qss_a_00215
P. Purnell
Abstract As sustainability becomes an increasing priority throughout global society, academic and research institutions are assessed on their contribution to relevant research publications. This study compares four methods of identifying research publications related to United Nations Sustainable Development Goal 13—Climate Action (SDG 13). The four methods (Elsevier, STRINGS, SIRIS, and Dimensions) have each developed search strings with the help of subject matter experts, which are then enhanced through distinct methods to produce a final set of publications. Our analysis showed that the methods produced comparable quantities of publications but with little overlap between them. We visualized some difference in topic focus between the methods and drew links with the search strategies used. Differences between publications retrieved are likely to come from subjective interpretation of the goals, keyword selection, operationalizing search strategies, AI enhancements, and selection of bibliographic database. Each of the elements warrants deeper investigation to understand their role in identifying SDG-related research. Before choosing any method to assess the research contribution to SDGs, end users of SDG data should carefully consider their interpretation of the goal and determine which of the available methods produces the closest data set. Meanwhile, data providers might customize their methods for varying interpretations of the SDGs.
摘要随着可持续性在全球社会越来越受到重视,学术和研究机构将根据其对相关研究出版物的贡献进行评估。本研究比较了确定与联合国可持续发展目标13——气候行动(SDG 13)相关的研究出版物的四种方法。这四种方法(Elsevier、STRING、SIRIS和Dimensions)都在主题专家的帮助下开发了搜索字符串,然后通过不同的方法进行增强,以生成最终的出版物集。我们的分析表明,这些方法产生了数量相当的出版物,但它们之间几乎没有重叠。我们可视化了两种方法在主题焦点方面的一些差异,并绘制了与所使用的搜索策略的链接。检索到的出版物之间的差异可能来自对目标的主观解释、关键词选择、操作搜索策略、人工智能增强和书目数据库的选择。每一个要素都需要更深入的调查,以了解它们在确定可持续发展目标相关研究中的作用。在选择任何方法来评估对可持续发展目标的研究贡献之前,可持续发展目标数据的最终用户应仔细考虑他们对目标的解释,并确定哪种可用方法产生的数据集最接近。同时,数据提供者可能会根据对可持续发展目标的不同解释定制他们的方法。
{"title":"A comparison of different methods of identifying publications related to the United Nations Sustainable Development Goals: Case study of SDG 13—Climate Action","authors":"P. Purnell","doi":"10.1162/qss_a_00215","DOIUrl":"https://doi.org/10.1162/qss_a_00215","url":null,"abstract":"Abstract As sustainability becomes an increasing priority throughout global society, academic and research institutions are assessed on their contribution to relevant research publications. This study compares four methods of identifying research publications related to United Nations Sustainable Development Goal 13—Climate Action (SDG 13). The four methods (Elsevier, STRINGS, SIRIS, and Dimensions) have each developed search strings with the help of subject matter experts, which are then enhanced through distinct methods to produce a final set of publications. Our analysis showed that the methods produced comparable quantities of publications but with little overlap between them. We visualized some difference in topic focus between the methods and drew links with the search strategies used. Differences between publications retrieved are likely to come from subjective interpretation of the goals, keyword selection, operationalizing search strategies, AI enhancements, and selection of bibliographic database. Each of the elements warrants deeper investigation to understand their role in identifying SDG-related research. Before choosing any method to assess the research contribution to SDGs, end users of SDG data should carefully consider their interpretation of the goal and determine which of the available methods produces the closest data set. Meanwhile, data providers might customize their methods for varying interpretations of the SDGs.","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"3 1","pages":"976-1002"},"PeriodicalIF":6.4,"publicationDate":"2022-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46402790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Covid-19 refereeing duration and impact in major medical journals Covid-19在主要医学期刊的评审时间和影响
IF 6.4 Q1 INFORMATION SCIENCE & LIBRARY SCIENCE Pub Date : 2021-12-23 DOI: 10.1162/qss_a_00176
K. Kousha, M. Thelwall
Abstract Two partly conflicting academic pressures from the seriousness of the Covid-19 pandemic are the need for faster peer review of Covid-19 health-related research and greater scrutiny of its findings. This paper investigates whether decreases in peer review durations for Covid-19 articles were universal across 97 major medical journals, as well as Nature, Science, and Cell. The results suggest that on average, Covid-19 articles submitted during 2020 were reviewed 1.7–2.1 times faster than non-Covid-19 articles submitted during 2017–2020. Nevertheless, while the review speed of Covid-19 research was particularly fast during the first 5 months (1.9–3.4 times faster) of the pandemic (January–May 2020), this speed advantage was no longer evident for articles submitted in November–December 2020. Faster peer review was also associated with higher citation impact for Covid-19 articles in the same journals, suggesting it did not usually compromise the scholarly impact of important Covid-19 research. Overall, then, it seems that core medical and general journals responded quickly but carefully to the pandemic, although the situation returned closer to normal within a year.
摘要新冠肺炎大流行的严重性带来的两个部分相互矛盾的学术压力是,需要更快地对新冠肺炎健康相关研究进行同行评审,并对其研究结果进行更严格的审查。本文调查了在97种主要医学期刊以及《自然》、《科学》和《细胞》中,新冠肺炎文章的同行评议持续时间是否普遍缩短。结果表明,平均而言,2020年提交的新冠肺炎文章的审查速度是2017-2020年间提交的非新冠文章的1.7-2.1倍。尽管如此,尽管新冠肺炎研究的审查速度在大流行的前5个月(2020年1月至5月)特别快(快1.9至3.4倍),但在2020年11月至12月提交的文章中,这种速度优势不再明显。更快的同行评议也与同一期刊上新冠肺炎文章的引用影响更高有关,这表明它通常不会影响重要的新冠肺炎研究的学术影响。总的来说,核心医学和普通期刊似乎对疫情反应迅速但谨慎,尽管情况在一年内接近正常。
{"title":"Covid-19 refereeing duration and impact in major medical journals","authors":"K. Kousha, M. Thelwall","doi":"10.1162/qss_a_00176","DOIUrl":"https://doi.org/10.1162/qss_a_00176","url":null,"abstract":"Abstract Two partly conflicting academic pressures from the seriousness of the Covid-19 pandemic are the need for faster peer review of Covid-19 health-related research and greater scrutiny of its findings. This paper investigates whether decreases in peer review durations for Covid-19 articles were universal across 97 major medical journals, as well as Nature, Science, and Cell. The results suggest that on average, Covid-19 articles submitted during 2020 were reviewed 1.7–2.1 times faster than non-Covid-19 articles submitted during 2017–2020. Nevertheless, while the review speed of Covid-19 research was particularly fast during the first 5 months (1.9–3.4 times faster) of the pandemic (January–May 2020), this speed advantage was no longer evident for articles submitted in November–December 2020. Faster peer review was also associated with higher citation impact for Covid-19 articles in the same journals, suggesting it did not usually compromise the scholarly impact of important Covid-19 research. Overall, then, it seems that core medical and general journals responded quickly but carefully to the pandemic, although the situation returned closer to normal within a year.","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"3 1","pages":"1-17"},"PeriodicalIF":6.4,"publicationDate":"2021-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47938218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Scopus 1900–2020: Growth in articles, abstracts, countries, fields, and journals Scopus 1900-2020:文章,摘要,国家,领域和期刊的增长
IF 6.4 Q1 INFORMATION SCIENCE & LIBRARY SCIENCE Pub Date : 2021-12-12 DOI: 10.1162/qss_a_00177
M. Thelwall, Pardeep Sud
Abstract Scientometric research often relies on large-scale bibliometric databases of academic journal articles. Long-term and longitudinal research can be affected if the composition of a database varies over time, and text processing research can be affected if the percentage of articles with abstracts changes. This article therefore assesses changes in the magnitude of the coverage of a major citation index, Scopus, over 121 years from 1900. The results show sustained exponential growth from 1900, except for dips during both world wars, and with increased growth after 2004. Over the same period, the percentage of articles with 500+ character abstracts increased from 1% to 95%. The number of different journals in Scopus also increased exponentially, but slowing down from 2010, with the number of articles per journal being approximately constant until 1980, then tripling due to megajournals and online-only publishing. The breadth of Scopus, in terms of the number of narrow fields with substantial numbers of articles, simultaneously increased from one field having 1,000 articles in 1945 to 308 fields in 2020. Scopus’s international character also radically changed from 68% of first authors from Germany and the United States in 1900 to just 17% in 2020, with China dominating (25%).
摘要科学计量研究通常依赖于学术期刊文章的大型文献计量数据库。如果数据库的组成随着时间的推移而变化,长期和纵向研究可能会受到影响,如果摘要文章的百分比发生变化,文本处理研究也可能受到影响。因此,本文评估了自1900年以来121年来主要引文索引Scopus的覆盖范围的变化。结果显示,自1900年以来,除了两次世界大战期间的下降外,经济持续呈指数级增长,2004年后经济增长有所增加。在同一时期,500多个字符摘要的文章比例从1%增加到95%。Scopus的不同期刊数量也呈指数级增长,但从2010年开始放缓,每本期刊的文章数量在1980年之前几乎保持不变,然后由于大型期刊和纯在线出版而增加了两倍。Scopus的广度,就拥有大量文章的狭窄领域的数量而言,同时从1945年的一个拥有1000篇文章的领域增加到2020年的308个领域。Scopus的国际性也发生了根本性的变化,从1900年德国和美国的68%的第一作者到2020年的17%,其中中国占主导地位(25%)。
{"title":"Scopus 1900–2020: Growth in articles, abstracts, countries, fields, and journals","authors":"M. Thelwall, Pardeep Sud","doi":"10.1162/qss_a_00177","DOIUrl":"https://doi.org/10.1162/qss_a_00177","url":null,"abstract":"Abstract Scientometric research often relies on large-scale bibliometric databases of academic journal articles. Long-term and longitudinal research can be affected if the composition of a database varies over time, and text processing research can be affected if the percentage of articles with abstracts changes. This article therefore assesses changes in the magnitude of the coverage of a major citation index, Scopus, over 121 years from 1900. The results show sustained exponential growth from 1900, except for dips during both world wars, and with increased growth after 2004. Over the same period, the percentage of articles with 500+ character abstracts increased from 1% to 95%. The number of different journals in Scopus also increased exponentially, but slowing down from 2010, with the number of articles per journal being approximately constant until 1980, then tripling due to megajournals and online-only publishing. The breadth of Scopus, in terms of the number of narrow fields with substantial numbers of articles, simultaneously increased from one field having 1,000 articles in 1945 to 308 fields in 2020. Scopus’s international character also radically changed from 68% of first authors from Germany and the United States in 1900 to just 17% in 2020, with China dominating (25%).","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"3 1","pages":"37-50"},"PeriodicalIF":6.4,"publicationDate":"2021-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48757692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
A quantitative and qualitative open citation analysis of retracted articles in the humanities 人文学科撤稿文章的定量和定性开放引文分析
IF 6.4 Q1 INFORMATION SCIENCE & LIBRARY SCIENCE Pub Date : 2021-11-09 DOI: 10.1162/qss_a_00222
Ivan Heibi, S. Peroni
Abstract In this article, we show and discuss the results of a quantitative and qualitative analysis of open citations of retracted publications in the humanities domain. Our study was conducted by selecting retracted papers in the humanities domain and marking their main characteristics (e.g., retraction reason). Then, we gathered the citing entities and annotated their basic metadata (e.g., title, venue, subject) and the characteristics of their in-text citations (e.g., intent, sentiment). Using these data, we performed a quantitative and qualitative study of retractions in the humanities, presenting descriptive statistics and a topic modeling analysis of the citing entities’ abstracts and the in-text citation contexts. As part of our main findings, we noticed that there was no drop in the overall number of citations after the year of retraction, with few entities that have either mentioned the retraction or expressed a negative sentiment toward the cited publication. In addition, on several occasions, we noticed a higher concern/awareness by citing entities belonging to the health sciences domain about citing a retracted publication, compared with the humanities and social science domains. Philosophy, arts, and history are the humanities areas that showed higher concern toward the retraction.
在本文中,我们展示并讨论了对人文学科领域撤回出版物的开放引用进行定量和定性分析的结果。我们的研究是通过选择人文学科领域的撤稿论文并标记其主要特征(如撤稿原因)来进行的。然后,我们收集引用实体并标注其基本元数据(如标题、地点、主题)及其文本引用特征(如意图、情感)。利用这些数据,我们对人文学科的撤稿进行了定量和定性研究,对引用实体的摘要和文本引用上下文进行了描述性统计和主题建模分析。作为我们主要发现的一部分,我们注意到,在撤稿年份之后,被引用的总数量没有下降,很少有实体提到撤稿或对被引用的出版物表达负面情绪。此外,有几次,我们注意到,与人文和社会科学领域相比,引用属于健康科学领域的实体对引用撤回的出版物有更高的关注/意识。哲学、艺术、历史是对撤稿关注度较高的人文学科。
{"title":"A quantitative and qualitative open citation analysis of retracted articles in the humanities","authors":"Ivan Heibi, S. Peroni","doi":"10.1162/qss_a_00222","DOIUrl":"https://doi.org/10.1162/qss_a_00222","url":null,"abstract":"Abstract In this article, we show and discuss the results of a quantitative and qualitative analysis of open citations of retracted publications in the humanities domain. Our study was conducted by selecting retracted papers in the humanities domain and marking their main characteristics (e.g., retraction reason). Then, we gathered the citing entities and annotated their basic metadata (e.g., title, venue, subject) and the characteristics of their in-text citations (e.g., intent, sentiment). Using these data, we performed a quantitative and qualitative study of retractions in the humanities, presenting descriptive statistics and a topic modeling analysis of the citing entities’ abstracts and the in-text citation contexts. As part of our main findings, we noticed that there was no drop in the overall number of citations after the year of retraction, with few entities that have either mentioned the retraction or expressed a negative sentiment toward the cited publication. In addition, on several occasions, we noticed a higher concern/awareness by citing entities belonging to the health sciences domain about citing a retracted publication, compared with the humanities and social science domains. Philosophy, arts, and history are the humanities areas that showed higher concern toward the retraction.","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"3 1","pages":"953-975"},"PeriodicalIF":6.4,"publicationDate":"2021-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48528848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
The data set knowledge graph: Creating a linked open data source for data sets 数据集知识图:为数据集创建链接的开放数据源
IF 6.4 Q1 INFORMATION SCIENCE & LIBRARY SCIENCE Pub Date : 2021-11-05 DOI: 10.1162/qss_a_00161
Michael Färber, David Lamprecht
Abstract Several scholarly knowledge graphs have been proposed to model and analyze the academic landscape. However, although the number of data sets has increased remarkably in recent years, these knowledge graphs do not primarily focus on data sets but rather on associated entities such as publications. Moreover, publicly available data set knowledge graphs do not systematically contain links to the publications in which the data sets are mentioned. In this paper, we present an approach for constructing an RDF knowledge graph that fulfills these mentioned criteria. Our data set knowledge graph, DSKG, is publicly available at http://dskg.org and contains metadata of data sets for all scientific disciplines. To ensure high data quality of the DSKG, we first identify suitable raw data set collections for creating the DSKG. We then establish links between the data sets and publications modeled in the Microsoft Academic Knowledge Graph that mention these data sets. As the author names of data sets can be ambiguous, we develop and evaluate a method for author name disambiguation and enrich the knowledge graph with links to ORCID. Overall, our knowledge graph contains more than 2,000 data sets with associated properties, as well as 814,000 links to 635,000 scientific publications. It can be used for a variety of scenarios, facilitating advanced data set search systems and new ways of measuring and awarding the provisioning of data sets.
一些学术知识图已经被提出来建模和分析学术景观。然而,尽管近年来数据集的数量显著增加,但这些知识图谱并不主要关注数据集,而是关注相关实体,如出版物。此外,公开可用的数据集知识图没有系统地包含到提到数据集的出版物的链接。在本文中,我们提出了一种构建满足上述标准的RDF知识图的方法。我们的数据集知识图,DSKG,在http://dskg.org上公开提供,包含所有科学学科的数据集元数据。为了确保DSKG的高数据质量,我们首先确定适合创建DSKG的原始数据集集合。然后,我们在提到这些数据集的Microsoft学术知识图中建模的数据集和出版物之间建立链接。针对数据集作者姓名可能存在歧义的情况,我们开发并评估了一种作者姓名消歧义的方法,并通过ORCID链接丰富了知识图谱。总的来说,我们的知识图谱包含了2000多个具有相关属性的数据集,以及814000个指向635000个科学出版物的链接。它可以用于各种场景,促进高级数据集搜索系统和测量和授予数据集提供的新方法。
{"title":"The data set knowledge graph: Creating a linked open data source for data sets","authors":"Michael Färber, David Lamprecht","doi":"10.1162/qss_a_00161","DOIUrl":"https://doi.org/10.1162/qss_a_00161","url":null,"abstract":"Abstract Several scholarly knowledge graphs have been proposed to model and analyze the academic landscape. However, although the number of data sets has increased remarkably in recent years, these knowledge graphs do not primarily focus on data sets but rather on associated entities such as publications. Moreover, publicly available data set knowledge graphs do not systematically contain links to the publications in which the data sets are mentioned. In this paper, we present an approach for constructing an RDF knowledge graph that fulfills these mentioned criteria. Our data set knowledge graph, DSKG, is publicly available at http://dskg.org and contains metadata of data sets for all scientific disciplines. To ensure high data quality of the DSKG, we first identify suitable raw data set collections for creating the DSKG. We then establish links between the data sets and publications modeled in the Microsoft Academic Knowledge Graph that mention these data sets. As the author names of data sets can be ambiguous, we develop and evaluate a method for author name disambiguation and enrich the knowledge graph with links to ORCID. Overall, our knowledge graph contains more than 2,000 data sets with associated properties, as well as 814,000 links to 635,000 scientific publications. It can be used for a variety of scenarios, facilitating advanced data set search systems and new ways of measuring and awarding the provisioning of data sets.","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"2 1","pages":"1324-1355"},"PeriodicalIF":6.4,"publicationDate":"2021-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47852306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
A framework for creating knowledge graphs of scientific software metadata 创建科学软件元数据知识图谱的框架
IF 6.4 Q1 INFORMATION SCIENCE & LIBRARY SCIENCE Pub Date : 2021-11-05 DOI: 10.1162/qss_a_00167
Aidan Kelley, D. Garijo
Abstract An increasing number of researchers rely on computational methods to generate or manipulate the results described in their scientific publications. Software created to this end—scientific software—is key to understanding, reproducing, and reusing existing work in many disciplines, ranging from Geosciences to Astronomy or Artificial Intelligence. However, scientific software is usually challenging to find, set up, and compare to similar software due to its disconnected documentation (dispersed in manuals, readme files, websites, and code comments) and the lack of structured metadata to describe it. As a result, researchers have to manually inspect existing tools to understand their differences and incorporate them into their work. This approach scales poorly with the number of publications and tools made available every year. In this paper we address these issues by introducing a framework for automatically extracting scientific software metadata from its documentation (in particular, their readme files); a methodology for structuring the extracted metadata in a Knowledge Graph (KG) of scientific software; and an exploitation framework for browsing and comparing the contents of the generated KG. We demonstrate our approach by creating a KG with metadata from over 10,000 scientific software entries from public code repositories.
越来越多的研究人员依靠计算方法来生成或操纵其科学出版物中描述的结果。为此目的而创建的软件——科学软件——是理解、复制和重用许多学科现有工作的关键,从地球科学到天文学或人工智能。然而,由于科学软件的文档(分散在手册、自述文件、网站和代码注释中)和缺乏结构化元数据来描述它,因此查找、设置和比较类似的软件通常具有挑战性。因此,研究人员必须手动检查现有的工具,以了解它们之间的差异,并将它们纳入他们的工作中。这种方法很难适应每年可用的出版物和工具的数量。在本文中,我们通过引入一个框架来解决这些问题,该框架可以自动从科学软件的文档(特别是它们的自述文件)中提取元数据;在科学软件知识图谱(Knowledge Graph, KG)中构造提取元数据的方法;以及用于浏览和比较所生成的KG的内容的开发框架。我们通过使用来自公共代码库的10,000多个科学软件条目的元数据创建一个KG来演示我们的方法。
{"title":"A framework for creating knowledge graphs of scientific software metadata","authors":"Aidan Kelley, D. Garijo","doi":"10.1162/qss_a_00167","DOIUrl":"https://doi.org/10.1162/qss_a_00167","url":null,"abstract":"Abstract An increasing number of researchers rely on computational methods to generate or manipulate the results described in their scientific publications. Software created to this end—scientific software—is key to understanding, reproducing, and reusing existing work in many disciplines, ranging from Geosciences to Astronomy or Artificial Intelligence. However, scientific software is usually challenging to find, set up, and compare to similar software due to its disconnected documentation (dispersed in manuals, readme files, websites, and code comments) and the lack of structured metadata to describe it. As a result, researchers have to manually inspect existing tools to understand their differences and incorporate them into their work. This approach scales poorly with the number of publications and tools made available every year. In this paper we address these issues by introducing a framework for automatically extracting scientific software metadata from its documentation (in particular, their readme files); a methodology for structuring the extracted metadata in a Knowledge Graph (KG) of scientific software; and an exploitation framework for browsing and comparing the contents of the generated KG. We demonstrate our approach by creating a KG with metadata from over 10,000 scientific software entries from public code repositories.","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"2 1","pages":"1423-1446"},"PeriodicalIF":6.4,"publicationDate":"2021-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46123915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
AIDA: A knowledge graph about research dynamics in academia and industry AIDA:关于学术界和工业界研究动态的知识图谱
IF 6.4 Q1 INFORMATION SCIENCE & LIBRARY SCIENCE Pub Date : 2021-11-05 DOI: 10.1162/qss_a_00162
Simone Angioni, Angelo Salatino, Francesco Osborne, Diego Reforgiato, Recupero, E. Motta
Abstract Academia and industry share a complex, multifaceted, and symbiotic relationship. Analyzing the knowledge flow between them, understanding which directions have the biggest potential, and discovering the best strategies to harmonize their efforts is a critical task for several stakeholders. Research publications and patents are an ideal medium to analyze this space, but current data sets of scholarly data cannot be used for such a purpose because they lack a high-quality characterization of the relevant research topics and industrial sectors. In this paper, we introduce the Academia/Industry DynAmics (AIDA) Knowledge Graph, which describes 21 million publications and 8 million patents according to the research topics drawn from the Computer Science Ontology. 5.1 million publications and 5.6 million patents are further characterized according to the type of the author’s affiliations and 66 industrial sectors from the proposed Industrial Sectors Ontology (INDUSO). AIDA was generated by an automatic pipeline that integrates data from Microsoft Academic Graph, Dimensions, DBpedia, the Computer Science Ontology, and the Global Research Identifier Database. It is publicly available under CC BY 4.0 and can be downloaded as a dump or queried via a triplestore. We evaluated the different parts of the generation pipeline on a manually crafted gold standard yielding competitive results.
学术界和产业界有着复杂的、多方面的、共生的关系。分析他们之间的知识流动,了解哪个方向具有最大的潜力,并发现协调他们努力的最佳策略是几个利益相关者的关键任务。研究出版物和专利是分析这一领域的理想媒介,但目前的学术数据集不能用于这一目的,因为它们缺乏对相关研究主题和工业部门的高质量描述。本文引入了学术界/行业动态(AIDA)知识图谱,该图谱根据从计算机科学本体中提取的研究主题描述了2100万份出版物和800万项专利,并根据作者所属单位的类型和提出的工业部门本体(INDUSO)中的66个工业部门进一步描述了510万份出版物和560万项专利。AIDA是由一个自动管道生成的,该管道集成了来自微软学术图、维度、DBpedia、计算机科学本体和全球研究标识数据库的数据。它在CC BY 4.0下公开提供,可以作为转储文件下载或通过triplestore查询。我们在手工制作的黄金标准上评估了生成管道的不同部分,产生了具有竞争力的结果。
{"title":"AIDA: A knowledge graph about research dynamics in academia and industry","authors":"Simone Angioni, Angelo Salatino, Francesco Osborne, Diego Reforgiato, Recupero, E. Motta","doi":"10.1162/qss_a_00162","DOIUrl":"https://doi.org/10.1162/qss_a_00162","url":null,"abstract":"Abstract Academia and industry share a complex, multifaceted, and symbiotic relationship. Analyzing the knowledge flow between them, understanding which directions have the biggest potential, and discovering the best strategies to harmonize their efforts is a critical task for several stakeholders. Research publications and patents are an ideal medium to analyze this space, but current data sets of scholarly data cannot be used for such a purpose because they lack a high-quality characterization of the relevant research topics and industrial sectors. In this paper, we introduce the Academia/Industry DynAmics (AIDA) Knowledge Graph, which describes 21 million publications and 8 million patents according to the research topics drawn from the Computer Science Ontology. 5.1 million publications and 5.6 million patents are further characterized according to the type of the author’s affiliations and 66 industrial sectors from the proposed Industrial Sectors Ontology (INDUSO). AIDA was generated by an automatic pipeline that integrates data from Microsoft Academic Graph, Dimensions, DBpedia, the Computer Science Ontology, and the Global Research Identifier Database. It is publicly available under CC BY 4.0 and can be downloaded as a dump or queried via a triplestore. We evaluated the different parts of the generation pipeline on a manually crafted gold standard yielding competitive results.","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"2 1","pages":"1356-1398"},"PeriodicalIF":6.4,"publicationDate":"2021-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48808221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Data citation and the citation graph 《我是特朗普政府内部抵抗力量的一部分》,《纽约时报》,2018年9月5日。
IF 6.4 Q1 INFORMATION SCIENCE & LIBRARY SCIENCE Pub Date : 2021-11-05 DOI: 10.1162/qss_a_00166
P. Buneman, Dennis Dosso, Matteo Lissandrini, G. Silvello
Abstract The citation graph is a computational artifact that is widely used to represent the domain of published literature. It represents connections between published works, such as citations and authorship. Among other things, the graph supports the computation of bibliometric measures such as h-indexes and impact factors. There is now an increasing demand that we should treat the publication of data in the same way that we treat conventional publications. In particular, we should cite data for the same reasons that we cite other publications. In this paper we discuss what is needed for the citation graph to represent data citation. We identify two challenges: to model the evolution of credit appropriately (through references) over time and to model data citation not only to a data set treated as a single object but also to parts of it. We describe an extension of the current citation graph model that addresses these challenges. It is built on two central concepts: citable units and reference subsumption. We discuss how this extension would enable data citation to be represented within the citation graph and how it allows for improvements in current practices for bibliometric computations, both for scientific publications and for data.
摘要引文图是一种计算人工制品,广泛用于表示已发表文献的领域。它代表了已发表作品之间的联系,如引用和作者身份。除其他外,该图支持文献计量指标的计算,如h指数和影响因素。现在有越来越多的要求,我们应该像对待传统出版物一样对待数据的发布。特别是,我们引用数据的理由与引用其他出版物的理由相同。在本文中,我们讨论了引用图表示数据引用所需要的东西。我们确定了两个挑战:对信用随时间的演变进行适当的建模(通过参考文献),以及不仅对作为单个对象处理的数据集,而且对其部分进行数据引用建模。我们描述了当前引文图模型的扩展,以应对这些挑战。它建立在两个核心概念之上:可引用单位和引用包容。我们讨论了这种扩展将如何使数据引用能够在引用图中表示,以及它如何改进当前科学出版物和数据的文献计量计算实践。
{"title":"Data citation and the citation graph","authors":"P. Buneman, Dennis Dosso, Matteo Lissandrini, G. Silvello","doi":"10.1162/qss_a_00166","DOIUrl":"https://doi.org/10.1162/qss_a_00166","url":null,"abstract":"Abstract The citation graph is a computational artifact that is widely used to represent the domain of published literature. It represents connections between published works, such as citations and authorship. Among other things, the graph supports the computation of bibliometric measures such as h-indexes and impact factors. There is now an increasing demand that we should treat the publication of data in the same way that we treat conventional publications. In particular, we should cite data for the same reasons that we cite other publications. In this paper we discuss what is needed for the citation graph to represent data citation. We identify two challenges: to model the evolution of credit appropriately (through references) over time and to model data citation not only to a data set treated as a single object but also to parts of it. We describe an extension of the current citation graph model that addresses these challenges. It is built on two central concepts: citable units and reference subsumption. We discuss how this extension would enable data citation to be represented within the citation graph and how it allows for improvements in current practices for bibliometric computations, both for scientific publications and for data.","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"2 1","pages":"1399-1422"},"PeriodicalIF":6.4,"publicationDate":"2021-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49277776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
New trends in scientific knowledge graphs and research impact assessment 科学知识图谱和研究影响评估的新趋势
IF 6.4 Q1 INFORMATION SCIENCE & LIBRARY SCIENCE Pub Date : 2021-11-05 DOI: 10.1162/qss_e_00160
P. Manghi, A. Mannocci, Francesco Osborne, Dimitris Sacharidis, Angelo Salatino, Thanasis Vergoulis
In recent decades, we have experienced a continuously increasing publication rate of scientific articles and related research objects (e.g., data sets, software packages). As this trend keeps growing, practitioners in the field of scholarly knowledge are confronted with several challenges. In this special issue, we focus on two major categories of such challenges: (a) those related to the organization of scholarly data to achieve a flexible, context-sensitive, finegrained, and machine-actionable representation of scholarly knowledge that at the same time is structured, interlinked, and semantically rich, and (b) those related to the design of novel, reliable, and comprehensive metrics to assess scientific impact.
近几十年来,我们经历了科学文章和相关研究对象(如数据集、软件包)的发表率不断增加。随着这一趋势的不断发展,学术知识领域的从业者面临着一些挑战。在本期特刊中,我们重点关注这类挑战的两大类:(a)与学术数据组织相关的挑战,以实现灵活、上下文敏感、细粒度和机器可操作的学术知识表示,同时是结构化的、相互关联的和语义丰富的;(b)与设计新颖、可靠和全面的指标来评估科学影响相关的挑战。
{"title":"New trends in scientific knowledge graphs and research impact assessment","authors":"P. Manghi, A. Mannocci, Francesco Osborne, Dimitris Sacharidis, Angelo Salatino, Thanasis Vergoulis","doi":"10.1162/qss_e_00160","DOIUrl":"https://doi.org/10.1162/qss_e_00160","url":null,"abstract":"In recent decades, we have experienced a continuously increasing publication rate of scientific articles and related research objects (e.g., data sets, software packages). As this trend keeps growing, practitioners in the field of scholarly knowledge are confronted with several challenges. In this special issue, we focus on two major categories of such challenges: (a) those related to the organization of scholarly data to achieve a flexible, context-sensitive, finegrained, and machine-actionable representation of scholarly knowledge that at the same time is structured, interlinked, and semantically rich, and (b) those related to the design of novel, reliable, and comprehensive metrics to assess scientific impact.","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"2 1","pages":"1296-1300"},"PeriodicalIF":6.4,"publicationDate":"2021-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48447399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
Quantitative Science Studies
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1