Pub Date : 2024-08-24DOI: 10.1007/s11192-024-05127-8
Ashraf Maleki, Kim Holmberg
Despite differences in extent of engagement of users, original tweets and retweets to scientific publications are considered as equal events. Current research investigates quantifiable differences between tweets and retweets from an altmetric point of view. Twitter users, text, and media content of two datasets, one containing 742 randomly selected tweets and retweets (371 each) and another with 5898 tweets and retweets (about 3000 each), all linking to scientific articles published on PLoS ONE, were manually categorized. Results from analyzing the proportions of tweets and retweets indicated that academic and individual accounts produce majority of original tweets (34% and 55%, respectively) and posted significantly larger proportion of retweets (41.5 and 81%). Bot accounts, on the other hand, had posted significantly more original tweets (20%) than retweets (2%). Natural communication sentences prevailed in retweets and tweets (63% vs. 45%) as well as images (41.5% vs. 23%), both showing a significant rise in usage overtime. Overall, the findings suggest that the attention scientific articles receive on Twitter may have more to do with human interaction and inclusion of visual content in the tweets, than the significance of or genuine interest towards the research results.
{"title":"Tweeting and retweeting scientific articles: implications for altmetrics","authors":"Ashraf Maleki, Kim Holmberg","doi":"10.1007/s11192-024-05127-8","DOIUrl":"https://doi.org/10.1007/s11192-024-05127-8","url":null,"abstract":"<p>Despite differences in extent of engagement of users, original tweets and retweets to scientific publications are considered as equal events. Current research investigates quantifiable differences between tweets and retweets from an altmetric point of view. Twitter users, text, and media content of two datasets, one containing 742 randomly selected tweets and retweets (371 each) and another with 5898 tweets and retweets (about 3000 each), all linking to scientific articles published on PLoS ONE, were manually categorized. Results from analyzing the proportions of tweets and retweets indicated that academic and individual accounts produce majority of original tweets (34% and 55%, respectively) and posted significantly larger proportion of retweets (41.5 and 81%). Bot accounts, on the other hand, had posted significantly more original tweets (20%) than retweets (2%). Natural communication sentences prevailed in retweets and tweets (63% vs. 45%) as well as images (41.5% vs. 23%), both showing a significant rise in usage overtime. Overall, the findings suggest that the attention scientific articles receive on Twitter may have more to do with human interaction and inclusion of visual content in the tweets, than the significance of or genuine interest towards the research results.</p>","PeriodicalId":21755,"journal":{"name":"Scientometrics","volume":"12 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142176280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-22DOI: 10.1007/s11192-024-05132-x
Eva Seidlmayer, Tetyana Melnychuk, Lukas Galke, Lisa Kühnel, Klaus Tochtermann, Carsten Schultz, Konrad U. Förstner
Based on a large-scale computational analysis of scholarly articles, this study investigates the dynamics of interdisciplinary research in the first year of the COVID-19 pandemic. Thereby, the study also analyses the reorientation effects away from other topics that receive less attention due to the high focus on the COVID-19 pandemic. The study aims to examine what can be learned from the (failing) interdisciplinarity of coronavirus research and its displacing effects for managing potential similar crises at the scientific level. To explore our research questions, we run several analyses by using the COVID-19++ dataset, which contains scholarly publications, preprints from the field of life sciences, and their referenced literature including publications from a broad scientific spectrum. Our results show the high impact and topic-wise adoption of research related to the COVID-19 crisis. Based on the similarity analysis of scientific topics, which is grounded on the concept embedding learning in the graph-structured bibliographic data, we measured the degree of interdisciplinarity of COVID-19 research in 2020. Our findings reveal a low degree of research interdisciplinarity. The publications’ reference analysis indicates the major role of clinical medicine, but also the growing importance of psychiatry and social sciences in COVID-19 research. A social network analysis shows that the authors’ high degree of centrality significantly increases her or his degree of interdisciplinarity.
{"title":"Research topic displacement and the lack of interdisciplinarity: lessons from the scientific response to COVID-19","authors":"Eva Seidlmayer, Tetyana Melnychuk, Lukas Galke, Lisa Kühnel, Klaus Tochtermann, Carsten Schultz, Konrad U. Förstner","doi":"10.1007/s11192-024-05132-x","DOIUrl":"https://doi.org/10.1007/s11192-024-05132-x","url":null,"abstract":"<p>Based on a large-scale computational analysis of scholarly articles, this study investigates the dynamics of interdisciplinary research in the first year of the COVID-19 pandemic. Thereby, the study also analyses the reorientation effects away from other topics that receive less attention due to the high focus on the COVID-19 pandemic. The study aims to examine what can be learned from the (failing) interdisciplinarity of coronavirus research and its displacing effects for managing potential similar crises at the scientific level. To explore our research questions, we run several analyses by using the COVID-19++ dataset, which contains scholarly publications, preprints from the field of life sciences, and their referenced literature including publications from a broad scientific spectrum. Our results show the high impact and topic-wise adoption of research related to the COVID-19 crisis. Based on the similarity analysis of scientific topics, which is grounded on the concept embedding learning in the graph-structured bibliographic data, we measured the degree of interdisciplinarity of COVID-19 research in 2020. Our findings reveal a low degree of research interdisciplinarity. The publications’ reference analysis indicates the major role of clinical medicine, but also the growing importance of psychiatry and social sciences in COVID-19 research. A social network analysis shows that the authors’ high degree of centrality significantly increases her or his degree of interdisciplinarity.</p>","PeriodicalId":21755,"journal":{"name":"Scientometrics","volume":"44 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142176453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-17DOI: 10.1007/s11192-024-05120-1
Eleonora Dagienė
This paper proposes an open-science-aligned approach that uses library metadata to evaluate individual books. I analyse the suitability of this approach for individual book assessment and visibility of national books in the library catalogues, to support responsible research evaluation. WorldCat metadata offers valuable insights for the evaluation of books, but the completeness of this metadata varies. Author, contributor, and publisher data require cleaning, while languages, years, formats, editions, and translations provide rich information. Open access data is currently lacking, and national book visibility in WorldCat depends heavily on contributions from national libraries and metadata suppliers. Encouraging national library engagement could boost the global visibility of domestic research. Further exploration is needed regarding long-term preservation, metadata ownership, and technical integration for effective standardisation and improved book evaluation.
{"title":"Mapping scholarly books: library metadata and research assessment","authors":"Eleonora Dagienė","doi":"10.1007/s11192-024-05120-1","DOIUrl":"https://doi.org/10.1007/s11192-024-05120-1","url":null,"abstract":"<p>This paper proposes an open-science-aligned approach that uses library metadata to evaluate individual books. I analyse the suitability of this approach for individual book assessment and visibility of national books in the library catalogues, to support responsible research evaluation. WorldCat metadata offers valuable insights for the evaluation of books, but the completeness of this metadata varies. Author, contributor, and publisher data require cleaning, while languages, years, formats, editions, and translations provide rich information. Open access data is currently lacking, and national book visibility in WorldCat depends heavily on contributions from national libraries and metadata suppliers. Encouraging national library engagement could boost the global visibility of domestic research. Further exploration is needed regarding long-term preservation, metadata ownership, and technical integration for effective standardisation and improved book evaluation.</p>","PeriodicalId":21755,"journal":{"name":"Scientometrics","volume":"9 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142176329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-17DOI: 10.1007/s11192-024-05131-y
Abhirup Nandy, Hiran H. Lathabai, Vivek Kumar Singh
During last several decades, various indicators and proxies to measure research output and their impact for different units have been proposed. These measurements may be targeted at individuals, institutions, journals, countries etc. Institutional level assessment is one such area that has always been and will remain a key challenge to a multitude of stakeholders. Various international rankings as well as different bibliometric indicators have been explored in the context of institutional assessments, though each of them has certain criticisms associated. Most of the existing indicators, including h-type indicators, mainly focus on research output and/ or citations to the research output. They do not reveal the expertise of institutions in different subject areas, which is crucial to know the research portfolio of an institution. Recently, a set of expertise measures such as x and x(g) indices were introduced to determine the expertise of institutions with respect to a specific discipline/field considering strengths in different finer level thematic areas of that discipline/field. In this work, an adaptation of the x-index, namely the (x_{d})-index is proposed to determine the overall scholarly expertise of an institution considering its publication pattern and strength in different coarse thematic areas. This indicator helps to identify the core expertise areas and the diversity of the research portfolio of the institution. Further, two variants of the indicator, namely field normalized indicator or (x_{d}) (FN)-index and fractional indicator (x_{d} left( f right))-index are also introduced to address the effect of field bias and collaborations on the computation of the expertise diversity. The framework can determine the most suitable version of the indicator to use for research portfolio management with the help of correlation analysis. These indicators and the associated framework are demonstrated on a dataset of 136 institutions. Upon rank correlation analysis, no significant difference is noticed between (x_{d}) and its variants computed using different publication counting, in this particular dataset, making (x_{d}) the most suitable indicator in this case. The possibilities offered by the framework for effective management of the research portfolio of an institution by expanding its diversity and its ability to aid national level policymakers for the effective management of scholarly ecosystem of the country is discussed.
过去几十年间,人们提出了各种指标和代用指标,用于衡量不同单位的研究成果及其影响。这些衡量指标可能针对个人、机构、期刊、国家等。机构层面的评估一直是并将继续是众多利益相关者面临的主要挑战之一。在机构评估方面,已经探讨了各种国际排名和不同的文献计量指标,尽管每种指标都有一些相关的批评意见。现有的大多数指标,包括 h 型指标,主要侧重于研究成果和/或研究成果的引用情况。这些指标并不能揭示院校在不同学科领域的专长,而这对了解院校的研究组合至关重要。最近,研究人员引入了一套专业知识衡量指标,如 x 指数和 x(g)指数,以确定院校在特定学科/领域的专业知识,同时考虑到院校在该学科/领域不同细分主题领域的优势。在这项工作中,我们提出了对 x 指数的一种调整,即 (x_{d})-index 指数,以确定一个机构的整体学术专长,其中考虑到其在不同粗略主题领域的出版模式和实力。这一指标有助于确定院校的核心专业领域和研究组合的多样性。此外,该指标还有两个变体,即领域归一化指标或 (x_{d}) (FN)-index 和分数指标 (x_{d}left( f {d})。(FN)-指标和分数指标(x_{d}left( f right))-指标,以解决领域偏差和合作对专业知识多样性计算的影响。在相关性分析的帮助下,该框架可以确定最适合用于研究组合管理的指标版本。这些指标和相关框架在 136 个机构的数据集上得到了验证。通过等级相关性分析,我们发现在这个特定的数据集中,(x_{d})和使用不同出版计数法计算出来的变体之间没有明显差异,因此(x_{d})在这种情况下是最合适的指标。本文讨论了该框架为有效管理一个机构的研究组合提供的可能性,即通过扩大其多样性及其帮助国家级决策者有效管理国家学术生态系统的能力。
{"title":"$${varvec{x}}_{{varvec{d}}}$$ -index and its variants: a set of overall scholarly expertise diversity indices for the research portfolio management of institutions","authors":"Abhirup Nandy, Hiran H. Lathabai, Vivek Kumar Singh","doi":"10.1007/s11192-024-05131-y","DOIUrl":"https://doi.org/10.1007/s11192-024-05131-y","url":null,"abstract":"<p>During last several decades, various indicators and proxies to measure research output and their impact for different units have been proposed. These measurements may be targeted at individuals, institutions, journals, countries etc. Institutional level assessment is one such area that has always been and will remain a key challenge to a multitude of stakeholders. Various international rankings as well as different bibliometric indicators have been explored in the context of institutional assessments, though each of them has certain criticisms associated. Most of the existing indicators, including <i>h</i>-type indicators, mainly focus on research output and/ or citations to the research output. They do not reveal the expertise of institutions in different subject areas, which is crucial to know the research portfolio of an institution. Recently, a set of expertise measures such as <i>x</i> and <i>x(g)</i> indices were introduced to determine the expertise of institutions with respect to a specific discipline/field considering strengths in different finer level thematic areas of that discipline/field. In this work, an adaptation of the <i>x</i>-index, namely the <span>(x_{d})</span>-index is proposed to determine the overall scholarly expertise of an institution considering its publication pattern and strength in different coarse thematic areas. This indicator helps to identify the core expertise areas and the diversity of the research portfolio of the institution. Further, two variants of the indicator, namely field normalized indicator or <span>(x_{d})</span> (FN)-index and fractional indicator <span>(x_{d} left( f right))</span>-index are also introduced to address the effect of field bias and collaborations on the computation of the expertise diversity. The framework can determine the most suitable version of the indicator to use for research portfolio management with the help of correlation analysis. These indicators and the associated framework are demonstrated on a dataset of 136 institutions. Upon rank correlation analysis, no significant difference is noticed between <span>(x_{d})</span> and its variants computed using different publication counting, in this particular dataset, making <span>(x_{d})</span> the most suitable indicator in this case. The possibilities offered by the framework for effective management of the research portfolio of an institution by expanding its diversity and its ability to aid national level policymakers for the effective management of scholarly ecosystem of the country is discussed.</p>","PeriodicalId":21755,"journal":{"name":"Scientometrics","volume":"31 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142176282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-17DOI: 10.1007/s11192-024-05128-7
Teodoro Luque-Martínez, Ignacio Luque-Raya
In this study, the scientific production of universities across the world is analysed, disaggregating it by research fields and specialities. A particular focus is on the strategic analysis of Spanish universities within the international panorama. Data collected from the widely known and frequently consulted Academic Ranking of World Universities are used to which clustering techniques are applied. To do so, indicators are defined that are related with university presence (in both absolute and relative terms), university performance within a specialist field with respect to the rest of the world, and within each speciality with respect to the general level of the country. With all that information, strategic clusters of specialities were identified, and an analysis by scientific field at an aggregated level was completed. Among the results, it is worth highlighting the greater international presence of Spanish universities within the specialist clusters of Food Science & Technology and Hospitality & Tourism Management, and their performance below the general average with respect to all universities, except for Remote Sensing, Veterinary Science, and Civil Engineering. The research fields within which the Spanish universities showed greater competitiveness are Life Sciences and Natural Science, whereas the fields of Engineering and Social Science had the lowest presence and level of international competitiveness. A series of recommendations for improvement are advanced concerning measurement of resources, communicative activities, and the orientation of lines of action within some specialities.
{"title":"Spanish scientific research by field and subject. Strategic analysis with ARWU indicators","authors":"Teodoro Luque-Martínez, Ignacio Luque-Raya","doi":"10.1007/s11192-024-05128-7","DOIUrl":"https://doi.org/10.1007/s11192-024-05128-7","url":null,"abstract":"<p>In this study, the scientific production of universities across the world is analysed, disaggregating it by research fields and specialities. A particular focus is on the strategic analysis of Spanish universities within the international panorama. Data collected from the widely known and frequently consulted Academic Ranking of World Universities are used to which clustering techniques are applied. To do so, indicators are defined that are related with university presence (in both absolute and relative terms), university performance within a specialist field with respect to the rest of the world, and within each speciality with respect to the general level of the country. With all that information, strategic clusters of specialities were identified, and an analysis by scientific field at an aggregated level was completed. Among the results, it is worth highlighting the greater international presence of Spanish universities within the specialist clusters of Food Science & Technology and Hospitality & Tourism Management, and their performance below the general average with respect to all universities, except for Remote Sensing, Veterinary Science, and Civil Engineering. The research fields within which the Spanish universities showed greater competitiveness are Life Sciences and Natural Science, whereas the fields of Engineering and Social Science had the lowest presence and level of international competitiveness. A series of recommendations for improvement are advanced concerning measurement of resources, communicative activities, and the orientation of lines of action within some specialities.</p>","PeriodicalId":21755,"journal":{"name":"Scientometrics","volume":"15 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142176328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1007/s11192-024-05119-8
Leo Egghe, Ronald Rousseau
We introduce and define three types of small worlds: small worlds based on the diameter of the network (SWD), those based on the average geodesic distance between nodes (SWA), and those based on the median geodesic distance (SWMd). These types of networks are defined as limiting properties of sequences of sets. We show the exact relation between these three types, namely that each SWD network is also an SWA network and that each SWA network is also an SWMd network. Yet, having the small-world property is a phenomenon that can easily occur in the sense that most networks are small-world networks in one of the three ways. We introduce sequences of distance frequencies, so-called alpha-sequences, and prove a relation between the majorization property between alpha-sequences and small-world properties.
{"title":"The small-world phenomenon: a model, explanations, characterizations, and examples","authors":"Leo Egghe, Ronald Rousseau","doi":"10.1007/s11192-024-05119-8","DOIUrl":"https://doi.org/10.1007/s11192-024-05119-8","url":null,"abstract":"<p>We introduce and define three types of small worlds: small worlds based on the diameter of the network (SWD), those based on the average geodesic distance between nodes (SWA), and those based on the median geodesic distance (SWMd). These types of networks are defined as limiting properties of sequences of sets. We show the exact relation between these three types, namely that each SWD network is also an SWA network and that each SWA network is also an SWMd network. Yet, having the small-world property is a phenomenon that can easily occur in the sense that most networks are small-world networks in one of the three ways. We introduce sequences of distance frequencies, so-called alpha-sequences, and prove a relation between the majorization property between alpha-sequences and small-world properties.</p>","PeriodicalId":21755,"journal":{"name":"Scientometrics","volume":"1 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-02DOI: 10.1007/s11192-024-05116-x
Eugenio Petrovich, Sander Verhaegh, Gregor Bös, Claudia Cristalli, Fons Dewulf, Ties van Gemert, Nina IJdens
Standard citation-based bibliometric tools have severe limitations when they are applied to periods in the history of science and the humanities before the advent of now-current citation practices. This paper presents an alternative method involving the extracting and analysis of mentions to map and analyze links between scholars and texts in periods that fall outside the scope of citation-based studies. Focusing on one specific discipline in one particular period and language area—Anglophone philosophy between 1890 and 1979—we describe a procedure to create a mention index by identifying, extracting, and disambiguating mentions in academic publications. Our mention index includes 1,095,765 mention links, extracted from 22,977 articles published in 12 journals. We successfully link 93% of these mentions to specific philosophers, with an estimated precision of 82% to 91%. Moreover, we integrate the mention index into a database named EDHIPHY, which includes data and metadata from multiple sources and enables multidimensional mention analyses. In the final part of the paper, we present four case studies conducted by domain experts, demonstrating the use and the potential of both EDHIPHY and mention analyses more generally.
{"title":"Bibliometrics beyond citations: introducing mention extraction and analysis","authors":"Eugenio Petrovich, Sander Verhaegh, Gregor Bös, Claudia Cristalli, Fons Dewulf, Ties van Gemert, Nina IJdens","doi":"10.1007/s11192-024-05116-x","DOIUrl":"https://doi.org/10.1007/s11192-024-05116-x","url":null,"abstract":"<p>Standard citation-based bibliometric tools have severe limitations when they are applied to periods in the history of science and the humanities before the advent of now-current citation practices. This paper presents an alternative method involving the extracting and analysis of <i>mentions</i> to map and analyze links between scholars and texts in periods that fall outside the scope of citation-based studies. Focusing on one specific discipline in one particular period and language area—Anglophone philosophy between 1890 and 1979—we describe a procedure to create a <i>mention index</i> by identifying, extracting, and disambiguating mentions in academic publications. Our mention index includes 1,095,765 mention links, extracted from 22,977 articles published in 12 journals. We successfully link 93% of these mentions to specific philosophers, with an estimated precision of 82% to 91%. Moreover, we integrate the mention index into a database named EDHIPHY, which includes data and metadata from multiple sources and enables multidimensional mention analyses. In the final part of the paper, we present four case studies conducted by domain experts, demonstrating the use and the potential of both EDHIPHY and mention analyses more generally.</p>","PeriodicalId":21755,"journal":{"name":"Scientometrics","volume":"7 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141882448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1007/s11192-024-05123-y
Jiawei Wang
This study presents the result of a cross-disciplinary and diachronic examination of cohesive devices used in high citation research article (HCRA) titles, a hitherto less-explored subgenre of academic discourse. Based on Halliday and Matthiessen’s (2014) Cohesion Model, the research analyzed the employment of connectors in a self-constructed corpus of 30,000 HCRA titles from disciplines of Biology, Chemistry, Linguistics, and Music from 1980 to 2023. Comparisons of disciplinary and diachronic changes of connectors were made in two-way multivariate analyses of variance (MANOVA), and follow-up analyses of variance (ANOVA). Major findings indicate that discipline, as compared to period, is the determinant of cohesion in HCRA titles, albeit in medium effect size. The use of Extension and Enhancement prevail HCRA titles, suggesting an exponential increase of sophistication and comprehensiveness of information in the curation and dissemination of scientific knowledge. Specifically, cohesion of HCRA titles is predominantly realized by additive, temporal, and causal connectors with sharp contrasts between soft and hard sciences, indicating longer titles with these connectors attract readers by harnessing their familiarity of disciplinary knowledge. Quantitative characterization of cohesion in HCRA titles shed light on how expert writers coherently organize titles to maximize informativeness and research impact, thereby contributing pedagogically to academic writing for English for Academic and Specific Purposes, and empirically for the research on the predictability of citation impacts.
{"title":"Quantifying cohesion in high citation research article titles: a cross-disciplinary and diachronic investigation","authors":"Jiawei Wang","doi":"10.1007/s11192-024-05123-y","DOIUrl":"https://doi.org/10.1007/s11192-024-05123-y","url":null,"abstract":"<p>This study presents the result of a cross-disciplinary and diachronic examination of cohesive devices used in high citation research article (HCRA) titles, a hitherto less-explored subgenre of academic discourse. Based on Halliday and Matthiessen’s (2014) Cohesion Model, the research analyzed the employment of connectors in a self-constructed corpus of 30,000 HCRA titles from disciplines of Biology, Chemistry, Linguistics, and Music from 1980 to 2023. Comparisons of disciplinary and diachronic changes of connectors were made in two-way multivariate analyses of variance (MANOVA), and follow-up analyses of variance (ANOVA). Major findings indicate that discipline, as compared to period, is the determinant of cohesion in HCRA titles, albeit in medium effect size. The use of Extension and Enhancement prevail HCRA titles, suggesting an exponential increase of sophistication and comprehensiveness of information in the curation and dissemination of scientific knowledge. Specifically, cohesion of HCRA titles is predominantly realized by additive, temporal, and causal connectors with sharp contrasts between soft and hard sciences, indicating longer titles with these connectors attract readers by harnessing their familiarity of disciplinary knowledge. Quantitative characterization of cohesion in HCRA titles shed light on how expert writers coherently organize titles to maximize informativeness and research impact, thereby contributing pedagogically to academic writing for English for Academic and Specific Purposes, and empirically for the research on the predictability of citation impacts.</p>","PeriodicalId":21755,"journal":{"name":"Scientometrics","volume":"45 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141865177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1007/s11192-024-05125-w
M. Ángeles Oviedo-García
Review mills sum up a new category of reviewer misconduct that flies in the face of reviewer ethics and integrity. A pattern of generic, vague, and repeated affirmations (identical or very similar boilerplate phrasing) is noted in the analysis of 263 review reports, regardless of the scientific content of the papers under review, coupled with coercive citation (perhaps among the main reasons for such behavior), which when combined produce fake reviews. The misconduct associated with review mills is unlike mere plagiarism (self-plagiarism) of reviewer comments. It is important to quantify the problem and to take urgent measures: (a) to identify the review millers; (b) to rectify the published literature; and (c) to determine procedures for journals and publishers on procedures to counter this new type of misconduct.
{"title":"The review mills, not just (self-)plagiarism in review reports, but a step further","authors":"M. Ángeles Oviedo-García","doi":"10.1007/s11192-024-05125-w","DOIUrl":"https://doi.org/10.1007/s11192-024-05125-w","url":null,"abstract":"<p>Review mills sum up a new category of reviewer misconduct that flies in the face of reviewer ethics and integrity. A pattern of generic, vague, and repeated affirmations (identical or very similar boilerplate phrasing) is noted in the analysis of 263 review reports, regardless of the scientific content of the papers under review, coupled with coercive citation (perhaps among the main reasons for such behavior), which when combined produce fake reviews. The misconduct associated with review mills is unlike mere plagiarism (self-plagiarism) of reviewer comments. It is important to quantify the problem and to take urgent measures: (a) to identify the review millers; (b) to rectify the published literature; and (c) to determine procedures for journals and publishers on procedures to counter this new type of misconduct.</p>","PeriodicalId":21755,"journal":{"name":"Scientometrics","volume":"74 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141865176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1007/s11192-024-05114-z
Biao Zhang, Yunwei Chen
Research on innovative content within academic articles plays a vital role in exploring the frontiers of scientific and technological innovation while facilitating the integration of scientific and technological evaluation into academic discourse. To efficiently gather the latest innovative concepts, it is essential to accurately recognize innovative sentences within academic articles. Although several supervised methods for classifying article sentences exist, such as citation function sentences, future work sentences, and formal citation sentences, most of these methods rely on manual annotations or rule-based matching to construct datasets, often neglecting an in-depth exploration of model performance enhancement. To address the limitations of existing research in this domain, this study introduces a semi-automatic annotation method for innovative sentences (IS) with the assistance of expert comments information and proposes a data augmentation method by SAO reconstruction to augment the training dataset. Within this paper, we compared and analyzed the effectiveness of multiple algorithms for recognizing IS within academic articles. This study utilized the full text of academic articles as the research subject and employed the semi-automatic method to annotate IS for creating the training dataset. Then, this study validated the effectiveness of the semi-automatic annotation method through manual inspection and compared it with rule-based annotation methods. Additionally, the impacts of different augmentation ratios on model performance were also explored. The empirical results reveal the following: (1) The semi-automatic annotation method proposed in this study achieves an accuracy rate of 0.87239, ensuring the validity of annotated data while reducing the manual annotation cost. (2) The SAO reconstruction for data augmentation method significantly improved the accuracy of machine learning and deep learning algorithms in the recognition of IS. (3) When the augmentation ratio in the training set was set to 50%, the trained GPT-2 model was superior to other algorithms, achieving an ACC of 0.97883 in the test set and an F1 score of 0.95505 in practical application.
对学术文章中创新内容的研究在探索科技创新前沿、促进科技评价融入学术话语方面发挥着至关重要的作用。为了有效收集最新的创新概念,准确识别学术文章中的创新句子至关重要。虽然目前已有多种有监督的文章句子分类方法,如引用功能句子、未来工作句子和正式引用句子等,但这些方法大多依赖人工标注或基于规则的匹配来构建数据集,往往忽视了对模型性能提升的深入探索。针对该领域现有研究的局限性,本研究引入了一种借助专家评论信息的创新句子(IS)半自动标注方法,并提出了一种通过SAO重构来增强训练数据集的数据增强方法。在本文中,我们比较并分析了多种算法识别学术文章中创新句子的有效性。本研究以学术文章全文为研究对象,采用半自动方法对 IS 进行注释以创建训练数据集。然后,本研究通过人工检查验证了半自动注释方法的有效性,并将其与基于规则的注释方法进行了比较。此外,还探讨了不同的增强比例对模型性能的影响。实证结果显示了以下几点:(1) 本研究提出的半自动标注方法准确率达到 0.87239,确保了标注数据的有效性,同时降低了人工标注成本。(2)数据扩增的 SAO 重构方法显著提高了机器学习和深度学习算法在 IS 识别中的准确率。(3)当训练集的扩增比例设置为50%时,训练出的GPT-2模型优于其他算法,在测试集中的ACC达到0.97883,在实际应用中的F1得分达到0.95505。
{"title":"Automated recognition of innovative sentences in academic articles: semi-automatic annotation for cost reduction and SAO reconstruction for enhanced data","authors":"Biao Zhang, Yunwei Chen","doi":"10.1007/s11192-024-05114-z","DOIUrl":"https://doi.org/10.1007/s11192-024-05114-z","url":null,"abstract":"<p>Research on innovative content within academic articles plays a vital role in exploring the frontiers of scientific and technological innovation while facilitating the integration of scientific and technological evaluation into academic discourse. To efficiently gather the latest innovative concepts, it is essential to accurately recognize innovative sentences within academic articles. Although several supervised methods for classifying article sentences exist, such as citation function sentences, future work sentences, and formal citation sentences, most of these methods rely on manual annotations or rule-based matching to construct datasets, often neglecting an in-depth exploration of model performance enhancement. To address the limitations of existing research in this domain, this study introduces a semi-automatic annotation method for innovative sentences (IS) with the assistance of expert comments information and proposes a data augmentation method by SAO reconstruction to augment the training dataset. Within this paper, we compared and analyzed the effectiveness of multiple algorithms for recognizing IS within academic articles. This study utilized the full text of academic articles as the research subject and employed the semi-automatic method to annotate IS for creating the training dataset. Then, this study validated the effectiveness of the semi-automatic annotation method through manual inspection and compared it with rule-based annotation methods. Additionally, the impacts of different augmentation ratios on model performance were also explored. The empirical results reveal the following: (1) The semi-automatic annotation method proposed in this study achieves an accuracy rate of 0.87239, ensuring the validity of annotated data while reducing the manual annotation cost. (2) The SAO reconstruction for data augmentation method significantly improved the accuracy of machine learning and deep learning algorithms in the recognition of IS. (3) When the augmentation ratio in the training set was set to 50%, the trained GPT-2 model was superior to other algorithms, achieving an ACC of 0.97883 in the test set and an F1 score of 0.95505 in practical application.</p>","PeriodicalId":21755,"journal":{"name":"Scientometrics","volume":"150 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141882449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}