{"title":"RefCit2vec: embedding models considering references and citations for measuring document similarity","authors":"Chien-chih Huang, Kuang-hua Chen","doi":"10.1007/s11192-024-05067-3","DOIUrl":null,"url":null,"abstract":"<p>This study outlines the intellectual structure of Library and Information Science in terms of the venues with RefCit2vec, an embedding method inspired by word2vec. The reference lists or cited-by lists of 62,077 articles in 35 venues (journals and proceedings) between 1928 and 2022 are converted into real number vectors by four independent models of RefCit2vec. The document similarities measured by the two models of RefCit2vec exhibit moderate correlations with bibliographical coupling metrics. In contrast, the similarities from the other two models moderately or strongly correlate with co-citation metrics. Each venue is represented by its centroid, the average vector of its constituent documents. By applying hierarchical agglomerative clustering on the venue centroids, 69% of venues robustly emerge in 6 out of 8 clusters. Four clusters consistently form the library-related branch. The bibliometrics/scientometrics branch contains only 1 cluster, whereas the information-related branch contains 3 clusters. 43% of venues are in six subgroups of consistent tree structures. An article is defined as SCIM-alike for it is closer to the SCIM centroid than half of SCIM articles are. 10% of JASIST articles are SCIM-alike upon their reference lists, and 5% of JASIST articles are SCIM-alike in terms of their cited-by lists. The percentage of SCIM-alike articles in JASIST hiked above the average between 2008 and 2018 but has dropped below the average since 2019. As we demonstrate the dynamics in LIS, citation embedding methods like RefCit2vec can incorporate citation-based, text-based, or authorship features to contribute to varied scenarios in investigating or exploring research fronts and scientific knowledge transfer.</p>","PeriodicalId":21755,"journal":{"name":"Scientometrics","volume":"16 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientometrics","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1007/s11192-024-05067-3","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
This study outlines the intellectual structure of Library and Information Science in terms of the venues with RefCit2vec, an embedding method inspired by word2vec. The reference lists or cited-by lists of 62,077 articles in 35 venues (journals and proceedings) between 1928 and 2022 are converted into real number vectors by four independent models of RefCit2vec. The document similarities measured by the two models of RefCit2vec exhibit moderate correlations with bibliographical coupling metrics. In contrast, the similarities from the other two models moderately or strongly correlate with co-citation metrics. Each venue is represented by its centroid, the average vector of its constituent documents. By applying hierarchical agglomerative clustering on the venue centroids, 69% of venues robustly emerge in 6 out of 8 clusters. Four clusters consistently form the library-related branch. The bibliometrics/scientometrics branch contains only 1 cluster, whereas the information-related branch contains 3 clusters. 43% of venues are in six subgroups of consistent tree structures. An article is defined as SCIM-alike for it is closer to the SCIM centroid than half of SCIM articles are. 10% of JASIST articles are SCIM-alike upon their reference lists, and 5% of JASIST articles are SCIM-alike in terms of their cited-by lists. The percentage of SCIM-alike articles in JASIST hiked above the average between 2008 and 2018 but has dropped below the average since 2019. As we demonstrate the dynamics in LIS, citation embedding methods like RefCit2vec can incorporate citation-based, text-based, or authorship features to contribute to varied scenarios in investigating or exploring research fronts and scientific knowledge transfer.
期刊介绍:
Scientometrics aims at publishing original studies, short communications, preliminary reports, review papers, letters to the editor and book reviews on scientometrics. The topics covered are results of research concerned with the quantitative features and characteristics of science. Emphasis is placed on investigations in which the development and mechanism of science are studied by means of (statistical) mathematical methods.
The Journal also provides the reader with important up-to-date information about international meetings and events in scientometrics and related fields. Appropriate bibliographic compilations are published as a separate section. Due to its fully interdisciplinary character, Scientometrics is indispensable to research workers and research administrators throughout the world. It provides valuable assistance to librarians and documentalists in central scientific agencies, ministries, research institutes and laboratories.
Scientometrics includes the Journal of Research Communication Studies. Consequently its aims and scope cover that of the latter, namely, to bring the results of research investigations together in one place, in such a form that they will be of use not only to the investigators themselves but also to the entrepreneurs and research workers who form the object of these studies.