RefCit2vec: embedding models considering references and citations for measuring document similarity

IF 3.5 3区 管理学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Scientometrics Pub Date : 2024-07-10 DOI:10.1007/s11192-024-05067-3
Chien-chih Huang, Kuang-hua Chen
{"title":"RefCit2vec: embedding models considering references and citations for measuring document similarity","authors":"Chien-chih Huang, Kuang-hua Chen","doi":"10.1007/s11192-024-05067-3","DOIUrl":null,"url":null,"abstract":"<p>This study outlines the intellectual structure of Library and Information Science in terms of the venues with RefCit2vec, an embedding method inspired by word2vec. The reference lists or cited-by lists of 62,077 articles in 35 venues (journals and proceedings) between 1928 and 2022 are converted into real number vectors by four independent models of RefCit2vec. The document similarities measured by the two models of RefCit2vec exhibit moderate correlations with bibliographical coupling metrics. In contrast, the similarities from the other two models moderately or strongly correlate with co-citation metrics. Each venue is represented by its centroid, the average vector of its constituent documents. By applying hierarchical agglomerative clustering on the venue centroids, 69% of venues robustly emerge in 6 out of 8 clusters. Four clusters consistently form the library-related branch. The bibliometrics/scientometrics branch contains only 1 cluster, whereas the information-related branch contains 3 clusters. 43% of venues are in six subgroups of consistent tree structures. An article is defined as SCIM-alike for it is closer to the SCIM centroid than half of SCIM articles are. 10% of JASIST articles are SCIM-alike upon their reference lists, and 5% of JASIST articles are SCIM-alike in terms of their cited-by lists. The percentage of SCIM-alike articles in JASIST hiked above the average between 2008 and 2018 but has dropped below the average since 2019. As we demonstrate the dynamics in LIS, citation embedding methods like RefCit2vec can incorporate citation-based, text-based, or authorship features to contribute to varied scenarios in investigating or exploring research fronts and scientific knowledge transfer.</p>","PeriodicalId":21755,"journal":{"name":"Scientometrics","volume":null,"pages":null},"PeriodicalIF":3.5000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientometrics","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1007/s11192-024-05067-3","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

This study outlines the intellectual structure of Library and Information Science in terms of the venues with RefCit2vec, an embedding method inspired by word2vec. The reference lists or cited-by lists of 62,077 articles in 35 venues (journals and proceedings) between 1928 and 2022 are converted into real number vectors by four independent models of RefCit2vec. The document similarities measured by the two models of RefCit2vec exhibit moderate correlations with bibliographical coupling metrics. In contrast, the similarities from the other two models moderately or strongly correlate with co-citation metrics. Each venue is represented by its centroid, the average vector of its constituent documents. By applying hierarchical agglomerative clustering on the venue centroids, 69% of venues robustly emerge in 6 out of 8 clusters. Four clusters consistently form the library-related branch. The bibliometrics/scientometrics branch contains only 1 cluster, whereas the information-related branch contains 3 clusters. 43% of venues are in six subgroups of consistent tree structures. An article is defined as SCIM-alike for it is closer to the SCIM centroid than half of SCIM articles are. 10% of JASIST articles are SCIM-alike upon their reference lists, and 5% of JASIST articles are SCIM-alike in terms of their cited-by lists. The percentage of SCIM-alike articles in JASIST hiked above the average between 2008 and 2018 but has dropped below the average since 2019. As we demonstrate the dynamics in LIS, citation embedding methods like RefCit2vec can incorporate citation-based, text-based, or authorship features to contribute to varied scenarios in investigating or exploring research fronts and scientific knowledge transfer.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
RefCit2vec:考虑参考文献和引文的嵌入模型,用于测量文档相似性
本研究利用 RefCit2vec(一种受 word2vec 启发的嵌入方法)从文献库的角度概述了图书馆与信息科学的知识结构。四种独立的 RefCit2vec 模型将 1928 年至 2022 年间 35 种文献(期刊和论文集)中 62,077 篇文章的参考文献列表或被引文献列表转换为实数向量。RefCit2vec 的两个模型测得的文献相似度与书目耦合度量表现出适度的相关性。相比之下,其他两个模型测得的相似性与共引指标呈中度或强度相关。每个地点由其中心点表示,中心点是其组成文档的平均向量。通过对场馆中心点进行分层聚类,8 个聚类中有 6 个聚类稳健地出现了 69% 的场馆。四个聚类一致地形成了图书馆相关分支。文献计量学/科学计量学分支只包含 1 个聚类,而信息相关分支包含 3 个聚类。43% 的场馆属于 6 个树形结构一致的分组。如果一篇文章比一半的 SCIM 文章更接近 SCIM 中心点,则该文章被定义为 SCIM-alike。10%的JASIST文章在参考文献列表中与SCIM相似,5%的JASIST文章在被引文献列表中与SCIM相似。2008年至2018年期间,JASIST中SCIM-alike文章的比例高于平均水平,但自2019年以来,该比例降至平均水平以下。正如我们在 LIS 中展示的动态一样,像 RefCit2vec 这样的引文嵌入方法可以结合基于引文、基于文本或作者身份的特征,为调查或探索研究前沿和科学知识转移的各种场景做出贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Scientometrics
Scientometrics 管理科学-计算机:跨学科应用
CiteScore
7.20
自引率
17.90%
发文量
351
审稿时长
1.5 months
期刊介绍: Scientometrics aims at publishing original studies, short communications, preliminary reports, review papers, letters to the editor and book reviews on scientometrics. The topics covered are results of research concerned with the quantitative features and characteristics of science. Emphasis is placed on investigations in which the development and mechanism of science are studied by means of (statistical) mathematical methods. The Journal also provides the reader with important up-to-date information about international meetings and events in scientometrics and related fields. Appropriate bibliographic compilations are published as a separate section. Due to its fully interdisciplinary character, Scientometrics is indispensable to research workers and research administrators throughout the world. It provides valuable assistance to librarians and documentalists in central scientific agencies, ministries, research institutes and laboratories. Scientometrics includes the Journal of Research Communication Studies. Consequently its aims and scope cover that of the latter, namely, to bring the results of research investigations together in one place, in such a form that they will be of use not only to the investigators themselves but also to the entrepreneurs and research workers who form the object of these studies.
期刊最新文献
Through the secret gate: a study of member-contributed submissions in PNAS Breach of academic values and misconduct: the case of Sci-Hub Measuring the global and domestic technological impact of Chinese scientific output: a patent-to-paper citation analysis of science-technology linkage Evolving patterns of extreme publishing behavior across science Automated taxonomy alignment via large language models: bridging the gap between knowledge domains
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1