Comparing semantic representation methods for keyword analysis in bibliometric research

IF 3.5 2区管理学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Journal of Informetrics Pub Date : 2024-04-05 DOI:10.1016/j.joi.2024.101529

Guo Chen , Siqi Hong , Chenxin Du , Panting Wang , Zeyu Yang , Lu Xiao

{"title":"Comparing semantic representation methods for keyword analysis in bibliometric research","authors":"Guo Chen , Siqi Hong , Chenxin Du , Panting Wang , Zeyu Yang , Lu Xiao","doi":"10.1016/j.joi.2024.101529","DOIUrl":null,"url":null,"abstract":"<div><p>Semantic representation methods play a crucial role in text mining tasks. Although numerous approaches have been proposed and compared in text mining research, the comparison of semantic representation methods specifically for publication keywords in bibliometric studies has received limited attention. This lack of practical evidence makes it challenging for researchers to select suitable methods to obtain keyword vectors for downstream bibliometric tasks, potentially hindering the achievement of optimal results. To address this gap, this study conducts an experimental comparison of various typical semantic representation methods for keywords, aiming to provide quantitative evidence for bibliometric studies. The experiment focuses on keyword clustering as the fundamental task and evaluates 22 variations of five typical methods across four scientific domains. The methods compared are co-word matrix, co-word network, word embedding, network embedding, and “semantic + structure” integration. The comparison is based on fitting the clustering results of these methods with the “evaluation standard” specific to each domain. The empirical findings demonstrate that the co-word matrix exhibits subpar performance, whereas the co-word network and word embedding techniques display satisfactory performance. Among the five network embedding algorithms, LINE and Node2Vec outperform DeepWalk, Struc2Vec, and SDNE. Remarkably, both the “pre-training and fine-tuning” model and the “semantic + structure” model yield unsatisfactory results in terms of performance. Nevertheless, even with variations in the performance of these methods, no singular approach stands out as universally superior. When selecting methods in practical applications, comprehensive consideration of factors such as corpus size and semantic cohesion of domain keywords is crucial. This study advances our understanding of semantic representation methods for keyword analysis and contributes to the advancement of bibliometric analysis by providing valuable recommendations for researchers in selecting appropriate methods.</p></div>","PeriodicalId":48662,"journal":{"name":"Journal of Informetrics","volume":"18 3","pages":"Article 101529"},"PeriodicalIF":3.5000,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Informetrics","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1751157724000427","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Semantic representation methods play a crucial role in text mining tasks. Although numerous approaches have been proposed and compared in text mining research, the comparison of semantic representation methods specifically for publication keywords in bibliometric studies has received limited attention. This lack of practical evidence makes it challenging for researchers to select suitable methods to obtain keyword vectors for downstream bibliometric tasks, potentially hindering the achievement of optimal results. To address this gap, this study conducts an experimental comparison of various typical semantic representation methods for keywords, aiming to provide quantitative evidence for bibliometric studies. The experiment focuses on keyword clustering as the fundamental task and evaluates 22 variations of five typical methods across four scientific domains. The methods compared are co-word matrix, co-word network, word embedding, network embedding, and “semantic + structure” integration. The comparison is based on fitting the clustering results of these methods with the “evaluation standard” specific to each domain. The empirical findings demonstrate that the co-word matrix exhibits subpar performance, whereas the co-word network and word embedding techniques display satisfactory performance. Among the five network embedding algorithms, LINE and Node2Vec outperform DeepWalk, Struc2Vec, and SDNE. Remarkably, both the “pre-training and fine-tuning” model and the “semantic + structure” model yield unsatisfactory results in terms of performance. Nevertheless, even with variations in the performance of these methods, no singular approach stands out as universally superior. When selecting methods in practical applications, comprehensive consideration of factors such as corpus size and semantic cohesion of domain keywords is crucial. This study advances our understanding of semantic representation methods for keyword analysis and contributes to the advancement of bibliometric analysis by providing valuable recommendations for researchers in selecting appropriate methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

比较文献计量学研究中关键词分析的语义表示方法

语义表示方法在文本挖掘任务中起着至关重要的作用。虽然在文本挖掘研究中已经提出并比较了许多方法，但专门针对文献计量学研究中的出版物关键词的语义表示方法的比较却受到了有限的关注。这种缺乏实际证据的情况使得研究人员在为下游文献计量学任务选择合适的方法来获取关键词向量时面临挑战，可能会阻碍取得最佳结果。为弥补这一不足，本研究对各种典型的关键词语义表示方法进行了实验比较，旨在为文献计量学研究提供定量证据。实验以关键词聚类为基本任务，评估了四个科学领域中五种典型方法的 22 种变体。比较的方法包括共词矩阵、共词网络、词嵌入、网络嵌入和 "语义 + 结构 "整合。比较的基础是将这些方法的聚类结果与每个领域特有的 "评价标准 "进行拟合。实证结果表明，共词矩阵表现不佳，而共词网络和词嵌入技术则表现令人满意。在五种网络嵌入算法中，LINE 和 Node2Vec 的性能优于 DeepWalk、Struc2Vec 和 SDNE。值得注意的是，"预训练和微调 "模型和 "语义 + 结构 "模型的性能结果都不尽如人意。尽管如此，即使这些方法的性能各不相同，也没有哪一种方法具有普遍的优越性。在实际应用中选择方法时，综合考虑语料库规模和领域关键词的语义内聚性等因素至关重要。本研究加深了我们对关键词分析的语义表示方法的理解，并为研究人员选择合适的方法提供了宝贵的建议，从而推动了文献计量学分析的发展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Informetrics Social Sciences-Library and Information Sciences

CiteScore

6.40

自引率

16.20%

发文量

期刊介绍： Journal of Informetrics (JOI) publishes rigorous high-quality research on quantitative aspects of information science. The main focus of the journal is on topics in bibliometrics, scientometrics, webometrics, patentometrics, altmetrics and research evaluation. Contributions studying informetric problems using methods from other quantitative fields, such as mathematics, statistics, computer science, economics and econometrics, and network science, are especially encouraged. JOI publishes both theoretical and empirical work. In general, case studies, for instance a bibliometric analysis focusing on a specific research field or a specific country, are not considered suitable for publication in JOI, unless they contain innovative methodological elements.