Comparing semantic representation methods for keyword analysis in bibliometric research

IF 4.3 3区 材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC ACS Applied Electronic Materials Pub Date : 2024-04-05 DOI:10.1016/j.joi.2024.101529
Guo Chen , Siqi Hong , Chenxin Du , Panting Wang , Zeyu Yang , Lu Xiao
{"title":"Comparing semantic representation methods for keyword analysis in bibliometric research","authors":"Guo Chen ,&nbsp;Siqi Hong ,&nbsp;Chenxin Du ,&nbsp;Panting Wang ,&nbsp;Zeyu Yang ,&nbsp;Lu Xiao","doi":"10.1016/j.joi.2024.101529","DOIUrl":null,"url":null,"abstract":"<div><p>Semantic representation methods play a crucial role in text mining tasks. Although numerous approaches have been proposed and compared in text mining research, the comparison of semantic representation methods specifically for publication keywords in bibliometric studies has received limited attention. This lack of practical evidence makes it challenging for researchers to select suitable methods to obtain keyword vectors for downstream bibliometric tasks, potentially hindering the achievement of optimal results. To address this gap, this study conducts an experimental comparison of various typical semantic representation methods for keywords, aiming to provide quantitative evidence for bibliometric studies. The experiment focuses on keyword clustering as the fundamental task and evaluates 22 variations of five typical methods across four scientific domains. The methods compared are co-word matrix, co-word network, word embedding, network embedding, and “semantic + structure” integration. The comparison is based on fitting the clustering results of these methods with the “evaluation standard” specific to each domain. The empirical findings demonstrate that the co-word matrix exhibits subpar performance, whereas the co-word network and word embedding techniques display satisfactory performance. Among the five network embedding algorithms, LINE and Node2Vec outperform DeepWalk, Struc2Vec, and SDNE. Remarkably, both the “pre-training and fine-tuning” model and the “semantic + structure” model yield unsatisfactory results in terms of performance. Nevertheless, even with variations in the performance of these methods, no singular approach stands out as universally superior. When selecting methods in practical applications, comprehensive consideration of factors such as corpus size and semantic cohesion of domain keywords is crucial. This study advances our understanding of semantic representation methods for keyword analysis and contributes to the advancement of bibliometric analysis by providing valuable recommendations for researchers in selecting appropriate methods.</p></div>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1751157724000427","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Semantic representation methods play a crucial role in text mining tasks. Although numerous approaches have been proposed and compared in text mining research, the comparison of semantic representation methods specifically for publication keywords in bibliometric studies has received limited attention. This lack of practical evidence makes it challenging for researchers to select suitable methods to obtain keyword vectors for downstream bibliometric tasks, potentially hindering the achievement of optimal results. To address this gap, this study conducts an experimental comparison of various typical semantic representation methods for keywords, aiming to provide quantitative evidence for bibliometric studies. The experiment focuses on keyword clustering as the fundamental task and evaluates 22 variations of five typical methods across four scientific domains. The methods compared are co-word matrix, co-word network, word embedding, network embedding, and “semantic + structure” integration. The comparison is based on fitting the clustering results of these methods with the “evaluation standard” specific to each domain. The empirical findings demonstrate that the co-word matrix exhibits subpar performance, whereas the co-word network and word embedding techniques display satisfactory performance. Among the five network embedding algorithms, LINE and Node2Vec outperform DeepWalk, Struc2Vec, and SDNE. Remarkably, both the “pre-training and fine-tuning” model and the “semantic + structure” model yield unsatisfactory results in terms of performance. Nevertheless, even with variations in the performance of these methods, no singular approach stands out as universally superior. When selecting methods in practical applications, comprehensive consideration of factors such as corpus size and semantic cohesion of domain keywords is crucial. This study advances our understanding of semantic representation methods for keyword analysis and contributes to the advancement of bibliometric analysis by providing valuable recommendations for researchers in selecting appropriate methods.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
比较文献计量学研究中关键词分析的语义表示方法
语义表示方法在文本挖掘任务中起着至关重要的作用。虽然在文本挖掘研究中已经提出并比较了许多方法,但专门针对文献计量学研究中的出版物关键词的语义表示方法的比较却受到了有限的关注。这种缺乏实际证据的情况使得研究人员在为下游文献计量学任务选择合适的方法来获取关键词向量时面临挑战,可能会阻碍取得最佳结果。为弥补这一不足,本研究对各种典型的关键词语义表示方法进行了实验比较,旨在为文献计量学研究提供定量证据。实验以关键词聚类为基本任务,评估了四个科学领域中五种典型方法的 22 种变体。比较的方法包括共词矩阵、共词网络、词嵌入、网络嵌入和 "语义 + 结构 "整合。比较的基础是将这些方法的聚类结果与每个领域特有的 "评价标准 "进行拟合。实证结果表明,共词矩阵表现不佳,而共词网络和词嵌入技术则表现令人满意。在五种网络嵌入算法中,LINE 和 Node2Vec 的性能优于 DeepWalk、Struc2Vec 和 SDNE。值得注意的是,"预训练和微调 "模型和 "语义 + 结构 "模型的性能结果都不尽如人意。尽管如此,即使这些方法的性能各不相同,也没有哪一种方法具有普遍的优越性。在实际应用中选择方法时,综合考虑语料库规模和领域关键词的语义内聚性等因素至关重要。本研究加深了我们对关键词分析的语义表示方法的理解,并为研究人员选择合适的方法提供了宝贵的建议,从而推动了文献计量学分析的发展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.20
自引率
4.30%
发文量
567
期刊最新文献
Vitamin B12: prevention of human beings from lethal diseases and its food application. Current status and obstacles of narrowing yield gaps of four major crops. Cold shock treatment alleviates pitting in sweet cherry fruit by enhancing antioxidant enzymes activity and regulating membrane lipid metabolism. Removal of proteins and lipids affects structure, in vitro digestion and physicochemical properties of rice flour modified by heat-moisture treatment. Investigating the impact of climate variables on the organic honey yield in Turkey using XGBoost machine learning.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1