通过元标签分析比较研究课题:使用真实世界数字人文数据的多模块机器算法方法

IF 0.6 Q3 INFORMATION SCIENCE & LIBRARY SCIENCE Journal of Scientometric Research Pub Date : 2024-04-15 DOI:10.5530/jscires.13.1.5
Bhaskar Mukherjee, Debasis Majhi, Priya Tiwari, Saloni Chaudhary
{"title":"通过元标签分析比较研究课题:使用真实世界数字人文数据的多模块机器算法方法","authors":"Bhaskar Mukherjee, Debasis Majhi, Priya Tiwari, Saloni Chaudhary","doi":"10.5530/jscires.13.1.5","DOIUrl":null,"url":null,"abstract":"The present study extract, map and compare the lexical and semantic similarity of terms from author-provided keywords with machine extracted terms and topics from titles and abstracts of an inter-disciplinary field like ‘digital humanities’. Author-provided terms (keywords) were first extracted and mapped through visualization software like Gephi and then these extracted terms were compared with terms extracted from title and abstract of the research articles through NLP based statistical modules. Also, the interdisciplinary of significant topics were measured through the Brillouin index. A set of 7483 articles downloaded from Scopus database on the domain of digital humanities and its associated fields were used for the purpose. We observed the researches on digital humanities are spread over a considerable number of concepts like ‘Industry 4.0’, ‘topic modelling, ‘open science’. Further, the machine algorithm-based extraction compared and identified a larger lexical similarity between these author-provided keywords and title-extracted keywords, rather than abstract-extracted keywords. Jaccard similarity of all author-keywords with machine extracted title keywords came 0.83 and SBERT BiEncoder_score was 0.7374. The top research areas extracted from titles, through unsupervised approach of term extraction resulted in topics like digital humanities approach, digital humanities visualization, indicating a strong connection to the discipline of digital humanities. The average interdisciplinarity index of top significant topics came between 1.217 and 1.284, with the highest index value for ‘computational digital humanities’. As this study is based on real-world data, it is highly useful to understand how far machine algorithm-based text extraction can be helpful for information retrieval process.","PeriodicalId":43282,"journal":{"name":"Journal of Scientometric Research","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing Research Topics through Metatags Analysis: A Multi-module Machine Algorithm Approaches Using Real World Data on Digital Humanities\",\"authors\":\"Bhaskar Mukherjee, Debasis Majhi, Priya Tiwari, Saloni Chaudhary\",\"doi\":\"10.5530/jscires.13.1.5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The present study extract, map and compare the lexical and semantic similarity of terms from author-provided keywords with machine extracted terms and topics from titles and abstracts of an inter-disciplinary field like ‘digital humanities’. Author-provided terms (keywords) were first extracted and mapped through visualization software like Gephi and then these extracted terms were compared with terms extracted from title and abstract of the research articles through NLP based statistical modules. Also, the interdisciplinary of significant topics were measured through the Brillouin index. A set of 7483 articles downloaded from Scopus database on the domain of digital humanities and its associated fields were used for the purpose. We observed the researches on digital humanities are spread over a considerable number of concepts like ‘Industry 4.0’, ‘topic modelling, ‘open science’. Further, the machine algorithm-based extraction compared and identified a larger lexical similarity between these author-provided keywords and title-extracted keywords, rather than abstract-extracted keywords. Jaccard similarity of all author-keywords with machine extracted title keywords came 0.83 and SBERT BiEncoder_score was 0.7374. The top research areas extracted from titles, through unsupervised approach of term extraction resulted in topics like digital humanities approach, digital humanities visualization, indicating a strong connection to the discipline of digital humanities. The average interdisciplinarity index of top significant topics came between 1.217 and 1.284, with the highest index value for ‘computational digital humanities’. As this study is based on real-world data, it is highly useful to understand how far machine algorithm-based text extraction can be helpful for information retrieval process.\",\"PeriodicalId\":43282,\"journal\":{\"name\":\"Journal of Scientometric Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2024-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Scientometric Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5530/jscires.13.1.5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"INFORMATION SCIENCE & LIBRARY SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Scientometric Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5530/jscires.13.1.5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0

摘要

本研究从 "数字人文 "等跨学科领域的标题和摘要中,提取、映射和比较从作者提供的关键词与机器提取的术语和主题的词性和语义相似性。首先通过 Gephi 等可视化软件对作者提供的术语(关键词)进行提取和映射,然后通过基于 NLP 的统计模块将这些提取的术语与从研究文章的标题和摘要中提取的术语进行比较。此外,还通过布里渊指数衡量了重要主题的跨学科性。我们使用了从 Scopus 数据库下载的一组 7483 篇有关数字人文领域及其相关领域的文章。我们发现,数字人文研究涉及大量概念,如 "工业 4.0"、"主题建模"、"开放科学 "等。此外,基于机器算法的提取比较并确定了这些作者提供的关键词与标题提取的关键词之间更大的词性相似性,而不是摘要提取的关键词。所有作者关键词与机器提取的标题关键词的 Jaccard 相似度为 0.83,SBERT BiEncoder_score 为 0.7374。通过无监督术语提取方法从标题中提取的顶级研究领域包括数字人文方法、数字人文可视化等主题,这表明这些主题与数字人文学科有着密切联系。最重要主题的平均跨学科指数介于 1.217 和 1.284 之间,其中 "计算数字人文 "的指数值最高。由于这项研究基于真实世界的数据,因此对于了解基于机器算法的文本提取在多大程度上有助于信息检索过程非常有用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Comparing Research Topics through Metatags Analysis: A Multi-module Machine Algorithm Approaches Using Real World Data on Digital Humanities
The present study extract, map and compare the lexical and semantic similarity of terms from author-provided keywords with machine extracted terms and topics from titles and abstracts of an inter-disciplinary field like ‘digital humanities’. Author-provided terms (keywords) were first extracted and mapped through visualization software like Gephi and then these extracted terms were compared with terms extracted from title and abstract of the research articles through NLP based statistical modules. Also, the interdisciplinary of significant topics were measured through the Brillouin index. A set of 7483 articles downloaded from Scopus database on the domain of digital humanities and its associated fields were used for the purpose. We observed the researches on digital humanities are spread over a considerable number of concepts like ‘Industry 4.0’, ‘topic modelling, ‘open science’. Further, the machine algorithm-based extraction compared and identified a larger lexical similarity between these author-provided keywords and title-extracted keywords, rather than abstract-extracted keywords. Jaccard similarity of all author-keywords with machine extracted title keywords came 0.83 and SBERT BiEncoder_score was 0.7374. The top research areas extracted from titles, through unsupervised approach of term extraction resulted in topics like digital humanities approach, digital humanities visualization, indicating a strong connection to the discipline of digital humanities. The average interdisciplinarity index of top significant topics came between 1.217 and 1.284, with the highest index value for ‘computational digital humanities’. As this study is based on real-world data, it is highly useful to understand how far machine algorithm-based text extraction can be helpful for information retrieval process.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Scientometric Research
Journal of Scientometric Research INFORMATION SCIENCE & LIBRARY SCIENCE-
CiteScore
1.30
自引率
12.50%
发文量
52
期刊最新文献
Exploring the Landscape of Autonomous Vehicles Research: A Scientometric Analysis in the Context of Urban Transportation Planning Usability Testing: A Bibliometric Analysis Based on WoS Data Keyphrase-Based Literature Recommendation: Enhancing User Queries with Hybrid Co-citation and Co-occurrence Networks The Development of Research on Investor Sentiment in Emerging and Frontier Markets with the Bibliometric Method Analysis of Emerging Research Areas in Selected African Countries: A Case of Biotechnology-Applied Microbiology Discipline
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1