Juan Pablo Bascur, Suzan Verberne, Nees Jan van Eck, Ludo Waltman
{"title":"Which topics are best represented by science maps? An analysis of clustering effectiveness for citation and text similarity networks","authors":"Juan Pablo Bascur, Suzan Verberne, Nees Jan van Eck, Ludo Waltman","doi":"arxiv-2406.06454","DOIUrl":null,"url":null,"abstract":"A science map of topics is a visualization that shows topics identified\nalgorithmically based on the bibliographic metadata of scientific publications.\nIn practice not all topics are well represented in a science map. We analyzed\nhow effectively different topics are represented in science maps created by\nclustering biomedical publications. To achieve this, we investigated which\ntopic categories, obtained from MeSH terms, are better represented in science\nmaps based on citation or text similarity networks. To evaluate the clustering\neffectiveness of topics, we determined the extent to which documents belonging\nto the same topic are grouped together in the same cluster. We found that the\nbest and worst represented topic categories are the same for citation and text\nsimilarity networks. The best represented topic categories are diseases,\npsychology, anatomy, organisms and the techniques and equipment used for\ndiagnostics and therapy, while the worst represented topic categories are\nnatural science fields, geographical entities, information sciences and health\ncare and occupations. Furthermore, for the diseases and organisms topic\ncategories and for science maps with smaller clusters, we found that topics\ntend to be better represented in citation similarity networks than in text\nsimilarity networks.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"68 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.06454","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A science map of topics is a visualization that shows topics identified
algorithmically based on the bibliographic metadata of scientific publications.
In practice not all topics are well represented in a science map. We analyzed
how effectively different topics are represented in science maps created by
clustering biomedical publications. To achieve this, we investigated which
topic categories, obtained from MeSH terms, are better represented in science
maps based on citation or text similarity networks. To evaluate the clustering
effectiveness of topics, we determined the extent to which documents belonging
to the same topic are grouped together in the same cluster. We found that the
best and worst represented topic categories are the same for citation and text
similarity networks. The best represented topic categories are diseases,
psychology, anatomy, organisms and the techniques and equipment used for
diagnostics and therapy, while the worst represented topic categories are
natural science fields, geographical entities, information sciences and health
care and occupations. Furthermore, for the diseases and organisms topic
categories and for science maps with smaller clusters, we found that topics
tend to be better represented in citation similarity networks than in text
similarity networks.