{"title":"Lead–lag effect of research between conference papers and journal papers in data mining","authors":"Yue Huang, Runyu Tian","doi":"10.1002/widm.1561","DOIUrl":null,"url":null,"abstract":"The examination of the lead–lag effect between different publication types, incorporating a temporal dimension, is very significant for assessing research. In this article, we introduce a novel framework to quantify the lead–lag effect between the research topics of conference papers and journal papers. We first identify research topics via the text‐embedding‐based topic modeling technique BERTopic, then extract the research topics of each time slice, construct and visualize the similarity matrix of topics to reveal the time‐lag direction and finally quantify the lead–lag effect by four proposed indicators, as well as by average influence topic similarity comparison maps. We conduct a detailed analysis of 19,166 bibliographic data for top conference papers and journal papers from 2015 to 2019 in the data mining field, calculate the similarity of topics obtained by BERTopic between each time slice divided by quarters. The results show that journal paper topics lag behind conference paper topics in the data mining field. The most significant lead–lag effect is 2.5 years, with approximately 33.45% of topics affected by this lag. The methodology presented here holds potential for broader application in the analysis of lead–lag effects across diverse research areas, offering valuable insights into the state of research development and informing policy decisions.This article is categorized under:<jats:list list-type=\"simple\"> <jats:list-item>Application Areas > Science and Technology</jats:list-item> </jats:list>","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"35 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"WIREs Data Mining and Knowledge Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/widm.1561","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The examination of the lead–lag effect between different publication types, incorporating a temporal dimension, is very significant for assessing research. In this article, we introduce a novel framework to quantify the lead–lag effect between the research topics of conference papers and journal papers. We first identify research topics via the text‐embedding‐based topic modeling technique BERTopic, then extract the research topics of each time slice, construct and visualize the similarity matrix of topics to reveal the time‐lag direction and finally quantify the lead–lag effect by four proposed indicators, as well as by average influence topic similarity comparison maps. We conduct a detailed analysis of 19,166 bibliographic data for top conference papers and journal papers from 2015 to 2019 in the data mining field, calculate the similarity of topics obtained by BERTopic between each time slice divided by quarters. The results show that journal paper topics lag behind conference paper topics in the data mining field. The most significant lead–lag effect is 2.5 years, with approximately 33.45% of topics affected by this lag. The methodology presented here holds potential for broader application in the analysis of lead–lag effects across diverse research areas, offering valuable insights into the state of research development and informing policy decisions.This article is categorized under:Application Areas > Science and Technology