{"title":"基于论文实体映射的 EST 检测研究:科学文本建模与神经先知的结合","authors":"Dejian Yu, Bo Xiang","doi":"10.1016/j.joi.2024.101551","DOIUrl":null,"url":null,"abstract":"<div><p>Existing studies on the detection of emerging scientific topics (ESTs) overemphasize the newness and neglect content innovation of knowledge. Moreover, they also ignore the lag existing in knowledge diffusion. In this paper, we propose a four-stage detection framework for ESTs that maps emerging attributes from paper entities to scientific topics. Empirical studies based on two significantly different disciplinary datasets, IS-LS, and AI, which contain 73,601 and 255,620 publications, respectively, are employed to validate our approach. First, we generate 29 and 47 candidate scientific topics based on topic modeling, respectively. Second, we represent the novelty of paper entities based on pre-trained language models, which is mapped to scientific topic entities along with knowledge distributions to obtain topic emerging attributes: topic novelty, relative share and growth. Third, we propose to predict future trends of these attributes with Neural Prophet, which outperforms four baseline models in <span><math><msup><mrow><mi>R</mi></mrow><mn>2</mn></msup></math></span>, <span><math><mrow><mi>M</mi><mi>A</mi><mi>E</mi></mrow></math></span> and <span><math><mrow><mi>R</mi><mi>M</mi><mi>S</mi><mi>E</mi></mrow></math></span>. Finally, combining future values of candidate scientific topics, they are grouped into 8 clusters containing two ESTs types through strategic market theory and clustering model. From the correlation and feature distribution analysis of emerging attributes, we discover the existence of resilience and scale advantage in the diffusion of scientific knowledge. There also exists significant uncertainty in previous citation-based scientific topic evaluation patterns caused by the complexity of citation behavior. Overall, this research enriches theoretical knowledge and detection frameworks of ESTs, and provides detailed insights into comprehensive assessment and dissemination of scientific topics.</p></div>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An ESTs detection research based on paper entity mapping: Combining scientific text modeling and neural prophet\",\"authors\":\"Dejian Yu, Bo Xiang\",\"doi\":\"10.1016/j.joi.2024.101551\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Existing studies on the detection of emerging scientific topics (ESTs) overemphasize the newness and neglect content innovation of knowledge. Moreover, they also ignore the lag existing in knowledge diffusion. In this paper, we propose a four-stage detection framework for ESTs that maps emerging attributes from paper entities to scientific topics. Empirical studies based on two significantly different disciplinary datasets, IS-LS, and AI, which contain 73,601 and 255,620 publications, respectively, are employed to validate our approach. First, we generate 29 and 47 candidate scientific topics based on topic modeling, respectively. Second, we represent the novelty of paper entities based on pre-trained language models, which is mapped to scientific topic entities along with knowledge distributions to obtain topic emerging attributes: topic novelty, relative share and growth. Third, we propose to predict future trends of these attributes with Neural Prophet, which outperforms four baseline models in <span><math><msup><mrow><mi>R</mi></mrow><mn>2</mn></msup></math></span>, <span><math><mrow><mi>M</mi><mi>A</mi><mi>E</mi></mrow></math></span> and <span><math><mrow><mi>R</mi><mi>M</mi><mi>S</mi><mi>E</mi></mrow></math></span>. Finally, combining future values of candidate scientific topics, they are grouped into 8 clusters containing two ESTs types through strategic market theory and clustering model. From the correlation and feature distribution analysis of emerging attributes, we discover the existence of resilience and scale advantage in the diffusion of scientific knowledge. There also exists significant uncertainty in previous citation-based scientific topic evaluation patterns caused by the complexity of citation behavior. Overall, this research enriches theoretical knowledge and detection frameworks of ESTs, and provides detailed insights into comprehensive assessment and dissemination of scientific topics.</p></div>\",\"PeriodicalId\":3,\"journal\":{\"name\":\"ACS Applied Electronic Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Electronic Materials\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1751157724000646\",\"RegionNum\":3,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1751157724000646","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
摘要
现有关于新兴科学课题(EST)检测的研究过于强调知识的新颖性,而忽视了知识内容的创新性。此外,它们还忽视了知识传播中存在的滞后性。在本文中,我们提出了一个四阶段 EST 检测框架,该框架将论文实体的新兴属性映射到科学主题。为了验证我们的方法,我们采用了基于 IS-LS 和 AI 这两个明显不同的学科数据集的实证研究,这两个数据集分别包含 73,601 篇和 255,620 篇论文。首先,我们基于主题建模分别生成了 29 个和 47 个候选科学主题。其次,我们基于预先训练好的语言模型来表示论文实体的新颖性,并将其与知识分布一起映射到科学主题实体上,从而得到主题的新兴属性:主题新颖性、相对份额和增长。第三,我们建议使用神经先知预测这些属性的未来趋势,该模型在 R2、MAE 和 RMSE 方面优于四个基线模型。最后,结合候选科学主题的未来价值,通过战略市场理论和聚类模型,将其分为包含两种 EST 类型的 8 个聚类。从新兴属性的相关性和特征分布分析中,我们发现科学知识的传播存在弹性和规模优势。同时,由于引文行为的复杂性,以往基于引文的科学主题评价模式也存在很大的不确定性。总之,本研究丰富了EST的理论知识和检测框架,为科学主题的综合评估和传播提供了详尽的见解。
An ESTs detection research based on paper entity mapping: Combining scientific text modeling and neural prophet
Existing studies on the detection of emerging scientific topics (ESTs) overemphasize the newness and neglect content innovation of knowledge. Moreover, they also ignore the lag existing in knowledge diffusion. In this paper, we propose a four-stage detection framework for ESTs that maps emerging attributes from paper entities to scientific topics. Empirical studies based on two significantly different disciplinary datasets, IS-LS, and AI, which contain 73,601 and 255,620 publications, respectively, are employed to validate our approach. First, we generate 29 and 47 candidate scientific topics based on topic modeling, respectively. Second, we represent the novelty of paper entities based on pre-trained language models, which is mapped to scientific topic entities along with knowledge distributions to obtain topic emerging attributes: topic novelty, relative share and growth. Third, we propose to predict future trends of these attributes with Neural Prophet, which outperforms four baseline models in , and . Finally, combining future values of candidate scientific topics, they are grouped into 8 clusters containing two ESTs types through strategic market theory and clustering model. From the correlation and feature distribution analysis of emerging attributes, we discover the existence of resilience and scale advantage in the diffusion of scientific knowledge. There also exists significant uncertainty in previous citation-based scientific topic evaluation patterns caused by the complexity of citation behavior. Overall, this research enriches theoretical knowledge and detection frameworks of ESTs, and provides detailed insights into comprehensive assessment and dissemination of scientific topics.