Early identification of scientific breakthroughs through outlier analysis based on research entities

IF 1.5 3区管理学 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE Journal of Data and Information Science Pub Date : 2024-09-04 DOI:10.2478/jdis-2024-0027

Yang Zhao, Mengting Zhang, Xiaoli Chen, Zhixiong Zhang

{"title":"Early identification of scientific breakthroughs through outlier analysis based on research entities","authors":"Yang Zhao, Mengting Zhang, Xiaoli Chen, Zhixiong Zhang","doi":"10.2478/jdis-2024-0027","DOIUrl":null,"url":null,"abstract":"Purpose To address the “anomalies” that occur when scientific breakthroughs emerge, this study focuses on identifying early signs and nascent stages of breakthrough innovations from the perspective of outliers, aiming to achieve early identification of scientific breakthroughs in papers. Design/methodology/approach This study utilizes semantic technology to extract research entities from the titles and abstracts of papers to represent each paper’s research content. Outlier detection methods are then employed to measure and analyze the anomalies in breakthrough papers during their early stages. The development and evolution process are traced using literature time tags. Finally, a case study is conducted using the key publications of the 2021 Nobel Prize laureates in Physiology or Medicine. Findings Through manual analysis of all identified outlier papers, the effectiveness of the proposed method for early identifying potential scientific breakthroughs is verified. Research limitations The study’s applicability has only been empirically tested in the biomedical field. More data from various fields are needed to validate the robustness and generalizability of the method. Practical implications This study provides a valuable supplement to current methods for early identification of scientific breakthroughs, effectively supporting technological intelligence decision-making and services. Originality/Value The study introduces a novel approach to early identification of scientific breakthroughs by leveraging outlier analysis of research entities, offering a more sensitive, precise, and fine-grained alternative method compared to traditional citation-based evaluations, which enhances the ability to identify nascent breakthrough innovations.","PeriodicalId":44622,"journal":{"name":"Journal of Data and Information Science","volume":"49 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data and Information Science","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.2478/jdis-2024-0027","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose To address the “anomalies” that occur when scientific breakthroughs emerge, this study focuses on identifying early signs and nascent stages of breakthrough innovations from the perspective of outliers, aiming to achieve early identification of scientific breakthroughs in papers. Design/methodology/approach This study utilizes semantic technology to extract research entities from the titles and abstracts of papers to represent each paper’s research content. Outlier detection methods are then employed to measure and analyze the anomalies in breakthrough papers during their early stages. The development and evolution process are traced using literature time tags. Finally, a case study is conducted using the key publications of the 2021 Nobel Prize laureates in Physiology or Medicine. Findings Through manual analysis of all identified outlier papers, the effectiveness of the proposed method for early identifying potential scientific breakthroughs is verified. Research limitations The study’s applicability has only been empirically tested in the biomedical field. More data from various fields are needed to validate the robustness and generalizability of the method. Practical implications This study provides a valuable supplement to current methods for early identification of scientific breakthroughs, effectively supporting technological intelligence decision-making and services. Originality/Value The study introduces a novel approach to early identification of scientific breakthroughs by leveraging outlier analysis of research entities, offering a more sensitive, precise, and fine-grained alternative method compared to traditional citation-based evaluations, which enhances the ability to identify nascent breakthrough innovations.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过基于研究实体的离群值分析，及早发现科学突破

目的针对科学突破出现时的 "异常现象"，本研究侧重于从异常值的角度识别突破性创新的早期征兆和萌芽阶段，旨在实现对论文中科学突破的早期识别。设计/方法/途径本研究利用语义技术从论文标题和摘要中提取研究实体，以代表每篇论文的研究内容。然后采用离群点检测方法来测量和分析突破性论文早期阶段的异常情况。利用文献时间标签追踪论文的发展和演变过程。最后，利用 2021 年诺贝尔生理学或医学奖得主的主要论文进行案例研究。研究结果通过对所有识别出的离群论文进行人工分析，验证了所提出的方法在早期识别潜在科学突破方面的有效性。研究局限性该研究的适用性仅在生物医学领域进行了经验测试。需要更多来自不同领域的数据来验证该方法的稳健性和可推广性。实践意义本研究为当前早期识别科学突破的方法提供了宝贵的补充，有效地支持了科技情报决策和服务。原创性/价值该研究通过对研究实体的离群值分析，引入了一种早期识别科学突破的新方法，与传统的基于引文的评估相比，提供了一种更灵敏、更精确、更精细的替代方法，提高了识别新生突破性创新的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Data and Information Science INFORMATION SCIENCE & LIBRARY SCIENCE-

CiteScore

3.50

自引率

6.70%

发文量

495

期刊介绍： JDIS devotes itself to the study and application of the theories, methods, techniques, services, infrastructural facilities using big data to support knowledge discovery for decision & policy making. The basic emphasis is big data-based, analytics centered, knowledge discovery driven, and decision making supporting. The special effort is on the knowledge discovery to detect and predict structures, trends, behaviors, relations, evolutions and disruptions in research, innovation, business, politics, security, media and communications, and social development, where the big data may include metadata or full content data, text or non-textural data, structured or non-structural data, domain specific or cross-domain data, and dynamic or interactive data. The main areas of interest are: (1) New theories, methods, and techniques of big data based data mining, knowledge discovery, and informatics, including but not limited to scientometrics, communication analysis, social network analysis, tech & industry analysis, competitive intelligence, knowledge mapping, evidence based policy analysis, and predictive analysis. (2) New methods, architectures, and facilities to develop or improve knowledge infrastructure capable to support knowledge organization and sophisticated analytics, including but not limited to ontology construction, knowledge organization, semantic linked data, knowledge integration and fusion, semantic retrieval, domain specific knowledge infrastructure, and semantic sciences. (3) New mechanisms, methods, and tools to embed knowledge analytics and knowledge discovery into actual operation, service, or managerial processes, including but not limited to knowledge assisted scientific discovery, data mining driven intelligent workflows in learning, communications, and management. Specific topic areas may include: Knowledge organization Knowledge discovery and data mining Knowledge integration and fusion Semantic Web metrics Scientometrics Analytic and diagnostic informetrics Competitive intelligence Predictive analysis Social network analysis and metrics Semantic and interactively analytic retrieval Evidence-based policy analysis Intelligent knowledge production Knowledge-driven workflow management and decision-making Knowledge-driven collaboration and its management Domain knowledge infrastructure with knowledge fusion and analytics Development of data and information services