Accelerating Evidence Synthesis in Observational Studies: Development of a Living Natural Language Processing-Assisted Intelligent Systematic Literature Review System.

IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS JMIR Medical Informatics Pub Date : 2024-10-23 DOI:10.2196/54653
Frank J Manion, Jingcheng Du, Dong Wang, Long He, Bin Lin, Jingqi Wang, Siwei Wang, David Eckels, Jan Cervenka, Peter C Fiduccia, Nicole Cossrow, Lixia Yao
{"title":"Accelerating Evidence Synthesis in Observational Studies: Development of a Living Natural Language Processing-Assisted Intelligent Systematic Literature Review System.","authors":"Frank J Manion, Jingcheng Du, Dong Wang, Long He, Bin Lin, Jingqi Wang, Siwei Wang, David Eckels, Jan Cervenka, Peter C Fiduccia, Nicole Cossrow, Lixia Yao","doi":"10.2196/54653","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Systematic literature review (SLR), a robust method to identify and summarize evidence from published sources, is considered to be a complex, time-consuming, labor-intensive, and expensive task.</p><p><strong>Objective: </strong>This study aimed to present a solution based on natural language processing (NLP) that accelerates and streamlines the SLR process for observational studies using real-world data.</p><p><strong>Methods: </strong>We followed an agile software development and iterative software engineering methodology to build a customized intelligent end-to-end living NLP-assisted solution for observational SLR tasks. Multiple machine learning-based NLP algorithms were adopted to automate article screening and data element extraction processes. The NLP prediction results can be further reviewed and verified by domain experts, following the human-in-the-loop design. The system integrates explainable articificial intelligence to provide evidence for NLP algorithms and add transparency to extracted literature data elements. The system was developed based on 3 existing SLR projects of observational studies, including the epidemiology studies of human papillomavirus-associated diseases, the disease burden of pneumococcal diseases, and cost-effectiveness studies on pneumococcal vaccines.</p><p><strong>Results: </strong>Our Intelligent SLR Platform covers major SLR steps, including study protocol setting, literature retrieval, abstract screening, full-text screening, data element extraction from full-text articles, results summary, and data visualization. The NLP algorithms achieved accuracy scores of 0.86-0.90 on article screening tasks (framed as text classification tasks) and macroaverage F1 scores of 0.57-0.89 on data element extraction tasks (framed as named entity recognition tasks).</p><p><strong>Conclusions: </strong>Cutting-edge NLP algorithms expedite SLR for observational studies, thus allowing scientists to have more time to focus on the quality of data and the synthesis of evidence in observational studies. Aligning the living SLR concept, the system has the potential to update literature data and enable scientists to easily stay current with the literature related to observational studies prospectively and continuously.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e54653"},"PeriodicalIF":3.1000,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11523763/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/54653","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Systematic literature review (SLR), a robust method to identify and summarize evidence from published sources, is considered to be a complex, time-consuming, labor-intensive, and expensive task.

Objective: This study aimed to present a solution based on natural language processing (NLP) that accelerates and streamlines the SLR process for observational studies using real-world data.

Methods: We followed an agile software development and iterative software engineering methodology to build a customized intelligent end-to-end living NLP-assisted solution for observational SLR tasks. Multiple machine learning-based NLP algorithms were adopted to automate article screening and data element extraction processes. The NLP prediction results can be further reviewed and verified by domain experts, following the human-in-the-loop design. The system integrates explainable articificial intelligence to provide evidence for NLP algorithms and add transparency to extracted literature data elements. The system was developed based on 3 existing SLR projects of observational studies, including the epidemiology studies of human papillomavirus-associated diseases, the disease burden of pneumococcal diseases, and cost-effectiveness studies on pneumococcal vaccines.

Results: Our Intelligent SLR Platform covers major SLR steps, including study protocol setting, literature retrieval, abstract screening, full-text screening, data element extraction from full-text articles, results summary, and data visualization. The NLP algorithms achieved accuracy scores of 0.86-0.90 on article screening tasks (framed as text classification tasks) and macroaverage F1 scores of 0.57-0.89 on data element extraction tasks (framed as named entity recognition tasks).

Conclusions: Cutting-edge NLP algorithms expedite SLR for observational studies, thus allowing scientists to have more time to focus on the quality of data and the synthesis of evidence in observational studies. Aligning the living SLR concept, the system has the potential to update literature data and enable scientists to easily stay current with the literature related to observational studies prospectively and continuously.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
加速观察研究中的证据综合:开发活的自然语言处理辅助智能系统文献综述系统。
背景:系统文献综述(SLR)是一种从已发表的文献中识别和总结证据的可靠方法,被认为是一项复杂、耗时、耗力且昂贵的任务:本研究旨在提出一种基于自然语言处理(NLP)的解决方案,利用真实世界的数据加速并简化观察性研究的SLR过程:我们采用敏捷软件开发和迭代软件工程方法,为观察性 SLR 任务构建了一个定制的智能端到端生活 NLP 辅助解决方案。我们采用了多种基于机器学习的 NLP 算法来实现文章筛选和数据元素提取过程的自动化。按照 "人在环中 "的设计,NLP 预测结果可由领域专家进一步审查和验证。该系统集成了可解释人工智能,为 NLP 算法提供证据,并增加了提取文献数据元素的透明度。该系统的开发基于 3 个现有的观察性研究 SLR 项目,包括人类乳头瘤病毒相关疾病的流行病学研究、肺炎球菌疾病的疾病负担研究以及肺炎球菌疫苗的成本效益研究:我们的智能SLR平台涵盖了SLR的主要步骤,包括研究方案设定、文献检索、摘要筛选、全文筛选、从全文中提取数据元素、结果汇总和数据可视化。在文章筛选任务(以文本分类任务为框架)上,NLP 算法的准确率达到了 0.86-0.90 分,在数据元素提取任务(以命名实体识别任务为框架)上,宏观平均 F1 分数为 0.57-0.89 分:前沿的 NLP 算法加快了观察性研究的 SLR,从而使科学家有更多时间专注于观察性研究的数据质量和证据综合。该系统与 "活的SLR "概念相一致,具有更新文献数据的潜力,使科学家能够轻松地、前瞻性地、持续地了解与观察性研究相关的最新文献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
JMIR Medical Informatics
JMIR Medical Informatics Medicine-Health Informatics
CiteScore
7.90
自引率
3.10%
发文量
173
审稿时长
12 weeks
期刊介绍: JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.
期刊最新文献
Factors Contributing to Successful Information System Implementation and Employee Well-Being in Health Care and Social Welfare Professionals: Comparative Cross-Sectional Study. Bidirectional Long Short-Term Memory-Based Detection of Adverse Drug Reaction Posts Using Korean Social Networking Services Data: Deep Learning Approaches. Correlation between Diagnosis-related Group Weights and Nursing Time in the Cardiology Department: A Cross-sectional Study. Data Ownership in the AI-Powered Integrative Health Care Landscape. Medication Prescription Policy for US Veterans With Metastatic Castration-Resistant Prostate Cancer: Causal Machine Learning Approach.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1