PP54 机器学习加速文献综述筛选

IF 2.6 4区 医学 Q2 HEALTH CARE SCIENCES & SERVICES International Journal of Technology Assessment in Health Care Pub Date : 2023-12-14 DOI:10.1017/s0266462323001988
Mary Chappell, Mary Edwards, Deborah Watkins, Christopher Marshall, Lavinia Ferrante di Ruffano, Anita Fitzgerald, Sara Graziadio
{"title":"PP54 机器学习加速文献综述筛选","authors":"Mary Chappell, Mary Edwards, Deborah Watkins, Christopher Marshall, Lavinia Ferrante di Ruffano, Anita Fitzgerald, Sara Graziadio","doi":"10.1017/s0266462323001988","DOIUrl":null,"url":null,"abstract":"<span>Introduction</span><p>Systematic reviews are important for informing decision-making and primary research, but they can be time consuming and costly. With the advent of machine learning, there is an opportunity to accelerate the review process in study screening. We aimed to understand the literature to make decisions about the use of machine learning for screening in our review workflow.</p><span>Methods</span><p>A pragmatic literature review of PubMed to obtain studies evaluating the accuracy of publicly available machine learning screening tools. A single reviewer used ‘snowballing’ searches to identify studies reporting accuracy data and extracted the sensitivity (ability to correctly identify included studies for a review) and specificity, or workload saved (ability to correctly exclude irrelevant studies).</p><span>Results</span><p>Ten tools (AbstractR, ASReview Lab, Cochrane RCT classifier, Concept encoder, Dpedia, DistillerAI, Rayyan, Research Screener, Robot Analyst, SWIFT-active screener) were evaluated in a total of 16 studies. Fourteen studies were single arm where, although compared with a reference standard (predominantly single reviewer screening), there was no other comparator. Two studies were comparative, where tools were compared with other tools as well as a reference standard. All tools ranked records by probability of inclusion and either (i) applied a cut-point to exclude records or (ii) were used to rank and re-rank records during screening iterations, with screening continuing until most relevant records were obtained. The accuracy of tools varied widely between different studies and review projects. When used in method (ii), at 95 percent to 100 percent sensitivity, tools achieved workload savings of between 7 percent and 99 percent. It was unclear whether evaluations were conducted independent of tool developers.</p><span>Conclusions</span><p>Evaluations suggest the potential for tools to correctly classify studies in screening. However, conclusions are limited since (i) tool accuracy is generally not compared with dual reviewer screening and (ii) the literature lacks comparative studies and, because of between-study heterogeneity, it is not possible to robustly determine the accuracy of tools compared with each other. Independent evaluations are needed.</p>","PeriodicalId":14467,"journal":{"name":"International Journal of Technology Assessment in Health Care","volume":null,"pages":null},"PeriodicalIF":2.6000,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PP54 Machine Learning For Accelerating Screening In Literature Reviews\",\"authors\":\"Mary Chappell, Mary Edwards, Deborah Watkins, Christopher Marshall, Lavinia Ferrante di Ruffano, Anita Fitzgerald, Sara Graziadio\",\"doi\":\"10.1017/s0266462323001988\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<span>Introduction</span><p>Systematic reviews are important for informing decision-making and primary research, but they can be time consuming and costly. With the advent of machine learning, there is an opportunity to accelerate the review process in study screening. We aimed to understand the literature to make decisions about the use of machine learning for screening in our review workflow.</p><span>Methods</span><p>A pragmatic literature review of PubMed to obtain studies evaluating the accuracy of publicly available machine learning screening tools. A single reviewer used ‘snowballing’ searches to identify studies reporting accuracy data and extracted the sensitivity (ability to correctly identify included studies for a review) and specificity, or workload saved (ability to correctly exclude irrelevant studies).</p><span>Results</span><p>Ten tools (AbstractR, ASReview Lab, Cochrane RCT classifier, Concept encoder, Dpedia, DistillerAI, Rayyan, Research Screener, Robot Analyst, SWIFT-active screener) were evaluated in a total of 16 studies. Fourteen studies were single arm where, although compared with a reference standard (predominantly single reviewer screening), there was no other comparator. Two studies were comparative, where tools were compared with other tools as well as a reference standard. All tools ranked records by probability of inclusion and either (i) applied a cut-point to exclude records or (ii) were used to rank and re-rank records during screening iterations, with screening continuing until most relevant records were obtained. The accuracy of tools varied widely between different studies and review projects. When used in method (ii), at 95 percent to 100 percent sensitivity, tools achieved workload savings of between 7 percent and 99 percent. It was unclear whether evaluations were conducted independent of tool developers.</p><span>Conclusions</span><p>Evaluations suggest the potential for tools to correctly classify studies in screening. However, conclusions are limited since (i) tool accuracy is generally not compared with dual reviewer screening and (ii) the literature lacks comparative studies and, because of between-study heterogeneity, it is not possible to robustly determine the accuracy of tools compared with each other. Independent evaluations are needed.</p>\",\"PeriodicalId\":14467,\"journal\":{\"name\":\"International Journal of Technology Assessment in Health Care\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2023-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Technology Assessment in Health Care\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1017/s0266462323001988\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Technology Assessment in Health Care","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1017/s0266462323001988","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

系统评价对于为决策和初步研究提供信息是重要的,但是它们可能是耗时和昂贵的。随着机器学习的出现,有机会加速研究筛选的审查过程。我们的目标是了解文献,以便在我们的审查工作流程中决定使用机器学习进行筛选。方法对PubMed的实用文献进行综述,以获得评估公开可用机器学习筛选工具准确性的研究。单一审稿人使用“滚雪球”搜索来识别报告准确性数据的研究,并提取敏感性(正确识别纳入研究的能力)和特异性,或节省工作量(正确排除无关研究的能力)。结果共对16项研究中的10个工具(AbstractR、ASReview Lab、Cochrane RCT分类器、Concept encoder、Dpedia、DistillerAI、Rayyan、Research Screener、Robot Analyst、SWIFT-active Screener)进行了评估。14项研究是单组研究,虽然与参考标准(主要是单一审稿人筛选)进行比较,但没有其他比较物。两项研究是比较的,其中工具与其他工具以及参考标准进行了比较。所有工具根据纳入的概率对记录进行排序,或者(i)应用切点来排除记录,或者(ii)在筛选迭代过程中对记录进行排序和重新排序,直到获得最相关的记录。工具的准确性在不同的研究和综述项目之间差异很大。当在方法(ii)中使用时,在95%到100%的灵敏度下,工具实现了7%到99%的工作量节省。目前尚不清楚评估是否独立于工具开发人员进行。结论评价提示了在筛选中使用正确分类研究的工具的潜力。然而,结论是有限的,因为(i)工具的准确性通常没有与双重审稿人筛选进行比较,(ii)文献缺乏比较研究,并且由于研究之间的异质性,不可能可靠地确定工具相互比较的准确性。需要独立的评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PP54 Machine Learning For Accelerating Screening In Literature Reviews
Introduction

Systematic reviews are important for informing decision-making and primary research, but they can be time consuming and costly. With the advent of machine learning, there is an opportunity to accelerate the review process in study screening. We aimed to understand the literature to make decisions about the use of machine learning for screening in our review workflow.

Methods

A pragmatic literature review of PubMed to obtain studies evaluating the accuracy of publicly available machine learning screening tools. A single reviewer used ‘snowballing’ searches to identify studies reporting accuracy data and extracted the sensitivity (ability to correctly identify included studies for a review) and specificity, or workload saved (ability to correctly exclude irrelevant studies).

Results

Ten tools (AbstractR, ASReview Lab, Cochrane RCT classifier, Concept encoder, Dpedia, DistillerAI, Rayyan, Research Screener, Robot Analyst, SWIFT-active screener) were evaluated in a total of 16 studies. Fourteen studies were single arm where, although compared with a reference standard (predominantly single reviewer screening), there was no other comparator. Two studies were comparative, where tools were compared with other tools as well as a reference standard. All tools ranked records by probability of inclusion and either (i) applied a cut-point to exclude records or (ii) were used to rank and re-rank records during screening iterations, with screening continuing until most relevant records were obtained. The accuracy of tools varied widely between different studies and review projects. When used in method (ii), at 95 percent to 100 percent sensitivity, tools achieved workload savings of between 7 percent and 99 percent. It was unclear whether evaluations were conducted independent of tool developers.

Conclusions

Evaluations suggest the potential for tools to correctly classify studies in screening. However, conclusions are limited since (i) tool accuracy is generally not compared with dual reviewer screening and (ii) the literature lacks comparative studies and, because of between-study heterogeneity, it is not possible to robustly determine the accuracy of tools compared with each other. Independent evaluations are needed.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Technology Assessment in Health Care
International Journal of Technology Assessment in Health Care 医学-公共卫生、环境卫生与职业卫生
CiteScore
4.40
自引率
15.60%
发文量
116
审稿时长
6-12 weeks
期刊介绍: International Journal of Technology Assessment in Health Care serves as a forum for the wide range of health policy makers and professionals interested in the economic, social, ethical, medical and public health implications of health technology. It covers the development, evaluation, diffusion and use of health technology, as well as its impact on the organization and management of health care systems and public health. In addition to general essays and research reports, regular columns on technology assessment reports and thematic sections are published.
期刊最新文献
Development of an MCDA Framework for Rare Disease Reimbursement Prioritization in Malaysia. Experiences of patient organizations' involvement in medicine appraisal and reimbursement processes in Finland - a qualitative study. PP78 Real-World Trends And Medical Costs Of Stroke After Transcatheter Aortic Valve Implantation In Korea: A Nationwide, Population-Based Study Can requests for real-world evidence by the French HTA body be planned? An exhaustive retrospective case-control study of medicinal products appraisals from 2016 to 2021. A systematic review of the cost and cost-effectiveness of immunoglobulin treatment in patients with hematological malignancies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1