PP54 机器学习加速文献综述筛选

IF 2.6 4区医学 Q2 HEALTH CARE SCIENCES & SERVICES International Journal of Technology Assessment in Health Care Pub Date : 2023-12-14 DOI:10.1017/s0266462323001988

Mary Chappell, Mary Edwards, Deborah Watkins, Christopher Marshall, Lavinia Ferrante di Ruffano, Anita Fitzgerald, Sara Graziadio

{"title":"PP54 机器学习加速文献综述筛选","authors":"Mary Chappell, Mary Edwards, Deborah Watkins, Christopher Marshall, Lavinia Ferrante di Ruffano, Anita Fitzgerald, Sara Graziadio","doi":"10.1017/s0266462323001988","DOIUrl":null,"url":null,"abstract":"IntroductionSystematic reviews are important for informing decision-making and primary research, but they can be time consuming and costly. With the advent of machine learning, there is an opportunity to accelerate the review process in study screening. We aimed to understand the literature to make decisions about the use of machine learning for screening in our review workflow.MethodsA pragmatic literature review of PubMed to obtain studies evaluating the accuracy of publicly available machine learning screening tools. A single reviewer used ‘snowballing’ searches to identify studies reporting accuracy data and extracted the sensitivity (ability to correctly identify included studies for a review) and specificity, or workload saved (ability to correctly exclude irrelevant studies).ResultsTen tools (AbstractR, ASReview Lab, Cochrane RCT classifier, Concept encoder, Dpedia, DistillerAI, Rayyan, Research Screener, Robot Analyst, SWIFT-active screener) were evaluated in a total of 16 studies. Fourteen studies were single arm where, although compared with a reference standard (predominantly single reviewer screening), there was no other comparator. Two studies were comparative, where tools were compared with other tools as well as a reference standard. All tools ranked records by probability of inclusion and either (i) applied a cut-point to exclude records or (ii) were used to rank and re-rank records during screening iterations, with screening continuing until most relevant records were obtained. The accuracy of tools varied widely between different studies and review projects. When used in method (ii), at 95 percent to 100 percent sensitivity, tools achieved workload savings of between 7 percent and 99 percent. It was unclear whether evaluations were conducted independent of tool developers.ConclusionsEvaluations suggest the potential for tools to correctly classify studies in screening. However, conclusions are limited since (i) tool accuracy is generally not compared with dual reviewer screening and (ii) the literature lacks comparative studies and, because of between-study heterogeneity, it is not possible to robustly determine the accuracy of tools compared with each other. Independent evaluations are needed.","PeriodicalId":14467,"journal":{"name":"International Journal of Technology Assessment in Health Care","volume":null,"pages":null},"PeriodicalIF":2.6000,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PP54 Machine Learning For Accelerating Screening In Literature Reviews\",\"authors\":\"Mary Chappell, Mary Edwards, Deborah Watkins, Christopher Marshall, Lavinia Ferrante di Ruffano, Anita Fitzgerald, Sara Graziadio\",\"doi\":\"10.1017/s0266462323001988\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"IntroductionSystematic reviews are important for informing decision-making and primary research, but they can be time consuming and costly. With the advent of machine learning, there is an opportunity to accelerate the review process in study screening. We aimed to understand the literature to make decisions about the use of machine learning for screening in our review workflow.MethodsA pragmatic literature review of PubMed to obtain studies evaluating the accuracy of publicly available machine learning screening tools. A single reviewer used ‘snowballing’ searches to identify studies reporting accuracy data and extracted the sensitivity (ability to correctly identify included studies for a review) and specificity, or workload saved (ability to correctly exclude irrelevant studies).ResultsTen tools (AbstractR, ASReview Lab, Cochrane RCT classifier, Concept encoder, Dpedia, DistillerAI, Rayyan, Research Screener, Robot Analyst, SWIFT-active screener) were evaluated in a total of 16 studies. Fourteen studies were single arm where, although compared with a reference standard (predominantly single reviewer screening), there was no other comparator. Two studies were comparative, where tools were compared with other tools as well as a reference standard. All tools ranked records by probability of inclusion and either (i) applied a cut-point to exclude records or (ii) were used to rank and re-rank records during screening iterations, with screening continuing until most relevant records were obtained. The accuracy of tools varied widely between different studies and review projects. When used in method (ii), at 95 percent to 100 percent sensitivity, tools achieved workload savings of between 7 percent and 99 percent. It was unclear whether evaluations were conducted independent of tool developers.ConclusionsEvaluations suggest the potential for tools to correctly classify studies in screening. However, conclusions are limited since (i) tool accuracy is generally not compared with dual reviewer screening and (ii) the literature lacks comparative studies and, because of between-study heterogeneity, it is not possible to robustly determine the accuracy of tools compared with each other. Independent evaluations are needed.\",\"PeriodicalId\":14467,\"journal\":{\"name\":\"International Journal of Technology Assessment in Health Care\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2023-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Technology Assessment in Health Care\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1017/s0266462323001988\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Technology Assessment in Health Care","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1017/s0266462323001988","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

系统评价对于为决策和初步研究提供信息是重要的，但是它们可能是耗时和昂贵的。随着机器学习的出现，有机会加速研究筛选的审查过程。我们的目标是了解文献，以便在我们的审查工作流程中决定使用机器学习进行筛选。方法对PubMed的实用文献进行综述，以获得评估公开可用机器学习筛选工具准确性的研究。单一审稿人使用“滚雪球”搜索来识别报告准确性数据的研究，并提取敏感性(正确识别纳入研究的能力)和特异性，或节省工作量(正确排除无关研究的能力)。结果共对16项研究中的10个工具(AbstractR、ASReview Lab、Cochrane RCT分类器、Concept encoder、Dpedia、DistillerAI、Rayyan、Research Screener、Robot Analyst、SWIFT-active Screener)进行了评估。14项研究是单组研究，虽然与参考标准(主要是单一审稿人筛选)进行比较，但没有其他比较物。两项研究是比较的，其中工具与其他工具以及参考标准进行了比较。所有工具根据纳入的概率对记录进行排序，或者(i)应用切点来排除记录，或者(ii)在筛选迭代过程中对记录进行排序和重新排序，直到获得最相关的记录。工具的准确性在不同的研究和综述项目之间差异很大。当在方法(ii)中使用时，在95%到100%的灵敏度下，工具实现了7%到99%的工作量节省。目前尚不清楚评估是否独立于工具开发人员进行。结论评价提示了在筛选中使用正确分类研究的工具的潜力。然而，结论是有限的，因为(i)工具的准确性通常没有与双重审稿人筛选进行比较，(ii)文献缺乏比较研究，并且由于研究之间的异质性，不可能可靠地确定工具相互比较的准确性。需要独立的评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

PP54 Machine Learning For Accelerating Screening In Literature Reviews

Introduction

Systematic reviews are important for informing decision-making and primary research, but they can be time consuming and costly. With the advent of machine learning, there is an opportunity to accelerate the review process in study screening. We aimed to understand the literature to make decisions about the use of machine learning for screening in our review workflow.

Methods

A pragmatic literature review of PubMed to obtain studies evaluating the accuracy of publicly available machine learning screening tools. A single reviewer used ‘snowballing’ searches to identify studies reporting accuracy data and extracted the sensitivity (ability to correctly identify included studies for a review) and specificity, or workload saved (ability to correctly exclude irrelevant studies).

Results

Ten tools (AbstractR, ASReview Lab, Cochrane RCT classifier, Concept encoder, Dpedia, DistillerAI, Rayyan, Research Screener, Robot Analyst, SWIFT-active screener) were evaluated in a total of 16 studies. Fourteen studies were single arm where, although compared with a reference standard (predominantly single reviewer screening), there was no other comparator. Two studies were comparative, where tools were compared with other tools as well as a reference standard. All tools ranked records by probability of inclusion and either (i) applied a cut-point to exclude records or (ii) were used to rank and re-rank records during screening iterations, with screening continuing until most relevant records were obtained. The accuracy of tools varied widely between different studies and review projects. When used in method (ii), at 95 percent to 100 percent sensitivity, tools achieved workload savings of between 7 percent and 99 percent. It was unclear whether evaluations were conducted independent of tool developers.

Conclusions

Evaluations suggest the potential for tools to correctly classify studies in screening. However, conclusions are limited since (i) tool accuracy is generally not compared with dual reviewer screening and (ii) the literature lacks comparative studies and, because of between-study heterogeneity, it is not possible to robustly determine the accuracy of tools compared with each other. Independent evaluations are needed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Technology Assessment in Health Care 医学-公共卫生、环境卫生与职业卫生

CiteScore

4.40

自引率

15.60%

发文量

116

审稿时长

6-12 weeks

期刊介绍： International Journal of Technology Assessment in Health Care serves as a forum for the wide range of health policy makers and professionals interested in the economic, social, ethical, medical and public health implications of health technology. It covers the development, evaluation, diffusion and use of health technology, as well as its impact on the organization and management of health care systems and public health. In addition to general essays and research reports, regular columns on technology assessment reports and thematic sections are published.