Extracting PICO Sentences from Clinical Trial Reports using Supervised Distant Supervision.

IF 4.3 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Journal of Machine Learning Research Pub Date : 2016-01-01

Byron C Wallace, Joël Kuiper, Aakash Sharma, Mingxi Brian Zhu, Iain J Marshall

{"title":"Extracting PICO Sentences from Clinical Trial Reports using Supervised Distant Supervision.","authors":"Byron C Wallace, Joël Kuiper, Aakash Sharma, Mingxi Brian Zhu, Iain J Marshall","doi":"","DOIUrl":null,"url":null,"abstract":"Systematic reviews underpin Evidence Based Medicine (EBM) by addressing precise clinical questions via comprehensive synthesis of all relevant published evidence. Authors of systematic reviews typically define a Population/Problem, Intervention, Comparator, and Outcome (a PICO criteria) of interest, and then retrieve, appraise and synthesize results from all reports of clinical trials that meet these criteria. Identifying PICO elements in the full-texts of trial reports is thus a critical yet time-consuming step in the systematic review process. We seek to expedite evidence synthesis by developing machine learning models to automatically extract sentences from articles relevant to PICO elements. Collecting a large corpus of training data for this task would be prohibitively expensive. Therefore, we derive distant supervision (DS) with which to train models using previously conducted reviews. DS entails heuristically deriving 'soft' labels from an available structured resource. However, we have access only to unstructured, free-text summaries of PICO elements for corresponding articles; we must derive from these the desired sentence-level annotations. To this end, we propose a novel method - supervised distant supervision (SDS) - that uses a small amount of direct supervision to better exploit a large corpus of distantly labeled instances by learning to pseudo-annotate articles using the available DS. We show that this approach tends to outperform existing methods with respect to automated PICO extraction.","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"17 ","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5065023/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Machine Learning Research","FirstCategoryId":"94","ListUrlMain":"","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Systematic reviews underpin Evidence Based Medicine (EBM) by addressing precise clinical questions via comprehensive synthesis of all relevant published evidence. Authors of systematic reviews typically define a Population/Problem, Intervention, Comparator, and Outcome (a PICO criteria) of interest, and then retrieve, appraise and synthesize results from all reports of clinical trials that meet these criteria. Identifying PICO elements in the full-texts of trial reports is thus a critical yet time-consuming step in the systematic review process. We seek to expedite evidence synthesis by developing machine learning models to automatically extract sentences from articles relevant to PICO elements. Collecting a large corpus of training data for this task would be prohibitively expensive. Therefore, we derive distant supervision (DS) with which to train models using previously conducted reviews. DS entails heuristically deriving 'soft' labels from an available structured resource. However, we have access only to unstructured, free-text summaries of PICO elements for corresponding articles; we must derive from these the desired sentence-level annotations. To this end, we propose a novel method - supervised distant supervision (SDS) - that uses a small amount of direct supervision to better exploit a large corpus of distantly labeled instances by learning to pseudo-annotate articles using the available DS. We show that this approach tends to outperform existing methods with respect to automated PICO extraction.

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用远距离监督从临床试验报告中提取 PICO 句子

系统综述是循证医学（EBM）的基础，它通过全面综合所有已发表的相关证据来解决精确的临床问题。系统性综述的作者通常会定义感兴趣的人群/问题、干预措施、比较者和结果（PICO 标准），然后检索、评估和综合符合这些标准的所有临床试验报告的结果。因此，识别试验报告全文中的 PICO 要素是系统综述过程中一个关键但耗时的步骤。我们试图通过开发机器学习模型来自动提取文章中与 PICO 要素相关的句子，从而加快证据合成的速度。为这项任务收集大量的训练数据将耗资巨大。因此，我们利用以前进行过的综述推导出远距离监督（DS）来训练模型。远距离监督需要从可用的结构化资源中启发式地推导出 "软 "标签。然而，我们只能获得相应文章的非结构化、自由文本的 PICO 要素摘要；我们必须从中推导出所需的句子级注释。为此，我们提出了一种新方法--远距离监督（SDS）--该方法使用少量的直接监督，通过学习使用可用的 DS 对文章进行伪标注，从而更好地利用大量远距离标注实例的语料库。我们的研究表明，这种方法在自动 PICO 提取方面往往优于现有方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Machine Learning Research 工程技术-计算机：人工智能

CiteScore

18.80

自引率

0.00%

发文量

审稿时长

3 months

期刊介绍： The Journal of Machine Learning Research (JMLR) provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning. All published papers are freely available online. JMLR has a commitment to rigorous yet rapid reviewing. JMLR seeks previously unpublished papers on machine learning that contain: new principled algorithms with sound empirical validation, and with justification of theoretical, psychological, or biological nature; experimental and/or theoretical studies yielding new insight into the design and behavior of learning in intelligent systems; accounts of applications of existing techniques that shed light on the strengths and weaknesses of the methods; formalization of new learning tasks (e.g., in the context of new applications) and of methods for assessing performance on those tasks; development of new analytical frameworks that advance theoretical studies of practical learning methods; computational models of data from natural learning systems at the behavioral or neural level; or extremely well-written surveys of existing work.

期刊最新文献

Flexible Bayesian Product Mixture Models for Vector Autoregressions. Convergence for nonconvex ADMM, with applications to CT imaging. Effect-Invariant Mechanisms for Policy Generalization. Nonparametric Regression for 3D Point Cloud Learning. Graphical Dirichlet Process for Clustering Non-Exchangeable Grouped Data.