Combining an expert-based medical entity recognizer to a machine-learning system: methods and a case study.

Biomedical informatics insights Pub Date : 2013-08-01 eCollection Date: 2013-01-01 DOI:10.4137/BII.S11770

Pierre Zweigenbaum, Thomas Lavergne, Natalia Grabar, Thierry Hamon, Sophie Rosset, Cyril Grouin

{"title":"Combining an expert-based medical entity recognizer to a machine-learning system: methods and a case study.","authors":"Pierre Zweigenbaum, Thomas Lavergne, Natalia Grabar, Thierry Hamon, Sophie Rosset, Cyril Grouin","doi":"10.4137/BII.S11770","DOIUrl":null,"url":null,"abstract":"<p><p>Medical entity recognition is currently generally performed by data-driven methods based on supervised machine learning. Expert-based systems, where linguistic and domain expertise are directly provided to the system are often combined with data-driven systems. We present here a case study where an existing expert-based medical entity recognition system, Ogmios, is combined with a data-driven system, Caramba, based on a linear-chain Conditional Random Field (CRF) classifier. Our case study specifically highlights the risk of overfitting incurred by an expert-based system. We observe that it prevents the combination of the 2 systems from obtaining improvements in precision, recall, or F-measure, and analyze the underlying mechanisms through a post-hoc feature-level analysis. Wrapping the expert-based system alone as attributes input to a CRF classifier does boost its F-measure from 0.603 to 0.710, bringing it on par with the data-driven system. The generalization of this method remains to be further investigated. </p>","PeriodicalId":88397,"journal":{"name":"Biomedical informatics insights","volume":"6 Suppl 1","pages":"51-62"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4137/BII.S11770","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical informatics insights","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4137/BII.S11770","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2013/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Medical entity recognition is currently generally performed by data-driven methods based on supervised machine learning. Expert-based systems, where linguistic and domain expertise are directly provided to the system are often combined with data-driven systems. We present here a case study where an existing expert-based medical entity recognition system, Ogmios, is combined with a data-driven system, Caramba, based on a linear-chain Conditional Random Field (CRF) classifier. Our case study specifically highlights the risk of overfitting incurred by an expert-based system. We observe that it prevents the combination of the 2 systems from obtaining improvements in precision, recall, or F-measure, and analyze the underlying mechanisms through a post-hoc feature-level analysis. Wrapping the expert-based system alone as attributes input to a CRF classifier does boost its F-measure from 0.603 to 0.710, bringing it on par with the data-driven system. The generalization of this method remains to be further investigated.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

将基于专家的医疗实体识别器与机器学习系统相结合:方法和案例研究。

医疗实体识别目前一般是通过基于监督机器学习的数据驱动方法来实现的。直接向系统提供语言和领域专业知识的基于专家的系统通常与数据驱动的系统相结合。我们在这里提出了一个案例研究，其中现有的基于专家的医疗实体识别系统Ogmios与基于线性链条件随机场(CRF)分类器的数据驱动系统Caramba相结合。我们的案例研究特别强调了基于专家的系统所带来的过拟合风险。我们观察到，它阻止了两个系统的组合在精度、召回率或f测量方面的提高，并通过事后特征级分析分析了潜在的机制。将基于专家的系统单独包装为CRF分类器的属性输入确实将其f度量从0.603提高到0.710，使其与数据驱动的系统相当。该方法的推广还有待进一步研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Biomedical informatics insights

自引率

0.00%

发文量