Combining an expert-based medical entity recognizer to a machine-learning system: methods and a case study.

Biomedical informatics insights Pub Date : 2013-08-01 eCollection Date: 2013-01-01 DOI:10.4137/BII.S11770
Pierre Zweigenbaum, Thomas Lavergne, Natalia Grabar, Thierry Hamon, Sophie Rosset, Cyril Grouin
{"title":"Combining an expert-based medical entity recognizer to a machine-learning system: methods and a case study.","authors":"Pierre Zweigenbaum,&nbsp;Thomas Lavergne,&nbsp;Natalia Grabar,&nbsp;Thierry Hamon,&nbsp;Sophie Rosset,&nbsp;Cyril Grouin","doi":"10.4137/BII.S11770","DOIUrl":null,"url":null,"abstract":"<p><p>Medical entity recognition is currently generally performed by data-driven methods based on supervised machine learning. Expert-based systems, where linguistic and domain expertise are directly provided to the system are often combined with data-driven systems. We present here a case study where an existing expert-based medical entity recognition system, Ogmios, is combined with a data-driven system, Caramba, based on a linear-chain Conditional Random Field (CRF) classifier. Our case study specifically highlights the risk of overfitting incurred by an expert-based system. We observe that it prevents the combination of the 2 systems from obtaining improvements in precision, recall, or F-measure, and analyze the underlying mechanisms through a post-hoc feature-level analysis. Wrapping the expert-based system alone as attributes input to a CRF classifier does boost its F-measure from 0.603 to 0.710, bringing it on par with the data-driven system. The generalization of this method remains to be further investigated. </p>","PeriodicalId":88397,"journal":{"name":"Biomedical informatics insights","volume":"6 Suppl 1","pages":"51-62"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4137/BII.S11770","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical informatics insights","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4137/BII.S11770","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2013/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Medical entity recognition is currently generally performed by data-driven methods based on supervised machine learning. Expert-based systems, where linguistic and domain expertise are directly provided to the system are often combined with data-driven systems. We present here a case study where an existing expert-based medical entity recognition system, Ogmios, is combined with a data-driven system, Caramba, based on a linear-chain Conditional Random Field (CRF) classifier. Our case study specifically highlights the risk of overfitting incurred by an expert-based system. We observe that it prevents the combination of the 2 systems from obtaining improvements in precision, recall, or F-measure, and analyze the underlying mechanisms through a post-hoc feature-level analysis. Wrapping the expert-based system alone as attributes input to a CRF classifier does boost its F-measure from 0.603 to 0.710, bringing it on par with the data-driven system. The generalization of this method remains to be further investigated.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
将基于专家的医疗实体识别器与机器学习系统相结合:方法和案例研究。
医疗实体识别目前一般是通过基于监督机器学习的数据驱动方法来实现的。直接向系统提供语言和领域专业知识的基于专家的系统通常与数据驱动的系统相结合。我们在这里提出了一个案例研究,其中现有的基于专家的医疗实体识别系统Ogmios与基于线性链条件随机场(CRF)分类器的数据驱动系统Caramba相结合。我们的案例研究特别强调了基于专家的系统所带来的过拟合风险。我们观察到,它阻止了两个系统的组合在精度、召回率或f测量方面的提高,并通过事后特征级分析分析了潜在的机制。将基于专家的系统单独包装为CRF分类器的属性输入确实将其f度量从0.603提高到0.710,使其与数据驱动的系统相当。该方法的推广还有待进一步研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Data-Driven Approach to Predicting Septic Shock in the Intensive Care Unit A Genome Model to Explain Major Features of Neurodevelopmental Disorders in Newborns. Mathematical Model for Computer-Assisted Modification of Medication Dosing Rules. Applying Supervised Machine Learning to Identify Which Patient Characteristics Identify the Highest Rates of Mortality Post-Interhospital Transfer. Coalitional Game Theory Facilitates Identification of Non-Coding Variants Associated With Autism.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1