Case study - Feature engineering inspired by domain experts on real world medical data

Olof Björneld , Martin Carlsson , Welf Löwe
{"title":"Case study - Feature engineering inspired by domain experts on real world medical data","authors":"Olof Björneld ,&nbsp;Martin Carlsson ,&nbsp;Welf Löwe","doi":"10.1016/j.ibmed.2023.100110","DOIUrl":null,"url":null,"abstract":"<div><p>To perform data mining projects for knowledge discovery based on health data produced in a daily health care stored in electronic health records (EHR) can be time consuming. This study exemplifies that the involvement of a data scientist improves classification performances. We have performed a case study that comprises two real world medical research projects, comparing feature engineering and knowledge discovery based on classification performance. Project (P1) comprised 82,742 patients with the research question “Can we predict patient falls by use of EHR data” and the second project (P2) included 23,396 patients with the focus on “Negative side effects of antiepileptic drug consumption on bone structure”.</p><p>The results concluded three salient results. (i) It is valuable for medical researchers to involve a data scientist when medical research based on real world medical data is performed. The findings were justified with an analysis of classification metrics when iteratively engineered features were used. The features were generated from domain experts and computer scientists in collaboration with medical researchers. We gave this process the name domain knowledge-driven feature engineering (KDFE).</p><p>To evaluate the classification performance the metric area under the receiver operating characteristic curve (AUROC) was used. (ii) Domain experts are benefited in quantitative terms by KDFE. When KDFE was compared to baseline, the average classification performance measured by AUROC for the engineered features rose for P1 from 0.62 to 0.82 and for P2 from 0.61 to 0.89 (p-values &lt;&lt; 0.001). (iii) The engineered features were represented in a systematic structure, which is the foundation of a theoretical model for automated KDFE (aKDFE).</p><p>To our knowledge, this is the first study that proves that via quantitative measures KDFE adds value to real-world. However, the method is not limited to the medical domain. Other areas with similar data properties should also benefit from KDFE.</p></div>","PeriodicalId":73399,"journal":{"name":"Intelligence-based medicine","volume":"8 ","pages":"Article 100110"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligence-based medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666521223000248","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

To perform data mining projects for knowledge discovery based on health data produced in a daily health care stored in electronic health records (EHR) can be time consuming. This study exemplifies that the involvement of a data scientist improves classification performances. We have performed a case study that comprises two real world medical research projects, comparing feature engineering and knowledge discovery based on classification performance. Project (P1) comprised 82,742 patients with the research question “Can we predict patient falls by use of EHR data” and the second project (P2) included 23,396 patients with the focus on “Negative side effects of antiepileptic drug consumption on bone structure”.

The results concluded three salient results. (i) It is valuable for medical researchers to involve a data scientist when medical research based on real world medical data is performed. The findings were justified with an analysis of classification metrics when iteratively engineered features were used. The features were generated from domain experts and computer scientists in collaboration with medical researchers. We gave this process the name domain knowledge-driven feature engineering (KDFE).

To evaluate the classification performance the metric area under the receiver operating characteristic curve (AUROC) was used. (ii) Domain experts are benefited in quantitative terms by KDFE. When KDFE was compared to baseline, the average classification performance measured by AUROC for the engineered features rose for P1 from 0.62 to 0.82 and for P2 from 0.61 to 0.89 (p-values << 0.001). (iii) The engineered features were represented in a systematic structure, which is the foundation of a theoretical model for automated KDFE (aKDFE).

To our knowledge, this is the first study that proves that via quantitative measures KDFE adds value to real-world. However, the method is not limited to the medical domain. Other areas with similar data properties should also benefit from KDFE.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
案例研究-特征工程的灵感来自领域专家对现实世界的医疗数据
基于存储在电子健康记录(EHR)中的日常医疗保健中产生的健康数据来执行用于知识发现的数据挖掘项目可能是耗时的。这项研究表明,数据科学家的参与可以提高分类性能。我们进行了一个案例研究,包括两个真实世界的医学研究项目,比较了特征工程和基于分类性能的知识发现。项目(P1)包括82742名患者,研究问题是“我们能利用EHR数据预测患者跌倒吗”,第二个项目(P2)包括23396名患者,重点是“服用抗癫痫药物对骨结构的负面副作用”。(i) 当基于真实世界医学数据进行医学研究时,让数据科学家参与进来对医学研究人员来说是很有价值的。当使用迭代设计的特征时,通过对分类指标的分析来证明这些发现是合理的。这些特征是由领域专家和计算机科学家与医学研究人员合作生成的。我们将这一过程称为名称域知识驱动特征工程(KDFE)。为了评估分类性能,我们使用了接收器工作特性曲线下的度量区域(AUROC)。(ii)KDFE在数量方面使领域专家受益。当将KDFE与基线进行比较时,AUROC测量的工程特征的平均分类性能P1从0.62上升到0.82,P2从0.61上升到0.89(p值<;<;0.001)。(iii)工程特征以系统结构表示,这是自动化KDFE(aKDFE)理论模型的基础。据我们所知,这是第一项通过定量测量证明KDFE为现实世界增加价值的研究。然而,该方法并不局限于医学领域。具有类似数据属性的其他区域也应该从KDFE中受益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Intelligence-based medicine
Intelligence-based medicine Health Informatics
CiteScore
5.00
自引率
0.00%
发文量
0
审稿时长
187 days
期刊最新文献
Artificial intelligence in child development monitoring: A systematic review on usage, outcomes and acceptance Automatic characterization of cerebral MRI images for the detection of autism spectrum disorders DOTnet 2.0: Deep learning network for diffuse optical tomography image reconstruction Artificial intelligence in child development monitoring: A systematic review on usage, outcomes and acceptance Clustering polycystic ovary syndrome laboratory results extracted from a large internet forum with machine learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1