Using machine learning to improve anaphylaxis case identification in medical claims data.

IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES JAMIA Open Pub Date : 2023-10-27 eCollection Date: 2023-12-01 DOI:10.1093/jamiaopen/ooad090

Kamil Can Kural, Ilya Mazo, Mark Walderhaug, Luis Santana-Quintero, Konstantinos Karagiannis, Elaine E Thompson, Jeffrey A Kelman, Ravi Goud

{"title":"Using machine learning to improve anaphylaxis case identification in medical claims data.","authors":"Kamil Can Kural, Ilya Mazo, Mark Walderhaug, Luis Santana-Quintero, Konstantinos Karagiannis, Elaine E Thompson, Jeffrey A Kelman, Ravi Goud","doi":"10.1093/jamiaopen/ooad090","DOIUrl":null,"url":null,"abstract":"Objective: Anaphylaxis is a severe life-threatening allergic reaction, and its accurate identification in healthcare databases can harness the potential of \"Big Data\" for healthcare or public health purposes.Methods: This study used claims data obtained between October 1, 2015 and February 28, 2019 from the CMS database to examine the utility of machine learning in identifying incident anaphylaxis cases. We created a feature selection pipeline to identify critical features between different datasets. Then a variety of unsupervised and supervised methods were used (eg, Sammon mapping and eXtreme Gradient Boosting) to train models on datasets of differing data quality, which reflects the varying availability and potential rarity of ground truth data in medical databases.Results: Resulting machine learning model accuracies ranged between 47.7% and 94.4% when tested on ground truth data. Finally, we found new features to help experts enhance existing case-finding algorithms.Discussion: Developing precise algorithms to detect medical outcomes in claims can be a laborious and expensive process, particularly for conditions presented and coded diversely. We found it beneficial to filter out highly potent codes used for data curation to identify underlying patterns and features. To improve rule-based algorithms where necessary, researchers could use model explainers to determine noteworthy features, which could then be shared with experts and included in the algorithm.Conclusion: Our work suggests machine learning models can perform at similar levels as a previously published expert case-finding algorithm, while also having the potential to improve performance or streamline algorithm construction processes by identifying new relevant features for algorithm construction.","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"6 4","pages":"ooad090"},"PeriodicalIF":2.5000,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10611436/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooad090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/12/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: Anaphylaxis is a severe life-threatening allergic reaction, and its accurate identification in healthcare databases can harness the potential of "Big Data" for healthcare or public health purposes.

Methods: This study used claims data obtained between October 1, 2015 and February 28, 2019 from the CMS database to examine the utility of machine learning in identifying incident anaphylaxis cases. We created a feature selection pipeline to identify critical features between different datasets. Then a variety of unsupervised and supervised methods were used (eg, Sammon mapping and eXtreme Gradient Boosting) to train models on datasets of differing data quality, which reflects the varying availability and potential rarity of ground truth data in medical databases.

Results: Resulting machine learning model accuracies ranged between 47.7% and 94.4% when tested on ground truth data. Finally, we found new features to help experts enhance existing case-finding algorithms.

Discussion: Developing precise algorithms to detect medical outcomes in claims can be a laborious and expensive process, particularly for conditions presented and coded diversely. We found it beneficial to filter out highly potent codes used for data curation to identify underlying patterns and features. To improve rule-based algorithms where necessary, researchers could use model explainers to determine noteworthy features, which could then be shared with experts and included in the algorithm.

Conclusion: Our work suggests machine learning models can perform at similar levels as a previously published expert case-finding algorithm, while also having the potential to improve performance or streamline algorithm construction processes by identifying new relevant features for algorithm construction.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用机器学习改进医疗索赔数据中的过敏反应病例识别。

目的：过敏反应是一种严重的危及生命的过敏反应，在医疗保健数据库中准确识别它可以利用“大数据”的潜力用于医疗保健或公共卫生目的。方法：本研究使用2015年10月1日至2019年2月28日期间从CMS数据库获得的索赔数据，检验机器学习在识别事件过敏病例中的效用。我们创建了一个特征选择管道来识别不同数据集之间的关键特征。然后，使用各种无监督和有监督的方法（例如，Sammon映射和极限梯度增强）在不同数据质量的数据集上训练模型，这反映了医学数据库中地面实况数据的不同可用性和潜在的稀有性。结果：在实际数据上测试时，得到的机器学习模型准确率在47.7%和94.4%之间。最后，我们发现了新的功能来帮助专家增强现有的案例查找算法。讨论：开发精确的算法来检测索赔中的医疗结果可能是一个费力而昂贵的过程，尤其是对于呈现和编码不同的情况。我们发现过滤掉用于数据管理的高效代码以识别潜在的模式和特征是有益的。为了在必要时改进基于规则的算法，研究人员可以使用模型解释器来确定值得注意的特征，然后与专家共享并将其包含在算法中。结论：我们的工作表明，机器学习模型可以在与之前发表的专家案例发现算法类似的水平上执行，同时也有可能通过识别算法构建的新相关特征来提高性能或简化算法构建过程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊