使用机器学习的医疗保健索赔欺诈检测：系统回顾。

IF 6.1 2区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Artificial Intelligence in Medicine Pub Date : 2025-02-01 DOI:10.1016/j.artmed.2024.103061

Anli du Preez , Sanmitra Bhattacharya , Peter Beling , Edward Bowen

{"title":"使用机器学习的医疗保健索赔欺诈检测：系统回顾。","authors":"Anli du Preez , Sanmitra Bhattacharya , Peter Beling , Edward Bowen","doi":"10.1016/j.artmed.2024.103061","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective:</h3><div>Identifying fraud in healthcare programs is crucial, as an estimated 3%–10% of the total healthcare expenditures are lost to fraudulent activities. This study presents a systematic literature review of machine learning techniques applied to fraud detection in health insurance claims. We aim to analyze the data and methodologies documented in the literature over the past two decades, providing insights into research challenges and opportunities.</div></div><div><h3>Methods:</h3><div>We identified research studies on health insurance fraud detection using machine learning approaches from databases such as Google Scholar, Springer-Link journals, Elsevier, PubMed, Excerpta Medica Database (EMBASE), Scopus, the Association for Computing Machinery (ACM) Digital Library, and the Institute of Electrical and Electronics Engineers (IEEE) Xplore Digital Library. We included only articles that presented experimental results of machine learning-based approaches applied to healthcare claims. From the reviewed articles, 137 were selected for the final qualitative and quantitative analyses.</div></div><div><h3>Results:</h3><div>In recent years, there has been a surge in publications centered on the use of machine learning to detect health insurance fraud. Among these studies, those focused on the detection of fraud committed by healthcare providers was the most prevalent, followed by fraud committed by patients. A wide variety of machine learning algorithms are highlighted in these studies, ranging from unsupervised (41 studies) and supervised methods (94 studies), to hybrid approaches (12 studies). While traditional machine learning approaches remain dominant in this research area, the adoption of advanced deep learning techniques is on the rise. Considering the type of healthcare claims data used, 30 studies utilized private data sources, while the rest used publicly available datasets. Data from 16 countries were utilized, with a majority coming from the United States (96 studies), followed by China (11 studies) and Australia (5 studies).</div></div><div><h3>Discussion and Conclusion:</h3><div>Detecting fraud in healthcare claims using machine learning presents several challenges. These include inconsistent data, absence of data standardization and integration, privacy concerns, and a limited number of labeled fraudulent cases to train models on. Future work should focus on enhancing transparency in data preparation, promoting the sharing of fraud investigation outcomes by authorities, and developing benchmark datasets to enhance accessibility and comparability. Furthermore, innovative techniques in data sampling, feature encoding methods for training machine learning models, and exploring the latest advancements in deep learning can significantly advance research in health insurance fraud detection.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"160 ","pages":"Article 103061"},"PeriodicalIF":6.1000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fraud detection in healthcare claims using machine learning: A systematic review\",\"authors\":\"Anli du Preez , Sanmitra Bhattacharya , Peter Beling , Edward Bowen\",\"doi\":\"10.1016/j.artmed.2024.103061\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective:</h3><div>Identifying fraud in healthcare programs is crucial, as an estimated 3%–10% of the total healthcare expenditures are lost to fraudulent activities. This study presents a systematic literature review of machine learning techniques applied to fraud detection in health insurance claims. We aim to analyze the data and methodologies documented in the literature over the past two decades, providing insights into research challenges and opportunities.</div></div><div><h3>Methods:</h3><div>We identified research studies on health insurance fraud detection using machine learning approaches from databases such as Google Scholar, Springer-Link journals, Elsevier, PubMed, Excerpta Medica Database (EMBASE), Scopus, the Association for Computing Machinery (ACM) Digital Library, and the Institute of Electrical and Electronics Engineers (IEEE) Xplore Digital Library. We included only articles that presented experimental results of machine learning-based approaches applied to healthcare claims. From the reviewed articles, 137 were selected for the final qualitative and quantitative analyses.</div></div><div><h3>Results:</h3><div>In recent years, there has been a surge in publications centered on the use of machine learning to detect health insurance fraud. Among these studies, those focused on the detection of fraud committed by healthcare providers was the most prevalent, followed by fraud committed by patients. A wide variety of machine learning algorithms are highlighted in these studies, ranging from unsupervised (41 studies) and supervised methods (94 studies), to hybrid approaches (12 studies). While traditional machine learning approaches remain dominant in this research area, the adoption of advanced deep learning techniques is on the rise. Considering the type of healthcare claims data used, 30 studies utilized private data sources, while the rest used publicly available datasets. Data from 16 countries were utilized, with a majority coming from the United States (96 studies), followed by China (11 studies) and Australia (5 studies).</div></div><div><h3>Discussion and Conclusion:</h3><div>Detecting fraud in healthcare claims using machine learning presents several challenges. These include inconsistent data, absence of data standardization and integration, privacy concerns, and a limited number of labeled fraudulent cases to train models on. Future work should focus on enhancing transparency in data preparation, promoting the sharing of fraud investigation outcomes by authorities, and developing benchmark datasets to enhance accessibility and comparability. Furthermore, innovative techniques in data sampling, feature encoding methods for training machine learning models, and exploring the latest advancements in deep learning can significantly advance research in health insurance fraud detection.</div></div>\",\"PeriodicalId\":55458,\"journal\":{\"name\":\"Artificial Intelligence in Medicine\",\"volume\":\"160 \",\"pages\":\"Article 103061\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence in Medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0933365724003038\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0933365724003038","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

目的：识别医疗保健计划中的欺诈行为至关重要，因为估计有3%-10%的医疗保健总支出损失在欺诈活动中。本研究对机器学习技术在健康保险索赔欺诈检测中的应用进行了系统的文献综述。我们的目标是分析过去二十年来文献中记录的数据和方法，为研究挑战和机遇提供见解。方法：我们从b谷歌Scholar、Springer-Link期刊、Elsevier、PubMed、摘录医学数据库（EMBASE）、Scopus、计算机械协会（ACM）数字图书馆和电气与电子工程师协会（IEEE） Xplore数字图书馆等数据库中确定了使用机器学习方法进行医疗保险欺诈检测的研究。我们只收录了将基于机器学习的方法应用于医疗保健索赔的实验结果的文章。从审查的文章中，选择137篇进行最后的定性和定量分析。结果：近年来，以使用机器学习检测健康保险欺诈为中心的出版物激增。在这些研究中，关注医疗保健提供者欺诈行为的研究最为普遍，其次是患者欺诈行为。这些研究强调了各种各样的机器学习算法，从无监督（41项研究）和监督方法（94项研究）到混合方法（12项研究）。虽然传统的机器学习方法在这一研究领域仍然占主导地位，但先进的深度学习技术的采用正在上升。考虑到所使用的医疗保健索赔数据的类型，30项研究使用了私人数据源，而其余研究使用了公开可用的数据集。本研究使用了来自16个国家的数据，其中大部分来自美国（96项研究），其次是中国（11项研究）和澳大利亚（5项研究）。讨论和结论：使用机器学习检测医疗保健索赔中的欺诈存在几个挑战。这些问题包括不一致的数据、缺乏数据标准化和集成、隐私问题以及用于训练模型的标记欺诈案例数量有限。未来的工作应侧重于提高数据准备的透明度，促进当局共享欺诈调查结果，并开发基准数据集以提高可及性和可比性。此外，数据采样的创新技术、用于训练机器学习模型的特征编码方法以及探索深度学习的最新进展可以显著推进医疗保险欺诈检测的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fraud detection in healthcare claims using machine learning: A systematic review

Objective:

Identifying fraud in healthcare programs is crucial, as an estimated 3%–10% of the total healthcare expenditures are lost to fraudulent activities. This study presents a systematic literature review of machine learning techniques applied to fraud detection in health insurance claims. We aim to analyze the data and methodologies documented in the literature over the past two decades, providing insights into research challenges and opportunities.

Methods:

We identified research studies on health insurance fraud detection using machine learning approaches from databases such as Google Scholar, Springer-Link journals, Elsevier, PubMed, Excerpta Medica Database (EMBASE), Scopus, the Association for Computing Machinery (ACM) Digital Library, and the Institute of Electrical and Electronics Engineers (IEEE) Xplore Digital Library. We included only articles that presented experimental results of machine learning-based approaches applied to healthcare claims. From the reviewed articles, 137 were selected for the final qualitative and quantitative analyses.

Results:

In recent years, there has been a surge in publications centered on the use of machine learning to detect health insurance fraud. Among these studies, those focused on the detection of fraud committed by healthcare providers was the most prevalent, followed by fraud committed by patients. A wide variety of machine learning algorithms are highlighted in these studies, ranging from unsupervised (41 studies) and supervised methods (94 studies), to hybrid approaches (12 studies). While traditional machine learning approaches remain dominant in this research area, the adoption of advanced deep learning techniques is on the rise. Considering the type of healthcare claims data used, 30 studies utilized private data sources, while the rest used publicly available datasets. Data from 16 countries were utilized, with a majority coming from the United States (96 studies), followed by China (11 studies) and Australia (5 studies).

Discussion and Conclusion:

Detecting fraud in healthcare claims using machine learning presents several challenges. These include inconsistent data, absence of data standardization and integration, privacy concerns, and a limited number of labeled fraudulent cases to train models on. Future work should focus on enhancing transparency in data preparation, promoting the sharing of fraud investigation outcomes by authorities, and developing benchmark datasets to enhance accessibility and comparability. Furthermore, innovative techniques in data sampling, feature encoding methods for training machine learning models, and exploring the latest advancements in deep learning can significantly advance research in health insurance fraud detection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Artificial Intelligence in Medicine 工程技术-工程：生物医学

CiteScore

15.00

自引率

2.70%

发文量

143

审稿时长

6.3 months

期刊介绍： Artificial Intelligence in Medicine publishes original articles from a wide variety of interdisciplinary perspectives concerning the theory and practice of artificial intelligence (AI) in medicine, medically-oriented human biology, and health care. Artificial intelligence in medicine may be characterized as the scientific discipline pertaining to research studies, projects, and applications that aim at supporting decision-based medical tasks through knowledge- and/or data-intensive computer-based solutions that ultimately support and improve the performance of a human care provider.