Anli du Preez , Sanmitra Bhattacharya , Peter Beling , Edward Bowen
{"title":"使用机器学习的医疗保健索赔欺诈检测:系统回顾。","authors":"Anli du Preez , Sanmitra Bhattacharya , Peter Beling , Edward Bowen","doi":"10.1016/j.artmed.2024.103061","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective:</h3><div>Identifying fraud in healthcare programs is crucial, as an estimated 3%–10% of the total healthcare expenditures are lost to fraudulent activities. This study presents a systematic literature review of machine learning techniques applied to fraud detection in health insurance claims. We aim to analyze the data and methodologies documented in the literature over the past two decades, providing insights into research challenges and opportunities.</div></div><div><h3>Methods:</h3><div>We identified research studies on health insurance fraud detection using machine learning approaches from databases such as Google Scholar, Springer-Link journals, Elsevier, PubMed, Excerpta Medica Database (EMBASE), Scopus, the Association for Computing Machinery (ACM) Digital Library, and the Institute of Electrical and Electronics Engineers (IEEE) Xplore Digital Library. We included only articles that presented experimental results of machine learning-based approaches applied to healthcare claims. From the reviewed articles, 137 were selected for the final qualitative and quantitative analyses.</div></div><div><h3>Results:</h3><div>In recent years, there has been a surge in publications centered on the use of machine learning to detect health insurance fraud. Among these studies, those focused on the detection of fraud committed by healthcare providers was the most prevalent, followed by fraud committed by patients. A wide variety of machine learning algorithms are highlighted in these studies, ranging from unsupervised (41 studies) and supervised methods (94 studies), to hybrid approaches (12 studies). While traditional machine learning approaches remain dominant in this research area, the adoption of advanced deep learning techniques is on the rise. Considering the type of healthcare claims data used, 30 studies utilized private data sources, while the rest used publicly available datasets. Data from 16 countries were utilized, with a majority coming from the United States (96 studies), followed by China (11 studies) and Australia (5 studies).</div></div><div><h3>Discussion and Conclusion:</h3><div>Detecting fraud in healthcare claims using machine learning presents several challenges. These include inconsistent data, absence of data standardization and integration, privacy concerns, and a limited number of labeled fraudulent cases to train models on. Future work should focus on enhancing transparency in data preparation, promoting the sharing of fraud investigation outcomes by authorities, and developing benchmark datasets to enhance accessibility and comparability. Furthermore, innovative techniques in data sampling, feature encoding methods for training machine learning models, and exploring the latest advancements in deep learning can significantly advance research in health insurance fraud detection.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"160 ","pages":"Article 103061"},"PeriodicalIF":6.1000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fraud detection in healthcare claims using machine learning: A systematic review\",\"authors\":\"Anli du Preez , Sanmitra Bhattacharya , Peter Beling , Edward Bowen\",\"doi\":\"10.1016/j.artmed.2024.103061\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective:</h3><div>Identifying fraud in healthcare programs is crucial, as an estimated 3%–10% of the total healthcare expenditures are lost to fraudulent activities. This study presents a systematic literature review of machine learning techniques applied to fraud detection in health insurance claims. We aim to analyze the data and methodologies documented in the literature over the past two decades, providing insights into research challenges and opportunities.</div></div><div><h3>Methods:</h3><div>We identified research studies on health insurance fraud detection using machine learning approaches from databases such as Google Scholar, Springer-Link journals, Elsevier, PubMed, Excerpta Medica Database (EMBASE), Scopus, the Association for Computing Machinery (ACM) Digital Library, and the Institute of Electrical and Electronics Engineers (IEEE) Xplore Digital Library. We included only articles that presented experimental results of machine learning-based approaches applied to healthcare claims. From the reviewed articles, 137 were selected for the final qualitative and quantitative analyses.</div></div><div><h3>Results:</h3><div>In recent years, there has been a surge in publications centered on the use of machine learning to detect health insurance fraud. Among these studies, those focused on the detection of fraud committed by healthcare providers was the most prevalent, followed by fraud committed by patients. A wide variety of machine learning algorithms are highlighted in these studies, ranging from unsupervised (41 studies) and supervised methods (94 studies), to hybrid approaches (12 studies). While traditional machine learning approaches remain dominant in this research area, the adoption of advanced deep learning techniques is on the rise. Considering the type of healthcare claims data used, 30 studies utilized private data sources, while the rest used publicly available datasets. Data from 16 countries were utilized, with a majority coming from the United States (96 studies), followed by China (11 studies) and Australia (5 studies).</div></div><div><h3>Discussion and Conclusion:</h3><div>Detecting fraud in healthcare claims using machine learning presents several challenges. These include inconsistent data, absence of data standardization and integration, privacy concerns, and a limited number of labeled fraudulent cases to train models on. Future work should focus on enhancing transparency in data preparation, promoting the sharing of fraud investigation outcomes by authorities, and developing benchmark datasets to enhance accessibility and comparability. Furthermore, innovative techniques in data sampling, feature encoding methods for training machine learning models, and exploring the latest advancements in deep learning can significantly advance research in health insurance fraud detection.</div></div>\",\"PeriodicalId\":55458,\"journal\":{\"name\":\"Artificial Intelligence in Medicine\",\"volume\":\"160 \",\"pages\":\"Article 103061\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence in Medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0933365724003038\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0933365724003038","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Fraud detection in healthcare claims using machine learning: A systematic review
Objective:
Identifying fraud in healthcare programs is crucial, as an estimated 3%–10% of the total healthcare expenditures are lost to fraudulent activities. This study presents a systematic literature review of machine learning techniques applied to fraud detection in health insurance claims. We aim to analyze the data and methodologies documented in the literature over the past two decades, providing insights into research challenges and opportunities.
Methods:
We identified research studies on health insurance fraud detection using machine learning approaches from databases such as Google Scholar, Springer-Link journals, Elsevier, PubMed, Excerpta Medica Database (EMBASE), Scopus, the Association for Computing Machinery (ACM) Digital Library, and the Institute of Electrical and Electronics Engineers (IEEE) Xplore Digital Library. We included only articles that presented experimental results of machine learning-based approaches applied to healthcare claims. From the reviewed articles, 137 were selected for the final qualitative and quantitative analyses.
Results:
In recent years, there has been a surge in publications centered on the use of machine learning to detect health insurance fraud. Among these studies, those focused on the detection of fraud committed by healthcare providers was the most prevalent, followed by fraud committed by patients. A wide variety of machine learning algorithms are highlighted in these studies, ranging from unsupervised (41 studies) and supervised methods (94 studies), to hybrid approaches (12 studies). While traditional machine learning approaches remain dominant in this research area, the adoption of advanced deep learning techniques is on the rise. Considering the type of healthcare claims data used, 30 studies utilized private data sources, while the rest used publicly available datasets. Data from 16 countries were utilized, with a majority coming from the United States (96 studies), followed by China (11 studies) and Australia (5 studies).
Discussion and Conclusion:
Detecting fraud in healthcare claims using machine learning presents several challenges. These include inconsistent data, absence of data standardization and integration, privacy concerns, and a limited number of labeled fraudulent cases to train models on. Future work should focus on enhancing transparency in data preparation, promoting the sharing of fraud investigation outcomes by authorities, and developing benchmark datasets to enhance accessibility and comparability. Furthermore, innovative techniques in data sampling, feature encoding methods for training machine learning models, and exploring the latest advancements in deep learning can significantly advance research in health insurance fraud detection.
期刊介绍:
Artificial Intelligence in Medicine publishes original articles from a wide variety of interdisciplinary perspectives concerning the theory and practice of artificial intelligence (AI) in medicine, medically-oriented human biology, and health care.
Artificial intelligence in medicine may be characterized as the scientific discipline pertaining to research studies, projects, and applications that aim at supporting decision-based medical tasks through knowledge- and/or data-intensive computer-based solutions that ultimately support and improve the performance of a human care provider.