Lama Abu Tahoun, Amit Shay Green, Tal Patalon, Yaron Dagan, Robert Moskovitch
{"title":"基于电子健康记录的睡眠呼吸暂停测试预测。","authors":"Lama Abu Tahoun, Amit Shay Green, Tal Patalon, Yaron Dagan, Robert Moskovitch","doi":"10.1016/j.jbi.2024.104737","DOIUrl":null,"url":null,"abstract":"<p><p>The identification of Obstructive Sleep Apnea (OSA) is done by a Polysomnography test which is often done in later ages. Being able to notify potential insured members at earlier ages is desirable. For that, we develop predictive models that rely on Electronic Health Records (EHR) and predict whether a person will go through a sleep apnea test after the age of 50. A major challenge is the variability in EHR records in various insured members over the years, which this study investigates as well in the context of controls matching, and prediction. Since there are many temporal variables, the RankLi method was introduced for temporal variable selection. This approach employs the t-test to calculate a divergence score for each temporal variable between the target classes. We also investigate here the need to consider the number of EHR records, as part of control matching, and whether modeling separately for subgroups according to the number of EHR records is more effective. For each prediction task, we trained 4 different classifiers including 1-CNN, LSTM, Random Forest, and Logistic Regression, on data until the age of 40 or 50, and on several numbers of temporal variables. Using the number of EHR records for control matching was found crucial, and using learning models for subsets of the population according to the number of EHR records they have was found more effective. The deep learning models, particularly the 1-CNN, achieved the highest balanced accuracy and AUC scores in both male and female groups. In the male group, the highest results were also observed at age 50 with 100 temporal variables, resulting in a balanced accuracy of 90% and an AUC of 93%.</p>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":" ","pages":"104737"},"PeriodicalIF":4.0000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sleep apnea test prediction based on Electronic Health Records.\",\"authors\":\"Lama Abu Tahoun, Amit Shay Green, Tal Patalon, Yaron Dagan, Robert Moskovitch\",\"doi\":\"10.1016/j.jbi.2024.104737\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The identification of Obstructive Sleep Apnea (OSA) is done by a Polysomnography test which is often done in later ages. Being able to notify potential insured members at earlier ages is desirable. For that, we develop predictive models that rely on Electronic Health Records (EHR) and predict whether a person will go through a sleep apnea test after the age of 50. A major challenge is the variability in EHR records in various insured members over the years, which this study investigates as well in the context of controls matching, and prediction. Since there are many temporal variables, the RankLi method was introduced for temporal variable selection. This approach employs the t-test to calculate a divergence score for each temporal variable between the target classes. We also investigate here the need to consider the number of EHR records, as part of control matching, and whether modeling separately for subgroups according to the number of EHR records is more effective. For each prediction task, we trained 4 different classifiers including 1-CNN, LSTM, Random Forest, and Logistic Regression, on data until the age of 40 or 50, and on several numbers of temporal variables. Using the number of EHR records for control matching was found crucial, and using learning models for subsets of the population according to the number of EHR records they have was found more effective. The deep learning models, particularly the 1-CNN, achieved the highest balanced accuracy and AUC scores in both male and female groups. In the male group, the highest results were also observed at age 50 with 100 temporal variables, resulting in a balanced accuracy of 90% and an AUC of 93%.</p>\",\"PeriodicalId\":15263,\"journal\":{\"name\":\"Journal of Biomedical Informatics\",\"volume\":\" \",\"pages\":\"104737\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Biomedical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jbi.2024.104737\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jbi.2024.104737","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Sleep apnea test prediction based on Electronic Health Records.
The identification of Obstructive Sleep Apnea (OSA) is done by a Polysomnography test which is often done in later ages. Being able to notify potential insured members at earlier ages is desirable. For that, we develop predictive models that rely on Electronic Health Records (EHR) and predict whether a person will go through a sleep apnea test after the age of 50. A major challenge is the variability in EHR records in various insured members over the years, which this study investigates as well in the context of controls matching, and prediction. Since there are many temporal variables, the RankLi method was introduced for temporal variable selection. This approach employs the t-test to calculate a divergence score for each temporal variable between the target classes. We also investigate here the need to consider the number of EHR records, as part of control matching, and whether modeling separately for subgroups according to the number of EHR records is more effective. For each prediction task, we trained 4 different classifiers including 1-CNN, LSTM, Random Forest, and Logistic Regression, on data until the age of 40 or 50, and on several numbers of temporal variables. Using the number of EHR records for control matching was found crucial, and using learning models for subsets of the population according to the number of EHR records they have was found more effective. The deep learning models, particularly the 1-CNN, achieved the highest balanced accuracy and AUC scores in both male and female groups. In the male group, the highest results were also observed at age 50 with 100 temporal variables, resulting in a balanced accuracy of 90% and an AUC of 93%.
期刊介绍:
The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.