Enrico Manzini , Bogdan Vlacho , Josep Franch-Nadal , Joan Escudero , Ana Génova , Elisenda Reixach , Erich Andrés , Israel Pizarro , Dídac Mauricio , Alexandre Perera-Lluna
{"title":"A deep attention-based encoder for the prediction of type 2 diabetes longitudinal outcomes from routinely collected health care data","authors":"Enrico Manzini , Bogdan Vlacho , Josep Franch-Nadal , Joan Escudero , Ana Génova , Elisenda Reixach , Erich Andrés , Israel Pizarro , Dídac Mauricio , Alexandre Perera-Lluna","doi":"10.1016/j.eswa.2025.126876","DOIUrl":null,"url":null,"abstract":"<div><div>Recent evidence indicates that Type 2 Diabetes Mellitus (T2DM) is a complex and highly heterogeneous disease involving various pathophysiological and genetic pathways, which presents clinicians with challenges in disease management. While deep learning models have made significant progress in helping practitioners manage T2DM treatments, several important limitations persist. In this paper we propose DARE, a model based on the transformer encoder, designed for analyzing longitudinal heterogeneous diabetes data. The model can be easily fine-tuned for various clinical prediction tasks, enabling a computational approach to assist clinicians in the management of the disease. We trained DARE using data from over 200,000 diabetic subjects from the primary healthcare SIDIAP database, which includes diagnosis and drug codes, along with various clinical and analytical measurements. After an unsupervised pre-training phase, we fine-tuned the model for predicting three specific clinical outcomes: i) occurrence of comorbidity, ii) achievement of target glycemic control (defined as glycated hemoglobin <span><math><mrow><mo><</mo><mn>7</mn><mtext>%</mtext></mrow></math></span>) and iii) changes in glucose-lowering treatment. In cross-validation, the embedding vectors generated by DARE outperformed those from baseline models (comorbidities prediction task <span><math><mrow><mi>A</mi><mi>U</mi><mi>C</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>88</mn></mrow></math></span>, treatment prediction task <span><math><mrow><mi>A</mi><mi>U</mi><mi>C</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>91</mn></mrow></math></span>, HbA1c target prediction task <span><math><mrow><mi>A</mi><mi>U</mi><mi>C</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>82</mn></mrow></math></span>). Our findings suggest that attention-based encoders improve results with respect to different deep learning and classical baseline models when used to predict different clinical relevant outcomes from T2DM longitudinal data.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"274 ","pages":"Article 126876"},"PeriodicalIF":7.5000,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425004981","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Recent evidence indicates that Type 2 Diabetes Mellitus (T2DM) is a complex and highly heterogeneous disease involving various pathophysiological and genetic pathways, which presents clinicians with challenges in disease management. While deep learning models have made significant progress in helping practitioners manage T2DM treatments, several important limitations persist. In this paper we propose DARE, a model based on the transformer encoder, designed for analyzing longitudinal heterogeneous diabetes data. The model can be easily fine-tuned for various clinical prediction tasks, enabling a computational approach to assist clinicians in the management of the disease. We trained DARE using data from over 200,000 diabetic subjects from the primary healthcare SIDIAP database, which includes diagnosis and drug codes, along with various clinical and analytical measurements. After an unsupervised pre-training phase, we fine-tuned the model for predicting three specific clinical outcomes: i) occurrence of comorbidity, ii) achievement of target glycemic control (defined as glycated hemoglobin ) and iii) changes in glucose-lowering treatment. In cross-validation, the embedding vectors generated by DARE outperformed those from baseline models (comorbidities prediction task , treatment prediction task , HbA1c target prediction task ). Our findings suggest that attention-based encoders improve results with respect to different deep learning and classical baseline models when used to predict different clinical relevant outcomes from T2DM longitudinal data.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.