Pub Date : 2024-11-12eCollection Date: 2024-12-01DOI: 10.1007/s13755-024-00311-9
Manlu He, Erwin M Bakker, Michael S Lew
Depression is one of the most prevalent mental conditions which could impair people's productivity and lead to severe consequences. The diagnosis of this disease is complex as it often relies on a physician's subjective interview-based screening. The aim of our work is to propose deep learning models for automatic depression detection by using different data modalities, which could assist in the diagnosis of depression. Current works on automatic depression detection mostly are tested on a single dataset, which might lack robustness, flexibility and scalability. To alleviate this problem, we design a novel Graph Neural Network-enhanced Transformer model named DePressionDetect Net (DPD Net) that leverages textual, audio and visual features and can work under two different application settings: the clinical setting and the social media setting. The model consists of a unimodal encoder module for encoding single modality, a multimodal encoder module for integrating the multimodal information, and a detection module for producing the final prediction. We also propose a model named DePressionDetect-with-EEG Net (DPD-E Net) to incorporate Electroencephalography (EEG) signals and speech data for depression detection. Experiments across four benchmark datasets show that DPD Net and DPD-E Net can outperform the state-of-the-art models on three datasets (i.e., E-DAIC dataset, Twitter depression dataset and MODMA dataset), and achieve competitive performance on the fourth one (i.e., D-vlog dataset). Ablation studies demonstrate the advantages of the proposed modules and the effectiveness of combining diverse modalities for automatic depression detection.
抑郁症是最常见的精神疾病之一,会损害人们的工作效率并导致严重后果。这种疾病的诊断非常复杂,因为它通常依赖于医生基于访谈的主观筛查。我们的工作旨在通过使用不同的数据模式,为抑郁症的自动检测提出深度学习模型,从而为抑郁症的诊断提供帮助。目前的抑郁症自动检测工作大多在单一数据集上进行测试,可能缺乏鲁棒性、灵活性和可扩展性。为了缓解这一问题,我们设计了一种名为 "抑郁检测网络"(DePressionDetect Net,DPD Net)的新型图神经网络增强变换器模型,该模型利用文本、音频和视觉特征,可在两种不同的应用环境下工作:临床环境和社交媒体环境。该模型由用于编码单一模态的单模态编码器模块、用于整合多模态信息的多模态编码器模块和用于生成最终预测结果的检测模块组成。我们还提出了一个名为 "DePressionDetect-with-EEG Net"(DPD-E Net)的模型,用于结合脑电图(EEG)信号和语音数据进行抑郁检测。四个基准数据集的实验表明,DPD Net 和 DPD-E Net 在三个数据集(即 E-DAIC 数据集、Twitter 抑郁症数据集和 MODMA 数据集)上的表现优于最先进的模型,并在第四个数据集(即 D-vlog 数据集)上取得了具有竞争力的性能。消融研究证明了所提模块的优势,以及结合多种模式进行抑郁自动检测的有效性。
{"title":"DPD (DePression Detection) Net: a deep neural network for multimodal depression detection.","authors":"Manlu He, Erwin M Bakker, Michael S Lew","doi":"10.1007/s13755-024-00311-9","DOIUrl":"10.1007/s13755-024-00311-9","url":null,"abstract":"<p><p>Depression is one of the most prevalent mental conditions which could impair people's productivity and lead to severe consequences. The diagnosis of this disease is complex as it often relies on a physician's subjective interview-based screening. The aim of our work is to propose deep learning models for automatic depression detection by using different data modalities, which could assist in the diagnosis of depression. Current works on automatic depression detection mostly are tested on a single dataset, which might lack robustness, flexibility and scalability. To alleviate this problem, we design a novel Graph Neural Network-enhanced Transformer model named DePressionDetect Net (DPD Net) that leverages textual, audio and visual features and can work under two different application settings: the clinical setting and the social media setting. The model consists of a unimodal encoder module for encoding single modality, a multimodal encoder module for integrating the multimodal information, and a detection module for producing the final prediction. We also propose a model named DePressionDetect-with-EEG Net (DPD-E Net) to incorporate Electroencephalography (EEG) signals and speech data for depression detection. Experiments across four benchmark datasets show that DPD Net and DPD-E Net can outperform the state-of-the-art models on three datasets (i.e., E-DAIC dataset, Twitter depression dataset and MODMA dataset), and achieve competitive performance on the fourth one (i.e., D-vlog dataset). Ablation studies demonstrate the advantages of the proposed modules and the effectiveness of combining diverse modalities for automatic depression detection.</p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"12 1","pages":"53"},"PeriodicalIF":3.4,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11557813/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142630203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-12eCollection Date: 2024-12-01DOI: 10.1007/s13755-024-00312-8
Ruichen Cong, Ou Deng, Shoji Nishimura, Atsushi Ogihara, Qun Jin
Purpose: Recent advancements in information technology and wearable devices have revolutionized healthcare through health data analysis. Identifying significant relationships in complex health data enhances healthcare and public health strategies. In health analytics, causal graphs are important for investigating the relationships among health features. However, they face challenges owing to the large number of features, complexity, and computational demands. Feature selection methods are useful for addressing these challenges. In this paper, we present a framework for multiple feature selection based on an optimization strategy for causal analysis of health data.
Methods: We select multiple health features based on an optimization strategy. First, we define a Weighted Total Score (WTS) index to assess the feature importance after the combination of different feature selection methods. To explore an optimal set of weights for each method, we design a multiple feature selection algorithm integrated with the greedy algorithm. The features are then ranked according to their WTS, enabling selection of the most important ones. After that, causal graphs are constructed based on the selected features, and the statistical significance of the paths is assessed. Furthermore, evaluation experiments are conducted on an experiment dataset collected for this study and an open dataset for diabetes.
Results: The results demonstrate that our approach outperforms baseline models by reducing the number of features while improving model performance. Moreover, the statistical significance of the relationships between features uncovered through causal graphs is validated for both datasets.
Conclusion: By using the proposed framework for multiple feature selection based on an optimization strategy for causal analysis, the number of features is reduced and the causal relationships are uncovered and validated.
{"title":"Multiple feature selection based on an optimization strategy for causal analysis of health data.","authors":"Ruichen Cong, Ou Deng, Shoji Nishimura, Atsushi Ogihara, Qun Jin","doi":"10.1007/s13755-024-00312-8","DOIUrl":"10.1007/s13755-024-00312-8","url":null,"abstract":"<p><strong>Purpose: </strong>Recent advancements in information technology and wearable devices have revolutionized healthcare through health data analysis. Identifying significant relationships in complex health data enhances healthcare and public health strategies. In health analytics, causal graphs are important for investigating the relationships among health features. However, they face challenges owing to the large number of features, complexity, and computational demands. Feature selection methods are useful for addressing these challenges. In this paper, we present a framework for multiple feature selection based on an optimization strategy for causal analysis of health data.</p><p><strong>Methods: </strong>We select multiple health features based on an optimization strategy. First, we define a Weighted Total Score (WTS) index to assess the feature importance after the combination of different feature selection methods. To explore an optimal set of weights for each method, we design a multiple feature selection algorithm integrated with the greedy algorithm. The features are then ranked according to their WTS, enabling selection of the most important ones. After that, causal graphs are constructed based on the selected features, and the statistical significance of the paths is assessed. Furthermore, evaluation experiments are conducted on an experiment dataset collected for this study and an open dataset for diabetes.</p><p><strong>Results: </strong>The results demonstrate that our approach outperforms baseline models by reducing the number of features while improving model performance. Moreover, the statistical significance of the relationships between features uncovered through causal graphs is validated for both datasets.</p><p><strong>Conclusion: </strong>By using the proposed framework for multiple feature selection based on an optimization strategy for causal analysis, the number of features is reduced and the causal relationships are uncovered and validated.</p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"12 1","pages":"52"},"PeriodicalIF":4.7,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11554952/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142630210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-11eCollection Date: 2024-12-01DOI: 10.1007/s13755-024-00310-w
Zhenzhen Du, Shuang Wang, Ouzhou Yang, Juan He, Yujie Yang, Jing Zheng, Honglei Zhao, Yunpeng Cai
Purpose: Dyslipidemia poses a significant risk for the progression to cardiovascular diseases. Despite the identification of numerous risk factors and the proposal of various risk scales, there is still an urgent need for effective predictive models for the onset of cardiovascular diseases in the hyperlipidemic population, which are essential for the prevention of CVD.
Methods: We carried out a retrospective cohort study with 23,548 hyperlipidemia patients in Shenzhen Health Information Big Data Platform, including 11,723 CVD onset cases in a 3-year follow-up. The population was randomly divided into 70% as an independent training dataset and remaining 30% as test set. Four distinct machine-learning algorithms were implemented on the training dataset with the aim of developing highly accurate predictive models, and their performance was subsequently benchmarked against conventional risk assessment scales. An ablation study was also carried out to analyze the impact of individual risk factors to model performance.
Results: The non-linear algorithm, LightGBM, excelled in forecasting the incidence of cardiovascular disease within 3 years, achieving an area under the 'receiver operating characteristic curve' (AUROC) of 0.883. This performance surpassed that of the conventional logistic regression model, which had an AUROC of 0.725, on identical datasets. Concurrently, in direct comparative analyses, machine-learning approaches have notably outperformed the three traditional risk assessment methods within their respective applicable populations. These include the Framingham cardiovascular disease risk score, 2019 ESC/EAS guidelines for the management of dyslipidemia and the 2016 Chinese recommendations for the management of dyslipidemia in adults. Further analysis of risk factors showed that the variability of blood lipid levels and remnant cholesterol played an important role in indicating an increased risk of CVD.
Conclusions: We have shown that the application of machine-learning techniques significantly enhances the precision of cardiovascular risk forecasting among hyperlipidemic patients, addressing the critical issue of disease prediction's heterogeneity and non-linearity. Furthermore, some recently-suggested biomarkers, including blood lipid variability and remnant cholesterol are also important predictors of cardiovascular events, suggesting the importance of continuous lipid monitoring and healthcare profiling through big data platforms.
{"title":"Machine-learning-based prediction of cardiovascular events for hyperlipidemia population with lipid variability and remnant cholesterol as biomarkers.","authors":"Zhenzhen Du, Shuang Wang, Ouzhou Yang, Juan He, Yujie Yang, Jing Zheng, Honglei Zhao, Yunpeng Cai","doi":"10.1007/s13755-024-00310-w","DOIUrl":"10.1007/s13755-024-00310-w","url":null,"abstract":"<p><strong>Purpose: </strong>Dyslipidemia poses a significant risk for the progression to cardiovascular diseases. Despite the identification of numerous risk factors and the proposal of various risk scales, there is still an urgent need for effective predictive models for the onset of cardiovascular diseases in the hyperlipidemic population, which are essential for the prevention of CVD.</p><p><strong>Methods: </strong>We carried out a retrospective cohort study with 23,548 hyperlipidemia patients in Shenzhen Health Information Big Data Platform, including 11,723 CVD onset cases in a 3-year follow-up. The population was randomly divided into 70% as an independent training dataset and remaining 30% as test set. Four distinct machine-learning algorithms were implemented on the training dataset with the aim of developing highly accurate predictive models, and their performance was subsequently benchmarked against conventional risk assessment scales. An ablation study was also carried out to analyze the impact of individual risk factors to model performance.</p><p><strong>Results: </strong>The non-linear algorithm, LightGBM, excelled in forecasting the incidence of cardiovascular disease within 3 years, achieving an area under the 'receiver operating characteristic curve' (AUROC) of 0.883. This performance surpassed that of the conventional logistic regression model, which had an AUROC of 0.725, on identical datasets. Concurrently, in direct comparative analyses, machine-learning approaches have notably outperformed the three traditional risk assessment methods within their respective applicable populations. These include the Framingham cardiovascular disease risk score, 2019 ESC/EAS guidelines for the management of dyslipidemia and the 2016 Chinese recommendations for the management of dyslipidemia in adults. Further analysis of risk factors showed that the variability of blood lipid levels and remnant cholesterol played an important role in indicating an increased risk of CVD.</p><p><strong>Conclusions: </strong>We have shown that the application of machine-learning techniques significantly enhances the precision of cardiovascular risk forecasting among hyperlipidemic patients, addressing the critical issue of disease prediction's heterogeneity and non-linearity. Furthermore, some recently-suggested biomarkers, including blood lipid variability and remnant cholesterol are also important predictors of cardiovascular events, suggesting the importance of continuous lipid monitoring and healthcare profiling through big data platforms.</p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"12 1","pages":"51"},"PeriodicalIF":3.4,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11551092/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142630206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-23eCollection Date: 2024-12-01DOI: 10.1007/s13755-024-00308-4
Ramón Rueda, Esteban Fabello, Tatiana Silva, Samuel Genzor, Jan Mizera, Ladislav Stanke
Purpose: Chronic obstructive pulmonary disease (COPD) is a prevalent and preventable condition that typically worsens over time. Acute exacerbations of COPD significantly impact disease progression, underscoring the importance of prevention efforts. This observational study aimed to achieve two main objectives: (1) identify patients at risk of exacerbations using an ensemble of clustering algorithms, and (2) classify patients into distinct clusters based on disease severity.
Methods: Data from portable medical devices were analyzed post-hoc using hyperparameter optimization with Self-Organizing Maps (SOM), Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Isolation Forest, and Support Vector Machine (SVM) algorithms, to detect flare-ups. Principal Component Analysis (PCA) followed by KMeans clustering was applied to categorize patients by severity.
Results: 25 patients were included within the study population, data from 17 patients had the required reliability. Five patients were identified in the highest deterioration group, with one clinically confirmed exacerbation accurately detected by our ensemble algorithm. Then, PCA and KMeans clustering grouped patients into three clusters based on severity: Cluster 0 started with the least severe characteristics but experienced decline, Cluster 1 consistently showed the most severe characteristics, and Cluster 2 showed slight improvement.
Conclusion: Our approach effectively identified patients at risk of exacerbations and classified them by disease severity. Although promising, the approach would need to be verified on a larger sample with a larger number of recorded clinically verified exacerbations.
{"title":"Machine learning approach to flare-up detection and clustering in chronic obstructive pulmonary disease (COPD) patients.","authors":"Ramón Rueda, Esteban Fabello, Tatiana Silva, Samuel Genzor, Jan Mizera, Ladislav Stanke","doi":"10.1007/s13755-024-00308-4","DOIUrl":"10.1007/s13755-024-00308-4","url":null,"abstract":"<p><strong>Purpose: </strong>Chronic obstructive pulmonary disease (COPD) is a prevalent and preventable condition that typically worsens over time. Acute exacerbations of COPD significantly impact disease progression, underscoring the importance of prevention efforts. This observational study aimed to achieve two main objectives: (1) identify patients at risk of exacerbations using an ensemble of clustering algorithms, and (2) classify patients into distinct clusters based on disease severity.</p><p><strong>Methods: </strong>Data from portable medical devices were analyzed post-hoc using hyperparameter optimization with Self-Organizing Maps (SOM), Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Isolation Forest, and Support Vector Machine (SVM) algorithms, to detect flare-ups. Principal Component Analysis (PCA) followed by KMeans clustering was applied to categorize patients by severity.</p><p><strong>Results: </strong>25 patients were included within the study population, data from 17 patients had the required reliability. Five patients were identified in the highest deterioration group, with one clinically confirmed exacerbation accurately detected by our ensemble algorithm. Then, PCA and KMeans clustering grouped patients into three clusters based on severity: Cluster 0 started with the least severe characteristics but experienced decline, Cluster 1 consistently showed the most severe characteristics, and Cluster 2 showed slight improvement.</p><p><strong>Conclusion: </strong>Our approach effectively identified patients at risk of exacerbations and classified them by disease severity. Although promising, the approach would need to be verified on a larger sample with a larger number of recorded clinically verified exacerbations.</p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"12 1","pages":"50"},"PeriodicalIF":3.4,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11499475/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142516717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13eCollection Date: 2024-12-01DOI: 10.1007/s13755-024-00306-6
Liutao Zhao, Haoran Xie, Lin Zhong, Yujue Wang
Artificial intelligence has immense potential for applications in smart healthcare. Nowadays, a large amount of medical data collected by wearable or implantable devices has been accumulated in Body Area Networks. Unlocking the value of this data can better explore the applications of artificial intelligence in the smart healthcare field. To utilize these dispersed data, this paper proposes an innovative Federated Learning scheme, focusing on the challenges of explainability and security in smart healthcare. In the proposed scheme, the federated modeling process and explainability analysis are independent of each other. By introducing post-hoc explanation techniques to analyze the global model, the scheme avoids the performance degradation caused by pursuing explainability while understanding the mechanism of the model. In terms of security, firstly, a fair and efficient client private gradient evaluation method is introduced for explainable evaluation of gradient contributions, quantifying client contributions in federated learning and filtering the impact of low-quality data. Secondly, to address the privacy issues of medical health data collected by wireless Body Area Networks, a multi-server model is proposed to solve the secure aggregation problem in federated learning. Furthermore, by employing homomorphic secret sharing and homomorphic hashing techniques, a non-interactive, verifiable secure aggregation protocol is proposed, ensuring that client data privacy is protected and the correctness of the aggregation results is maintained even in the presence of up to t colluding malicious servers. Experimental results demonstrate that the proposed scheme's explainability is consistent with that of centralized training scenarios and shows competitive performance in terms of security and efficiency.
Graphical abstract:
人工智能在智能医疗领域的应用潜力巨大。如今,由可穿戴或植入式设备收集的大量医疗数据已在体域网络中积累起来。挖掘这些数据的价值可以更好地探索人工智能在智能医疗领域的应用。为了利用这些分散的数据,本文提出了一种创新的联盟学习方案,重点关注智能医疗领域中可解释性和安全性的挑战。在所提出的方案中,联合建模过程和可解释性分析是相互独立的。通过引入事后解释技术来分析全局模型,该方案避免了在理解模型机制的同时追求可解释性而导致的性能下降。在安全性方面,首先,针对梯度贡献的可解释性评估,引入了一种公平高效的客户端私有梯度评估方法,量化了联合学习中的客户端贡献,过滤了低质量数据的影响。其次,针对无线体域网收集的医疗健康数据的隐私问题,提出了一种多服务器模型,以解决联合学习中的安全聚合问题。此外,通过采用同态秘密共享和同态散列技术,提出了一种非交互式、可验证的安全聚合协议,确保客户端数据隐私得到保护,即使存在多达 t 个恶意串通的服务器,也能保持聚合结果的正确性。实验结果表明,所提方案的可解释性与集中式训练方案一致,并在安全性和效率方面表现出了竞争力:
{"title":"Explainable federated learning scheme for secure healthcare data sharing.","authors":"Liutao Zhao, Haoran Xie, Lin Zhong, Yujue Wang","doi":"10.1007/s13755-024-00306-6","DOIUrl":"10.1007/s13755-024-00306-6","url":null,"abstract":"<p><p>Artificial intelligence has immense potential for applications in smart healthcare. Nowadays, a large amount of medical data collected by wearable or implantable devices has been accumulated in Body Area Networks. Unlocking the value of this data can better explore the applications of artificial intelligence in the smart healthcare field. To utilize these dispersed data, this paper proposes an innovative Federated Learning scheme, focusing on the challenges of explainability and security in smart healthcare. In the proposed scheme, the federated modeling process and explainability analysis are independent of each other. By introducing post-hoc explanation techniques to analyze the global model, the scheme avoids the performance degradation caused by pursuing explainability while understanding the mechanism of the model. In terms of security, firstly, a fair and efficient client private gradient evaluation method is introduced for explainable evaluation of gradient contributions, quantifying client contributions in federated learning and filtering the impact of low-quality data. Secondly, to address the privacy issues of medical health data collected by wireless Body Area Networks, a multi-server model is proposed to solve the secure aggregation problem in federated learning. Furthermore, by employing homomorphic secret sharing and homomorphic hashing techniques, a non-interactive, verifiable secure aggregation protocol is proposed, ensuring that client data privacy is protected and the correctness of the aggregation results is maintained even in the presence of up to <i>t</i> colluding malicious servers. Experimental results demonstrate that the proposed scheme's explainability is consistent with that of centralized training scenarios and shows competitive performance in terms of security and efficiency.</p><p><strong>Graphical abstract: </strong></p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"12 1","pages":"49"},"PeriodicalIF":3.4,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11399375/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142298293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-12eCollection Date: 2024-12-01DOI: 10.1007/s13755-024-00307-5
Ye Liang, Chonghui Guo, Hailin Li
Objective: The study aims to identify distinct population-specific comorbidity progression patterns, timely detect potential comorbidities, and gain better understanding of the progression of comorbid conditions among patients.
Methods: This work presents a comorbidity progression analysis framework that utilizes temporal comorbidity networks (TCN) for patient stratification and comorbidity prediction. We propose a TCN construction approach that utilizes longitudinal, temporal diagnosis data of patients to construct their TCN. Subsequently, we employ the TCN for patient stratification by conducting preliminary analysis, and typical prescription analysis to uncover potential comorbidity progression patterns in different patient groups. Finally, we propose an innovative comorbidity prediction method by utilizing the distance-matched temporal comorbidity network (TCN-DM). This method identifies similar patients with disease prevalence and disease transition patterns and combines their diagnosis information with that of the current patient to predict potential comorbidity at the patient's next visit.
Results: This study validated the capability of the framework using a real-world dataset MIMIC-III, with heart failure (HF) as interested disease to investigate comorbidity progression in HF patients. With TCN, this study can identify four significant distinctive HF subgroups, revealing the progression of comorbidities in patients. Furthermore, compared to other methods, TCN-DM demonstrated better predictive performance with F1-Score values ranging from 0.454 to 0.612, showcasing its superiority.
Conclusions: This study can identify comorbidity patterns for individuals and population, and offer promising prediction for future comorbidity developments in patients.
{"title":"Comorbidity progression analysis: patient stratification and comorbidity prediction using temporal comorbidity network.","authors":"Ye Liang, Chonghui Guo, Hailin Li","doi":"10.1007/s13755-024-00307-5","DOIUrl":"10.1007/s13755-024-00307-5","url":null,"abstract":"<p><strong>Objective: </strong>The study aims to identify distinct population-specific comorbidity progression patterns, timely detect potential comorbidities, and gain better understanding of the progression of comorbid conditions among patients.</p><p><strong>Methods: </strong>This work presents a comorbidity progression analysis framework that utilizes temporal comorbidity networks (TCN) for patient stratification and comorbidity prediction. We propose a TCN construction approach that utilizes longitudinal, temporal diagnosis data of patients to construct their TCN. Subsequently, we employ the TCN for patient stratification by conducting preliminary analysis, and typical prescription analysis to uncover potential comorbidity progression patterns in different patient groups. Finally, we propose an innovative comorbidity prediction method by utilizing the distance-matched temporal comorbidity network (TCN-DM). This method identifies similar patients with disease prevalence and disease transition patterns and combines their diagnosis information with that of the current patient to predict potential comorbidity at the patient's next visit.</p><p><strong>Results: </strong>This study validated the capability of the framework using a real-world dataset MIMIC-III, with heart failure (HF) as interested disease to investigate comorbidity progression in HF patients. With TCN, this study can identify four significant distinctive HF subgroups, revealing the progression of comorbidities in patients. Furthermore, compared to other methods, TCN-DM demonstrated better predictive performance with F1-Score values ranging from 0.454 to 0.612, showcasing its superiority.</p><p><strong>Conclusions: </strong>This study can identify comorbidity patterns for individuals and population, and offer promising prediction for future comorbidity developments in patients.</p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"12 1","pages":"48"},"PeriodicalIF":3.4,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11393239/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142298292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-06eCollection Date: 2024-12-01DOI: 10.1007/s13755-024-00303-9
Eliseo Bao, Anxo Pérez, Javier Parapar
Users of social platforms often perceive these sites as supportive spaces to post about their mental health issues. Those conversations contain important traces about individuals' health risks. Recently, researchers have exploited this online information to construct mental health detection models, which aim to identify users at risk on platforms like Twitter, Reddit or Facebook. Most of these models are focused on achieving good classification results, ignoring the explainability and interpretability of the decisions. Recent research has pointed out the importance of using clinical markers, such as the use of symptoms, to improve trust in the computational models by health professionals. In this paper, we introduce transformer-based architectures designed to detect and explain the appearance of depressive symptom markers in user-generated content from social media. We present two approaches: (i) train a model to classify, and another one to explain the classifier's decision separately and (ii) unify the two tasks simultaneously within a single model. Additionally, for this latter manner, we also investigated the performance of recent conversational Large Language Models (LLMs) utilizing both in-context learning and finetuning. Our models provide natural language explanations, aligning with validated symptoms, thus enabling clinicians to interpret the decisions more effectively. We evaluate our approaches using recent symptom-focused datasets, using both offline metrics and expert-in-the-loop evaluations to assess the quality of our models' explanations. Our findings demonstrate that it is possible to achieve good classification results while generating interpretable symptom-based explanations.
{"title":"Explainable depression symptom detection in social media.","authors":"Eliseo Bao, Anxo Pérez, Javier Parapar","doi":"10.1007/s13755-024-00303-9","DOIUrl":"10.1007/s13755-024-00303-9","url":null,"abstract":"<p><p>Users of social platforms often perceive these sites as supportive spaces to post about their mental health issues. Those conversations contain important traces about individuals' health risks. Recently, researchers have exploited this online information to construct mental health detection models, which aim to identify users at risk on platforms like Twitter, Reddit or Facebook. Most of these models are focused on achieving good classification results, ignoring the explainability and interpretability of the decisions. Recent research has pointed out the importance of using clinical markers, such as the use of symptoms, to improve trust in the computational models by health professionals. In this paper, we introduce transformer-based architectures designed to detect and explain the appearance of depressive symptom markers in user-generated content from social media. We present two approaches: (i) train a model to classify, and another one to explain the classifier's decision separately and (ii) unify the two tasks simultaneously within a single model. Additionally, for this latter manner, we also investigated the performance of recent conversational Large Language Models (LLMs) utilizing both in-context learning and finetuning. Our models provide natural language explanations, aligning with validated symptoms, thus enabling clinicians to interpret the decisions more effectively. We evaluate our approaches using recent symptom-focused datasets, using both offline metrics and expert-in-the-loop evaluations to assess the quality of our models' explanations. Our findings demonstrate that it is possible to achieve good classification results while generating interpretable symptom-based explanations.</p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"12 1","pages":"47"},"PeriodicalIF":4.7,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11379836/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142156307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heartbeats classification is a crucial tool for arrhythmia diagnosis. In this study, a multi-feature pseudo-color mapping (MfPc Mapping) was proposed, and a lightweight FlexShuffleNet was designed to classify heartbeats. MfPc Mapping converts one-dimensional (1-D) electrocardiogram (ECG) recordings into corresponding two-dimensional (2-D) multi-feature RGB graphs, and it offers good excellent interpretability and data visualization. FlexShuffleNet is a lightweight network that can be adapted to classification tasks of varying complexity by tuning hyperparameters. The method has three steps. The first step is data preprocessing, which includes de-noising the raw ECG recordings, removing baseline drift, extracting heartbeats, and performing data balancing, the second step is transforming the heartbeats using MfPc Mapping. Finally, the FlexShuffleNet is employed to classify heartbeats into 14 categories. This study was evaluated on the test set of the MIT-BIH arrhythmia database (MIT/BIH DB), and it yielded the results i.e., accuracy of 99.77%, sensitivity of 94.60%, precision of 89.83% and specificity of 99.85% and F1-score of 0.9125 in 14-category classification task. Additionally, validation on Shandong Province Hospital database (SPH DB) yielded the results i.e., accuracy of 92.08%, sensitivity of 93.63%, precision of 91.25% and specificity of 99.85% and F1-score of 0.9315. The results show the satisfied performance of the proposed method.
{"title":"A lightweight network based on multi-feature pseudo-color mapping for arrhythmia recognition.","authors":"Yijun Ma, Junyan Li, Jinbiao Zhang, Jilin Wang, Guozhen Sun, Yatao Zhang","doi":"10.1007/s13755-024-00304-8","DOIUrl":"10.1007/s13755-024-00304-8","url":null,"abstract":"<p><p>Heartbeats classification is a crucial tool for arrhythmia diagnosis. In this study, a multi-feature pseudo-color mapping (MfPc Mapping) was proposed, and a lightweight FlexShuffleNet was designed to classify heartbeats. MfPc Mapping converts one-dimensional (1-D) electrocardiogram (ECG) recordings into corresponding two-dimensional (2-D) multi-feature RGB graphs, and it offers good excellent interpretability and data visualization. FlexShuffleNet is a lightweight network that can be adapted to classification tasks of varying complexity by tuning hyperparameters. The method has three steps. The first step is data preprocessing, which includes de-noising the raw ECG recordings, removing baseline drift, extracting heartbeats, and performing data balancing, the second step is transforming the heartbeats using MfPc Mapping. Finally, the FlexShuffleNet is employed to classify heartbeats into 14 categories. This study was evaluated on the test set of the MIT-BIH arrhythmia database (MIT/BIH DB), and it yielded the results i.e., accuracy of 99.77%, sensitivity of 94.60%, precision of 89.83% and specificity of 99.85% and F1-score of 0.9125 in 14-category classification task. Additionally, validation on Shandong Province Hospital database (SPH DB) yielded the results i.e., accuracy of 92.08%, sensitivity of 93.63%, precision of 91.25% and specificity of 99.85% and F1-score of 0.9315. The results show the satisfied performance of the proposed method.</p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"12 1","pages":"46"},"PeriodicalIF":3.4,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371975/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142141311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-03eCollection Date: 2024-12-01DOI: 10.1007/s13755-024-00298-3
Zhisheng Huang, Qing Hu
Adolescent suicide has become an important social issue of general concern. Many young people express their suicidal feelings and intentions through online social media, e.g., Twitter, Microblog. The "tree hole" is the Chinese name for places on the Web where people post secrets. It provides the possibility of using Artificial Intelligence and big data technology to detect the posts where someone express the suicidal signal from those "tree hole" social media. We have developed the Web-based intelligent agents (i.e., AI-based programs) which can monitor the "tree hole" websites in Microblog every day by using knowledge graph technology. We have organized Tree-hole Rescue Team, which consists of more than 1000 volunteers, to carry out suicide rescue intervention according to the daily monitoring notifications. From 2018 to 2023, Tree-hole Rescue Team has prevented more than 6600 suicides. A few thousands of people have been saved within those 6 years. In this paper, we present the basic technology of Web-based Tree Hole intelligent agents and elaborate how the intelligent agents can discover suicide attempts and issue corresponding monitoring notifications and how the volunteers of Tree Hole Rescue Team can conduct online suicide intervention. This research also shows that the knowledge graph approach can be used for the semantic analysis on social media.
{"title":"Tree hole rescue: an AI approach for suicide risk detection and online suicide intervention.","authors":"Zhisheng Huang, Qing Hu","doi":"10.1007/s13755-024-00298-3","DOIUrl":"10.1007/s13755-024-00298-3","url":null,"abstract":"<p><p>Adolescent suicide has become an important social issue of general concern. Many young people express their suicidal feelings and intentions through online social media, e.g., Twitter, Microblog. The \"tree hole\" is the Chinese name for places on the Web where people post secrets. It provides the possibility of using Artificial Intelligence and big data technology to detect the posts where someone express the suicidal signal from those \"tree hole\" social media. We have developed the Web-based intelligent agents (i.e., AI-based programs) which can monitor the \"tree hole\" websites in Microblog every day by using knowledge graph technology. We have organized Tree-hole Rescue Team, which consists of more than 1000 volunteers, to carry out suicide rescue intervention according to the daily monitoring notifications. From 2018 to 2023, Tree-hole Rescue Team has prevented more than 6600 suicides. A few thousands of people have been saved within those 6 years. In this paper, we present the basic technology of Web-based Tree Hole intelligent agents and elaborate how the intelligent agents can discover suicide attempts and issue corresponding monitoring notifications and how the volunteers of Tree Hole Rescue Team can conduct online suicide intervention. This research also shows that the knowledge graph approach can be used for the semantic analysis on social media.</p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"12 1","pages":"45"},"PeriodicalIF":3.4,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371955/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142141312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-31eCollection Date: 2024-12-01DOI: 10.1007/s13755-024-00305-7
Umaisa Hassan, Amit Singhal
Purpose: Attention-deficit hyperactivity disorder (ADHD) stands as a significant psychiatric and neuro-developmental disorder with global prevalence. The prevalence of ADHD among school children in India is estimated to range from 5% to 8%. However, certain studies have reported higher prevalence rates, reaching as high as 11%. Utilizing electroencephalography (EEG) signals for the early detection and classification of ADHD in children is crucial.
Methods: In this study, we introduce a CNN architecture characterized by its simplicity, comprising solely two convolutional layers. Our approach involves pre-processing EEG signals through a band-pass filter and segmenting them into 5-s frames. Following this, the frames undergo normalization and canonical correlation analysis. Subsequently, the proposed CNN architecture is employed for training and testing purposes.
Results: Our methodology yields remarkable results, with 100% accuracy, sensitivity, and specificity when utilizing the complete 19-channel EEG signals for diagnosing ADHD in children. However, employing the entire set of EEG channels presents challenges related to the computational complexity. Therefore, we investigate the feasibility of using only frontal brain EEG channels for ADHD detection, which yields an accuracy of 99.08%.
Conclusions: The proposed method yields high accuracy and is easy to implement, hence, it has the potential for widespread practical deployment to diagnose ADHD.
{"title":"Convolutional neural network framework for EEG-based ADHD diagnosis in children.","authors":"Umaisa Hassan, Amit Singhal","doi":"10.1007/s13755-024-00305-7","DOIUrl":"10.1007/s13755-024-00305-7","url":null,"abstract":"<p><strong>Purpose: </strong>Attention-deficit hyperactivity disorder (ADHD) stands as a significant psychiatric and neuro-developmental disorder with global prevalence. The prevalence of ADHD among school children in India is estimated to range from 5% to 8%. However, certain studies have reported higher prevalence rates, reaching as high as 11%. Utilizing electroencephalography (EEG) signals for the early detection and classification of ADHD in children is crucial.</p><p><strong>Methods: </strong>In this study, we introduce a CNN architecture characterized by its simplicity, comprising solely two convolutional layers. Our approach involves pre-processing EEG signals through a band-pass filter and segmenting them into 5-s frames. Following this, the frames undergo normalization and canonical correlation analysis. Subsequently, the proposed CNN architecture is employed for training and testing purposes.</p><p><strong>Results: </strong>Our methodology yields remarkable results, with 100% accuracy, sensitivity, and specificity when utilizing the complete 19-channel EEG signals for diagnosing ADHD in children. However, employing the entire set of EEG channels presents challenges related to the computational complexity. Therefore, we investigate the feasibility of using only frontal brain EEG channels for ADHD detection, which yields an accuracy of 99.08%.</p><p><strong>Conclusions: </strong>The proposed method yields high accuracy and is easy to implement, hence, it has the potential for widespread practical deployment to diagnose ADHD.</p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"12 1","pages":"44"},"PeriodicalIF":3.4,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11365922/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142120855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}