Accurate blood glucose (BG) prediction is greatly benefit for the treatment of diabetes. Generally, clinical physicians are required to comprehensively analyze various factors, such as patient's body temperature, meal, sleep, insulin injection, continuous glucose monitoring (CGM), and other information, to evaluate the fluctuation trend of blood glucose. To address this problem, this paper proposes a multivariate blood glucose prediction method based on mixed feature clustering. It clusters time series data with diverse or mixed features related to blood glucose, effectively leveraging correlations and distribution characteristics. By combining incremental clustering of multivariate time series with transfer learning, this method achieves online prediction of blood glucose levels. The experimental results indicate that the proposed method can decrease the prediction error RMSE by 4.2% (PH=30min) and 5.9% (PH=60min). Compared with other prediction methods, the training time of the multivariate prediction method is reduced by 5.2% (PH=30min) and 4.7% (PH=60min). It was also validated and compared with other methods in a real dataset. The proposed method in this study has lower prediction error and better prediction performance in the prediction horizon (PH) of PH=30, 45, 60, 75, and 90 min, respectively. Compared with the traditional unitary and multivariate time series prediction method, the approach proposed in this paper significantly improves the accuracy and robustness of blood glucose prediction. According to the evaluation results on the data set from OhioT1DM and the Sixth People's Hospital of Shanghai, the proposed method has better generalization performance and clinical acceptability.
{"title":"A new multivariate blood glucose prediction method with hybrid feature clustering and online transfer learning.","authors":"Fuqiang You, Guo Zhao, Xinyu Zhang, Ziheng Zhang, Jinli Cao, Hongru Li","doi":"10.1007/s13755-024-00313-7","DOIUrl":"10.1007/s13755-024-00313-7","url":null,"abstract":"<p><p>Accurate blood glucose (BG) prediction is greatly benefit for the treatment of diabetes. Generally, clinical physicians are required to comprehensively analyze various factors, such as patient's body temperature, meal, sleep, insulin injection, continuous glucose monitoring (CGM), and other information, to evaluate the fluctuation trend of blood glucose. To address this problem, this paper proposes a multivariate blood glucose prediction method based on mixed feature clustering. It clusters time series data with diverse or mixed features related to blood glucose, effectively leveraging correlations and distribution characteristics. By combining incremental clustering of multivariate time series with transfer learning, this method achieves online prediction of blood glucose levels. The experimental results indicate that the proposed method can decrease the prediction error RMSE by 4.2% (PH=30min) and 5.9% (PH=60min). Compared with other prediction methods, the training time of the multivariate prediction method is reduced by 5.2% (PH=30min) and 4.7% (PH=60min). It was also validated and compared with other methods in a real dataset. The proposed method in this study has lower prediction error and better prediction performance in the prediction horizon (PH) of PH=30, 45, 60, 75, and 90 min, respectively. Compared with the traditional unitary and multivariate time series prediction method, the approach proposed in this paper significantly improves the accuracy and robustness of blood glucose prediction. According to the evaluation results on the data set from OhioT1DM and the Sixth People's Hospital of Shanghai, the proposed method has better generalization performance and clinical acceptability.</p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"12 1","pages":"57"},"PeriodicalIF":4.7,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11570574/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142677071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-16eCollection Date: 2024-12-01DOI: 10.1007/s13755-024-00314-6
Xi Cao, Yong-Feng Ge, Kate Wang, Ying Lin
Purpose: Cognitive diagnostic tests (CDTs) assess cognitive skills at a more granular level, providing detailed insights into the mastery profile of test-takers. Traditional algorithms for constructing CDTs have partially addressed these challenges, focusing on a limited number of constraints. This paper intends to utilize a meta-heuristic algorithm to produce high-quality tests and handle more constraints simultaneously.
Methods: This paper presents a memetic ant colony optimization (MACO) algorithm for constructing CDTs while considering multiple constraints. The MACO method utilizes pheromone trails to represent successful test constructions from the past. Additionally, it innovatively integrates item quality and constraint adherence into heuristic information to manage multiple constraints simultaneously. The method evaluates the assembled tests based on the diagnosis index and constraint satisfaction. Another innovation of MACO is the incorporation of a local search strategy to further enhance diagnostic accuracy by partially optimizing item selection. The optimal local search parameter settings are explored through a parameter investigation. A series of simulation experiments validate the effectiveness of MACO under various conditions.
Results: The results demonstrate the great ability of meta-heuristic algorithms to handle multiple constraints and achieve high statistical performance. MACO exhibited superior performance in generating high-quality CDTs while meeting multiple constraints, particularly for mixed and low discrimination item banks. It achieved faster convergence than the ant colony optimization in most scenarios.
Conclusions: MACO provides an effective solution for multi-constrained CDT construction, especially for shorter tests and item banks with mixed or lower discrimination. The experimental results also suggest that the suitability of different optimization approaches may depend on specific test conditions, such as the characteristics of the item bank and the length of the test.
{"title":"Memetic ant colony optimization for multi-constrained cognitive diagnostic test construction.","authors":"Xi Cao, Yong-Feng Ge, Kate Wang, Ying Lin","doi":"10.1007/s13755-024-00314-6","DOIUrl":"10.1007/s13755-024-00314-6","url":null,"abstract":"<p><strong>Purpose: </strong>Cognitive diagnostic tests (CDTs) assess cognitive skills at a more granular level, providing detailed insights into the mastery profile of test-takers. Traditional algorithms for constructing CDTs have partially addressed these challenges, focusing on a limited number of constraints. This paper intends to utilize a meta-heuristic algorithm to produce high-quality tests and handle more constraints simultaneously.</p><p><strong>Methods: </strong>This paper presents a memetic ant colony optimization (MACO) algorithm for constructing CDTs while considering multiple constraints. The MACO method utilizes pheromone trails to represent successful test constructions from the past. Additionally, it innovatively integrates item quality and constraint adherence into heuristic information to manage multiple constraints simultaneously. The method evaluates the assembled tests based on the diagnosis index and constraint satisfaction. Another innovation of MACO is the incorporation of a local search strategy to further enhance diagnostic accuracy by partially optimizing item selection. The optimal local search parameter settings are explored through a parameter investigation. A series of simulation experiments validate the effectiveness of MACO under various conditions.</p><p><strong>Results: </strong>The results demonstrate the great ability of meta-heuristic algorithms to handle multiple constraints and achieve high statistical performance. MACO exhibited superior performance in generating high-quality CDTs while meeting multiple constraints, particularly for mixed and low discrimination item banks. It achieved faster convergence than the ant colony optimization in most scenarios.</p><p><strong>Conclusions: </strong>MACO provides an effective solution for multi-constrained CDT construction, especially for shorter tests and item banks with mixed or lower discrimination. The experimental results also suggest that the suitability of different optimization approaches may depend on specific test conditions, such as the characteristics of the item bank and the length of the test.</p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"12 1","pages":"56"},"PeriodicalIF":4.7,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11569084/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142668916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Over the past few decades, a variety of significant scientific breakthroughs have been achieved in the fields of brain encoding and decoding using the functional magnetic resonance imaging (fMRI). Many studies have been conducted on the topic of human brain reaction to visual stimuli. However, the relationship between fMRI images and video sequences viewed by humans remains complex and is often studied using large transformer models. In this paper, we investigate the correlation between videos presented to participants during an experiment and the resulting fMRI images. To achieve this, we propose a method for creating a linear model that predicts changes in fMRI signals based on video sequence images. A linear model is constructed for each individual voxel in the fMRI image, assuming that the image sequence follows a Markov property. Through the comprehensive qualitative experiments, we demonstrate the relationship between the two time series. We hope that our findings contribute to a deeper understanding of the human brain's reaction to external stimuli and provide a basis for future research in this area.
{"title":"Forecasting fMRI images from video sequences: linear model analysis.","authors":"Daniil Dorin, Nikita Kiselev, Andrey Grabovoy, Vadim Strijov","doi":"10.1007/s13755-024-00315-5","DOIUrl":"10.1007/s13755-024-00315-5","url":null,"abstract":"<p><p>Over the past few decades, a variety of significant scientific breakthroughs have been achieved in the fields of brain encoding and decoding using the functional magnetic resonance imaging (fMRI). Many studies have been conducted on the topic of human brain reaction to visual stimuli. However, the relationship between fMRI images and video sequences viewed by humans remains complex and is often studied using large transformer models. In this paper, we investigate the correlation between videos presented to participants during an experiment and the resulting fMRI images. To achieve this, we propose a method for creating a linear model that predicts changes in fMRI signals based on video sequence images. A linear model is constructed for each individual voxel in the fMRI image, assuming that the image sequence follows a Markov property. Through the comprehensive qualitative experiments, we demonstrate the relationship between the two time series. We hope that our findings contribute to a deeper understanding of the human brain's reaction to external stimuli and provide a basis for future research in this area.</p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"12 1","pages":"55"},"PeriodicalIF":4.7,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11568086/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142648946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: Kidney stone disease (KSD) is a common urological disorder with an increasing incidence worldwide. The extensive knowledge about KSD is dispersed across multiple databases, challenging the visualization and representation of its hierarchy and connections. This paper aims at constructing a disease-specific knowledge graph for KSD to enhance the effective utilization of knowledge by medical professionals and promote clinical research and discovery.
Methods: Text parsing and semantic analysis were conducted on literature related to KSD from PubMed, with concept annotation based on biomedical ontology being utilized to generate semantic data in RDF format. Moreover, public databases were integrated to construct a large-scale knowledge graph for KSD. Additionally, case studies were carried out to demonstrate the practical utility of the developed knowledge graph.
Results: We proposed and implemented a Kidney Stone Disease Knowledge Graph (KSDKG), covering more than 90 million triples. This graph comprised semantic data extracted from 29,174 articles, integrating available data from UMLS, SNOMED CT, MeSH, DrugBank and Microbe-Disease Knowledge Graph. Through the application of three cases, we retrieved and discovered information on microbes, drugs and diseases associated with KSD. The results illustrated that the KSDKG can integrate diverse medical knowledge and provide new clinical insights for identifying the underlying mechanisms of KSD.
Conclusion: The KSDKG efficiently utilizes knowledge graph to reveal hidden knowledge associations, facilitating semantic search and response. As a blueprint for developing disease-specific knowledge graphs, it offers valuable contributions to medical research.
{"title":"KSDKG: construction and application of knowledge graph for kidney stone disease based on biomedical literature and public databases.","authors":"Jianping Man, Yufei Shi, Zhensheng Hu, Rui Yang, Zhisheng Huang, Yi Zhou","doi":"10.1007/s13755-024-00309-3","DOIUrl":"10.1007/s13755-024-00309-3","url":null,"abstract":"<p><strong>Purpose: </strong>Kidney stone disease (KSD) is a common urological disorder with an increasing incidence worldwide. The extensive knowledge about KSD is dispersed across multiple databases, challenging the visualization and representation of its hierarchy and connections. This paper aims at constructing a disease-specific knowledge graph for KSD to enhance the effective utilization of knowledge by medical professionals and promote clinical research and discovery.</p><p><strong>Methods: </strong>Text parsing and semantic analysis were conducted on literature related to KSD from PubMed, with concept annotation based on biomedical ontology being utilized to generate semantic data in RDF format. Moreover, public databases were integrated to construct a large-scale knowledge graph for KSD. Additionally, case studies were carried out to demonstrate the practical utility of the developed knowledge graph.</p><p><strong>Results: </strong>We proposed and implemented a Kidney Stone Disease Knowledge Graph (KSDKG), covering more than 90 million triples. This graph comprised semantic data extracted from 29,174 articles, integrating available data from UMLS, SNOMED CT, MeSH, DrugBank and Microbe-Disease Knowledge Graph. Through the application of three cases, we retrieved and discovered information on microbes, drugs and diseases associated with KSD. The results illustrated that the KSDKG can integrate diverse medical knowledge and provide new clinical insights for identifying the underlying mechanisms of KSD.</p><p><strong>Conclusion: </strong>The KSDKG efficiently utilizes knowledge graph to reveal hidden knowledge associations, facilitating semantic search and response. As a blueprint for developing disease-specific knowledge graphs, it offers valuable contributions to medical research.</p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"12 1","pages":"54"},"PeriodicalIF":4.7,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11564440/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142648856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-12eCollection Date: 2024-12-01DOI: 10.1007/s13755-024-00311-9
Manlu He, Erwin M Bakker, Michael S Lew
Depression is one of the most prevalent mental conditions which could impair people's productivity and lead to severe consequences. The diagnosis of this disease is complex as it often relies on a physician's subjective interview-based screening. The aim of our work is to propose deep learning models for automatic depression detection by using different data modalities, which could assist in the diagnosis of depression. Current works on automatic depression detection mostly are tested on a single dataset, which might lack robustness, flexibility and scalability. To alleviate this problem, we design a novel Graph Neural Network-enhanced Transformer model named DePressionDetect Net (DPD Net) that leverages textual, audio and visual features and can work under two different application settings: the clinical setting and the social media setting. The model consists of a unimodal encoder module for encoding single modality, a multimodal encoder module for integrating the multimodal information, and a detection module for producing the final prediction. We also propose a model named DePressionDetect-with-EEG Net (DPD-E Net) to incorporate Electroencephalography (EEG) signals and speech data for depression detection. Experiments across four benchmark datasets show that DPD Net and DPD-E Net can outperform the state-of-the-art models on three datasets (i.e., E-DAIC dataset, Twitter depression dataset and MODMA dataset), and achieve competitive performance on the fourth one (i.e., D-vlog dataset). Ablation studies demonstrate the advantages of the proposed modules and the effectiveness of combining diverse modalities for automatic depression detection.
抑郁症是最常见的精神疾病之一,会损害人们的工作效率并导致严重后果。这种疾病的诊断非常复杂,因为它通常依赖于医生基于访谈的主观筛查。我们的工作旨在通过使用不同的数据模式,为抑郁症的自动检测提出深度学习模型,从而为抑郁症的诊断提供帮助。目前的抑郁症自动检测工作大多在单一数据集上进行测试,可能缺乏鲁棒性、灵活性和可扩展性。为了缓解这一问题,我们设计了一种名为 "抑郁检测网络"(DePressionDetect Net,DPD Net)的新型图神经网络增强变换器模型,该模型利用文本、音频和视觉特征,可在两种不同的应用环境下工作:临床环境和社交媒体环境。该模型由用于编码单一模态的单模态编码器模块、用于整合多模态信息的多模态编码器模块和用于生成最终预测结果的检测模块组成。我们还提出了一个名为 "DePressionDetect-with-EEG Net"(DPD-E Net)的模型,用于结合脑电图(EEG)信号和语音数据进行抑郁检测。四个基准数据集的实验表明,DPD Net 和 DPD-E Net 在三个数据集(即 E-DAIC 数据集、Twitter 抑郁症数据集和 MODMA 数据集)上的表现优于最先进的模型,并在第四个数据集(即 D-vlog 数据集)上取得了具有竞争力的性能。消融研究证明了所提模块的优势,以及结合多种模式进行抑郁自动检测的有效性。
{"title":"DPD (DePression Detection) Net: a deep neural network for multimodal depression detection.","authors":"Manlu He, Erwin M Bakker, Michael S Lew","doi":"10.1007/s13755-024-00311-9","DOIUrl":"10.1007/s13755-024-00311-9","url":null,"abstract":"<p><p>Depression is one of the most prevalent mental conditions which could impair people's productivity and lead to severe consequences. The diagnosis of this disease is complex as it often relies on a physician's subjective interview-based screening. The aim of our work is to propose deep learning models for automatic depression detection by using different data modalities, which could assist in the diagnosis of depression. Current works on automatic depression detection mostly are tested on a single dataset, which might lack robustness, flexibility and scalability. To alleviate this problem, we design a novel Graph Neural Network-enhanced Transformer model named DePressionDetect Net (DPD Net) that leverages textual, audio and visual features and can work under two different application settings: the clinical setting and the social media setting. The model consists of a unimodal encoder module for encoding single modality, a multimodal encoder module for integrating the multimodal information, and a detection module for producing the final prediction. We also propose a model named DePressionDetect-with-EEG Net (DPD-E Net) to incorporate Electroencephalography (EEG) signals and speech data for depression detection. Experiments across four benchmark datasets show that DPD Net and DPD-E Net can outperform the state-of-the-art models on three datasets (i.e., E-DAIC dataset, Twitter depression dataset and MODMA dataset), and achieve competitive performance on the fourth one (i.e., D-vlog dataset). Ablation studies demonstrate the advantages of the proposed modules and the effectiveness of combining diverse modalities for automatic depression detection.</p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"12 1","pages":"53"},"PeriodicalIF":4.7,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11557813/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142630203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-12eCollection Date: 2024-12-01DOI: 10.1007/s13755-024-00312-8
Ruichen Cong, Ou Deng, Shoji Nishimura, Atsushi Ogihara, Qun Jin
Purpose: Recent advancements in information technology and wearable devices have revolutionized healthcare through health data analysis. Identifying significant relationships in complex health data enhances healthcare and public health strategies. In health analytics, causal graphs are important for investigating the relationships among health features. However, they face challenges owing to the large number of features, complexity, and computational demands. Feature selection methods are useful for addressing these challenges. In this paper, we present a framework for multiple feature selection based on an optimization strategy for causal analysis of health data.
Methods: We select multiple health features based on an optimization strategy. First, we define a Weighted Total Score (WTS) index to assess the feature importance after the combination of different feature selection methods. To explore an optimal set of weights for each method, we design a multiple feature selection algorithm integrated with the greedy algorithm. The features are then ranked according to their WTS, enabling selection of the most important ones. After that, causal graphs are constructed based on the selected features, and the statistical significance of the paths is assessed. Furthermore, evaluation experiments are conducted on an experiment dataset collected for this study and an open dataset for diabetes.
Results: The results demonstrate that our approach outperforms baseline models by reducing the number of features while improving model performance. Moreover, the statistical significance of the relationships between features uncovered through causal graphs is validated for both datasets.
Conclusion: By using the proposed framework for multiple feature selection based on an optimization strategy for causal analysis, the number of features is reduced and the causal relationships are uncovered and validated.
{"title":"Multiple feature selection based on an optimization strategy for causal analysis of health data.","authors":"Ruichen Cong, Ou Deng, Shoji Nishimura, Atsushi Ogihara, Qun Jin","doi":"10.1007/s13755-024-00312-8","DOIUrl":"10.1007/s13755-024-00312-8","url":null,"abstract":"<p><strong>Purpose: </strong>Recent advancements in information technology and wearable devices have revolutionized healthcare through health data analysis. Identifying significant relationships in complex health data enhances healthcare and public health strategies. In health analytics, causal graphs are important for investigating the relationships among health features. However, they face challenges owing to the large number of features, complexity, and computational demands. Feature selection methods are useful for addressing these challenges. In this paper, we present a framework for multiple feature selection based on an optimization strategy for causal analysis of health data.</p><p><strong>Methods: </strong>We select multiple health features based on an optimization strategy. First, we define a Weighted Total Score (WTS) index to assess the feature importance after the combination of different feature selection methods. To explore an optimal set of weights for each method, we design a multiple feature selection algorithm integrated with the greedy algorithm. The features are then ranked according to their WTS, enabling selection of the most important ones. After that, causal graphs are constructed based on the selected features, and the statistical significance of the paths is assessed. Furthermore, evaluation experiments are conducted on an experiment dataset collected for this study and an open dataset for diabetes.</p><p><strong>Results: </strong>The results demonstrate that our approach outperforms baseline models by reducing the number of features while improving model performance. Moreover, the statistical significance of the relationships between features uncovered through causal graphs is validated for both datasets.</p><p><strong>Conclusion: </strong>By using the proposed framework for multiple feature selection based on an optimization strategy for causal analysis, the number of features is reduced and the causal relationships are uncovered and validated.</p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"12 1","pages":"52"},"PeriodicalIF":4.7,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11554952/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142630210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-11eCollection Date: 2024-12-01DOI: 10.1007/s13755-024-00310-w
Zhenzhen Du, Shuang Wang, Ouzhou Yang, Juan He, Yujie Yang, Jing Zheng, Honglei Zhao, Yunpeng Cai
Purpose: Dyslipidemia poses a significant risk for the progression to cardiovascular diseases. Despite the identification of numerous risk factors and the proposal of various risk scales, there is still an urgent need for effective predictive models for the onset of cardiovascular diseases in the hyperlipidemic population, which are essential for the prevention of CVD.
Methods: We carried out a retrospective cohort study with 23,548 hyperlipidemia patients in Shenzhen Health Information Big Data Platform, including 11,723 CVD onset cases in a 3-year follow-up. The population was randomly divided into 70% as an independent training dataset and remaining 30% as test set. Four distinct machine-learning algorithms were implemented on the training dataset with the aim of developing highly accurate predictive models, and their performance was subsequently benchmarked against conventional risk assessment scales. An ablation study was also carried out to analyze the impact of individual risk factors to model performance.
Results: The non-linear algorithm, LightGBM, excelled in forecasting the incidence of cardiovascular disease within 3 years, achieving an area under the 'receiver operating characteristic curve' (AUROC) of 0.883. This performance surpassed that of the conventional logistic regression model, which had an AUROC of 0.725, on identical datasets. Concurrently, in direct comparative analyses, machine-learning approaches have notably outperformed the three traditional risk assessment methods within their respective applicable populations. These include the Framingham cardiovascular disease risk score, 2019 ESC/EAS guidelines for the management of dyslipidemia and the 2016 Chinese recommendations for the management of dyslipidemia in adults. Further analysis of risk factors showed that the variability of blood lipid levels and remnant cholesterol played an important role in indicating an increased risk of CVD.
Conclusions: We have shown that the application of machine-learning techniques significantly enhances the precision of cardiovascular risk forecasting among hyperlipidemic patients, addressing the critical issue of disease prediction's heterogeneity and non-linearity. Furthermore, some recently-suggested biomarkers, including blood lipid variability and remnant cholesterol are also important predictors of cardiovascular events, suggesting the importance of continuous lipid monitoring and healthcare profiling through big data platforms.
{"title":"Machine-learning-based prediction of cardiovascular events for hyperlipidemia population with lipid variability and remnant cholesterol as biomarkers.","authors":"Zhenzhen Du, Shuang Wang, Ouzhou Yang, Juan He, Yujie Yang, Jing Zheng, Honglei Zhao, Yunpeng Cai","doi":"10.1007/s13755-024-00310-w","DOIUrl":"10.1007/s13755-024-00310-w","url":null,"abstract":"<p><strong>Purpose: </strong>Dyslipidemia poses a significant risk for the progression to cardiovascular diseases. Despite the identification of numerous risk factors and the proposal of various risk scales, there is still an urgent need for effective predictive models for the onset of cardiovascular diseases in the hyperlipidemic population, which are essential for the prevention of CVD.</p><p><strong>Methods: </strong>We carried out a retrospective cohort study with 23,548 hyperlipidemia patients in Shenzhen Health Information Big Data Platform, including 11,723 CVD onset cases in a 3-year follow-up. The population was randomly divided into 70% as an independent training dataset and remaining 30% as test set. Four distinct machine-learning algorithms were implemented on the training dataset with the aim of developing highly accurate predictive models, and their performance was subsequently benchmarked against conventional risk assessment scales. An ablation study was also carried out to analyze the impact of individual risk factors to model performance.</p><p><strong>Results: </strong>The non-linear algorithm, LightGBM, excelled in forecasting the incidence of cardiovascular disease within 3 years, achieving an area under the 'receiver operating characteristic curve' (AUROC) of 0.883. This performance surpassed that of the conventional logistic regression model, which had an AUROC of 0.725, on identical datasets. Concurrently, in direct comparative analyses, machine-learning approaches have notably outperformed the three traditional risk assessment methods within their respective applicable populations. These include the Framingham cardiovascular disease risk score, 2019 ESC/EAS guidelines for the management of dyslipidemia and the 2016 Chinese recommendations for the management of dyslipidemia in adults. Further analysis of risk factors showed that the variability of blood lipid levels and remnant cholesterol played an important role in indicating an increased risk of CVD.</p><p><strong>Conclusions: </strong>We have shown that the application of machine-learning techniques significantly enhances the precision of cardiovascular risk forecasting among hyperlipidemic patients, addressing the critical issue of disease prediction's heterogeneity and non-linearity. Furthermore, some recently-suggested biomarkers, including blood lipid variability and remnant cholesterol are also important predictors of cardiovascular events, suggesting the importance of continuous lipid monitoring and healthcare profiling through big data platforms.</p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"12 1","pages":"51"},"PeriodicalIF":4.7,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11551092/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142630206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-23eCollection Date: 2024-12-01DOI: 10.1007/s13755-024-00308-4
Ramón Rueda, Esteban Fabello, Tatiana Silva, Samuel Genzor, Jan Mizera, Ladislav Stanke
Purpose: Chronic obstructive pulmonary disease (COPD) is a prevalent and preventable condition that typically worsens over time. Acute exacerbations of COPD significantly impact disease progression, underscoring the importance of prevention efforts. This observational study aimed to achieve two main objectives: (1) identify patients at risk of exacerbations using an ensemble of clustering algorithms, and (2) classify patients into distinct clusters based on disease severity.
Methods: Data from portable medical devices were analyzed post-hoc using hyperparameter optimization with Self-Organizing Maps (SOM), Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Isolation Forest, and Support Vector Machine (SVM) algorithms, to detect flare-ups. Principal Component Analysis (PCA) followed by KMeans clustering was applied to categorize patients by severity.
Results: 25 patients were included within the study population, data from 17 patients had the required reliability. Five patients were identified in the highest deterioration group, with one clinically confirmed exacerbation accurately detected by our ensemble algorithm. Then, PCA and KMeans clustering grouped patients into three clusters based on severity: Cluster 0 started with the least severe characteristics but experienced decline, Cluster 1 consistently showed the most severe characteristics, and Cluster 2 showed slight improvement.
Conclusion: Our approach effectively identified patients at risk of exacerbations and classified them by disease severity. Although promising, the approach would need to be verified on a larger sample with a larger number of recorded clinically verified exacerbations.
{"title":"Machine learning approach to flare-up detection and clustering in chronic obstructive pulmonary disease (COPD) patients.","authors":"Ramón Rueda, Esteban Fabello, Tatiana Silva, Samuel Genzor, Jan Mizera, Ladislav Stanke","doi":"10.1007/s13755-024-00308-4","DOIUrl":"10.1007/s13755-024-00308-4","url":null,"abstract":"<p><strong>Purpose: </strong>Chronic obstructive pulmonary disease (COPD) is a prevalent and preventable condition that typically worsens over time. Acute exacerbations of COPD significantly impact disease progression, underscoring the importance of prevention efforts. This observational study aimed to achieve two main objectives: (1) identify patients at risk of exacerbations using an ensemble of clustering algorithms, and (2) classify patients into distinct clusters based on disease severity.</p><p><strong>Methods: </strong>Data from portable medical devices were analyzed post-hoc using hyperparameter optimization with Self-Organizing Maps (SOM), Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Isolation Forest, and Support Vector Machine (SVM) algorithms, to detect flare-ups. Principal Component Analysis (PCA) followed by KMeans clustering was applied to categorize patients by severity.</p><p><strong>Results: </strong>25 patients were included within the study population, data from 17 patients had the required reliability. Five patients were identified in the highest deterioration group, with one clinically confirmed exacerbation accurately detected by our ensemble algorithm. Then, PCA and KMeans clustering grouped patients into three clusters based on severity: Cluster 0 started with the least severe characteristics but experienced decline, Cluster 1 consistently showed the most severe characteristics, and Cluster 2 showed slight improvement.</p><p><strong>Conclusion: </strong>Our approach effectively identified patients at risk of exacerbations and classified them by disease severity. Although promising, the approach would need to be verified on a larger sample with a larger number of recorded clinically verified exacerbations.</p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"12 1","pages":"50"},"PeriodicalIF":4.7,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11499475/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142516717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13eCollection Date: 2024-12-01DOI: 10.1007/s13755-024-00306-6
Liutao Zhao, Haoran Xie, Lin Zhong, Yujue Wang
Artificial intelligence has immense potential for applications in smart healthcare. Nowadays, a large amount of medical data collected by wearable or implantable devices has been accumulated in Body Area Networks. Unlocking the value of this data can better explore the applications of artificial intelligence in the smart healthcare field. To utilize these dispersed data, this paper proposes an innovative Federated Learning scheme, focusing on the challenges of explainability and security in smart healthcare. In the proposed scheme, the federated modeling process and explainability analysis are independent of each other. By introducing post-hoc explanation techniques to analyze the global model, the scheme avoids the performance degradation caused by pursuing explainability while understanding the mechanism of the model. In terms of security, firstly, a fair and efficient client private gradient evaluation method is introduced for explainable evaluation of gradient contributions, quantifying client contributions in federated learning and filtering the impact of low-quality data. Secondly, to address the privacy issues of medical health data collected by wireless Body Area Networks, a multi-server model is proposed to solve the secure aggregation problem in federated learning. Furthermore, by employing homomorphic secret sharing and homomorphic hashing techniques, a non-interactive, verifiable secure aggregation protocol is proposed, ensuring that client data privacy is protected and the correctness of the aggregation results is maintained even in the presence of up to t colluding malicious servers. Experimental results demonstrate that the proposed scheme's explainability is consistent with that of centralized training scenarios and shows competitive performance in terms of security and efficiency.
Graphical abstract:
人工智能在智能医疗领域的应用潜力巨大。如今,由可穿戴或植入式设备收集的大量医疗数据已在体域网络中积累起来。挖掘这些数据的价值可以更好地探索人工智能在智能医疗领域的应用。为了利用这些分散的数据,本文提出了一种创新的联盟学习方案,重点关注智能医疗领域中可解释性和安全性的挑战。在所提出的方案中,联合建模过程和可解释性分析是相互独立的。通过引入事后解释技术来分析全局模型,该方案避免了在理解模型机制的同时追求可解释性而导致的性能下降。在安全性方面,首先,针对梯度贡献的可解释性评估,引入了一种公平高效的客户端私有梯度评估方法,量化了联合学习中的客户端贡献,过滤了低质量数据的影响。其次,针对无线体域网收集的医疗健康数据的隐私问题,提出了一种多服务器模型,以解决联合学习中的安全聚合问题。此外,通过采用同态秘密共享和同态散列技术,提出了一种非交互式、可验证的安全聚合协议,确保客户端数据隐私得到保护,即使存在多达 t 个恶意串通的服务器,也能保持聚合结果的正确性。实验结果表明,所提方案的可解释性与集中式训练方案一致,并在安全性和效率方面表现出了竞争力:
{"title":"Explainable federated learning scheme for secure healthcare data sharing.","authors":"Liutao Zhao, Haoran Xie, Lin Zhong, Yujue Wang","doi":"10.1007/s13755-024-00306-6","DOIUrl":"10.1007/s13755-024-00306-6","url":null,"abstract":"<p><p>Artificial intelligence has immense potential for applications in smart healthcare. Nowadays, a large amount of medical data collected by wearable or implantable devices has been accumulated in Body Area Networks. Unlocking the value of this data can better explore the applications of artificial intelligence in the smart healthcare field. To utilize these dispersed data, this paper proposes an innovative Federated Learning scheme, focusing on the challenges of explainability and security in smart healthcare. In the proposed scheme, the federated modeling process and explainability analysis are independent of each other. By introducing post-hoc explanation techniques to analyze the global model, the scheme avoids the performance degradation caused by pursuing explainability while understanding the mechanism of the model. In terms of security, firstly, a fair and efficient client private gradient evaluation method is introduced for explainable evaluation of gradient contributions, quantifying client contributions in federated learning and filtering the impact of low-quality data. Secondly, to address the privacy issues of medical health data collected by wireless Body Area Networks, a multi-server model is proposed to solve the secure aggregation problem in federated learning. Furthermore, by employing homomorphic secret sharing and homomorphic hashing techniques, a non-interactive, verifiable secure aggregation protocol is proposed, ensuring that client data privacy is protected and the correctness of the aggregation results is maintained even in the presence of up to <i>t</i> colluding malicious servers. Experimental results demonstrate that the proposed scheme's explainability is consistent with that of centralized training scenarios and shows competitive performance in terms of security and efficiency.</p><p><strong>Graphical abstract: </strong></p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"12 1","pages":"49"},"PeriodicalIF":4.7,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11399375/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142298293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-12eCollection Date: 2024-12-01DOI: 10.1007/s13755-024-00307-5
Ye Liang, Chonghui Guo, Hailin Li
Objective: The study aims to identify distinct population-specific comorbidity progression patterns, timely detect potential comorbidities, and gain better understanding of the progression of comorbid conditions among patients.
Methods: This work presents a comorbidity progression analysis framework that utilizes temporal comorbidity networks (TCN) for patient stratification and comorbidity prediction. We propose a TCN construction approach that utilizes longitudinal, temporal diagnosis data of patients to construct their TCN. Subsequently, we employ the TCN for patient stratification by conducting preliminary analysis, and typical prescription analysis to uncover potential comorbidity progression patterns in different patient groups. Finally, we propose an innovative comorbidity prediction method by utilizing the distance-matched temporal comorbidity network (TCN-DM). This method identifies similar patients with disease prevalence and disease transition patterns and combines their diagnosis information with that of the current patient to predict potential comorbidity at the patient's next visit.
Results: This study validated the capability of the framework using a real-world dataset MIMIC-III, with heart failure (HF) as interested disease to investigate comorbidity progression in HF patients. With TCN, this study can identify four significant distinctive HF subgroups, revealing the progression of comorbidities in patients. Furthermore, compared to other methods, TCN-DM demonstrated better predictive performance with F1-Score values ranging from 0.454 to 0.612, showcasing its superiority.
Conclusions: This study can identify comorbidity patterns for individuals and population, and offer promising prediction for future comorbidity developments in patients.
{"title":"Comorbidity progression analysis: patient stratification and comorbidity prediction using temporal comorbidity network.","authors":"Ye Liang, Chonghui Guo, Hailin Li","doi":"10.1007/s13755-024-00307-5","DOIUrl":"10.1007/s13755-024-00307-5","url":null,"abstract":"<p><strong>Objective: </strong>The study aims to identify distinct population-specific comorbidity progression patterns, timely detect potential comorbidities, and gain better understanding of the progression of comorbid conditions among patients.</p><p><strong>Methods: </strong>This work presents a comorbidity progression analysis framework that utilizes temporal comorbidity networks (TCN) for patient stratification and comorbidity prediction. We propose a TCN construction approach that utilizes longitudinal, temporal diagnosis data of patients to construct their TCN. Subsequently, we employ the TCN for patient stratification by conducting preliminary analysis, and typical prescription analysis to uncover potential comorbidity progression patterns in different patient groups. Finally, we propose an innovative comorbidity prediction method by utilizing the distance-matched temporal comorbidity network (TCN-DM). This method identifies similar patients with disease prevalence and disease transition patterns and combines their diagnosis information with that of the current patient to predict potential comorbidity at the patient's next visit.</p><p><strong>Results: </strong>This study validated the capability of the framework using a real-world dataset MIMIC-III, with heart failure (HF) as interested disease to investigate comorbidity progression in HF patients. With TCN, this study can identify four significant distinctive HF subgroups, revealing the progression of comorbidities in patients. Furthermore, compared to other methods, TCN-DM demonstrated better predictive performance with F1-Score values ranging from 0.454 to 0.612, showcasing its superiority.</p><p><strong>Conclusions: </strong>This study can identify comorbidity patterns for individuals and population, and offer promising prediction for future comorbidity developments in patients.</p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"12 1","pages":"48"},"PeriodicalIF":4.7,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11393239/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142298292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}