Pub Date : 2024-11-15DOI: 10.1016/j.compbiomed.2024.109393
Tiago A.H. Fonseca , Cristiana P. Von Rekowski , Rúben Araújo , M. Conceição Oliveira , Gonçalo C. Justino , Luís Bento , Cecília R.C. Calado
Serum metabolome analysis is essential for identifying disease biomarkers and predicting patient outcomes in precision medicine. Thus, this study aims to compare Ultra-High Performance Liquid Chromatography-High-Resolution Mass Spectrometry (UHPLC-HRMS) with Fourier Transform Infrared (FTIR) spectroscopy in acquiring the serum metabolome of critically ill patients, associated with invasive mechanical ventilation (IMV), and predicting death. Three groups of 8 patients were considered. Group A did not require IMV and survived hospitalization, while Groups B and C required IMV. Group C patients died a median of 5 days after sample harvest. Good prediction models were achieved when comparing groups A to B and B to C using both platforms’ data, with UHPLC-HRMS showing 8–17 % higher accuracies (≥83 %). However, developing predictive models using metabolite sets was not feasible when comparing unbalanced populations, i.e., Groups A and B combined to Group C. Alternatively, FTIR-spectroscopy enabled the development of a model with 83 % accuracy. Overall, UHPLC-HRMS data yields more robust prediction models when comparing homogenous populations, potentially enhancing understanding of metabolic mechanisms and improving patient therapy adjustments. FTIR-spectroscopy is more suitable for unbalanced populations. Its simplicity, speed, cost-effectiveness, and high-throughput operation make it ideal for large-scale studies and clinical translation in complex populations.
在精准医疗中,血清代谢组分析对于确定疾病生物标记物和预测患者预后至关重要。因此,本研究旨在比较超高效液相色谱-高分辨质谱法(UHPLC-HRMS)和傅立叶变换红外光谱法(FTIR)在获取与有创机械通气(IMV)相关的重症患者血清代谢组和预测死亡方面的作用。三组共 8 名患者。A 组无需进行有创机械通气并在住院期间存活下来,而 B 组和 C 组则需要进行有创机械通气。C 组患者在样本采集后中位数 5 天死亡。使用两个平台的数据对 A 组和 B 组以及 B 组和 C 组进行比较后,建立了良好的预测模型,其中 UHPLC-HRMS 的准确率高出 8-17%(≥83%)。不过,在比较不平衡群体(即 A 组和 B 组与 C 组的组合)时,使用代谢物集开发预测模型是不可行的。总之,在比较同质人群时,超高效液相色谱-质谱联用仪数据能生成更可靠的预测模型,从而有可能加深对代谢机制的了解,改善患者的治疗调整。傅立叶变换红外光谱法更适用于不平衡人群。傅立叶变换红外光谱法操作简便、速度快、成本效益高且具有高通量,因此非常适合在复杂人群中进行大规模研究和临床转化。
{"title":"Comparison of two metabolomics-platforms to discover biomarkers in critically ill patients from serum analysis","authors":"Tiago A.H. Fonseca , Cristiana P. Von Rekowski , Rúben Araújo , M. Conceição Oliveira , Gonçalo C. Justino , Luís Bento , Cecília R.C. Calado","doi":"10.1016/j.compbiomed.2024.109393","DOIUrl":"10.1016/j.compbiomed.2024.109393","url":null,"abstract":"<div><div>Serum metabolome analysis is essential for identifying disease biomarkers and predicting patient outcomes in precision medicine. Thus, this study aims to compare Ultra-High Performance Liquid Chromatography-High-Resolution Mass Spectrometry (UHPLC-HRMS) with Fourier Transform Infrared (FTIR) spectroscopy in acquiring the serum metabolome of critically ill patients, associated with invasive mechanical ventilation (IMV), and predicting death. Three groups of 8 patients were considered. Group A did not require IMV and survived hospitalization, while Groups B and C required IMV. Group C patients died a median of 5 days after sample harvest. Good prediction models were achieved when comparing groups A to B and B to C using both platforms’ data, with UHPLC-HRMS showing 8–17 % higher accuracies (≥83 %). However, developing predictive models using metabolite sets was not feasible when comparing unbalanced populations, i.e., Groups A and B combined to Group C. Alternatively, FTIR-spectroscopy enabled the development of a model with 83 % accuracy. Overall, UHPLC-HRMS data yields more robust prediction models when comparing homogenous populations, potentially enhancing understanding of metabolic mechanisms and improving patient therapy adjustments. FTIR-spectroscopy is more suitable for unbalanced populations. Its simplicity, speed, cost-effectiveness, and high-throughput operation make it ideal for large-scale studies and clinical translation in complex populations.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"184 ","pages":"Article 109393"},"PeriodicalIF":7.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142643390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.compbiomed.2024.109346
Achraf Djemal , Ahmed Yahia Kallel , Cherif Ouni , Rihem El Baccouch , Dhouha Bouchaala , Fatma Kammoun Feki , Chahnez Charfi Triki , Ahmed Fakhfakh , Olfa Kanoun
The diagnosis of epilepsy based on visual inspection of electroencephalogram (EEG) signals is inherently complex and prone to error, even for physicians, mainly due to the large number of signals involved and the variability between individuals. These same challenges make the development of portable epilepsy diagnostic systems for everyday use difficult. Key obstacles include the immense complexity of signal processing and the inherent ambiguity in accurately classifying disease. For these reasons, we propose in this paper the deployment of compressive sensing to condense EEG signals while preserving relevant information, allowing seizure classification based on systematically selected features of the reconstructed signals. Based on a dataset comprising EEG recordings from 13 epileptic patients with various seizure types, we explore the deployment of the discrete cosine transform (DCT) and random matrix multiplication for compression ratios ranging from 5% to 70%, balancing data reduction with signal fidelity. Following the extraction of relevant features, selection was performed based on mutual information and a correlation matrix to preserve only the most relevant features for analysis. For classification, following a comparison of adequate machine learning models, XGBoost is chosen as it realizes a classification accuracy of 98.78%. The CS method was implemented on an STM32 microcontroller and a Raspberry Pi for reconstruction and classification, to demonstrate feasibility as an embedded system. At 70% compression, significant improvements have been observed: 70% file size reduction, 84% decrease in transmission time (from 2518.532s to 400.392s), and substantial energy savings (e.g., from 11.5±0.707 mWh to 4.5±0.707 mWh for Patient 12). Thereby, the signal quality was maintained with PSNR of 16.15±3.98 and Pearson correlation coefficient of 0.68±0.15. The proposed system highlights the potential for efficient, portable, real-time epilepsy diagnosis systems that achieve precise and fully automated seizure classification.
{"title":"Fast processing and classification of epileptic seizures based on compressed EEG signals","authors":"Achraf Djemal , Ahmed Yahia Kallel , Cherif Ouni , Rihem El Baccouch , Dhouha Bouchaala , Fatma Kammoun Feki , Chahnez Charfi Triki , Ahmed Fakhfakh , Olfa Kanoun","doi":"10.1016/j.compbiomed.2024.109346","DOIUrl":"10.1016/j.compbiomed.2024.109346","url":null,"abstract":"<div><div>The diagnosis of epilepsy based on visual inspection of electroencephalogram (EEG) signals is inherently complex and prone to error, even for physicians, mainly due to the large number of signals involved and the variability between individuals. These same challenges make the development of portable epilepsy diagnostic systems for everyday use difficult. Key obstacles include the immense complexity of signal processing and the inherent ambiguity in accurately classifying disease. For these reasons, we propose in this paper the deployment of compressive sensing to condense EEG signals while preserving relevant information, allowing seizure classification based on systematically selected features of the reconstructed signals. Based on a dataset comprising EEG recordings from 13 epileptic patients with various seizure types, we explore the deployment of the discrete cosine transform (DCT) and random matrix multiplication for compression ratios ranging from 5% to 70%, balancing data reduction with signal fidelity. Following the extraction of relevant features, selection was performed based on mutual information and a correlation matrix to preserve only the most relevant features for analysis. For classification, following a comparison of adequate machine learning models, XGBoost is chosen as it realizes a classification accuracy of 98.78%. The CS method was implemented on an STM32 microcontroller and a Raspberry Pi for reconstruction and classification, to demonstrate feasibility as an embedded system. At 70% compression, significant improvements have been observed: 70% file size reduction, 84% decrease in transmission time (from 2518.532s to 400.392s), and substantial energy savings (e.g., from 11.5±0.707 mWh to 4.5±0.707 mWh for Patient 12). Thereby, the signal quality was maintained with PSNR of 16.15±3.98 and Pearson correlation coefficient of 0.68±0.15. The proposed system highlights the potential for efficient, portable, real-time epilepsy diagnosis systems that achieve precise and fully automated seizure classification.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"184 ","pages":"Article 109346"},"PeriodicalIF":7.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142638659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.compbiomed.2024.109336
Nataliia Molchanova , Vatsal Raina , Andrey Malinin , Francesco La Rosa , Adrien Depeursinge , Mark Gales , Cristina Granziera , Henning Müller , Mara Graziani , Meritxell Bach Cuadra
This paper explores uncertainty quantification (UQ) as an indicator of the trustworthiness of automated deep-learning (DL) tools in the context of white matter lesion (WML) segmentation from magnetic resonance imaging (MRI) scans of multiple sclerosis (MS) patients. Our study focuses on two principal aspects of uncertainty in structured output segmentation tasks. First, we postulate that a reliable uncertainty measure should indicate predictions likely to be incorrect with high uncertainty values. Second, we investigate the merit of quantifying uncertainty at different anatomical scales (voxel, lesion, or patient). We hypothesize that uncertainty at each scale is related to specific types of errors. Our study aims to confirm this relationship by conducting separate analyses for in-domain and out-of-domain settings. Our primary methodological contributions are (i) the development of novel measures for quantifying uncertainty at lesion and patient scales, derived from structural prediction discrepancies, and (ii) the extension of an error retention curve analysis framework to facilitate the evaluation of UQ performance at both lesion and patient scales. The results from a multi-centric MRI dataset of 444 patients demonstrate that our proposed measures more effectively capture model errors at the lesion and patient scales compared to measures that average voxel-scale uncertainty values. We provide the UQ protocols code at https://github.com/Medical-Image-Analysis-Laboratory/MS_WML_uncs.
{"title":"Structural-based uncertainty in deep learning across anatomical scales: Analysis in white matter lesion segmentation","authors":"Nataliia Molchanova , Vatsal Raina , Andrey Malinin , Francesco La Rosa , Adrien Depeursinge , Mark Gales , Cristina Granziera , Henning Müller , Mara Graziani , Meritxell Bach Cuadra","doi":"10.1016/j.compbiomed.2024.109336","DOIUrl":"10.1016/j.compbiomed.2024.109336","url":null,"abstract":"<div><div>This paper explores uncertainty quantification (UQ) as an indicator of the trustworthiness of automated deep-learning (DL) tools in the context of white matter lesion (WML) segmentation from magnetic resonance imaging (MRI) scans of multiple sclerosis (MS) patients. Our study focuses on two principal aspects of uncertainty in structured output segmentation tasks. First, we postulate that a reliable uncertainty measure should indicate predictions likely to be incorrect with high uncertainty values. Second, we investigate the merit of quantifying uncertainty at different anatomical scales (voxel, lesion, or patient). We hypothesize that uncertainty at each scale is related to specific types of errors. Our study aims to confirm this relationship by conducting separate analyses for in-domain and out-of-domain settings. Our primary methodological contributions are (i) the development of novel measures for quantifying uncertainty at lesion and patient scales, derived from structural prediction discrepancies, and (ii) the extension of an error retention curve analysis framework to facilitate the evaluation of UQ performance at both lesion and patient scales. The results from a multi-centric MRI dataset of 444 patients demonstrate that our proposed measures more effectively capture model errors at the lesion and patient scales compared to measures that average voxel-scale uncertainty values. We provide the UQ protocols code at <span><span>https://github.com/Medical-Image-Analysis-Laboratory/MS_WML_uncs</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"184 ","pages":"Article 109336"},"PeriodicalIF":7.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142638671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.compbiomed.2024.109376
Tzu-Ching Shih , You-Cheng Yu , Tang-Chuan Wang
Age-related hearing loss (ARHL) is primarily attributed to inner-ear factors, yet the role of age-related middle ear characteristics remains elusive. Employing a finite element (FE) model, we conducted a comparative analysis with clinical data extracted from a retrospective cohort study involving 90 younger adults (mean age = 38.1 ± 7.7) and 111 older adults (mean age = 63.8 ± 8.4). The clinical dataset encompassed air-bone gap (ABG) measurements obtained through pure-tone audiometry (PTA) at frequencies of 0.5, 1.0, 2.0, and 4.0 kHz. FE results quantified the normalized stapes displacement value of the simulated form of air-bone gap () between the two age groups. The Mann-Whitney U test, with a significance threshold set at P < 0.05, was employed for statistical analysis. Furthermore, the study employs simulated auditory risk unit (ARU) results to evaluate basilar membrane (BM) damage. A significant intergroup discrepancy surfaces at 1.0 kHz ( = 1.0; ABG: P = 0.008), with pronounced BM damage occurring within the speech frequency range (0.5–4.0 kHz) among older adults. The ARU consistently localizes within the 3–18 mm region from the base for both age groups. In conclusion, older adults exhibited significant conductive hearing loss (CHL) at 1.0 kHz but demonstrated a modest enhancement in middle ear sound transmission efficiency at 2.0 kHz. Furthermore, our research indicates that aging exacerbates damage to the BM when exposed to speech frequency excitation exceeding 90 dB sound pressure level (dB SPL).
{"title":"Understanding age-related middle ear properties and basilar membrane damage in hearing loss: A finite element analysis and retrospective cohort study","authors":"Tzu-Ching Shih , You-Cheng Yu , Tang-Chuan Wang","doi":"10.1016/j.compbiomed.2024.109376","DOIUrl":"10.1016/j.compbiomed.2024.109376","url":null,"abstract":"<div><div>Age-related hearing loss (ARHL) is primarily attributed to inner-ear factors, yet the role of age-related middle ear characteristics remains elusive. Employing a finite element (FE) model, we conducted a comparative analysis with clinical data extracted from a retrospective cohort study involving 90 younger adults (mean age = 38.1 ± 7.7) and 111 older adults (mean age = 63.8 ± 8.4). The clinical dataset encompassed air-bone gap (ABG) measurements obtained through pure-tone audiometry (PTA) at frequencies of 0.5, 1.0, 2.0, and 4.0 kHz. FE results quantified the normalized stapes displacement value of the simulated form of air-bone gap (<span><math><mrow><mtext>ABGSim</mtext></mrow></math></span>) between the two age groups. The Mann-Whitney <em>U</em> test, with a significance threshold set at <em>P</em> < 0.05, was employed for statistical analysis. Furthermore, the study employs simulated auditory risk unit (ARU) results to evaluate basilar membrane (BM) damage. A significant intergroup discrepancy surfaces at 1.0 kHz (<span><math><mrow><mtext>ABGSim</mtext></mrow></math></span> = 1.0; ABG: <em>P</em> = 0.008), with pronounced BM damage occurring within the speech frequency range (0.5–4.0 kHz) among older adults. The ARU consistently localizes within the 3–18 mm region from the base for both age groups. In conclusion, older adults exhibited significant conductive hearing loss (CHL) at 1.0 kHz but demonstrated a modest enhancement in middle ear sound transmission efficiency at 2.0 kHz. Furthermore, our research indicates that aging exacerbates damage to the BM when exposed to speech frequency excitation exceeding 90 dB sound pressure level (dB SPL).</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"184 ","pages":"Article 109376"},"PeriodicalIF":7.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142638675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-14DOI: 10.1016/j.compbiomed.2024.109365
Yunfa Ding , Anxia Deng , Hao Yu , Hongbing Zhang , Tengfei Qi , Jipei He , Chenjun He , Hou Jie , Zihao Wang , Liangpin Wu
Objectives
The focus of this study is on identifying a potential association between Crohn's disease (CD), a chronic inflammatory bowel condition, and metabolic syndrome (Mets), characterized by a cluster of metabolic abnormalities, including high blood pressure, abnormal lipid levels, and overweight. While the link between CD and MetS has been suggested in the medical community, the underlying molecular mechanisms remain largely unexplored.
Methods
Using microarray data from the Gene Expression Omnibus (GEO) database, we conducted a differential gene expression analysis and applied Weighted Gene Co-expression Network Analysis (WGCNA) to identify genes shared between CD and MetS. To further elucidate the functions of these shared genes, we performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses and constructed protein-protein interaction (PPI) networks. For key gene screening, we used Random Forest and Least Absolute Shrinkage and Selection Operator (LASSO) regression and constructed a diagnostic prediction model with the Extreme Gradient Boosting (XGBoost) algorithm. Additionally, CIBERSORT and Gene Set Variation Analysis (GSVA) were employed to examine the relationships between these genes and immune cell infiltration, as well as metabolic pathways. Mendelian randomization and colocalization analyses were also conducted to explore causal links between genes and disease. Lastly, single-cell RNA sequencing (scRNA-seq) was used to validate the functionality of these key genes.
Results
Through the use of the limma R package and WGCNA, we identified 1767 co-expressed genes common to both CD and MetS, which are notably enriched in pathways related to immune responses and metabolic regulation. After thorough analysis, 34 key genes were highlighted, demonstrating their potential utility in prognostic models. These genes were closely linked to tissue immune responses and metabolic functions. Subsequent scRNA-seq analysis confirmed the strong diagnostic potential of PIM2 and PBX2, with especially prominent expression in T and B cells.
Conclusion
This study identifies shared regulatory genes between CD and MetS, advancing the development of precise diagnostic tools. In particular, PIM2 and PBX2 were found to be positively associated with hypoxia and hemoglobin metabolism pathways, suggesting their involvement in the modulation of cellular processes. These findings improve our understanding of the molecular mechanisms underlying the comorbidity of CD and MetS, offering novel targets for integrated therapeutic interventions.
研究目的代谢综合征的特征是一系列代谢异常,包括高血压、血脂水平异常和超重。虽然医学界已经提出了胃肠病和代谢综合征之间的联系,但其潜在的分子机制在很大程度上仍未得到探索:方法:利用基因表达总库(GEO)数据库中的微阵列数据,我们进行了差异基因表达分析,并应用加权基因共表达网络分析(WGCNA)确定了 CD 和 MetS 之间的共有基因。为了进一步阐明这些共有基因的功能,我们进行了基因本体(GO)和京都基因与基因组百科全书(KEGG)通路分析,并构建了蛋白-蛋白相互作用(PPI)网络。在关键基因筛选方面,我们使用了随机森林(Random Forest)和最小绝对收缩和选择操作器(Least Absolute Shrinkage and Selection Operator,LASSO)回归,并使用极梯度提升(Extreme Gradient Boosting,XGBoost)算法构建了诊断预测模型。此外,还采用了CIBERSORT和基因组变异分析(GSVA)来研究这些基因与免疫细胞浸润以及代谢途径之间的关系。此外,还进行了孟德尔随机化和共定位分析,以探索基因与疾病之间的因果联系。最后,利用单细胞 RNA 测序(scRNA-seq)验证了这些关键基因的功能:通过使用limma R软件包和WGCNA,我们发现了1767个CD和MetS共同表达的基因,这些基因明显富集在与免疫反应和代谢调节相关的通路中。经过深入分析,我们发现了 34 个关键基因,这表明它们在预后模型中具有潜在的作用。这些基因与组织免疫反应和代谢功能密切相关。随后的 scRNA-seq 分析证实了 PIM2 和 PBX2 的强大诊断潜力,它们在 T 细胞和 B 细胞中的表达尤为突出:结论:这项研究发现了 CD 和 MetS 之间的共享调控基因,推动了精确诊断工具的开发。特别是,研究发现 PIM2 和 PBX2 与缺氧和血红蛋白代谢途径呈正相关,这表明它们参与了细胞过程的调控。这些发现加深了我们对 CD 和 MetS 合并症的分子机制的理解,为综合治疗干预提供了新的靶点。
{"title":"Integrative multi-omics analysis of Crohn's disease and metabolic syndrome: Unveiling the underlying molecular mechanisms of comorbidity","authors":"Yunfa Ding , Anxia Deng , Hao Yu , Hongbing Zhang , Tengfei Qi , Jipei He , Chenjun He , Hou Jie , Zihao Wang , Liangpin Wu","doi":"10.1016/j.compbiomed.2024.109365","DOIUrl":"10.1016/j.compbiomed.2024.109365","url":null,"abstract":"<div><h3>Objectives</h3><div>The focus of this study is on identifying a potential association between Crohn's disease (CD), a chronic inflammatory bowel condition, and metabolic syndrome (Mets), characterized by a cluster of metabolic abnormalities, including high blood pressure, abnormal lipid levels, and overweight. While the link between CD and MetS has been suggested in the medical community, the underlying molecular mechanisms remain largely unexplored.</div></div><div><h3>Methods</h3><div>Using microarray data from the Gene Expression Omnibus (GEO) database, we conducted a differential gene expression analysis and applied Weighted Gene Co-expression Network Analysis (WGCNA) to identify genes shared between CD and MetS. To further elucidate the functions of these shared genes, we performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses and constructed protein-protein interaction (PPI) networks. For key gene screening, we used Random Forest and Least Absolute Shrinkage and Selection Operator (LASSO) regression and constructed a diagnostic prediction model with the Extreme Gradient Boosting (XGBoost) algorithm. Additionally, CIBERSORT and Gene Set Variation Analysis (GSVA) were employed to examine the relationships between these genes and immune cell infiltration, as well as metabolic pathways. Mendelian randomization and colocalization analyses were also conducted to explore causal links between genes and disease. Lastly, single-cell RNA sequencing (scRNA-seq) was used to validate the functionality of these key genes.</div></div><div><h3>Results</h3><div>Through the use of the limma R package and WGCNA, we identified 1767 co-expressed genes common to both CD and MetS, which are notably enriched in pathways related to immune responses and metabolic regulation. After thorough analysis, 34 key genes were highlighted, demonstrating their potential utility in prognostic models. These genes were closely linked to tissue immune responses and metabolic functions. Subsequent scRNA-seq analysis confirmed the strong diagnostic potential of PIM2 and PBX2, with especially prominent expression in T and B cells.</div></div><div><h3>Conclusion</h3><div>This study identifies shared regulatory genes between CD and MetS, advancing the development of precise diagnostic tools. In particular, PIM2 and PBX2 were found to be positively associated with hypoxia and hemoglobin metabolism pathways, suggesting their involvement in the modulation of cellular processes. These findings improve our understanding of the molecular mechanisms underlying the comorbidity of CD and MetS, offering novel targets for integrated therapeutic interventions.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"184 ","pages":"Article 109365"},"PeriodicalIF":7.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142616423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-14DOI: 10.1016/j.compbiomed.2024.109328
Qiao Ning , Zedong Qi
As an important post-translational modification, glutarylation plays a crucial role in a variety of cellular functions. Recently, diverse computational methods for glutarylation site identification have been proposed. However, the class imbalance problem due to data noise and uncertainty of non-glutarylation sites remains a great challenge. In this article, we propose a novel semi-supervised learning algorithm, called WGAN-GP_Glu, for identifying reliable non-glutarylation lysine sites from those without glutarylation annotation. WGAN-GP_Glu method is a multi-module framework algorithm, which mainly includes a reliable negative sample selection module, a deep feature extraction module, and a glutarylation site prediction module. In reliable negative sample selection module, we design an improved method of Wasserstein GAN with Gradient Penalty (WGAN-GP), named ReliableWGAN-GP, including three parts, two generators G1, G2 and a discriminator D, which can select reliable non-glutarylation samples from a great number of unlabeled samples. Generator G1 is utilized to generate noise data from unlabeled samples. For generator G2, both the positive sample and the noise data are used as inputs to improve the discriminant capability of discriminator D. Then, convolutional neural network and bidirectional long short-term memory network combined with attention mechanism are utilized to extract deep features for glutarylation samples and reliable non-glutarylation samples. Finally, a glutarylation site prediction module based on the three-layer fully connected layer is designed to make class predictions for samples. The sensitivity, specificity, accuracy and Matthew correlation coefficient of WGAN-GP_Glu on the independent test data set reach 90.58 %, 95.82 %, 94.44 % and 0.8645, respectively, which surpassed the existing methods for glutarylation sites prediction. Therefore, WGAN-GP_Glu can serve as a powerful tool in identifying glutarylation sites and the ReliableWGAN-GP algorithm is effective in selecting reliable negative samples. The data and code are available at https://github.com/xbbxhbc/WGAN-GP_Glu.git.
{"title":"WGAN-GP_Glu: A semi-supervised model based on double generator-Wasserstein GAN with gradient penalty algorithm for glutarylation site identification","authors":"Qiao Ning , Zedong Qi","doi":"10.1016/j.compbiomed.2024.109328","DOIUrl":"10.1016/j.compbiomed.2024.109328","url":null,"abstract":"<div><div>As an important post-translational modification, glutarylation plays a crucial role in a variety of cellular functions. Recently, diverse computational methods for glutarylation site identification have been proposed. However, the class imbalance problem due to data noise and uncertainty of non-glutarylation sites remains a great challenge. In this article, we propose a novel semi-supervised learning algorithm, called WGAN-GP_Glu, for identifying reliable non-glutarylation lysine sites from those without glutarylation annotation. WGAN-GP_Glu method is a multi-module framework algorithm, which mainly includes a reliable negative sample selection module, a deep feature extraction module, and a glutarylation site prediction module. In reliable negative sample selection module, we design an improved method of Wasserstein GAN with Gradient Penalty (WGAN-GP), named ReliableWGAN-GP, including three parts, two generators G1, G2 and a discriminator D, which can select reliable non-glutarylation samples from a great number of unlabeled samples. Generator G1 is utilized to generate noise data from unlabeled samples. For generator G2, both the positive sample and the noise data are used as inputs to improve the discriminant capability of discriminator D. Then, convolutional neural network and bidirectional long short-term memory network combined with attention mechanism are utilized to extract deep features for glutarylation samples and reliable non-glutarylation samples. Finally, a glutarylation site prediction module based on the three-layer fully connected layer is designed to make class predictions for samples. The sensitivity, specificity, accuracy and Matthew correlation coefficient of WGAN-GP_Glu on the independent test data set reach 90.58 %, 95.82 %, 94.44 % and 0.8645, respectively, which surpassed the existing methods for glutarylation sites prediction. Therefore, WGAN-GP_Glu can serve as a powerful tool in identifying glutarylation sites and the ReliableWGAN-GP algorithm is effective in selecting reliable negative samples. The data and code are available at <span><span>https://github.com/xbbxhbc/WGAN-GP_Glu.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"184 ","pages":"Article 109328"},"PeriodicalIF":7.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142615906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-14DOI: 10.1016/j.compbiomed.2024.109414
Francisco Traquete , Marta Sousa Silva , António E.N. Ferreira
Untargeted metabolomics is an extremely useful approach for the discrimination of biological systems and biomarker identification. However, data analysis workflows are complex and face many challenges. Two of these challenges are the demand of high sample size and the possibility of severe class imbalance, which is particularly common in clinical studies. The latter can make statistical models less generalizable, increase the risk of overfitting and skew the analysis in favour of the majority class. One possible approach to mitigate this problem is data augmentation. However, the use of artificial data requires adequate data augmentation methods and criteria for assessing the quality of the generated data.
In this work, we used Conditional Wasserstein Generative Adversarial Networks with Gradient Penalty (CWGAN-GPs) for data augmentation of metabolomics data. Using a set of benchmark datasets, we applied several criteria for the evaluation of the quality of generated data and assessed the performance of supervised predictive models trained with datasets that included such data. CWGAN-GP models generated realistic data with identical characteristics to real samples, mostly avoiding mode collapse. Furthermore, in cases of class imbalance, the performance of predictive models improved by supplementing the minority class with generated samples. This is evident for high quality datasets with well separated classes. Conversely, model improvements were quite modest for high class overlap datasets. This trend was confirmed by using synthetic datasets with different class separation levels. Data augmentation is a viable procedure to alleviate class imbalance problems but is not universally beneficial in metabolomics.
{"title":"Enhancing supervised analysis of imbalanced untargeted metabolomics datasets using a CWGAN-GP framework for data augmentation","authors":"Francisco Traquete , Marta Sousa Silva , António E.N. Ferreira","doi":"10.1016/j.compbiomed.2024.109414","DOIUrl":"10.1016/j.compbiomed.2024.109414","url":null,"abstract":"<div><div>Untargeted metabolomics is an extremely useful approach for the discrimination of biological systems and biomarker identification. However, data analysis workflows are complex and face many challenges. Two of these challenges are the demand of high sample size and the possibility of severe class imbalance, which is particularly common in clinical studies. The latter can make statistical models less generalizable, increase the risk of overfitting and skew the analysis in favour of the majority class. One possible approach to mitigate this problem is data augmentation. However, the use of artificial data requires adequate data augmentation methods and criteria for assessing the quality of the generated data.</div><div>In this work, we used Conditional Wasserstein Generative Adversarial Networks with Gradient Penalty (CWGAN-GPs) for data augmentation of metabolomics data. Using a set of benchmark datasets, we applied several criteria for the evaluation of the quality of generated data and assessed the performance of supervised predictive models trained with datasets that included such data. CWGAN-GP models generated realistic data with identical characteristics to real samples, mostly avoiding mode collapse. Furthermore, in cases of class imbalance, the performance of predictive models improved by supplementing the minority class with generated samples. This is evident for high quality datasets with well separated classes. Conversely, model improvements were quite modest for high class overlap datasets. This trend was confirmed by using synthetic datasets with different class separation levels. Data augmentation is a viable procedure to alleviate class imbalance problems but is not universally beneficial in metabolomics.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"184 ","pages":"Article 109414"},"PeriodicalIF":7.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142638658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-14DOI: 10.1016/j.compbiomed.2024.109355
Adil Bahaj , Mounir Ghogho
Objective:
This study aims at automatically quantifying and modelling the uncertainty of facts in biomedical knowledge graphs (BKGs) based on their textual supporting evidence using deep learning techniques.
Materials and Methods:
A sentence transformer is employed to extract deep features of sentences used to classify sentence factuality using a naive Bayes classifier. For each fact and its supporting evidence in a source KG, the deep feature extractor and the classifier are used to quantify the factuality of each sentence which are then transformed to numerical values in before being averaged to get the confidence score of the fact.
Results:
The fact classification feature extractor enhances the separability of classes in the embedding space. This helped the fact classification model to achieve a better performance than existing factuality classification with hand-crafted features. Uncertainty quantification and modelling were demonstrated on SemMedDB by creating USemMedDB, showing KGB2U’s ability to process large BKGs. A subset of USemMedDB facts is modelled to demonstrate the correlation between the structure of the uncertain BKG and the confidence scores. The best-trained model is used to predict confidence scores of existing and unseen facts. The top-ranked unseen facts were grounded using scientific evidence showing KGB2U’s ability to discover new knowledge.
Conclusion:
Supporting literature of BKG facts can be used to automatically quantify their uncertainty. Additionally, the resulting uncertain biomedical KGs can be used for knowledge discovery. BKG2U interface and source code are available at http://biofunk.datanets.org/ and https://github.com/BahajAdil/KBG2U respectively.
{"title":"A step towards quantifying, modelling and exploring uncertainty in biomedical knowledge graphs","authors":"Adil Bahaj , Mounir Ghogho","doi":"10.1016/j.compbiomed.2024.109355","DOIUrl":"10.1016/j.compbiomed.2024.109355","url":null,"abstract":"<div><h3>Objective:</h3><div>This study aims at automatically quantifying and modelling the uncertainty of facts in biomedical knowledge graphs (BKGs) based on their textual supporting evidence using deep learning techniques.</div></div><div><h3>Materials and Methods:</h3><div>A sentence transformer is employed to extract deep features of sentences used to classify sentence factuality using a naive Bayes classifier. For each fact and its supporting evidence in a source KG, the deep feature extractor and the classifier are used to quantify the factuality of each sentence which are then transformed to numerical values in <span><math><mrow><mo>[</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>]</mo></mrow></math></span> before being averaged to get the confidence score of the fact.</div></div><div><h3>Results:</h3><div>The fact classification feature extractor enhances the separability of classes in the embedding space. This helped the fact classification model to achieve a better performance than existing factuality classification with hand-crafted features. Uncertainty quantification and modelling were demonstrated on SemMedDB by creating USemMedDB, showing KGB2U’s ability to process large BKGs. A subset of USemMedDB facts is modelled to demonstrate the correlation between the structure of the uncertain BKG and the confidence scores. The best-trained model is used to predict confidence scores of existing and unseen facts. The top-ranked unseen facts were grounded using scientific evidence showing KGB2U’s ability to discover new knowledge.</div></div><div><h3>Conclusion:</h3><div>Supporting literature of BKG facts can be used to automatically quantify their uncertainty. Additionally, the resulting uncertain biomedical KGs can be used for knowledge discovery. BKG2U interface and source code are available at <span><span>http://biofunk.datanets.org/</span><svg><path></path></svg></span> and <span><span>https://github.com/BahajAdil/KBG2U</span><svg><path></path></svg></span> respectively.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"184 ","pages":"Article 109355"},"PeriodicalIF":7.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142616391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-14DOI: 10.1016/j.compbiomed.2024.109306
Panteleimon Chriskos , Christos A. Frantzidis , Christina S. Plomariti , Emmanouil Papanastasiou , Athanasia Pataka , Chrysoula Kourtidou-Papadeli , Panagiotis D. Bamidis
Background and Objective:
Sleep is an essential biological function that is critical for a healthy and fulfilling life. Available sleep quality assessment tools contain long questionnaires covering a long period of time, not taking into account daily physical activity patterns and individual lifestyles.
Methods:
In this paper we present SmartHypnos, an Android application that supports low-end devices. It enables users to report their sleep quality, monitor their physical activity and exercise intensity and gain personalized recommendations aimed at increasing sleep quality. The application functionalities are implemented through sleep quality evaluation questions, passive step counter, efficient data storage and Personal data are stored locally protecting user privacy. All these are integrated into a single interface that requires a single device, is of low learning difficulty and easy to use. SmartHypnos was evaluated during a pilot study that involved 48 adults (ages 18-50) that used the application for seven days and subsequently submitted their data, possible through the interface directly, and evaluated the application through an appropriate questionnaire.
Results:
SmartHypnos was rated positively by users, especially it terms of learnability, ease of use and stability, with a mean score over 8. Task completion time and ease, simplicity, user comfort and recommendation utility were scored with a mean over 7. The correlation between the features extracted were in accordance to prior works.
Conclusions:
SmartHypnos has the potential to become a sleep monitoring and intervention tool readily available to the general public, including vulnerable populations of low socio-economic status.
{"title":"SmartHypnos: An Android application for low-cost sleep self-monitoring and personalized recommendation generation","authors":"Panteleimon Chriskos , Christos A. Frantzidis , Christina S. Plomariti , Emmanouil Papanastasiou , Athanasia Pataka , Chrysoula Kourtidou-Papadeli , Panagiotis D. Bamidis","doi":"10.1016/j.compbiomed.2024.109306","DOIUrl":"10.1016/j.compbiomed.2024.109306","url":null,"abstract":"<div><h3>Background and Objective:</h3><div>Sleep is an essential biological function that is critical for a healthy and fulfilling life. Available sleep quality assessment tools contain long questionnaires covering a long period of time, not taking into account daily physical activity patterns and individual lifestyles.</div></div><div><h3>Methods:</h3><div>In this paper we present SmartHypnos, an Android application that supports low-end devices. It enables users to report their sleep quality, monitor their physical activity and exercise intensity and gain personalized recommendations aimed at increasing sleep quality. The application functionalities are implemented through sleep quality evaluation questions, passive step counter, efficient data storage and Personal data are stored locally protecting user privacy. All these are integrated into a single interface that requires a single device, is of low learning difficulty and easy to use. SmartHypnos was evaluated during a pilot study that involved 48 adults (ages 18-50) that used the application for seven days and subsequently submitted their data, possible through the interface directly, and evaluated the application through an appropriate questionnaire.</div></div><div><h3>Results:</h3><div>SmartHypnos was rated positively by users, especially it terms of learnability, ease of use and stability, with a mean score over 8. Task completion time and ease, simplicity, user comfort and recommendation utility were scored with a mean over 7. The correlation between the features extracted were in accordance to prior works.</div></div><div><h3>Conclusions:</h3><div>SmartHypnos has the potential to become a sleep monitoring and intervention tool readily available to the general public, including vulnerable populations of low socio-economic status.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"184 ","pages":"Article 109306"},"PeriodicalIF":7.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142615701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-13DOI: 10.1016/j.compbiomed.2024.109417
Wanting Li , Yongxiu Yang , Guangfei Li , Félix Nieto-del-Amor , Gema Prats-Boluda , Javier Garcia-Casado , Yiyao Ye-Lin , Dongmei Hao
Preterm birth a common and severe pregnancy complications, causing significant health, development, and economic problems. Accurate diagnosis of imminent labor for women with threatened preterm labor (TPL) is crucial. Electrohysterography (EHG), which represents uterine myometrial electrical activity, is a potential tool for predicting preterm birth. Increased cell synchronization is fundamental to generating high-intensity and coordinated uterine myometrial electrical activity as labor approaches. The present work aimed to evaluate the synchronization measures from multichannel EHG signals to predict labor in less than 24 h (time to delivery, TTD <24 h vs. TTD≥24 h), and between imminent labor (TTD <1 week) and non-imminent labor (TTD≥1 week) in women with TPL. We computed three synchronization measures: the imaginary component of coherence, phase lag index, and weighted phase lag index (wPLI) within three specific frequency bandwidths (fast wave low (FWL): 0.1–0.34 Hz, fast wave high (FWH): 0.34–1 Hz, and whole bandwidth: 0.1–1 Hz) from 115 pregnant women (26–41 weeks of gestation). Our results revealed that multichannel EHG synchronization measures significantly increased closer to delivery (labor > non-labor, imminent > non-imminent). Indeed, wPLI in the FWH bandwidth exhibited a positive correlation with gestational age (p < 0.001,correlation coefficient = 0.35) and an inverse relationship with time to delivery (p < 0.001,correlation coefficient = −0.33). wPLI allows for better distinguishing imminent from non-imminent in women with TPL, especially for those electrode pairs in the vertical direction, which has been reported as the predominant direction of uterine activity propagation. The three synchronization measures computed in FWL and FWH bandwidth provided complementary information for predicting labor in less than 24 h and also imminent labor in women with TPL, achieving an F1-score of 93 % (84.2–93 %) and 99.5 % (85.2–99.5 %) respectively. Our results suggest that EHG synchronization analysis constitutes a new sensitive metrics to discriminate imminent labor which can be potentially used for improving preterm birth prediction and understand uterine electrical activity dynamics.
{"title":"Synchronization study of electrohysterography for discrimination of imminent delivery in pregnant women with threatened preterm labor","authors":"Wanting Li , Yongxiu Yang , Guangfei Li , Félix Nieto-del-Amor , Gema Prats-Boluda , Javier Garcia-Casado , Yiyao Ye-Lin , Dongmei Hao","doi":"10.1016/j.compbiomed.2024.109417","DOIUrl":"10.1016/j.compbiomed.2024.109417","url":null,"abstract":"<div><div>Preterm birth a common and severe pregnancy complications, causing significant health, development, and economic problems. Accurate diagnosis of imminent labor for women with threatened preterm labor (TPL) is crucial. Electrohysterography (EHG), which represents uterine myometrial electrical activity, is a potential tool for predicting preterm birth. Increased cell synchronization is fundamental to generating high-intensity and coordinated uterine myometrial electrical activity as labor approaches. The present work aimed to evaluate the synchronization measures from multichannel EHG signals to predict labor in less than 24 h (time to delivery, TTD <24 h vs. TTD≥24 h), and between imminent labor (TTD <1 week) and non-imminent labor (TTD≥1 week) in women with TPL. We computed three synchronization measures: the imaginary component of coherence, phase lag index, and weighted phase lag index (wPLI) within three specific frequency bandwidths (fast wave low (FWL): 0.1–0.34 Hz, fast wave high (FWH): 0.34–1 Hz, and whole bandwidth: 0.1–1 Hz) from 115 pregnant women (26–41 weeks of gestation). Our results revealed that multichannel EHG synchronization measures significantly increased closer to delivery (labor > non-labor, imminent > non-imminent). Indeed, wPLI in the FWH bandwidth exhibited a positive correlation with gestational age (<em>p</em> < 0.001,correlation coefficient = 0.35) and an inverse relationship with time to delivery (<em>p</em> < 0.001,correlation coefficient = −0.33). wPLI allows for better distinguishing imminent from non-imminent in women with TPL, especially for those electrode pairs in the vertical direction, which has been reported as the predominant direction of uterine activity propagation. The three synchronization measures computed in FWL and FWH bandwidth provided complementary information for predicting labor in less than 24 h and also imminent labor in women with TPL, achieving an F1-score of 93 % (84.2–93 %) and 99.5 % (85.2–99.5 %) respectively. Our results suggest that EHG synchronization analysis constitutes a new sensitive metrics to discriminate imminent labor which can be potentially used for improving preterm birth prediction and understand uterine electrical activity dynamics.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"184 ","pages":"Article 109417"},"PeriodicalIF":7.0,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142615770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}