Pub Date : 2024-07-11DOI: 10.1186/s13040-024-00374-0
Emily R Hannon, Carmen J Marsit, Arlene E Dent, Paula Embury, Sidney Ogolla, David Midem, Scott M Williams, James W Kazura
Background: Changing cell-type proportions can confound studies of differential gene expression or DNA methylation (DNAm) from peripheral blood mononuclear cells (PBMCs). We examined how cell-type proportions derived from the transcriptome versus the methylome (DNAm) influence estimates of differentially expressed genes (DEGs) and differentially methylated positions (DMPs).
Methods: Transcriptome and DNAm data were obtained from PBMC RNA and DNA of Kenyan children (n = 8) before, during, and 6 weeks following uncomplicated malaria. DEGs and DMPs between time points were detected using cell-type adjusted modeling with Cibersortx or IDOL, respectively.
Results: Most major cell types and principal components had moderate to high correlation between the two deconvolution methods (r = 0.60-0.96). Estimates of cell-type proportions and DEGs or DMPs were largely unaffected by the method, with the greatest discrepancy in the estimation of neutrophils.
Conclusion: Variation in cell-type proportions is captured similarly by both transcriptomic and methylome deconvolution methods for most major cell types.
{"title":"Transcriptome- and DNA methylation-based cell-type deconvolutions produce similar estimates of differential gene expression and differential methylation.","authors":"Emily R Hannon, Carmen J Marsit, Arlene E Dent, Paula Embury, Sidney Ogolla, David Midem, Scott M Williams, James W Kazura","doi":"10.1186/s13040-024-00374-0","DOIUrl":"10.1186/s13040-024-00374-0","url":null,"abstract":"<p><strong>Background: </strong>Changing cell-type proportions can confound studies of differential gene expression or DNA methylation (DNAm) from peripheral blood mononuclear cells (PBMCs). We examined how cell-type proportions derived from the transcriptome versus the methylome (DNAm) influence estimates of differentially expressed genes (DEGs) and differentially methylated positions (DMPs).</p><p><strong>Methods: </strong>Transcriptome and DNAm data were obtained from PBMC RNA and DNA of Kenyan children (n = 8) before, during, and 6 weeks following uncomplicated malaria. DEGs and DMPs between time points were detected using cell-type adjusted modeling with Cibersortx or IDOL, respectively.</p><p><strong>Results: </strong>Most major cell types and principal components had moderate to high correlation between the two deconvolution methods (r = 0.60-0.96). Estimates of cell-type proportions and DEGs or DMPs were largely unaffected by the method, with the greatest discrepancy in the estimation of neutrophils.</p><p><strong>Conclusion: </strong>Variation in cell-type proportions is captured similarly by both transcriptomic and methylome deconvolution methods for most major cell types.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"21"},"PeriodicalIF":4.0,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11241886/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141591813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1186/s13040-024-00369-x
Lin Wang, Jiaming Su, Zhongjie Liu, Shaowei Ding, Yaotan Li, Baoluo Hou, Yuxin Hu, Zhaoxi Dong, Jingyi Tang, Hongfang Liu, Weijing Liu
<p><strong>Background: </strong>Diabetic nephropathy (DN) is a major microvascular complication of diabetes and has become the leading cause of end-stage renal disease worldwide. A considerable number of DN patients have experienced irreversible end-stage renal disease progression due to the inability to diagnose the disease early. Therefore, reliable biomarkers that are helpful for early diagnosis and treatment are identified. The migration of immune cells to the kidney is considered to be a key step in the progression of DN-related vascular injury. Therefore, finding markers in this process may be more helpful for the early diagnosis and progression prediction of DN.</p><p><strong>Methods: </strong>The gene chip data were retrieved from the GEO database using the search term ' diabetic nephropathy '. The ' limma ' software package was used to identify differentially expressed genes (DEGs) between DN and control samples. Gene set enrichment analysis (GSEA) was performed on genes obtained from the molecular characteristic database (MSigDB. The R package 'WGCNA' was used to identify gene modules associated with tubulointerstitial injury in DN, and it was crossed with immune-related DEGs to identify target genes. Gene ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed on differentially expressed genes using the 'ClusterProfiler' software package in R. Three methods, least absolute shrinkage and selection operator (LASSO), support vector machine recursive feature elimination (SVM-RFE) and random forest (RF), were used to select immune-related biomarkers for diagnosis. We retrieved the tubulointerstitial dataset from the Nephroseq database to construct an external validation dataset. Unsupervised clustering analysis of the expression levels of immune-related biomarkers was performed using the 'ConsensusClusterPlus 'R software package. The urine of patients who visited Dongzhimen Hospital of Beijing University of Chinese Medicine from September 2021 to March 2023 was collected, and Elisa was used to detect the mRNA expression level of immune-related biomarkers in urine. Pearson correlation analysis was used to detect the effect of immune-related biomarker expression on renal function in DN patients.</p><p><strong>Results: </strong>Four microarray datasets from the GEO database are included in the analysis : GSE30122, GSE47185, GSE99340 and GSE104954. These datasets included 63 DN patients and 55 healthy controls. A total of 9415 genes were detected in the data set. We found 153 differentially expressed immune-related genes, of which 112 genes were up-regulated, 41 genes were down-regulated, and 119 overlapping genes were identified. GO analysis showed that they were involved in various biological processes including leukocyte-mediated immunity. KEGG analysis showed that these target genes were mainly involved in the formation of phagosomes in Staphylococcus aureus infection. Among these
{"title":"Identification of immune-associated biomarkers of diabetes nephropathy tubulointerstitial injury based on machine learning: a bioinformatics multi-chip integrated analysis.","authors":"Lin Wang, Jiaming Su, Zhongjie Liu, Shaowei Ding, Yaotan Li, Baoluo Hou, Yuxin Hu, Zhaoxi Dong, Jingyi Tang, Hongfang Liu, Weijing Liu","doi":"10.1186/s13040-024-00369-x","DOIUrl":"10.1186/s13040-024-00369-x","url":null,"abstract":"<p><strong>Background: </strong>Diabetic nephropathy (DN) is a major microvascular complication of diabetes and has become the leading cause of end-stage renal disease worldwide. A considerable number of DN patients have experienced irreversible end-stage renal disease progression due to the inability to diagnose the disease early. Therefore, reliable biomarkers that are helpful for early diagnosis and treatment are identified. The migration of immune cells to the kidney is considered to be a key step in the progression of DN-related vascular injury. Therefore, finding markers in this process may be more helpful for the early diagnosis and progression prediction of DN.</p><p><strong>Methods: </strong>The gene chip data were retrieved from the GEO database using the search term ' diabetic nephropathy '. The ' limma ' software package was used to identify differentially expressed genes (DEGs) between DN and control samples. Gene set enrichment analysis (GSEA) was performed on genes obtained from the molecular characteristic database (MSigDB. The R package 'WGCNA' was used to identify gene modules associated with tubulointerstitial injury in DN, and it was crossed with immune-related DEGs to identify target genes. Gene ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed on differentially expressed genes using the 'ClusterProfiler' software package in R. Three methods, least absolute shrinkage and selection operator (LASSO), support vector machine recursive feature elimination (SVM-RFE) and random forest (RF), were used to select immune-related biomarkers for diagnosis. We retrieved the tubulointerstitial dataset from the Nephroseq database to construct an external validation dataset. Unsupervised clustering analysis of the expression levels of immune-related biomarkers was performed using the 'ConsensusClusterPlus 'R software package. The urine of patients who visited Dongzhimen Hospital of Beijing University of Chinese Medicine from September 2021 to March 2023 was collected, and Elisa was used to detect the mRNA expression level of immune-related biomarkers in urine. Pearson correlation analysis was used to detect the effect of immune-related biomarker expression on renal function in DN patients.</p><p><strong>Results: </strong>Four microarray datasets from the GEO database are included in the analysis : GSE30122, GSE47185, GSE99340 and GSE104954. These datasets included 63 DN patients and 55 healthy controls. A total of 9415 genes were detected in the data set. We found 153 differentially expressed immune-related genes, of which 112 genes were up-regulated, 41 genes were down-regulated, and 119 overlapping genes were identified. GO analysis showed that they were involved in various biological processes including leukocyte-mediated immunity. KEGG analysis showed that these target genes were mainly involved in the formation of phagosomes in Staphylococcus aureus infection. Among these","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"20"},"PeriodicalIF":4.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11218417/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141477779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-26DOI: 10.1186/s13040-024-00372-2
Yunfei Yin, Zheng Yuan, Islam Md Tanvir, Xianjian Bao
The loss of electronic medical records has seriously affected the practical application of biomedical data. Therefore, it is a meaningful research effort to effectively fill these lost data. Currently, state-of-the-art methods focus on using Generative Adversarial Networks (GANs) to fill the missing values of electronic medical records, achieving breakthrough progress. However, when facing datasets with high missing rates, the imputation accuracy of these methods sharply deceases. This motivates us to explore the uncertainty of GANs and improve the GAN-based imputation methods. In this paper, the GRUD (Gate Recurrent Unit Decay) network and the UGAN (Uncertainty Generative Adversarial Network) are proposed and organically combined, called UGAN-GRUD. In UGAN-GRUD, it highlights using GAN to generate imputation values and then leveraging GRUD to compensate them. We have designed the UGAN and the GRUD network. The former is employed to learn the distribution pattern and uncertainty of data through the Generator and Discriminator, iteratively. The latter is exploited to compensate the former by leveraging the GRUD based on time decay factor, which can learn the specific temporal relations in electronic medical records. Through experimental research on publicly available biomedical datasets, the results show that UGAN-GRUD outperforms the current state-of-the-art methods, with average 13% RMSE (Root Mean Squared Error) and 24.5% MAPE (Mean Absolute Percentage Error) improvements.
电子病历的丢失严重影响了生物医学数据的实际应用。因此,有效填补这些丢失的数据是一项有意义的研究工作。目前,最先进的方法主要是使用生成对抗网络(GAN)来填补电子病历的缺失值,并取得了突破性进展。然而,当面对高缺失率的数据集时,这些方法的估算准确性会急剧下降。这促使我们探索 GAN 的不确定性,并改进基于 GAN 的估算方法。本文提出 GRUD(门递归单元衰减)网络和 UGAN(不确定性生成对抗网络),并将其有机地结合起来,称为 UGAN-GRUD。在 UGAN-GRUD 中,它强调使用 GAN 生成估算值,然后利用 GRUD 对其进行补偿。我们设计了 UGAN 和 GRUD 网络。前者通过生成器和判别器反复学习数据的分布模式和不确定性。后者则利用基于时间衰减因子的 GRUD 来弥补前者的不足,后者可以学习电子病历中的特定时间关系。通过对公开生物医学数据集的实验研究,结果表明 UGAN-GRUD 优于目前最先进的方法,平均 RMSE(均方根误差)提高了 13%,MAPE(平均绝对误差)提高了 24.5%。
{"title":"Electronic medical records imputation by temporal Generative Adversarial Network.","authors":"Yunfei Yin, Zheng Yuan, Islam Md Tanvir, Xianjian Bao","doi":"10.1186/s13040-024-00372-2","DOIUrl":"10.1186/s13040-024-00372-2","url":null,"abstract":"<p><p>The loss of electronic medical records has seriously affected the practical application of biomedical data. Therefore, it is a meaningful research effort to effectively fill these lost data. Currently, state-of-the-art methods focus on using Generative Adversarial Networks (GANs) to fill the missing values of electronic medical records, achieving breakthrough progress. However, when facing datasets with high missing rates, the imputation accuracy of these methods sharply deceases. This motivates us to explore the uncertainty of GANs and improve the GAN-based imputation methods. In this paper, the GRUD (Gate Recurrent Unit Decay) network and the UGAN (Uncertainty Generative Adversarial Network) are proposed and organically combined, called UGAN-GRUD. In UGAN-GRUD, it highlights using GAN to generate imputation values and then leveraging GRUD to compensate them. We have designed the UGAN and the GRUD network. The former is employed to learn the distribution pattern and uncertainty of data through the Generator and Discriminator, iteratively. The latter is exploited to compensate the former by leveraging the GRUD based on time decay factor, which can learn the specific temporal relations in electronic medical records. Through experimental research on publicly available biomedical datasets, the results show that UGAN-GRUD outperforms the current state-of-the-art methods, with average 13% RMSE (Root Mean Squared Error) and 24.5% MAPE (Mean Absolute Percentage Error) improvements.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"19"},"PeriodicalIF":4.0,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11202349/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141460183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-22DOI: 10.1186/s13040-024-00370-4
Yusuf Brima, Marcellin Atemkeng
Deep learning shows great promise for medical image analysis but often lacks explainability, hindering its adoption in healthcare. Attribution techniques that explain model reasoning can potentially increase trust in deep learning among clinical stakeholders. In the literature, much of the research on attribution in medical imaging focuses on visual inspection rather than statistical quantitative analysis.In this paper, we proposed an image-based saliency framework to enhance the explainability of deep learning models in medical image analysis. We use adaptive path-based gradient integration, gradient-free techniques, and class activation mapping along with its derivatives to attribute predictions from brain tumor MRI and COVID-19 chest X-ray datasets made by recent deep convolutional neural network models.The proposed framework integrates qualitative and statistical quantitative assessments, employing Accuracy Information Curves (AICs) and Softmax Information Curves (SICs) to measure the effectiveness of saliency methods in retaining critical image information and their correlation with model predictions. Visual inspections indicate that methods such as ScoreCAM, XRAI, GradCAM, and GradCAM++ consistently produce focused and clinically interpretable attribution maps. These methods highlighted possible biomarkers, exposed model biases, and offered insights into the links between input features and predictions, demonstrating their ability to elucidate model reasoning on these datasets. Empirical evaluations reveal that ScoreCAM and XRAI are particularly effective in retaining relevant image regions, as reflected in their higher AUC values. However, SICs highlight variability, with instances of random saliency masks outperforming established methods, emphasizing the need for combining visual and empirical metrics for a comprehensive evaluation.The results underscore the importance of selecting appropriate saliency methods for specific medical imaging tasks and suggest that combining qualitative and quantitative approaches can enhance the transparency, trustworthiness, and clinical adoption of deep learning models in healthcare. This study advances model explainability to increase trust in deep learning among healthcare stakeholders by revealing the rationale behind predictions. Future research should refine empirical metrics for stability and reliability, include more diverse imaging modalities, and focus on improving model explainability to support clinical decision-making.
{"title":"Saliency-driven explainable deep learning in medical imaging: bridging visual explainability and statistical quantitative analysis.","authors":"Yusuf Brima, Marcellin Atemkeng","doi":"10.1186/s13040-024-00370-4","DOIUrl":"10.1186/s13040-024-00370-4","url":null,"abstract":"<p><p>Deep learning shows great promise for medical image analysis but often lacks explainability, hindering its adoption in healthcare. Attribution techniques that explain model reasoning can potentially increase trust in deep learning among clinical stakeholders. In the literature, much of the research on attribution in medical imaging focuses on visual inspection rather than statistical quantitative analysis.In this paper, we proposed an image-based saliency framework to enhance the explainability of deep learning models in medical image analysis. We use adaptive path-based gradient integration, gradient-free techniques, and class activation mapping along with its derivatives to attribute predictions from brain tumor MRI and COVID-19 chest X-ray datasets made by recent deep convolutional neural network models.The proposed framework integrates qualitative and statistical quantitative assessments, employing Accuracy Information Curves (AICs) and Softmax Information Curves (SICs) to measure the effectiveness of saliency methods in retaining critical image information and their correlation with model predictions. Visual inspections indicate that methods such as ScoreCAM, XRAI, GradCAM, and GradCAM++ consistently produce focused and clinically interpretable attribution maps. These methods highlighted possible biomarkers, exposed model biases, and offered insights into the links between input features and predictions, demonstrating their ability to elucidate model reasoning on these datasets. Empirical evaluations reveal that ScoreCAM and XRAI are particularly effective in retaining relevant image regions, as reflected in their higher AUC values. However, SICs highlight variability, with instances of random saliency masks outperforming established methods, emphasizing the need for combining visual and empirical metrics for a comprehensive evaluation.The results underscore the importance of selecting appropriate saliency methods for specific medical imaging tasks and suggest that combining qualitative and quantitative approaches can enhance the transparency, trustworthiness, and clinical adoption of deep learning models in healthcare. This study advances model explainability to increase trust in deep learning among healthcare stakeholders by revealing the rationale behind predictions. Future research should refine empirical metrics for stability and reliability, include more diverse imaging modalities, and focus on improving model explainability to support clinical decision-making.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"18"},"PeriodicalIF":4.0,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11193223/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141440989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-18DOI: 10.1186/s13040-024-00371-3
Zhiping Paul Wang, Priyanka Bhandary, Yizhou Wang, Jason H Moore
GPT-4, as the most advanced version of OpenAI's large language models, has attracted widespread attention, rapidly becoming an indispensable AI tool across various areas. This includes its exploration by scientists for diverse applications. Our study focused on assessing GPT-4's capabilities in generating text, tables, and diagrams for biomedical review papers. We also assessed the consistency in text generation by GPT-4, along with potential plagiarism issues when employing this model for the composition of scientific review papers. Based on the results, we suggest the development of enhanced functionalities in ChatGPT, aiming to meet the needs of the scientific community more effectively. This includes enhancements in uploaded document processing for reference materials, a deeper grasp of intricate biomedical concepts, more precise and efficient information distillation for table generation, and a further refined model specifically tailored for scientific diagram creation.
{"title":"Using GPT-4 to write a scientific review article: a pilot evaluation study.","authors":"Zhiping Paul Wang, Priyanka Bhandary, Yizhou Wang, Jason H Moore","doi":"10.1186/s13040-024-00371-3","DOIUrl":"10.1186/s13040-024-00371-3","url":null,"abstract":"<p><p>GPT-4, as the most advanced version of OpenAI's large language models, has attracted widespread attention, rapidly becoming an indispensable AI tool across various areas. This includes its exploration by scientists for diverse applications. Our study focused on assessing GPT-4's capabilities in generating text, tables, and diagrams for biomedical review papers. We also assessed the consistency in text generation by GPT-4, along with potential plagiarism issues when employing this model for the composition of scientific review papers. Based on the results, we suggest the development of enhanced functionalities in ChatGPT, aiming to meet the needs of the scientific community more effectively. This includes enhancements in uploaded document processing for reference materials, a deeper grasp of intricate biomedical concepts, more precise and efficient information distillation for table generation, and a further refined model specifically tailored for scientific diagram creation.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"16"},"PeriodicalIF":4.5,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11184879/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141421566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-11DOI: 10.1186/s13040-024-00368-y
Carolina Del-Valle-Soto, Ramon A Briseño, Leonardo J Valdivia, Juan Arturo Nolazco-Flores
The development of neuroscientific techniques enabling the recording of brain and peripheral nervous system activity has fueled research in cognitive science. Recent technological advancements offer new possibilities for inducing behavioral change, particularly through cost-effective Internet-based interventions. However, limitations in laboratory equipment volume have hindered the generalization of results to real-life contexts. The advent of Internet of Things (IoT) devices, such as wearables, equipped with sensors and microchips, has ushered in a new era in behavior change techniques. Wearables, including smartwatches, electronic tattoos, and more, are poised for massive adoption, with an expected annual growth rate of 55% over the next five years. These devices enable personalized instructions, leading to increased productivity and efficiency, particularly in industrial production. Additionally, the healthcare sector has seen a significant demand for wearables, with over 80% of global consumers willing to use them for health monitoring. This research explores the primary biometric applications of wearables and their impact on users' well-being, focusing on the integration of behavior change techniques facilitated by IoT devices. Wearables have revolutionized health monitoring by providing real-time feedback, personalized interventions, and gamification. They encourage positive behavior changes by delivering immediate feedback, tailored recommendations, and gamified experiences, leading to sustained improvements in health. Furthermore, wearables seamlessly integrate with digital platforms, enhancing their impact through social support and connectivity. However, privacy and data security concerns must be addressed to maintain users' trust. As technology continues to advance, the refinement of IoT devices' design and functionality is crucial for promoting behavior change and improving health outcomes. This study aims to investigate the effects of behavior change techniques facilitated by wearables on individuals' health outcomes and the role of wearables in promoting a healthier lifestyle.
{"title":"Unveiling wearables: exploring the global landscape of biometric applications and vital signs and behavioral impact.","authors":"Carolina Del-Valle-Soto, Ramon A Briseño, Leonardo J Valdivia, Juan Arturo Nolazco-Flores","doi":"10.1186/s13040-024-00368-y","DOIUrl":"10.1186/s13040-024-00368-y","url":null,"abstract":"<p><p>The development of neuroscientific techniques enabling the recording of brain and peripheral nervous system activity has fueled research in cognitive science. Recent technological advancements offer new possibilities for inducing behavioral change, particularly through cost-effective Internet-based interventions. However, limitations in laboratory equipment volume have hindered the generalization of results to real-life contexts. The advent of Internet of Things (IoT) devices, such as wearables, equipped with sensors and microchips, has ushered in a new era in behavior change techniques. Wearables, including smartwatches, electronic tattoos, and more, are poised for massive adoption, with an expected annual growth rate of 55% over the next five years. These devices enable personalized instructions, leading to increased productivity and efficiency, particularly in industrial production. Additionally, the healthcare sector has seen a significant demand for wearables, with over 80% of global consumers willing to use them for health monitoring. This research explores the primary biometric applications of wearables and their impact on users' well-being, focusing on the integration of behavior change techniques facilitated by IoT devices. Wearables have revolutionized health monitoring by providing real-time feedback, personalized interventions, and gamification. They encourage positive behavior changes by delivering immediate feedback, tailored recommendations, and gamified experiences, leading to sustained improvements in health. Furthermore, wearables seamlessly integrate with digital platforms, enhancing their impact through social support and connectivity. However, privacy and data security concerns must be addressed to maintain users' trust. As technology continues to advance, the refinement of IoT devices' design and functionality is crucial for promoting behavior change and improving health outcomes. This study aims to investigate the effects of behavior change techniques facilitated by wearables on individuals' health outcomes and the role of wearables in promoting a healthier lifestyle.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"15"},"PeriodicalIF":4.5,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11165804/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141307145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A knowledge graph can effectively showcase the essential characteristics of data and is increasingly emerging as a significant means of integrating information in the field of artificial intelligence. Coronary artery plaque represents a significant etiology of cardiovascular events, posing a diagnostic challenge for clinicians who are confronted with a multitude of nonspecific symptoms. To visualize the hierarchical relationship network graph of the molecular mechanisms underlying plaque properties and symptom phenotypes, patient symptomatology was extracted from electronic health record data from real-world clinical settings. Phenotypic networks were constructed utilizing clinical data and protein‒protein interaction networks. Machine learning techniques, including convolutional neural networks, Dijkstra's algorithm, and gene ontology semantic similarity, were employed to quantify clinical and biological features within the network. The resulting features were then utilized to train a K-nearest neighbor model, yielding 23 symptoms, 41 association rules, and 61 hub genes across the three types of plaques studied, achieving an area under the curve of 92.5%. Weighted correlation network analysis and pathway enrichment were subsequently utilized to identify lipid status-related genes and inflammation-associated pathways that could help explain the differences in plaque properties. To confirm the validity of the network graph model, we conducted coexpression analysis of the hub genes to evaluate their potential diagnostic value. Additionally, we investigated immune cell infiltration, examined the correlations between hub genes and immune cells, and validated the reliability of the identified biological pathways. By integrating clinical data and molecular network information, this biomedical knowledge graph model effectively elucidated the potential molecular mechanisms that collude symptoms, diseases, and molecules.
知识图谱可以有效地展示数据的基本特征,并日益成为人工智能领域整合信息的重要手段。冠状动脉斑块是心血管事件的一个重要病因,给临床医生带来了诊断上的挑战,因为他们要面对众多非特异性症状。为了可视化斑块特性和症状表型的分子机制的层次关系网络图,我们从真实世界临床环境的电子健康记录数据中提取了患者症状。利用临床数据和蛋白质-蛋白质相互作用网络构建了表型网络。采用卷积神经网络、Dijkstra 算法和基因本体语义相似性等机器学习技术来量化网络中的临床和生物特征。然后利用由此产生的特征来训练 K 最近邻模型,在研究的三种斑块中得出了 23 种症状、41 条关联规则和 61 个中心基因,曲线下面积达到 92.5%。随后,研究人员利用加权相关网络分析和通路富集来确定与脂质状态相关的基因和与炎症相关的通路,这些基因和通路有助于解释斑块特性的差异。为了证实网络图模型的有效性,我们对中心基因进行了共表达分析,以评估其潜在的诊断价值。此外,我们还调查了免疫细胞浸润情况,研究了枢纽基因与免疫细胞之间的相关性,并验证了所识别生物通路的可靠性。通过整合临床数据和分子网络信息,该生物医学知识图谱模型有效地阐明了症状、疾病和分子之间的潜在分子机制。
{"title":"The biomedical knowledge graph of symptom phenotype in coronary artery plaque: machine learning-based analysis of real-world clinical data.","authors":"Jia-Ming Huan, Xiao-Jie Wang, Yuan Li, Shi-Jun Zhang, Yuan-Long Hu, Yun-Lun Li","doi":"10.1186/s13040-024-00365-1","DOIUrl":"10.1186/s13040-024-00365-1","url":null,"abstract":"<p><p>A knowledge graph can effectively showcase the essential characteristics of data and is increasingly emerging as a significant means of integrating information in the field of artificial intelligence. Coronary artery plaque represents a significant etiology of cardiovascular events, posing a diagnostic challenge for clinicians who are confronted with a multitude of nonspecific symptoms. To visualize the hierarchical relationship network graph of the molecular mechanisms underlying plaque properties and symptom phenotypes, patient symptomatology was extracted from electronic health record data from real-world clinical settings. Phenotypic networks were constructed utilizing clinical data and protein‒protein interaction networks. Machine learning techniques, including convolutional neural networks, Dijkstra's algorithm, and gene ontology semantic similarity, were employed to quantify clinical and biological features within the network. The resulting features were then utilized to train a K-nearest neighbor model, yielding 23 symptoms, 41 association rules, and 61 hub genes across the three types of plaques studied, achieving an area under the curve of 92.5%. Weighted correlation network analysis and pathway enrichment were subsequently utilized to identify lipid status-related genes and inflammation-associated pathways that could help explain the differences in plaque properties. To confirm the validity of the network graph model, we conducted coexpression analysis of the hub genes to evaluate their potential diagnostic value. Additionally, we investigated immune cell infiltration, examined the correlations between hub genes and immune cells, and validated the reliability of the identified biological pathways. By integrating clinical data and molecular network information, this biomedical knowledge graph model effectively elucidated the potential molecular mechanisms that collude symptoms, diseases, and molecules.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"13"},"PeriodicalIF":4.5,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11110203/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141077027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent researches have found a strong correlation between the triglyceride-glucose (TyG) index or the atherogenic index of plasma (AIP) and cardiovascular disease (CVD) risk. However, there is a lack of research on non-invasive and rapid prediction of cardiovascular risk. We aimed to develop and validate a machine-learning model for predicting cardiovascular risk based on variables encompassing clinical questionnaires and oculomics. We collected data from the Korean National Health and Nutrition Examination Survey (KNHANES). The training dataset (80% from the year 2008 to 2011 KNHANES) was used for machine learning model development, with internal validation using the remaining 20%. An external validation dataset from the year 2012 assessed the model’s predictive capacity for TyG-index or AIP in new cases. We included 32122 participants in the final dataset. Machine learning models used 25 algorithms were trained on oculomics measurements and clinical questionnaires to predict the range of TyG-index and AIP. The area under the receiver operating characteristic curve (AUC), accuracy, precision, recall, and F1 score were used to evaluate the performance of our machine learning models. Based on large-scale cohort studies, we determined TyG-index cut-off points at 8.0, 8.75 (upper one-third values), 8.93 (upper one-fourth values), and AIP cut-offs at 0.318, 0.34. Values surpassing these thresholds indicated elevated cardiovascular risk. The best-performing algorithm revealed TyG-index cut-offs at 8.0, 8.75, and 8.93 with internal validation AUCs of 0.812, 0.873, and 0.911, respectively. External validation AUCs were 0.809, 0.863, and 0.901. For AIP at 0.34, internal and external validation achieved similar AUCs of 0.849 and 0.842. Slightly lower performance was seen for the 0.318 cut-off, with AUCs of 0.844 and 0.836. Significant gender-based variations were noted for TyG-index at 8 (male AUC=0.832, female AUC=0.790) and 8.75 (male AUC=0.874, female AUC=0.862) and AIP at 0.318 (male AUC=0.853, female AUC=0.825) and 0.34 (male AUC=0.858, female AUC=0.831). Gender similarity in AUC (male AUC=0.907 versus female AUC=0.906) was observed only when the TyG-index cut-off point equals 8.93. We have established a simple and effective non-invasive machine learning model that has good clinical value for predicting cardiovascular risk in the general population.
{"title":"Machine-learning-based models to predict cardiovascular risk using oculomics and clinic variables in KNHANES","authors":"Yuqi Zhang, Sijin Li, Weijie Wu, Yanqing Zhao, Jintao Han, Chao Tong, Niansang Luo, Kun Zhang","doi":"10.1186/s13040-024-00363-3","DOIUrl":"https://doi.org/10.1186/s13040-024-00363-3","url":null,"abstract":"Recent researches have found a strong correlation between the triglyceride-glucose (TyG) index or the atherogenic index of plasma (AIP) and cardiovascular disease (CVD) risk. However, there is a lack of research on non-invasive and rapid prediction of cardiovascular risk. We aimed to develop and validate a machine-learning model for predicting cardiovascular risk based on variables encompassing clinical questionnaires and oculomics. We collected data from the Korean National Health and Nutrition Examination Survey (KNHANES). The training dataset (80% from the year 2008 to 2011 KNHANES) was used for machine learning model development, with internal validation using the remaining 20%. An external validation dataset from the year 2012 assessed the model’s predictive capacity for TyG-index or AIP in new cases. We included 32122 participants in the final dataset. Machine learning models used 25 algorithms were trained on oculomics measurements and clinical questionnaires to predict the range of TyG-index and AIP. The area under the receiver operating characteristic curve (AUC), accuracy, precision, recall, and F1 score were used to evaluate the performance of our machine learning models. Based on large-scale cohort studies, we determined TyG-index cut-off points at 8.0, 8.75 (upper one-third values), 8.93 (upper one-fourth values), and AIP cut-offs at 0.318, 0.34. Values surpassing these thresholds indicated elevated cardiovascular risk. The best-performing algorithm revealed TyG-index cut-offs at 8.0, 8.75, and 8.93 with internal validation AUCs of 0.812, 0.873, and 0.911, respectively. External validation AUCs were 0.809, 0.863, and 0.901. For AIP at 0.34, internal and external validation achieved similar AUCs of 0.849 and 0.842. Slightly lower performance was seen for the 0.318 cut-off, with AUCs of 0.844 and 0.836. Significant gender-based variations were noted for TyG-index at 8 (male AUC=0.832, female AUC=0.790) and 8.75 (male AUC=0.874, female AUC=0.862) and AIP at 0.318 (male AUC=0.853, female AUC=0.825) and 0.34 (male AUC=0.858, female AUC=0.831). Gender similarity in AUC (male AUC=0.907 versus female AUC=0.906) was observed only when the TyG-index cut-off point equals 8.93. We have established a simple and effective non-invasive machine learning model that has good clinical value for predicting cardiovascular risk in the general population.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"114 1","pages":""},"PeriodicalIF":4.5,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140634495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-17DOI: 10.1186/s13040-024-00362-4
Selcen Ari Yuka, Alper Yilmaz
Competing endogenous RNAs play key roles in cellular molecular mechanisms through cross-talk in post-transcriptional interactions. Studies on ceRNA cross-talk, which is particularly dependent on the abundance of free transcripts, generally involve large- and small-scale studies involving the integration of transcriptomic data from tissues and correlation analyses. This abundance-dependent nature of ceRNA interactions suggests that tissue- and condition-specific ceRNA dynamics may fluctuate. However, there are no comprehensive studies investigating the ceRNA interactions in normal tissue, ceRNAs that are lost and/or appear in cancerous tissues or their interactions. In this study, we comprehensively analyzed the tumor-specific ceRNA fluctuations observed in the three highest-incidence cancers, LUAD, PRAD, and BRCA, compared to healthy lung, prostate, and breast tissues, respectively. Our observations pertaining to tumor-specific competing endogenous RNA (ceRNA) interactions revealed that, in the cases of lung adenocarcinoma (LUAD), prostate adenocarcinoma (PRAD), and breast invasive carcinoma (BRCA), 3,204, 1,233, and 406 ceRNAs, respectively, engage in post-transcriptional intercommunication within tumor tissues, in contrast to their absence in corresponding healthy samples. We also found that 90 ceRNAs are shared by the three cancer types and that these ceRNAs participate in ceRNA interactions in tumor tissues compared to those in normal tissues. Among the 90 ceRNAs that directly interact with miRNAs, we uncovered a core network of 165 miRNAs and 63 ceRNAs that should be considered in RNA-targeted and RNA-mediated approaches in future studies and could be used in these three aggressive cancer types. More specifically, in this core interaction network, ceRNAs such as GALNT7, KLF9, and DAB2 and miRNAs like miR-106a/b-5p, miR-20a-5p, and miR-519d-3p may have potential as common targets in the three critical cancers. In contrast to conventional methods that construct ceRNA networks using differentially expressed genes compared to normal tissues, our proposed approach identifies ceRNA players by considering their context within the ceRNA:miRNA interactions. Our results have the potential to reveal distinct and common ceRNA interactions in cancer types and to pinpoint critical RNAs, thereby paving the way for RNA-based strategies in the battle against cancer.
{"title":"Decoding dynamic miRNA:ceRNA interactions unveils therapeutic insights and targets across predominant cancer landscapes","authors":"Selcen Ari Yuka, Alper Yilmaz","doi":"10.1186/s13040-024-00362-4","DOIUrl":"https://doi.org/10.1186/s13040-024-00362-4","url":null,"abstract":"Competing endogenous RNAs play key roles in cellular molecular mechanisms through cross-talk in post-transcriptional interactions. Studies on ceRNA cross-talk, which is particularly dependent on the abundance of free transcripts, generally involve large- and small-scale studies involving the integration of transcriptomic data from tissues and correlation analyses. This abundance-dependent nature of ceRNA interactions suggests that tissue- and condition-specific ceRNA dynamics may fluctuate. However, there are no comprehensive studies investigating the ceRNA interactions in normal tissue, ceRNAs that are lost and/or appear in cancerous tissues or their interactions. In this study, we comprehensively analyzed the tumor-specific ceRNA fluctuations observed in the three highest-incidence cancers, LUAD, PRAD, and BRCA, compared to healthy lung, prostate, and breast tissues, respectively. Our observations pertaining to tumor-specific competing endogenous RNA (ceRNA) interactions revealed that, in the cases of lung adenocarcinoma (LUAD), prostate adenocarcinoma (PRAD), and breast invasive carcinoma (BRCA), 3,204, 1,233, and 406 ceRNAs, respectively, engage in post-transcriptional intercommunication within tumor tissues, in contrast to their absence in corresponding healthy samples. We also found that 90 ceRNAs are shared by the three cancer types and that these ceRNAs participate in ceRNA interactions in tumor tissues compared to those in normal tissues. Among the 90 ceRNAs that directly interact with miRNAs, we uncovered a core network of 165 miRNAs and 63 ceRNAs that should be considered in RNA-targeted and RNA-mediated approaches in future studies and could be used in these three aggressive cancer types. More specifically, in this core interaction network, ceRNAs such as GALNT7, KLF9, and DAB2 and miRNAs like miR-106a/b-5p, miR-20a-5p, and miR-519d-3p may have potential as common targets in the three critical cancers. In contrast to conventional methods that construct ceRNA networks using differentially expressed genes compared to normal tissues, our proposed approach identifies ceRNA players by considering their context within the ceRNA:miRNA interactions. Our results have the potential to reveal distinct and common ceRNA interactions in cancer types and to pinpoint critical RNAs, thereby paving the way for RNA-based strategies in the battle against cancer.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":""},"PeriodicalIF":4.5,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140614489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-16DOI: 10.1186/s13040-024-00361-5
Jianchang Hu, Silke Szymczak
Gene network information is believed to be beneficial for disease module and pathway identification, but has not been explicitly utilized in the standard random forest (RF) algorithm for gene expression data analysis. We investigate the performance of a network-guided RF where the network information is summarized into a sampling probability of predictor variables which is further used in the construction of the RF. Our simulation results suggest that network-guided RF does not provide better disease prediction than the standard RF. In terms of disease gene discovery, if disease genes form module(s), network-guided RF identifies them more accurately. In addition, when disease status is independent from genes in the given network, spurious gene selection results can occur when using network information, especially on hub genes. Our empirical analysis on two balanced microarray and RNA-Seq breast cancer datasets from The Cancer Genome Atlas (TCGA) for classification of progesterone receptor (PR) status also demonstrates that network-guided RF can identify genes from PGR-related pathways, which leads to a better connected module of identified genes. Gene networks can provide additional information to aid the gene expression analysis for disease module and pathway identification. But they need to be used with caution and validation on the results need to be carried out to guard against spurious gene selection. More robust approaches to incorporate such information into RF construction also warrant further study.
{"title":"Evaluation of network-guided random forest for disease gene discovery","authors":"Jianchang Hu, Silke Szymczak","doi":"10.1186/s13040-024-00361-5","DOIUrl":"https://doi.org/10.1186/s13040-024-00361-5","url":null,"abstract":"Gene network information is believed to be beneficial for disease module and pathway identification, but has not been explicitly utilized in the standard random forest (RF) algorithm for gene expression data analysis. We investigate the performance of a network-guided RF where the network information is summarized into a sampling probability of predictor variables which is further used in the construction of the RF. Our simulation results suggest that network-guided RF does not provide better disease prediction than the standard RF. In terms of disease gene discovery, if disease genes form module(s), network-guided RF identifies them more accurately. In addition, when disease status is independent from genes in the given network, spurious gene selection results can occur when using network information, especially on hub genes. Our empirical analysis on two balanced microarray and RNA-Seq breast cancer datasets from The Cancer Genome Atlas (TCGA) for classification of progesterone receptor (PR) status also demonstrates that network-guided RF can identify genes from PGR-related pathways, which leads to a better connected module of identified genes. Gene networks can provide additional information to aid the gene expression analysis for disease module and pathway identification. But they need to be used with caution and validation on the results need to be carried out to guard against spurious gene selection. More robust approaches to incorporate such information into RF construction also warrant further study.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"55 1","pages":""},"PeriodicalIF":4.5,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}