首页 > 最新文献

Biodata Mining最新文献

英文 中文
Transcriptome- and DNA methylation-based cell-type deconvolutions produce similar estimates of differential gene expression and differential methylation. 转录组和基于 DNA 甲基化的细胞类型解旋对差异基因表达和差异甲基化的估计结果相似。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-11 DOI: 10.1186/s13040-024-00374-0
Emily R Hannon, Carmen J Marsit, Arlene E Dent, Paula Embury, Sidney Ogolla, David Midem, Scott M Williams, James W Kazura

Background: Changing cell-type proportions can confound studies of differential gene expression or DNA methylation (DNAm) from peripheral blood mononuclear cells (PBMCs). We examined how cell-type proportions derived from the transcriptome versus the methylome (DNAm) influence estimates of differentially expressed genes (DEGs) and differentially methylated positions (DMPs).

Methods: Transcriptome and DNAm data were obtained from PBMC RNA and DNA of Kenyan children (n = 8) before, during, and 6 weeks following uncomplicated malaria. DEGs and DMPs between time points were detected using cell-type adjusted modeling with Cibersortx or IDOL, respectively.

Results: Most major cell types and principal components had moderate to high correlation between the two deconvolution methods (r = 0.60-0.96). Estimates of cell-type proportions and DEGs or DMPs were largely unaffected by the method, with the greatest discrepancy in the estimation of neutrophils.

Conclusion: Variation in cell-type proportions is captured similarly by both transcriptomic and methylome deconvolution methods for most major cell types.

背景:细胞类型比例的改变可能会混淆外周血单核细胞(PBMCs)差异基因表达或DNA甲基化(DNAm)的研究。我们研究了来自转录组与甲基组(DNAm)的细胞类型比例如何影响差异表达基因(DEGs)和差异甲基化位置(DMPs)的估计值:转录组和 DNAm 数据来自无并发症疟疾发生前、发生期间和发生后 6 周的肯尼亚儿童(n = 8)的 PBMC RNA 和 DNA。利用Cibersortx或IDOL的细胞类型调整模型分别检测时间点之间的DEGs和DMPs:大多数主要细胞类型和主成分在两种解卷积方法之间具有中度到高度的相关性(r = 0.60-0.96)。细胞类型比例和 DEG 或 DMP 的估计值基本不受方法的影响,中性粒细胞的估计值差异最大:结论:对于大多数主要细胞类型,转录组学和甲基组学解旋方法都能相似地捕捉到细胞类型比例的变化。
{"title":"Transcriptome- and DNA methylation-based cell-type deconvolutions produce similar estimates of differential gene expression and differential methylation.","authors":"Emily R Hannon, Carmen J Marsit, Arlene E Dent, Paula Embury, Sidney Ogolla, David Midem, Scott M Williams, James W Kazura","doi":"10.1186/s13040-024-00374-0","DOIUrl":"10.1186/s13040-024-00374-0","url":null,"abstract":"<p><strong>Background: </strong>Changing cell-type proportions can confound studies of differential gene expression or DNA methylation (DNAm) from peripheral blood mononuclear cells (PBMCs). We examined how cell-type proportions derived from the transcriptome versus the methylome (DNAm) influence estimates of differentially expressed genes (DEGs) and differentially methylated positions (DMPs).</p><p><strong>Methods: </strong>Transcriptome and DNAm data were obtained from PBMC RNA and DNA of Kenyan children (n = 8) before, during, and 6 weeks following uncomplicated malaria. DEGs and DMPs between time points were detected using cell-type adjusted modeling with Cibersortx or IDOL, respectively.</p><p><strong>Results: </strong>Most major cell types and principal components had moderate to high correlation between the two deconvolution methods (r = 0.60-0.96). Estimates of cell-type proportions and DEGs or DMPs were largely unaffected by the method, with the greatest discrepancy in the estimation of neutrophils.</p><p><strong>Conclusion: </strong>Variation in cell-type proportions is captured similarly by both transcriptomic and methylome deconvolution methods for most major cell types.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"21"},"PeriodicalIF":4.0,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11241886/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141591813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification of immune-associated biomarkers of diabetes nephropathy tubulointerstitial injury based on machine learning: a bioinformatics multi-chip integrated analysis. 基于机器学习的糖尿病肾病肾小管间质损伤免疫相关生物标记物的鉴定:生物信息学多芯片综合分析。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-01 DOI: 10.1186/s13040-024-00369-x
Lin Wang, Jiaming Su, Zhongjie Liu, Shaowei Ding, Yaotan Li, Baoluo Hou, Yuxin Hu, Zhaoxi Dong, Jingyi Tang, Hongfang Liu, Weijing Liu
<p><strong>Background: </strong>Diabetic nephropathy (DN) is a major microvascular complication of diabetes and has become the leading cause of end-stage renal disease worldwide. A considerable number of DN patients have experienced irreversible end-stage renal disease progression due to the inability to diagnose the disease early. Therefore, reliable biomarkers that are helpful for early diagnosis and treatment are identified. The migration of immune cells to the kidney is considered to be a key step in the progression of DN-related vascular injury. Therefore, finding markers in this process may be more helpful for the early diagnosis and progression prediction of DN.</p><p><strong>Methods: </strong>The gene chip data were retrieved from the GEO database using the search term ' diabetic nephropathy '. The ' limma ' software package was used to identify differentially expressed genes (DEGs) between DN and control samples. Gene set enrichment analysis (GSEA) was performed on genes obtained from the molecular characteristic database (MSigDB. The R package 'WGCNA' was used to identify gene modules associated with tubulointerstitial injury in DN, and it was crossed with immune-related DEGs to identify target genes. Gene ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed on differentially expressed genes using the 'ClusterProfiler' software package in R. Three methods, least absolute shrinkage and selection operator (LASSO), support vector machine recursive feature elimination (SVM-RFE) and random forest (RF), were used to select immune-related biomarkers for diagnosis. We retrieved the tubulointerstitial dataset from the Nephroseq database to construct an external validation dataset. Unsupervised clustering analysis of the expression levels of immune-related biomarkers was performed using the 'ConsensusClusterPlus 'R software package. The urine of patients who visited Dongzhimen Hospital of Beijing University of Chinese Medicine from September 2021 to March 2023 was collected, and Elisa was used to detect the mRNA expression level of immune-related biomarkers in urine. Pearson correlation analysis was used to detect the effect of immune-related biomarker expression on renal function in DN patients.</p><p><strong>Results: </strong>Four microarray datasets from the GEO database are included in the analysis : GSE30122, GSE47185, GSE99340 and GSE104954. These datasets included 63 DN patients and 55 healthy controls. A total of 9415 genes were detected in the data set. We found 153 differentially expressed immune-related genes, of which 112 genes were up-regulated, 41 genes were down-regulated, and 119 overlapping genes were identified. GO analysis showed that they were involved in various biological processes including leukocyte-mediated immunity. KEGG analysis showed that these target genes were mainly involved in the formation of phagosomes in Staphylococcus aureus infection. Among these
背景:糖尿病肾病(DN)是糖尿病的主要微血管并发症,已成为全球终末期肾病的主要病因。由于无法早期诊断,相当多的 DN 患者经历了不可逆转的终末期肾病进展。因此,需要找到有助于早期诊断和治疗的可靠生物标志物。免疫细胞向肾脏的迁移被认为是 DN 相关血管损伤进展的关键步骤。因此,寻找这一过程中的标记物可能更有助于 DN 的早期诊断和进展预测:方法:以 "糖尿病肾病 "为检索词,从 GEO 数据库中检索基因芯片数据。使用 "limma "软件包鉴定 DN 和对照样本之间的差异表达基因(DEGs)。对分子特征数据库(MSigDB)中获得的基因进行了基因组富集分析(GSEA)。使用 R 软件包 "WGCNA "识别与 DN 中肾小管间质损伤相关的基因模块,并与免疫相关的 DEGs 交叉以识别目标基因。利用R软件包 "ClusterProfiler "对差异表达基因进行了基因本体(GO)富集分析和京都基因组百科全书(KEGG)通路分析,并采用最小绝对收缩和选择算子(LASSO)、支持向量机递归特征消除(SVM-RFE)和随机森林(RF)三种方法筛选出用于诊断的免疫相关生物标志物。我们从 Nephroseq 数据库中检索了肾小管间质数据集,以构建外部验证数据集。我们使用 "ConsensusClusterPlus "R软件包对免疫相关生物标志物的表达水平进行了无监督聚类分析。收集2021年9月至2023年3月在北京中医药大学东直门医院就诊的患者尿液,用Elisa检测尿液中免疫相关生物标志物的mRNA表达水平。采用皮尔逊相关分析检测免疫相关生物标志物表达对DN患者肾功能的影响:分析包括 GEO 数据库中的四个微阵列数据集:GSE30122、GSE47185、GSE99340 和 GSE104954。这些数据集包括 63 名 DN 患者和 55 名健康对照者。数据集中共检测到 9415 个基因。我们发现了 153 个差异表达的免疫相关基因,其中 112 个基因上调,41 个基因下调,119 个基因重叠。GO 分析表明,这些基因参与了各种生物过程,包括白细胞介导的免疫。KEGG 分析显示,这些目标基因主要参与了金黄色葡萄球菌感染过程中吞噬体的形成。在这 119 个重叠基因中,机器学习结果发现 AGR2、CCR2、CEBPD、CISH、CX3CR1、DEFB1 和 FSTL1 是潜在的肾小管间质免疫相关生物标记。外部验证表明,上述标记物在区分 DN 患者和健康对照组方面具有诊断功效。临床研究表明,DN 患者尿样中 AGR2、CX3CR1 和 FSTL1 的表达与 GFR 呈负相关,DN 患者尿样中 CX3CR1 和 FSTL1 的表达与血清肌酐呈正相关,而 DN 患者尿样中 DEFB1 的表达与血清肌酐呈负相关。此外,DN 尿样中 CX3CR1 的表达与蛋白尿呈正相关,而 DN 尿样中 DEFB1 的表达与蛋白尿呈负相关。最后,根据蛋白尿的程度,将 DN 患者分为肾病性蛋白尿组(24 人)和肾下性蛋白尿组。经非配对 t 检验,两组患者尿液中 AGR2、CCR2 和 DEFB1 的含量存在明显差异(P 结论:DN 患者的尿液中 AGR2、CCR2 和 DEFB1 的含量均高于肾病蛋白尿组:我们的研究为免疫相关生物标志物在 DN 肾小管间质损伤中的作用提供了新的见解,并为 DN 患者的早期诊断和治疗提供了潜在的靶点。七个不同的基因(AGR2、CCR2、CEBPD、CISH、CX3CR1、DEFB1、FSTL1)作为有希望的敏感生物标志物,可能通过调节免疫炎症反应影响 DN 的进展。然而,要全面了解它们在 DN 中的确切分子机制和功能通路,还需要进一步的综合研究。
{"title":"Identification of immune-associated biomarkers of diabetes nephropathy tubulointerstitial injury based on machine learning: a bioinformatics multi-chip integrated analysis.","authors":"Lin Wang, Jiaming Su, Zhongjie Liu, Shaowei Ding, Yaotan Li, Baoluo Hou, Yuxin Hu, Zhaoxi Dong, Jingyi Tang, Hongfang Liu, Weijing Liu","doi":"10.1186/s13040-024-00369-x","DOIUrl":"10.1186/s13040-024-00369-x","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Diabetic nephropathy (DN) is a major microvascular complication of diabetes and has become the leading cause of end-stage renal disease worldwide. A considerable number of DN patients have experienced irreversible end-stage renal disease progression due to the inability to diagnose the disease early. Therefore, reliable biomarkers that are helpful for early diagnosis and treatment are identified. The migration of immune cells to the kidney is considered to be a key step in the progression of DN-related vascular injury. Therefore, finding markers in this process may be more helpful for the early diagnosis and progression prediction of DN.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;The gene chip data were retrieved from the GEO database using the search term ' diabetic nephropathy '. The ' limma ' software package was used to identify differentially expressed genes (DEGs) between DN and control samples. Gene set enrichment analysis (GSEA) was performed on genes obtained from the molecular characteristic database (MSigDB. The R package 'WGCNA' was used to identify gene modules associated with tubulointerstitial injury in DN, and it was crossed with immune-related DEGs to identify target genes. Gene ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed on differentially expressed genes using the 'ClusterProfiler' software package in R. Three methods, least absolute shrinkage and selection operator (LASSO), support vector machine recursive feature elimination (SVM-RFE) and random forest (RF), were used to select immune-related biomarkers for diagnosis. We retrieved the tubulointerstitial dataset from the Nephroseq database to construct an external validation dataset. Unsupervised clustering analysis of the expression levels of immune-related biomarkers was performed using the 'ConsensusClusterPlus 'R software package. The urine of patients who visited Dongzhimen Hospital of Beijing University of Chinese Medicine from September 2021 to March 2023 was collected, and Elisa was used to detect the mRNA expression level of immune-related biomarkers in urine. Pearson correlation analysis was used to detect the effect of immune-related biomarker expression on renal function in DN patients.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;Four microarray datasets from the GEO database are included in the analysis : GSE30122, GSE47185, GSE99340 and GSE104954. These datasets included 63 DN patients and 55 healthy controls. A total of 9415 genes were detected in the data set. We found 153 differentially expressed immune-related genes, of which 112 genes were up-regulated, 41 genes were down-regulated, and 119 overlapping genes were identified. GO analysis showed that they were involved in various biological processes including leukocyte-mediated immunity. KEGG analysis showed that these target genes were mainly involved in the formation of phagosomes in Staphylococcus aureus infection. Among these","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"20"},"PeriodicalIF":4.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11218417/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141477779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Electronic medical records imputation by temporal Generative Adversarial Network. 利用时态生成对抗网络估算电子病历。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-26 DOI: 10.1186/s13040-024-00372-2
Yunfei Yin, Zheng Yuan, Islam Md Tanvir, Xianjian Bao

The loss of electronic medical records has seriously affected the practical application of biomedical data. Therefore, it is a meaningful research effort to effectively fill these lost data. Currently, state-of-the-art methods focus on using Generative Adversarial Networks (GANs) to fill the missing values of electronic medical records, achieving breakthrough progress. However, when facing datasets with high missing rates, the imputation accuracy of these methods sharply deceases. This motivates us to explore the uncertainty of GANs and improve the GAN-based imputation methods. In this paper, the GRUD (Gate Recurrent Unit Decay) network and the UGAN (Uncertainty Generative Adversarial Network) are proposed and organically combined, called UGAN-GRUD. In UGAN-GRUD, it highlights using GAN to generate imputation values and then leveraging GRUD to compensate them. We have designed the UGAN and the GRUD network. The former is employed to learn the distribution pattern and uncertainty of data through the Generator and Discriminator, iteratively. The latter is exploited to compensate the former by leveraging the GRUD based on time decay factor, which can learn the specific temporal relations in electronic medical records. Through experimental research on publicly available biomedical datasets, the results show that UGAN-GRUD outperforms the current state-of-the-art methods, with average 13% RMSE (Root Mean Squared Error) and 24.5% MAPE (Mean Absolute Percentage Error) improvements.

电子病历的丢失严重影响了生物医学数据的实际应用。因此,有效填补这些丢失的数据是一项有意义的研究工作。目前,最先进的方法主要是使用生成对抗网络(GAN)来填补电子病历的缺失值,并取得了突破性进展。然而,当面对高缺失率的数据集时,这些方法的估算准确性会急剧下降。这促使我们探索 GAN 的不确定性,并改进基于 GAN 的估算方法。本文提出 GRUD(门递归单元衰减)网络和 UGAN(不确定性生成对抗网络),并将其有机地结合起来,称为 UGAN-GRUD。在 UGAN-GRUD 中,它强调使用 GAN 生成估算值,然后利用 GRUD 对其进行补偿。我们设计了 UGAN 和 GRUD 网络。前者通过生成器和判别器反复学习数据的分布模式和不确定性。后者则利用基于时间衰减因子的 GRUD 来弥补前者的不足,后者可以学习电子病历中的特定时间关系。通过对公开生物医学数据集的实验研究,结果表明 UGAN-GRUD 优于目前最先进的方法,平均 RMSE(均方根误差)提高了 13%,MAPE(平均绝对误差)提高了 24.5%。
{"title":"Electronic medical records imputation by temporal Generative Adversarial Network.","authors":"Yunfei Yin, Zheng Yuan, Islam Md Tanvir, Xianjian Bao","doi":"10.1186/s13040-024-00372-2","DOIUrl":"10.1186/s13040-024-00372-2","url":null,"abstract":"<p><p>The loss of electronic medical records has seriously affected the practical application of biomedical data. Therefore, it is a meaningful research effort to effectively fill these lost data. Currently, state-of-the-art methods focus on using Generative Adversarial Networks (GANs) to fill the missing values of electronic medical records, achieving breakthrough progress. However, when facing datasets with high missing rates, the imputation accuracy of these methods sharply deceases. This motivates us to explore the uncertainty of GANs and improve the GAN-based imputation methods. In this paper, the GRUD (Gate Recurrent Unit Decay) network and the UGAN (Uncertainty Generative Adversarial Network) are proposed and organically combined, called UGAN-GRUD. In UGAN-GRUD, it highlights using GAN to generate imputation values and then leveraging GRUD to compensate them. We have designed the UGAN and the GRUD network. The former is employed to learn the distribution pattern and uncertainty of data through the Generator and Discriminator, iteratively. The latter is exploited to compensate the former by leveraging the GRUD based on time decay factor, which can learn the specific temporal relations in electronic medical records. Through experimental research on publicly available biomedical datasets, the results show that UGAN-GRUD outperforms the current state-of-the-art methods, with average 13% RMSE (Root Mean Squared Error) and 24.5% MAPE (Mean Absolute Percentage Error) improvements.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"19"},"PeriodicalIF":4.0,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11202349/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141460183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Saliency-driven explainable deep learning in medical imaging: bridging visual explainability and statistical quantitative analysis. 医学成像中的显著性驱动可解释深度学习:连接视觉可解释性与统计定量分析。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-22 DOI: 10.1186/s13040-024-00370-4
Yusuf Brima, Marcellin Atemkeng

Deep learning shows great promise for medical image analysis but often lacks explainability, hindering its adoption in healthcare. Attribution techniques that explain model reasoning can potentially increase trust in deep learning among clinical stakeholders. In the literature, much of the research on attribution in medical imaging focuses on visual inspection rather than statistical quantitative analysis.In this paper, we proposed an image-based saliency framework to enhance the explainability of deep learning models in medical image analysis. We use adaptive path-based gradient integration, gradient-free techniques, and class activation mapping along with its derivatives to attribute predictions from brain tumor MRI and COVID-19 chest X-ray datasets made by recent deep convolutional neural network models.The proposed framework integrates qualitative and statistical quantitative assessments, employing Accuracy Information Curves (AICs) and Softmax Information Curves (SICs) to measure the effectiveness of saliency methods in retaining critical image information and their correlation with model predictions. Visual inspections indicate that methods such as ScoreCAM, XRAI, GradCAM, and GradCAM++ consistently produce focused and clinically interpretable attribution maps. These methods highlighted possible biomarkers, exposed model biases, and offered insights into the links between input features and predictions, demonstrating their ability to elucidate model reasoning on these datasets. Empirical evaluations reveal that ScoreCAM and XRAI are particularly effective in retaining relevant image regions, as reflected in their higher AUC values. However, SICs highlight variability, with instances of random saliency masks outperforming established methods, emphasizing the need for combining visual and empirical metrics for a comprehensive evaluation.The results underscore the importance of selecting appropriate saliency methods for specific medical imaging tasks and suggest that combining qualitative and quantitative approaches can enhance the transparency, trustworthiness, and clinical adoption of deep learning models in healthcare. This study advances model explainability to increase trust in deep learning among healthcare stakeholders by revealing the rationale behind predictions. Future research should refine empirical metrics for stability and reliability, include more diverse imaging modalities, and focus on improving model explainability to support clinical decision-making.

深度学习在医学图像分析方面大有可为,但往往缺乏可解释性,阻碍了其在医疗保健领域的应用。解释模型推理的归因技术有可能增加临床利益相关者对深度学习的信任。本文提出了一个基于图像的显著性框架,以增强深度学习模型在医学图像分析中的可解释性。我们使用基于路径的自适应梯度积分、无梯度技术和类激活映射及其衍生物,对最近的深度卷积神经网络模型从脑肿瘤 MRI 和 COVID-19 胸部 X 光数据集中得出的预测结果进行归因。所提出的框架综合了定性和统计定量评估,使用准确度信息曲线(AIC)和软最大信息曲线(SIC)来衡量突出度方法在保留关键图像信息方面的有效性及其与模型预测的相关性。目测结果表明,ScoreCAM、XRAI、GradCAM 和 GradCAM++ 等方法能持续生成重点突出、临床可解释的归因图。这些方法突出了可能的生物标记物,暴露了模型偏差,并提供了输入特征与预测之间联系的见解,证明了它们在这些数据集上阐明模型推理的能力。经验评估显示,ScoreCAM 和 XRAI 在保留相关图像区域方面特别有效,这反映在它们较高的 AUC 值上。结果强调了为特定医学成像任务选择合适的突出度方法的重要性,并表明结合定性和定量方法可以提高深度学习模型在医疗保健领域的透明度、可信度和临床应用。本研究通过揭示预测背后的原理,提高了模型的可解释性,从而增加了医疗保健利益相关者对深度学习的信任。未来的研究应完善稳定性和可靠性的经验指标,纳入更多不同的成像模式,并侧重于提高模型的可解释性,以支持临床决策。
{"title":"Saliency-driven explainable deep learning in medical imaging: bridging visual explainability and statistical quantitative analysis.","authors":"Yusuf Brima, Marcellin Atemkeng","doi":"10.1186/s13040-024-00370-4","DOIUrl":"10.1186/s13040-024-00370-4","url":null,"abstract":"<p><p>Deep learning shows great promise for medical image analysis but often lacks explainability, hindering its adoption in healthcare. Attribution techniques that explain model reasoning can potentially increase trust in deep learning among clinical stakeholders. In the literature, much of the research on attribution in medical imaging focuses on visual inspection rather than statistical quantitative analysis.In this paper, we proposed an image-based saliency framework to enhance the explainability of deep learning models in medical image analysis. We use adaptive path-based gradient integration, gradient-free techniques, and class activation mapping along with its derivatives to attribute predictions from brain tumor MRI and COVID-19 chest X-ray datasets made by recent deep convolutional neural network models.The proposed framework integrates qualitative and statistical quantitative assessments, employing Accuracy Information Curves (AICs) and Softmax Information Curves (SICs) to measure the effectiveness of saliency methods in retaining critical image information and their correlation with model predictions. Visual inspections indicate that methods such as ScoreCAM, XRAI, GradCAM, and GradCAM++ consistently produce focused and clinically interpretable attribution maps. These methods highlighted possible biomarkers, exposed model biases, and offered insights into the links between input features and predictions, demonstrating their ability to elucidate model reasoning on these datasets. Empirical evaluations reveal that ScoreCAM and XRAI are particularly effective in retaining relevant image regions, as reflected in their higher AUC values. However, SICs highlight variability, with instances of random saliency masks outperforming established methods, emphasizing the need for combining visual and empirical metrics for a comprehensive evaluation.The results underscore the importance of selecting appropriate saliency methods for specific medical imaging tasks and suggest that combining qualitative and quantitative approaches can enhance the transparency, trustworthiness, and clinical adoption of deep learning models in healthcare. This study advances model explainability to increase trust in deep learning among healthcare stakeholders by revealing the rationale behind predictions. Future research should refine empirical metrics for stability and reliability, include more diverse imaging modalities, and focus on improving model explainability to support clinical decision-making.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"18"},"PeriodicalIF":4.0,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11193223/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141440989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using GPT-4 to write a scientific review article: a pilot evaluation study. 使用 GPT-4 撰写科学评论文章:试点评估研究。
IF 4.5 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-18 DOI: 10.1186/s13040-024-00371-3
Zhiping Paul Wang, Priyanka Bhandary, Yizhou Wang, Jason H Moore

GPT-4, as the most advanced version of OpenAI's large language models, has attracted widespread attention, rapidly becoming an indispensable AI tool across various areas. This includes its exploration by scientists for diverse applications. Our study focused on assessing GPT-4's capabilities in generating text, tables, and diagrams for biomedical review papers. We also assessed the consistency in text generation by GPT-4, along with potential plagiarism issues when employing this model for the composition of scientific review papers. Based on the results, we suggest the development of enhanced functionalities in ChatGPT, aiming to meet the needs of the scientific community more effectively. This includes enhancements in uploaded document processing for reference materials, a deeper grasp of intricate biomedical concepts, more precise and efficient information distillation for table generation, and a further refined model specifically tailored for scientific diagram creation.

作为 OpenAI 大型语言模型的最高级版本,GPT-4 已引起广泛关注,并迅速成为各个领域不可或缺的人工智能工具。这包括科学家们对其在不同应用领域的探索。我们的研究重点是评估 GPT-4 为生物医学综述论文生成文本、表格和图表的能力。我们还评估了 GPT-4 生成文本的一致性,以及使用该模型撰写科学评论论文时可能存在的抄袭问题。基于这些结果,我们建议开发 ChatGPT 的增强功能,以更有效地满足科学界的需求。这包括加强对参考资料上传文档的处理,更深入地掌握复杂的生物医学概念,更精确、更高效地提炼信息以生成表格,以及进一步完善专门用于科学图表创建的模型。
{"title":"Using GPT-4 to write a scientific review article: a pilot evaluation study.","authors":"Zhiping Paul Wang, Priyanka Bhandary, Yizhou Wang, Jason H Moore","doi":"10.1186/s13040-024-00371-3","DOIUrl":"10.1186/s13040-024-00371-3","url":null,"abstract":"<p><p>GPT-4, as the most advanced version of OpenAI's large language models, has attracted widespread attention, rapidly becoming an indispensable AI tool across various areas. This includes its exploration by scientists for diverse applications. Our study focused on assessing GPT-4's capabilities in generating text, tables, and diagrams for biomedical review papers. We also assessed the consistency in text generation by GPT-4, along with potential plagiarism issues when employing this model for the composition of scientific review papers. Based on the results, we suggest the development of enhanced functionalities in ChatGPT, aiming to meet the needs of the scientific community more effectively. This includes enhancements in uploaded document processing for reference materials, a deeper grasp of intricate biomedical concepts, more precise and efficient information distillation for table generation, and a further refined model specifically tailored for scientific diagram creation.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"16"},"PeriodicalIF":4.5,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11184879/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141421566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling wearables: exploring the global landscape of biometric applications and vital signs and behavioral impact. 揭开可穿戴设备的神秘面纱:探索生物识别应用和生命体征及行为影响的全球格局。
IF 4.5 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-11 DOI: 10.1186/s13040-024-00368-y
Carolina Del-Valle-Soto, Ramon A Briseño, Leonardo J Valdivia, Juan Arturo Nolazco-Flores

The development of neuroscientific techniques enabling the recording of brain and peripheral nervous system activity has fueled research in cognitive science. Recent technological advancements offer new possibilities for inducing behavioral change, particularly through cost-effective Internet-based interventions. However, limitations in laboratory equipment volume have hindered the generalization of results to real-life contexts. The advent of Internet of Things (IoT) devices, such as wearables, equipped with sensors and microchips, has ushered in a new era in behavior change techniques. Wearables, including smartwatches, electronic tattoos, and more, are poised for massive adoption, with an expected annual growth rate of 55% over the next five years. These devices enable personalized instructions, leading to increased productivity and efficiency, particularly in industrial production. Additionally, the healthcare sector has seen a significant demand for wearables, with over 80% of global consumers willing to use them for health monitoring. This research explores the primary biometric applications of wearables and their impact on users' well-being, focusing on the integration of behavior change techniques facilitated by IoT devices. Wearables have revolutionized health monitoring by providing real-time feedback, personalized interventions, and gamification. They encourage positive behavior changes by delivering immediate feedback, tailored recommendations, and gamified experiences, leading to sustained improvements in health. Furthermore, wearables seamlessly integrate with digital platforms, enhancing their impact through social support and connectivity. However, privacy and data security concerns must be addressed to maintain users' trust. As technology continues to advance, the refinement of IoT devices' design and functionality is crucial for promoting behavior change and improving health outcomes. This study aims to investigate the effects of behavior change techniques facilitated by wearables on individuals' health outcomes and the role of wearables in promoting a healthier lifestyle.

能够记录大脑和周围神经系统活动的神经科学技术的发展推动了认知科学的研究。最近的技术进步为诱导行为改变提供了新的可能性,特别是通过经济有效的互联网干预。然而,实验室设备数量的限制阻碍了将结果推广到现实生活中。物联网(IoT)设备(如配备传感器和微型芯片的可穿戴设备)的出现开创了行为改变技术的新时代。包括智能手表、电子纹身等在内的可穿戴设备将得到大规模应用,预计未来五年的年增长率将达到 55%。这些设备可提供个性化指导,从而提高生产力和效率,特别是在工业生产领域。此外,医疗保健领域对可穿戴设备的需求也非常大,全球超过 80% 的消费者愿意使用可穿戴设备进行健康监测。本研究探讨了可穿戴设备的主要生物识别应用及其对用户健康的影响,重点是物联网设备促进的行为改变技术的整合。可穿戴设备通过提供实时反馈、个性化干预和游戏化,彻底改变了健康监测。它们通过提供即时反馈、量身定制的建议和游戏化体验来鼓励积极的行为改变,从而持续改善健康状况。此外,可穿戴设备还能与数字平台无缝集成,通过社会支持和连接增强其影响力。然而,为了维护用户的信任,必须解决隐私和数据安全问题。随着技术的不断进步,完善物联网设备的设计和功能对于促进行为改变和改善健康状况至关重要。本研究旨在调查可穿戴设备促进行为改变技术对个人健康结果的影响,以及可穿戴设备在促进更健康生活方式中的作用。
{"title":"Unveiling wearables: exploring the global landscape of biometric applications and vital signs and behavioral impact.","authors":"Carolina Del-Valle-Soto, Ramon A Briseño, Leonardo J Valdivia, Juan Arturo Nolazco-Flores","doi":"10.1186/s13040-024-00368-y","DOIUrl":"10.1186/s13040-024-00368-y","url":null,"abstract":"<p><p>The development of neuroscientific techniques enabling the recording of brain and peripheral nervous system activity has fueled research in cognitive science. Recent technological advancements offer new possibilities for inducing behavioral change, particularly through cost-effective Internet-based interventions. However, limitations in laboratory equipment volume have hindered the generalization of results to real-life contexts. The advent of Internet of Things (IoT) devices, such as wearables, equipped with sensors and microchips, has ushered in a new era in behavior change techniques. Wearables, including smartwatches, electronic tattoos, and more, are poised for massive adoption, with an expected annual growth rate of 55% over the next five years. These devices enable personalized instructions, leading to increased productivity and efficiency, particularly in industrial production. Additionally, the healthcare sector has seen a significant demand for wearables, with over 80% of global consumers willing to use them for health monitoring. This research explores the primary biometric applications of wearables and their impact on users' well-being, focusing on the integration of behavior change techniques facilitated by IoT devices. Wearables have revolutionized health monitoring by providing real-time feedback, personalized interventions, and gamification. They encourage positive behavior changes by delivering immediate feedback, tailored recommendations, and gamified experiences, leading to sustained improvements in health. Furthermore, wearables seamlessly integrate with digital platforms, enhancing their impact through social support and connectivity. However, privacy and data security concerns must be addressed to maintain users' trust. As technology continues to advance, the refinement of IoT devices' design and functionality is crucial for promoting behavior change and improving health outcomes. This study aims to investigate the effects of behavior change techniques facilitated by wearables on individuals' health outcomes and the role of wearables in promoting a healthier lifestyle.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"15"},"PeriodicalIF":4.5,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11165804/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141307145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The biomedical knowledge graph of symptom phenotype in coronary artery plaque: machine learning-based analysis of real-world clinical data. 冠状动脉斑块症状表型的生物医学知识图谱:基于机器学习的真实世界临床数据分析。
IF 4.5 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-05-21 DOI: 10.1186/s13040-024-00365-1
Jia-Ming Huan, Xiao-Jie Wang, Yuan Li, Shi-Jun Zhang, Yuan-Long Hu, Yun-Lun Li

A knowledge graph can effectively showcase the essential characteristics of data and is increasingly emerging as a significant means of integrating information in the field of artificial intelligence. Coronary artery plaque represents a significant etiology of cardiovascular events, posing a diagnostic challenge for clinicians who are confronted with a multitude of nonspecific symptoms. To visualize the hierarchical relationship network graph of the molecular mechanisms underlying plaque properties and symptom phenotypes, patient symptomatology was extracted from electronic health record data from real-world clinical settings. Phenotypic networks were constructed utilizing clinical data and protein‒protein interaction networks. Machine learning techniques, including convolutional neural networks, Dijkstra's algorithm, and gene ontology semantic similarity, were employed to quantify clinical and biological features within the network. The resulting features were then utilized to train a K-nearest neighbor model, yielding 23 symptoms, 41 association rules, and 61 hub genes across the three types of plaques studied, achieving an area under the curve of 92.5%. Weighted correlation network analysis and pathway enrichment were subsequently utilized to identify lipid status-related genes and inflammation-associated pathways that could help explain the differences in plaque properties. To confirm the validity of the network graph model, we conducted coexpression analysis of the hub genes to evaluate their potential diagnostic value. Additionally, we investigated immune cell infiltration, examined the correlations between hub genes and immune cells, and validated the reliability of the identified biological pathways. By integrating clinical data and molecular network information, this biomedical knowledge graph model effectively elucidated the potential molecular mechanisms that collude symptoms, diseases, and molecules.

知识图谱可以有效地展示数据的基本特征,并日益成为人工智能领域整合信息的重要手段。冠状动脉斑块是心血管事件的一个重要病因,给临床医生带来了诊断上的挑战,因为他们要面对众多非特异性症状。为了可视化斑块特性和症状表型的分子机制的层次关系网络图,我们从真实世界临床环境的电子健康记录数据中提取了患者症状。利用临床数据和蛋白质-蛋白质相互作用网络构建了表型网络。采用卷积神经网络、Dijkstra 算法和基因本体语义相似性等机器学习技术来量化网络中的临床和生物特征。然后利用由此产生的特征来训练 K 最近邻模型,在研究的三种斑块中得出了 23 种症状、41 条关联规则和 61 个中心基因,曲线下面积达到 92.5%。随后,研究人员利用加权相关网络分析和通路富集来确定与脂质状态相关的基因和与炎症相关的通路,这些基因和通路有助于解释斑块特性的差异。为了证实网络图模型的有效性,我们对中心基因进行了共表达分析,以评估其潜在的诊断价值。此外,我们还调查了免疫细胞浸润情况,研究了枢纽基因与免疫细胞之间的相关性,并验证了所识别生物通路的可靠性。通过整合临床数据和分子网络信息,该生物医学知识图谱模型有效地阐明了症状、疾病和分子之间的潜在分子机制。
{"title":"The biomedical knowledge graph of symptom phenotype in coronary artery plaque: machine learning-based analysis of real-world clinical data.","authors":"Jia-Ming Huan, Xiao-Jie Wang, Yuan Li, Shi-Jun Zhang, Yuan-Long Hu, Yun-Lun Li","doi":"10.1186/s13040-024-00365-1","DOIUrl":"10.1186/s13040-024-00365-1","url":null,"abstract":"<p><p>A knowledge graph can effectively showcase the essential characteristics of data and is increasingly emerging as a significant means of integrating information in the field of artificial intelligence. Coronary artery plaque represents a significant etiology of cardiovascular events, posing a diagnostic challenge for clinicians who are confronted with a multitude of nonspecific symptoms. To visualize the hierarchical relationship network graph of the molecular mechanisms underlying plaque properties and symptom phenotypes, patient symptomatology was extracted from electronic health record data from real-world clinical settings. Phenotypic networks were constructed utilizing clinical data and protein‒protein interaction networks. Machine learning techniques, including convolutional neural networks, Dijkstra's algorithm, and gene ontology semantic similarity, were employed to quantify clinical and biological features within the network. The resulting features were then utilized to train a K-nearest neighbor model, yielding 23 symptoms, 41 association rules, and 61 hub genes across the three types of plaques studied, achieving an area under the curve of 92.5%. Weighted correlation network analysis and pathway enrichment were subsequently utilized to identify lipid status-related genes and inflammation-associated pathways that could help explain the differences in plaque properties. To confirm the validity of the network graph model, we conducted coexpression analysis of the hub genes to evaluate their potential diagnostic value. Additionally, we investigated immune cell infiltration, examined the correlations between hub genes and immune cells, and validated the reliability of the identified biological pathways. By integrating clinical data and molecular network information, this biomedical knowledge graph model effectively elucidated the potential molecular mechanisms that collude symptoms, diseases, and molecules.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"13"},"PeriodicalIF":4.5,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11110203/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141077027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine-learning-based models to predict cardiovascular risk using oculomics and clinic variables in KNHANES 基于机器学习的模型,利用 KNHANES 中的眼动学和临床变量预测心血管风险
IF 4.5 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-04-22 DOI: 10.1186/s13040-024-00363-3
Yuqi Zhang, Sijin Li, Weijie Wu, Yanqing Zhao, Jintao Han, Chao Tong, Niansang Luo, Kun Zhang
Recent researches have found a strong correlation between the triglyceride-glucose (TyG) index or the atherogenic index of plasma (AIP) and cardiovascular disease (CVD) risk. However, there is a lack of research on non-invasive and rapid prediction of cardiovascular risk. We aimed to develop and validate a machine-learning model for predicting cardiovascular risk based on variables encompassing clinical questionnaires and oculomics. We collected data from the Korean National Health and Nutrition Examination Survey (KNHANES). The training dataset (80% from the year 2008 to 2011 KNHANES) was used for machine learning model development, with internal validation using the remaining 20%. An external validation dataset from the year 2012 assessed the model’s predictive capacity for TyG-index or AIP in new cases. We included 32122 participants in the final dataset. Machine learning models used 25 algorithms were trained on oculomics measurements and clinical questionnaires to predict the range of TyG-index and AIP. The area under the receiver operating characteristic curve (AUC), accuracy, precision, recall, and F1 score were used to evaluate the performance of our machine learning models. Based on large-scale cohort studies, we determined TyG-index cut-off points at 8.0, 8.75 (upper one-third values), 8.93 (upper one-fourth values), and AIP cut-offs at 0.318, 0.34. Values surpassing these thresholds indicated elevated cardiovascular risk. The best-performing algorithm revealed TyG-index cut-offs at 8.0, 8.75, and 8.93 with internal validation AUCs of 0.812, 0.873, and 0.911, respectively. External validation AUCs were 0.809, 0.863, and 0.901. For AIP at 0.34, internal and external validation achieved similar AUCs of 0.849 and 0.842. Slightly lower performance was seen for the 0.318 cut-off, with AUCs of 0.844 and 0.836. Significant gender-based variations were noted for TyG-index at 8 (male AUC=0.832, female AUC=0.790) and 8.75 (male AUC=0.874, female AUC=0.862) and AIP at 0.318 (male AUC=0.853, female AUC=0.825) and 0.34 (male AUC=0.858, female AUC=0.831). Gender similarity in AUC (male AUC=0.907 versus female AUC=0.906) was observed only when the TyG-index cut-off point equals 8.93. We have established a simple and effective non-invasive machine learning model that has good clinical value for predicting cardiovascular risk in the general population.
最近的研究发现,甘油三酯-葡萄糖(TyG)指数或血浆致动脉粥样硬化指数(AIP)与心血管疾病(CVD)风险之间存在密切联系。然而,目前还缺乏对心血管风险进行无创、快速预测的研究。我们的目的是开发并验证一种基于临床问卷和眼科变量的机器学习模型,用于预测心血管风险。我们从韩国国民健康与营养调查(KNHANES)中收集了数据。训练数据集(80%来自2008年至2011年的KNHANES)用于机器学习模型的开发,其余20%用于内部验证。2012年的外部验证数据集评估了模型对新病例中TyG指数或AIP的预测能力。我们在最终数据集中纳入了 32122 名参与者。机器学习模型使用 25 种算法,通过眼科测量和临床问卷进行训练,以预测 TyG 指数和 AIP 的范围。接受者操作特征曲线下面积(AUC)、准确度、精确度、召回率和 F1 分数用于评估机器学习模型的性能。根据大规模队列研究,我们将 TyG 指数临界点定为 8.0、8.75(上三分之一值)、8.93(上四分之一值),将 AIP 临界点定为 0.318、0.34。超过这些临界值表明心血管风险升高。表现最好的算法显示 TyG 指数临界值为 8.0、8.75 和 8.93,内部验证 AUC 分别为 0.812、0.873 和 0.911。外部验证的 AUC 分别为 0.809、0.863 和 0.901。对于 0.34 的 AIP,内部和外部验证的 AUC 相似,分别为 0.849 和 0.842。在 0.318 临界值时,AUC 分别为 0.844 和 0.836,表现略低。TyG指数在8(男性AUC=0.832,女性AUC=0.790)和8.75(男性AUC=0.874,女性AUC=0.862)以及AIP指数在0.318(男性AUC=0.853,女性AUC=0.825)和0.34(男性AUC=0.858,女性AUC=0.831)时有显著的性别差异。只有当 TyG 指数临界点等于 8.93 时,才能观察到 AUC 的性别相似性(男性 AUC=0.907 对女性 AUC=0.906)。我们建立了一个简单有效的无创机器学习模型,该模型对预测普通人群的心血管风险具有良好的临床价值。
{"title":"Machine-learning-based models to predict cardiovascular risk using oculomics and clinic variables in KNHANES","authors":"Yuqi Zhang, Sijin Li, Weijie Wu, Yanqing Zhao, Jintao Han, Chao Tong, Niansang Luo, Kun Zhang","doi":"10.1186/s13040-024-00363-3","DOIUrl":"https://doi.org/10.1186/s13040-024-00363-3","url":null,"abstract":"Recent researches have found a strong correlation between the triglyceride-glucose (TyG) index or the atherogenic index of plasma (AIP) and cardiovascular disease (CVD) risk. However, there is a lack of research on non-invasive and rapid prediction of cardiovascular risk. We aimed to develop and validate a machine-learning model for predicting cardiovascular risk based on variables encompassing clinical questionnaires and oculomics. We collected data from the Korean National Health and Nutrition Examination Survey (KNHANES). The training dataset (80% from the year 2008 to 2011 KNHANES) was used for machine learning model development, with internal validation using the remaining 20%. An external validation dataset from the year 2012 assessed the model’s predictive capacity for TyG-index or AIP in new cases. We included 32122 participants in the final dataset. Machine learning models used 25 algorithms were trained on oculomics measurements and clinical questionnaires to predict the range of TyG-index and AIP. The area under the receiver operating characteristic curve (AUC), accuracy, precision, recall, and F1 score were used to evaluate the performance of our machine learning models. Based on large-scale cohort studies, we determined TyG-index cut-off points at 8.0, 8.75 (upper one-third values), 8.93 (upper one-fourth values), and AIP cut-offs at 0.318, 0.34. Values surpassing these thresholds indicated elevated cardiovascular risk. The best-performing algorithm revealed TyG-index cut-offs at 8.0, 8.75, and 8.93 with internal validation AUCs of 0.812, 0.873, and 0.911, respectively. External validation AUCs were 0.809, 0.863, and 0.901. For AIP at 0.34, internal and external validation achieved similar AUCs of 0.849 and 0.842. Slightly lower performance was seen for the 0.318 cut-off, with AUCs of 0.844 and 0.836. Significant gender-based variations were noted for TyG-index at 8 (male AUC=0.832, female AUC=0.790) and 8.75 (male AUC=0.874, female AUC=0.862) and AIP at 0.318 (male AUC=0.853, female AUC=0.825) and 0.34 (male AUC=0.858, female AUC=0.831). Gender similarity in AUC (male AUC=0.907 versus female AUC=0.906) was observed only when the TyG-index cut-off point equals 8.93. We have established a simple and effective non-invasive machine learning model that has good clinical value for predicting cardiovascular risk in the general population.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"114 1","pages":""},"PeriodicalIF":4.5,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140634495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decoding dynamic miRNA:ceRNA interactions unveils therapeutic insights and targets across predominant cancer landscapes 解码动态 miRNA:ceRNA 相互作用,揭示主要癌症景观中的治疗见解和靶点
IF 4.5 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-04-17 DOI: 10.1186/s13040-024-00362-4
Selcen Ari Yuka, Alper Yilmaz
Competing endogenous RNAs play key roles in cellular molecular mechanisms through cross-talk in post-transcriptional interactions. Studies on ceRNA cross-talk, which is particularly dependent on the abundance of free transcripts, generally involve large- and small-scale studies involving the integration of transcriptomic data from tissues and correlation analyses. This abundance-dependent nature of ceRNA interactions suggests that tissue- and condition-specific ceRNA dynamics may fluctuate. However, there are no comprehensive studies investigating the ceRNA interactions in normal tissue, ceRNAs that are lost and/or appear in cancerous tissues or their interactions. In this study, we comprehensively analyzed the tumor-specific ceRNA fluctuations observed in the three highest-incidence cancers, LUAD, PRAD, and BRCA, compared to healthy lung, prostate, and breast tissues, respectively. Our observations pertaining to tumor-specific competing endogenous RNA (ceRNA) interactions revealed that, in the cases of lung adenocarcinoma (LUAD), prostate adenocarcinoma (PRAD), and breast invasive carcinoma (BRCA), 3,204, 1,233, and 406 ceRNAs, respectively, engage in post-transcriptional intercommunication within tumor tissues, in contrast to their absence in corresponding healthy samples. We also found that 90 ceRNAs are shared by the three cancer types and that these ceRNAs participate in ceRNA interactions in tumor tissues compared to those in normal tissues. Among the 90 ceRNAs that directly interact with miRNAs, we uncovered a core network of 165 miRNAs and 63 ceRNAs that should be considered in RNA-targeted and RNA-mediated approaches in future studies and could be used in these three aggressive cancer types. More specifically, in this core interaction network, ceRNAs such as GALNT7, KLF9, and DAB2 and miRNAs like miR-106a/b-5p, miR-20a-5p, and miR-519d-3p may have potential as common targets in the three critical cancers. In contrast to conventional methods that construct ceRNA networks using differentially expressed genes compared to normal tissues, our proposed approach identifies ceRNA players by considering their context within the ceRNA:miRNA interactions. Our results have the potential to reveal distinct and common ceRNA interactions in cancer types and to pinpoint critical RNAs, thereby paving the way for RNA-based strategies in the battle against cancer.
竞争性内源 RNA 通过转录后相互作用的交叉作用在细胞分子机制中发挥关键作用。对 ceRNA 交叉作用的研究特别依赖于游离转录本的丰度,一般涉及大、小规模的研究,包括整合来自组织的转录组数据和相关性分析。ceRNA 相互作用的丰度依赖性表明,特定组织和条件的 ceRNA 动态可能会波动。然而,目前还没有全面的研究调查正常组织中的 ceRNA 相互作用、癌症组织中丢失和/或出现的 ceRNA 及其相互作用。在本研究中,我们全面分析了在三种高发癌症(LUAD、PRAD 和 BRCA)中观察到的肿瘤特异性 ceRNA 波动,并分别与健康肺组织、前列腺组织和乳腺组织进行了比较。我们对肿瘤特异性竞争性内源性 RNA(ceRNA)相互作用的观察结果显示,在肺腺癌(LUAD)、前列腺癌(PRAD)和乳腺浸润性癌(BRCA)病例中,分别有 3204、1233 和 406 个 ceRNA 在肿瘤组织内进行转录后互通,而在相应的健康样本中则没有。我们还发现,三种癌症类型共有 90 个 ceRNA,与正常组织相比,这些 ceRNA 参与了肿瘤组织中的 ceRNA 相互作用。在与 miRNAs 直接相互作用的 90 个 ceRNAs 中,我们发现了一个由 165 个 miRNAs 和 63 个 ceRNAs 组成的核心网络,在未来的研究中,RNA 靶向和 RNA 介导的方法应考虑这些核心网络,并可用于这三种侵袭性癌症类型。更具体地说,在这个核心相互作用网络中,GALNT7、KLF9 和 DAB2 等 ceRNA 和 miR-106a/b-5p 、miR-20a-5p 和 miR-519d-3p 等 miRNA 有可能成为这三种侵袭性癌症的共同靶点。与传统的利用与正常组织相比的差异表达基因构建 ceRNA 网络的方法不同,我们提出的方法是通过考虑 ceRNA:miRNA 相互作用中的上下文来识别 ceRNA 参与者。我们的研究结果有可能揭示癌症类型中独特和常见的 ceRNA 相互作用,并确定关键 RNA,从而为基于 RNA 的抗癌策略铺平道路。
{"title":"Decoding dynamic miRNA:ceRNA interactions unveils therapeutic insights and targets across predominant cancer landscapes","authors":"Selcen Ari Yuka, Alper Yilmaz","doi":"10.1186/s13040-024-00362-4","DOIUrl":"https://doi.org/10.1186/s13040-024-00362-4","url":null,"abstract":"Competing endogenous RNAs play key roles in cellular molecular mechanisms through cross-talk in post-transcriptional interactions. Studies on ceRNA cross-talk, which is particularly dependent on the abundance of free transcripts, generally involve large- and small-scale studies involving the integration of transcriptomic data from tissues and correlation analyses. This abundance-dependent nature of ceRNA interactions suggests that tissue- and condition-specific ceRNA dynamics may fluctuate. However, there are no comprehensive studies investigating the ceRNA interactions in normal tissue, ceRNAs that are lost and/or appear in cancerous tissues or their interactions. In this study, we comprehensively analyzed the tumor-specific ceRNA fluctuations observed in the three highest-incidence cancers, LUAD, PRAD, and BRCA, compared to healthy lung, prostate, and breast tissues, respectively. Our observations pertaining to tumor-specific competing endogenous RNA (ceRNA) interactions revealed that, in the cases of lung adenocarcinoma (LUAD), prostate adenocarcinoma (PRAD), and breast invasive carcinoma (BRCA), 3,204, 1,233, and 406 ceRNAs, respectively, engage in post-transcriptional intercommunication within tumor tissues, in contrast to their absence in corresponding healthy samples. We also found that 90 ceRNAs are shared by the three cancer types and that these ceRNAs participate in ceRNA interactions in tumor tissues compared to those in normal tissues. Among the 90 ceRNAs that directly interact with miRNAs, we uncovered a core network of 165 miRNAs and 63 ceRNAs that should be considered in RNA-targeted and RNA-mediated approaches in future studies and could be used in these three aggressive cancer types. More specifically, in this core interaction network, ceRNAs such as GALNT7, KLF9, and DAB2 and miRNAs like miR-106a/b-5p, miR-20a-5p, and miR-519d-3p may have potential as common targets in the three critical cancers. In contrast to conventional methods that construct ceRNA networks using differentially expressed genes compared to normal tissues, our proposed approach identifies ceRNA players by considering their context within the ceRNA:miRNA interactions. Our results have the potential to reveal distinct and common ceRNA interactions in cancer types and to pinpoint critical RNAs, thereby paving the way for RNA-based strategies in the battle against cancer.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":""},"PeriodicalIF":4.5,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140614489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of network-guided random forest for disease gene discovery 评估用于疾病基因发现的网络引导随机森林
IF 4.5 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-04-16 DOI: 10.1186/s13040-024-00361-5
Jianchang Hu, Silke Szymczak
Gene network information is believed to be beneficial for disease module and pathway identification, but has not been explicitly utilized in the standard random forest (RF) algorithm for gene expression data analysis. We investigate the performance of a network-guided RF where the network information is summarized into a sampling probability of predictor variables which is further used in the construction of the RF. Our simulation results suggest that network-guided RF does not provide better disease prediction than the standard RF. In terms of disease gene discovery, if disease genes form module(s), network-guided RF identifies them more accurately. In addition, when disease status is independent from genes in the given network, spurious gene selection results can occur when using network information, especially on hub genes. Our empirical analysis on two balanced microarray and RNA-Seq breast cancer datasets from The Cancer Genome Atlas (TCGA) for classification of progesterone receptor (PR) status also demonstrates that network-guided RF can identify genes from PGR-related pathways, which leads to a better connected module of identified genes. Gene networks can provide additional information to aid the gene expression analysis for disease module and pathway identification. But they need to be used with caution and validation on the results need to be carried out to guard against spurious gene selection. More robust approaches to incorporate such information into RF construction also warrant further study.
基因网络信息被认为有利于疾病模块和通路的识别,但在用于基因表达数据分析的标准随机森林(RF)算法中尚未得到明确利用。我们研究了网络引导的 RF 的性能,在这种 RF 中,网络信息被归纳为预测变量的抽样概率,并进一步用于构建 RF。我们的模拟结果表明,与标准 RF 相比,网络引导 RF 并不能提供更好的疾病预测。在疾病基因发现方面,如果疾病基因形成模块,网络引导 RF 能更准确地识别它们。此外,当疾病状态与给定网络中的基因无关时,使用网络信息可能会出现虚假的基因选择结果,尤其是在枢纽基因上。我们对来自癌症基因组图谱(TCGA)的两个平衡微阵列和 RNA-Seq 乳腺癌数据集进行了实证分析,以对孕酮受体(PR)状态进行分类,结果也表明网络引导的 RF 可以识别 PGR 相关通路中的基因,从而产生连接性更好的已识别基因模块。基因网络可以为疾病模块和通路识别的基因表达分析提供额外的辅助信息。但需要谨慎使用,并对结果进行验证,以防止虚假的基因选择。将此类信息纳入 RF 构建的更稳健方法也值得进一步研究。
{"title":"Evaluation of network-guided random forest for disease gene discovery","authors":"Jianchang Hu, Silke Szymczak","doi":"10.1186/s13040-024-00361-5","DOIUrl":"https://doi.org/10.1186/s13040-024-00361-5","url":null,"abstract":"Gene network information is believed to be beneficial for disease module and pathway identification, but has not been explicitly utilized in the standard random forest (RF) algorithm for gene expression data analysis. We investigate the performance of a network-guided RF where the network information is summarized into a sampling probability of predictor variables which is further used in the construction of the RF. Our simulation results suggest that network-guided RF does not provide better disease prediction than the standard RF. In terms of disease gene discovery, if disease genes form module(s), network-guided RF identifies them more accurately. In addition, when disease status is independent from genes in the given network, spurious gene selection results can occur when using network information, especially on hub genes. Our empirical analysis on two balanced microarray and RNA-Seq breast cancer datasets from The Cancer Genome Atlas (TCGA) for classification of progesterone receptor (PR) status also demonstrates that network-guided RF can identify genes from PGR-related pathways, which leads to a better connected module of identified genes. Gene networks can provide additional information to aid the gene expression analysis for disease module and pathway identification. But they need to be used with caution and validation on the results need to be carried out to guard against spurious gene selection. More robust approaches to incorporate such information into RF construction also warrant further study.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"55 1","pages":""},"PeriodicalIF":4.5,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biodata Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1