Biodata Mining最新文献_第8页

Electronic medical records imputation by temporal Generative Adversarial Network. 利用时态生成对抗网络估算电子病历。

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2024-06-26 DOI: 10.1186/s13040-024-00372-2

Yunfei Yin, Zheng Yuan, Islam Md Tanvir, Xianjian Bao

The loss of electronic medical records has seriously affected the practical application of biomedical data. Therefore, it is a meaningful research effort to effectively fill these lost data. Currently, state-of-the-art methods focus on using Generative Adversarial Networks (GANs) to fill the missing values of electronic medical records, achieving breakthrough progress. However, when facing datasets with high missing rates, the imputation accuracy of these methods sharply deceases. This motivates us to explore the uncertainty of GANs and improve the GAN-based imputation methods. In this paper, the GRUD (Gate Recurrent Unit Decay) network and the UGAN (Uncertainty Generative Adversarial Network) are proposed and organically combined, called UGAN-GRUD. In UGAN-GRUD, it highlights using GAN to generate imputation values and then leveraging GRUD to compensate them. We have designed the UGAN and the GRUD network. The former is employed to learn the distribution pattern and uncertainty of data through the Generator and Discriminator, iteratively. The latter is exploited to compensate the former by leveraging the GRUD based on time decay factor, which can learn the specific temporal relations in electronic medical records. Through experimental research on publicly available biomedical datasets, the results show that UGAN-GRUD outperforms the current state-of-the-art methods, with average 13% RMSE (Root Mean Squared Error) and 24.5% MAPE (Mean Absolute Percentage Error) improvements.

电子病历的丢失严重影响了生物医学数据的实际应用。因此，有效填补这些丢失的数据是一项有意义的研究工作。目前，最先进的方法主要是使用生成对抗网络（GAN）来填补电子病历的缺失值，并取得了突破性进展。然而，当面对高缺失率的数据集时，这些方法的估算准确性会急剧下降。这促使我们探索 GAN 的不确定性，并改进基于 GAN 的估算方法。本文提出 GRUD（门递归单元衰减）网络和 UGAN（不确定性生成对抗网络），并将其有机地结合起来，称为 UGAN-GRUD。在 UGAN-GRUD 中，它强调使用 GAN 生成估算值，然后利用 GRUD 对其进行补偿。我们设计了 UGAN 和 GRUD 网络。前者通过生成器和判别器反复学习数据的分布模式和不确定性。后者则利用基于时间衰减因子的 GRUD 来弥补前者的不足，后者可以学习电子病历中的特定时间关系。通过对公开生物医学数据集的实验研究，结果表明 UGAN-GRUD 优于目前最先进的方法，平均 RMSE（均方根误差）提高了 13%，MAPE（平均绝对误差）提高了 24.5%。

{"title":"Electronic medical records imputation by temporal Generative Adversarial Network.","authors":"Yunfei Yin, Zheng Yuan, Islam Md Tanvir, Xianjian Bao","doi":"10.1186/s13040-024-00372-2","DOIUrl":"10.1186/s13040-024-00372-2","url":null,"abstract":"The loss of electronic medical records has seriously affected the practical application of biomedical data. Therefore, it is a meaningful research effort to effectively fill these lost data. Currently, state-of-the-art methods focus on using Generative Adversarial Networks (GANs) to fill the missing values of electronic medical records, achieving breakthrough progress. However, when facing datasets with high missing rates, the imputation accuracy of these methods sharply deceases. This motivates us to explore the uncertainty of GANs and improve the GAN-based imputation methods. In this paper, the GRUD (Gate Recurrent Unit Decay) network and the UGAN (Uncertainty Generative Adversarial Network) are proposed and organically combined, called UGAN-GRUD. In UGAN-GRUD, it highlights using GAN to generate imputation values and then leveraging GRUD to compensate them. We have designed the UGAN and the GRUD network. The former is employed to learn the distribution pattern and uncertainty of data through the Generator and Discriminator, iteratively. The latter is exploited to compensate the former by leveraging the GRUD based on time decay factor, which can learn the specific temporal relations in electronic medical records. Through experimental research on publicly available biomedical datasets, the results show that UGAN-GRUD outperforms the current state-of-the-art methods, with average 13% RMSE (Root Mean Squared Error) and 24.5% MAPE (Mean Absolute Percentage Error) improvements.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"19"},"PeriodicalIF":4.0,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11202349/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141460183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Saliency-driven explainable deep learning in medical imaging: bridging visual explainability and statistical quantitative analysis. 医学成像中的显著性驱动可解释深度学习：连接视觉可解释性与统计定量分析。

IF 4 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2024-06-22 DOI: 10.1186/s13040-024-00370-4

Yusuf Brima, Marcellin Atemkeng

Deep learning shows great promise for medical image analysis but often lacks explainability, hindering its adoption in healthcare. Attribution techniques that explain model reasoning can potentially increase trust in deep learning among clinical stakeholders. In the literature, much of the research on attribution in medical imaging focuses on visual inspection rather than statistical quantitative analysis.In this paper, we proposed an image-based saliency framework to enhance the explainability of deep learning models in medical image analysis. We use adaptive path-based gradient integration, gradient-free techniques, and class activation mapping along with its derivatives to attribute predictions from brain tumor MRI and COVID-19 chest X-ray datasets made by recent deep convolutional neural network models.The proposed framework integrates qualitative and statistical quantitative assessments, employing Accuracy Information Curves (AICs) and Softmax Information Curves (SICs) to measure the effectiveness of saliency methods in retaining critical image information and their correlation with model predictions. Visual inspections indicate that methods such as ScoreCAM, XRAI, GradCAM, and GradCAM++ consistently produce focused and clinically interpretable attribution maps. These methods highlighted possible biomarkers, exposed model biases, and offered insights into the links between input features and predictions, demonstrating their ability to elucidate model reasoning on these datasets. Empirical evaluations reveal that ScoreCAM and XRAI are particularly effective in retaining relevant image regions, as reflected in their higher AUC values. However, SICs highlight variability, with instances of random saliency masks outperforming established methods, emphasizing the need for combining visual and empirical metrics for a comprehensive evaluation.The results underscore the importance of selecting appropriate saliency methods for specific medical imaging tasks and suggest that combining qualitative and quantitative approaches can enhance the transparency, trustworthiness, and clinical adoption of deep learning models in healthcare. This study advances model explainability to increase trust in deep learning among healthcare stakeholders by revealing the rationale behind predictions. Future research should refine empirical metrics for stability and reliability, include more diverse imaging modalities, and focus on improving model explainability to support clinical decision-making.

深度学习在医学图像分析方面大有可为，但往往缺乏可解释性，阻碍了其在医疗保健领域的应用。解释模型推理的归因技术有可能增加临床利益相关者对深度学习的信任。本文提出了一个基于图像的显著性框架，以增强深度学习模型在医学图像分析中的可解释性。我们使用基于路径的自适应梯度积分、无梯度技术和类激活映射及其衍生物，对最近的深度卷积神经网络模型从脑肿瘤 MRI 和 COVID-19 胸部 X 光数据集中得出的预测结果进行归因。所提出的框架综合了定性和统计定量评估，使用准确度信息曲线（AIC）和软最大信息曲线（SIC）来衡量突出度方法在保留关键图像信息方面的有效性及其与模型预测的相关性。目测结果表明，ScoreCAM、XRAI、GradCAM 和 GradCAM++ 等方法能持续生成重点突出、临床可解释的归因图。这些方法突出了可能的生物标记物，暴露了模型偏差，并提供了输入特征与预测之间联系的见解，证明了它们在这些数据集上阐明模型推理的能力。经验评估显示，ScoreCAM 和 XRAI 在保留相关图像区域方面特别有效，这反映在它们较高的 AUC 值上。结果强调了为特定医学成像任务选择合适的突出度方法的重要性，并表明结合定性和定量方法可以提高深度学习模型在医疗保健领域的透明度、可信度和临床应用。本研究通过揭示预测背后的原理，提高了模型的可解释性，从而增加了医疗保健利益相关者对深度学习的信任。未来的研究应完善稳定性和可靠性的经验指标，纳入更多不同的成像模式，并侧重于提高模型的可解释性，以支持临床决策。

{"title":"Saliency-driven explainable deep learning in medical imaging: bridging visual explainability and statistical quantitative analysis.","authors":"Yusuf Brima, Marcellin Atemkeng","doi":"10.1186/s13040-024-00370-4","DOIUrl":"10.1186/s13040-024-00370-4","url":null,"abstract":"Deep learning shows great promise for medical image analysis but often lacks explainability, hindering its adoption in healthcare. Attribution techniques that explain model reasoning can potentially increase trust in deep learning among clinical stakeholders. In the literature, much of the research on attribution in medical imaging focuses on visual inspection rather than statistical quantitative analysis.In this paper, we proposed an image-based saliency framework to enhance the explainability of deep learning models in medical image analysis. We use adaptive path-based gradient integration, gradient-free techniques, and class activation mapping along with its derivatives to attribute predictions from brain tumor MRI and COVID-19 chest X-ray datasets made by recent deep convolutional neural network models.The proposed framework integrates qualitative and statistical quantitative assessments, employing Accuracy Information Curves (AICs) and Softmax Information Curves (SICs) to measure the effectiveness of saliency methods in retaining critical image information and their correlation with model predictions. Visual inspections indicate that methods such as ScoreCAM, XRAI, GradCAM, and GradCAM++ consistently produce focused and clinically interpretable attribution maps. These methods highlighted possible biomarkers, exposed model biases, and offered insights into the links between input features and predictions, demonstrating their ability to elucidate model reasoning on these datasets. Empirical evaluations reveal that ScoreCAM and XRAI are particularly effective in retaining relevant image regions, as reflected in their higher AUC values. However, SICs highlight variability, with instances of random saliency masks outperforming established methods, emphasizing the need for combining visual and empirical metrics for a comprehensive evaluation.The results underscore the importance of selecting appropriate saliency methods for specific medical imaging tasks and suggest that combining qualitative and quantitative approaches can enhance the transparency, trustworthiness, and clinical adoption of deep learning models in healthcare. This study advances model explainability to increase trust in deep learning among healthcare stakeholders by revealing the rationale behind predictions. Future research should refine empirical metrics for stability and reliability, include more diverse imaging modalities, and focus on improving model explainability to support clinical decision-making.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"18"},"PeriodicalIF":4.0,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11193223/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141440989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using GPT-4 to write a scientific review article: a pilot evaluation study. 使用 GPT-4 撰写科学评论文章：试点评估研究。

IF 4.5 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2024-06-18 DOI: 10.1186/s13040-024-00371-3

Zhiping Paul Wang, Priyanka Bhandary, Yizhou Wang, Jason H Moore

GPT-4, as the most advanced version of OpenAI's large language models, has attracted widespread attention, rapidly becoming an indispensable AI tool across various areas. This includes its exploration by scientists for diverse applications. Our study focused on assessing GPT-4's capabilities in generating text, tables, and diagrams for biomedical review papers. We also assessed the consistency in text generation by GPT-4, along with potential plagiarism issues when employing this model for the composition of scientific review papers. Based on the results, we suggest the development of enhanced functionalities in ChatGPT, aiming to meet the needs of the scientific community more effectively. This includes enhancements in uploaded document processing for reference materials, a deeper grasp of intricate biomedical concepts, more precise and efficient information distillation for table generation, and a further refined model specifically tailored for scientific diagram creation.

作为 OpenAI 大型语言模型的最高级版本，GPT-4 已引起广泛关注，并迅速成为各个领域不可或缺的人工智能工具。这包括科学家们对其在不同应用领域的探索。我们的研究重点是评估 GPT-4 为生物医学综述论文生成文本、表格和图表的能力。我们还评估了 GPT-4 生成文本的一致性，以及使用该模型撰写科学评论论文时可能存在的抄袭问题。基于这些结果，我们建议开发 ChatGPT 的增强功能，以更有效地满足科学界的需求。这包括加强对参考资料上传文档的处理，更深入地掌握复杂的生物医学概念，更精确、更高效地提炼信息以生成表格，以及进一步完善专门用于科学图表创建的模型。

引用次数: 0

Unveiling wearables: exploring the global landscape of biometric applications and vital signs and behavioral impact. 揭开可穿戴设备的神秘面纱：探索生物识别应用和生命体征及行为影响的全球格局。

IF 4.5 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2024-06-11 DOI: 10.1186/s13040-024-00368-y

Carolina Del-Valle-Soto, Ramon A Briseño, Leonardo J Valdivia, Juan Arturo Nolazco-Flores

The development of neuroscientific techniques enabling the recording of brain and peripheral nervous system activity has fueled research in cognitive science. Recent technological advancements offer new possibilities for inducing behavioral change, particularly through cost-effective Internet-based interventions. However, limitations in laboratory equipment volume have hindered the generalization of results to real-life contexts. The advent of Internet of Things (IoT) devices, such as wearables, equipped with sensors and microchips, has ushered in a new era in behavior change techniques. Wearables, including smartwatches, electronic tattoos, and more, are poised for massive adoption, with an expected annual growth rate of 55% over the next five years. These devices enable personalized instructions, leading to increased productivity and efficiency, particularly in industrial production. Additionally, the healthcare sector has seen a significant demand for wearables, with over 80% of global consumers willing to use them for health monitoring. This research explores the primary biometric applications of wearables and their impact on users' well-being, focusing on the integration of behavior change techniques facilitated by IoT devices. Wearables have revolutionized health monitoring by providing real-time feedback, personalized interventions, and gamification. They encourage positive behavior changes by delivering immediate feedback, tailored recommendations, and gamified experiences, leading to sustained improvements in health. Furthermore, wearables seamlessly integrate with digital platforms, enhancing their impact through social support and connectivity. However, privacy and data security concerns must be addressed to maintain users' trust. As technology continues to advance, the refinement of IoT devices' design and functionality is crucial for promoting behavior change and improving health outcomes. This study aims to investigate the effects of behavior change techniques facilitated by wearables on individuals' health outcomes and the role of wearables in promoting a healthier lifestyle.

能够记录大脑和周围神经系统活动的神经科学技术的发展推动了认知科学的研究。最近的技术进步为诱导行为改变提供了新的可能性，特别是通过经济有效的互联网干预。然而，实验室设备数量的限制阻碍了将结果推广到现实生活中。物联网（IoT）设备（如配备传感器和微型芯片的可穿戴设备）的出现开创了行为改变技术的新时代。包括智能手表、电子纹身等在内的可穿戴设备将得到大规模应用，预计未来五年的年增长率将达到 55%。这些设备可提供个性化指导，从而提高生产力和效率，特别是在工业生产领域。此外，医疗保健领域对可穿戴设备的需求也非常大，全球超过 80% 的消费者愿意使用可穿戴设备进行健康监测。本研究探讨了可穿戴设备的主要生物识别应用及其对用户健康的影响，重点是物联网设备促进的行为改变技术的整合。可穿戴设备通过提供实时反馈、个性化干预和游戏化，彻底改变了健康监测。它们通过提供即时反馈、量身定制的建议和游戏化体验来鼓励积极的行为改变，从而持续改善健康状况。此外，可穿戴设备还能与数字平台无缝集成，通过社会支持和连接增强其影响力。然而，为了维护用户的信任，必须解决隐私和数据安全问题。随着技术的不断进步，完善物联网设备的设计和功能对于促进行为改变和改善健康状况至关重要。本研究旨在调查可穿戴设备促进行为改变技术对个人健康结果的影响，以及可穿戴设备在促进更健康生活方式中的作用。

{"title":"Unveiling wearables: exploring the global landscape of biometric applications and vital signs and behavioral impact.","authors":"Carolina Del-Valle-Soto, Ramon A Briseño, Leonardo J Valdivia, Juan Arturo Nolazco-Flores","doi":"10.1186/s13040-024-00368-y","DOIUrl":"10.1186/s13040-024-00368-y","url":null,"abstract":"The development of neuroscientific techniques enabling the recording of brain and peripheral nervous system activity has fueled research in cognitive science. Recent technological advancements offer new possibilities for inducing behavioral change, particularly through cost-effective Internet-based interventions. However, limitations in laboratory equipment volume have hindered the generalization of results to real-life contexts. The advent of Internet of Things (IoT) devices, such as wearables, equipped with sensors and microchips, has ushered in a new era in behavior change techniques. Wearables, including smartwatches, electronic tattoos, and more, are poised for massive adoption, with an expected annual growth rate of 55% over the next five years. These devices enable personalized instructions, leading to increased productivity and efficiency, particularly in industrial production. Additionally, the healthcare sector has seen a significant demand for wearables, with over 80% of global consumers willing to use them for health monitoring. This research explores the primary biometric applications of wearables and their impact on users' well-being, focusing on the integration of behavior change techniques facilitated by IoT devices. Wearables have revolutionized health monitoring by providing real-time feedback, personalized interventions, and gamification. They encourage positive behavior changes by delivering immediate feedback, tailored recommendations, and gamified experiences, leading to sustained improvements in health. Furthermore, wearables seamlessly integrate with digital platforms, enhancing their impact through social support and connectivity. However, privacy and data security concerns must be addressed to maintain users' trust. As technology continues to advance, the refinement of IoT devices' design and functionality is crucial for promoting behavior change and improving health outcomes. This study aims to investigate the effects of behavior change techniques facilitated by wearables on individuals' health outcomes and the role of wearables in promoting a healthier lifestyle.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"15"},"PeriodicalIF":4.5,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11165804/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141307145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The biomedical knowledge graph of symptom phenotype in coronary artery plaque: machine learning-based analysis of real-world clinical data. 冠状动脉斑块症状表型的生物医学知识图谱：基于机器学习的真实世界临床数据分析。

IF 4.5 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2024-05-21 DOI: 10.1186/s13040-024-00365-1

Jia-Ming Huan, Xiao-Jie Wang, Yuan Li, Shi-Jun Zhang, Yuan-Long Hu, Yun-Lun Li

A knowledge graph can effectively showcase the essential characteristics of data and is increasingly emerging as a significant means of integrating information in the field of artificial intelligence. Coronary artery plaque represents a significant etiology of cardiovascular events, posing a diagnostic challenge for clinicians who are confronted with a multitude of nonspecific symptoms. To visualize the hierarchical relationship network graph of the molecular mechanisms underlying plaque properties and symptom phenotypes, patient symptomatology was extracted from electronic health record data from real-world clinical settings. Phenotypic networks were constructed utilizing clinical data and protein‒protein interaction networks. Machine learning techniques, including convolutional neural networks, Dijkstra's algorithm, and gene ontology semantic similarity, were employed to quantify clinical and biological features within the network. The resulting features were then utilized to train a K-nearest neighbor model, yielding 23 symptoms, 41 association rules, and 61 hub genes across the three types of plaques studied, achieving an area under the curve of 92.5%. Weighted correlation network analysis and pathway enrichment were subsequently utilized to identify lipid status-related genes and inflammation-associated pathways that could help explain the differences in plaque properties. To confirm the validity of the network graph model, we conducted coexpression analysis of the hub genes to evaluate their potential diagnostic value. Additionally, we investigated immune cell infiltration, examined the correlations between hub genes and immune cells, and validated the reliability of the identified biological pathways. By integrating clinical data and molecular network information, this biomedical knowledge graph model effectively elucidated the potential molecular mechanisms that collude symptoms, diseases, and molecules.

知识图谱可以有效地展示数据的基本特征，并日益成为人工智能领域整合信息的重要手段。冠状动脉斑块是心血管事件的一个重要病因，给临床医生带来了诊断上的挑战，因为他们要面对众多非特异性症状。为了可视化斑块特性和症状表型的分子机制的层次关系网络图，我们从真实世界临床环境的电子健康记录数据中提取了患者症状。利用临床数据和蛋白质-蛋白质相互作用网络构建了表型网络。采用卷积神经网络、Dijkstra 算法和基因本体语义相似性等机器学习技术来量化网络中的临床和生物特征。然后利用由此产生的特征来训练 K 最近邻模型，在研究的三种斑块中得出了 23 种症状、41 条关联规则和 61 个中心基因，曲线下面积达到 92.5%。随后，研究人员利用加权相关网络分析和通路富集来确定与脂质状态相关的基因和与炎症相关的通路，这些基因和通路有助于解释斑块特性的差异。为了证实网络图模型的有效性，我们对中心基因进行了共表达分析，以评估其潜在的诊断价值。此外，我们还调查了免疫细胞浸润情况，研究了枢纽基因与免疫细胞之间的相关性，并验证了所识别生物通路的可靠性。通过整合临床数据和分子网络信息，该生物医学知识图谱模型有效地阐明了症状、疾病和分子之间的潜在分子机制。

{"title":"The biomedical knowledge graph of symptom phenotype in coronary artery plaque: machine learning-based analysis of real-world clinical data.","authors":"Jia-Ming Huan, Xiao-Jie Wang, Yuan Li, Shi-Jun Zhang, Yuan-Long Hu, Yun-Lun Li","doi":"10.1186/s13040-024-00365-1","DOIUrl":"10.1186/s13040-024-00365-1","url":null,"abstract":"A knowledge graph can effectively showcase the essential characteristics of data and is increasingly emerging as a significant means of integrating information in the field of artificial intelligence. Coronary artery plaque represents a significant etiology of cardiovascular events, posing a diagnostic challenge for clinicians who are confronted with a multitude of nonspecific symptoms. To visualize the hierarchical relationship network graph of the molecular mechanisms underlying plaque properties and symptom phenotypes, patient symptomatology was extracted from electronic health record data from real-world clinical settings. Phenotypic networks were constructed utilizing clinical data and protein‒protein interaction networks. Machine learning techniques, including convolutional neural networks, Dijkstra's algorithm, and gene ontology semantic similarity, were employed to quantify clinical and biological features within the network. The resulting features were then utilized to train a K-nearest neighbor model, yielding 23 symptoms, 41 association rules, and 61 hub genes across the three types of plaques studied, achieving an area under the curve of 92.5%. Weighted correlation network analysis and pathway enrichment were subsequently utilized to identify lipid status-related genes and inflammation-associated pathways that could help explain the differences in plaque properties. To confirm the validity of the network graph model, we conducted coexpression analysis of the hub genes to evaluate their potential diagnostic value. Additionally, we investigated immune cell infiltration, examined the correlations between hub genes and immune cells, and validated the reliability of the identified biological pathways. By integrating clinical data and molecular network information, this biomedical knowledge graph model effectively elucidated the potential molecular mechanisms that collude symptoms, diseases, and molecules.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"13"},"PeriodicalIF":4.5,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11110203/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141077027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Machine-learning-based models to predict cardiovascular risk using oculomics and clinic variables in KNHANES 基于机器学习的模型，利用 KNHANES 中的眼动学和临床变量预测心血管风险

IF 4.5 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2024-04-22 DOI: 10.1186/s13040-024-00363-3

Yuqi Zhang, Sijin Li, Weijie Wu, Yanqing Zhao, Jintao Han, Chao Tong, Niansang Luo, Kun Zhang

Recent researches have found a strong correlation between the triglyceride-glucose (TyG) index or the atherogenic index of plasma (AIP) and cardiovascular disease (CVD) risk. However, there is a lack of research on non-invasive and rapid prediction of cardiovascular risk. We aimed to develop and validate a machine-learning model for predicting cardiovascular risk based on variables encompassing clinical questionnaires and oculomics. We collected data from the Korean National Health and Nutrition Examination Survey (KNHANES). The training dataset (80% from the year 2008 to 2011 KNHANES) was used for machine learning model development, with internal validation using the remaining 20%. An external validation dataset from the year 2012 assessed the model’s predictive capacity for TyG-index or AIP in new cases. We included 32122 participants in the final dataset. Machine learning models used 25 algorithms were trained on oculomics measurements and clinical questionnaires to predict the range of TyG-index and AIP. The area under the receiver operating characteristic curve (AUC), accuracy, precision, recall, and F1 score were used to evaluate the performance of our machine learning models. Based on large-scale cohort studies, we determined TyG-index cut-off points at 8.0, 8.75 (upper one-third values), 8.93 (upper one-fourth values), and AIP cut-offs at 0.318, 0.34. Values surpassing these thresholds indicated elevated cardiovascular risk. The best-performing algorithm revealed TyG-index cut-offs at 8.0, 8.75, and 8.93 with internal validation AUCs of 0.812, 0.873, and 0.911, respectively. External validation AUCs were 0.809, 0.863, and 0.901. For AIP at 0.34, internal and external validation achieved similar AUCs of 0.849 and 0.842. Slightly lower performance was seen for the 0.318 cut-off, with AUCs of 0.844 and 0.836. Significant gender-based variations were noted for TyG-index at 8 (male AUC=0.832, female AUC=0.790) and 8.75 (male AUC=0.874, female AUC=0.862) and AIP at 0.318 (male AUC=0.853, female AUC=0.825) and 0.34 (male AUC=0.858, female AUC=0.831). Gender similarity in AUC (male AUC=0.907 versus female AUC=0.906) was observed only when the TyG-index cut-off point equals 8.93. We have established a simple and effective non-invasive machine learning model that has good clinical value for predicting cardiovascular risk in the general population.

最近的研究发现，甘油三酯-葡萄糖（TyG）指数或血浆致动脉粥样硬化指数（AIP）与心血管疾病（CVD）风险之间存在密切联系。然而，目前还缺乏对心血管风险进行无创、快速预测的研究。我们的目的是开发并验证一种基于临床问卷和眼科变量的机器学习模型，用于预测心血管风险。我们从韩国国民健康与营养调查（KNHANES）中收集了数据。训练数据集（80%来自2008年至2011年的KNHANES）用于机器学习模型的开发，其余20%用于内部验证。2012年的外部验证数据集评估了模型对新病例中TyG指数或AIP的预测能力。我们在最终数据集中纳入了 32122 名参与者。机器学习模型使用 25 种算法，通过眼科测量和临床问卷进行训练，以预测 TyG 指数和 AIP 的范围。接受者操作特征曲线下面积（AUC）、准确度、精确度、召回率和 F1 分数用于评估机器学习模型的性能。根据大规模队列研究，我们将 TyG 指数临界点定为 8.0、8.75（上三分之一值）、8.93（上四分之一值），将 AIP 临界点定为 0.318、0.34。超过这些临界值表明心血管风险升高。表现最好的算法显示 TyG 指数临界值为 8.0、8.75 和 8.93，内部验证 AUC 分别为 0.812、0.873 和 0.911。外部验证的 AUC 分别为 0.809、0.863 和 0.901。对于 0.34 的 AIP，内部和外部验证的 AUC 相似，分别为 0.849 和 0.842。在 0.318 临界值时，AUC 分别为 0.844 和 0.836，表现略低。TyG指数在8（男性AUC=0.832，女性AUC=0.790）和8.75（男性AUC=0.874，女性AUC=0.862）以及AIP指数在0.318（男性AUC=0.853，女性AUC=0.825）和0.34（男性AUC=0.858，女性AUC=0.831）时有显著的性别差异。只有当 TyG 指数临界点等于 8.93 时，才能观察到 AUC 的性别相似性（男性 AUC=0.907 对女性 AUC=0.906）。我们建立了一个简单有效的无创机器学习模型，该模型对预测普通人群的心血管风险具有良好的临床价值。

{"title":"Machine-learning-based models to predict cardiovascular risk using oculomics and clinic variables in KNHANES","authors":"Yuqi Zhang, Sijin Li, Weijie Wu, Yanqing Zhao, Jintao Han, Chao Tong, Niansang Luo, Kun Zhang","doi":"10.1186/s13040-024-00363-3","DOIUrl":"https://doi.org/10.1186/s13040-024-00363-3","url":null,"abstract":"Recent researches have found a strong correlation between the triglyceride-glucose (TyG) index or the atherogenic index of plasma (AIP) and cardiovascular disease (CVD) risk. However, there is a lack of research on non-invasive and rapid prediction of cardiovascular risk. We aimed to develop and validate a machine-learning model for predicting cardiovascular risk based on variables encompassing clinical questionnaires and oculomics. We collected data from the Korean National Health and Nutrition Examination Survey (KNHANES). The training dataset (80% from the year 2008 to 2011 KNHANES) was used for machine learning model development, with internal validation using the remaining 20%. An external validation dataset from the year 2012 assessed the model’s predictive capacity for TyG-index or AIP in new cases. We included 32122 participants in the final dataset. Machine learning models used 25 algorithms were trained on oculomics measurements and clinical questionnaires to predict the range of TyG-index and AIP. The area under the receiver operating characteristic curve (AUC), accuracy, precision, recall, and F1 score were used to evaluate the performance of our machine learning models. Based on large-scale cohort studies, we determined TyG-index cut-off points at 8.0, 8.75 (upper one-third values), 8.93 (upper one-fourth values), and AIP cut-offs at 0.318, 0.34. Values surpassing these thresholds indicated elevated cardiovascular risk. The best-performing algorithm revealed TyG-index cut-offs at 8.0, 8.75, and 8.93 with internal validation AUCs of 0.812, 0.873, and 0.911, respectively. External validation AUCs were 0.809, 0.863, and 0.901. For AIP at 0.34, internal and external validation achieved similar AUCs of 0.849 and 0.842. Slightly lower performance was seen for the 0.318 cut-off, with AUCs of 0.844 and 0.836. Significant gender-based variations were noted for TyG-index at 8 (male AUC=0.832, female AUC=0.790) and 8.75 (male AUC=0.874, female AUC=0.862) and AIP at 0.318 (male AUC=0.853, female AUC=0.825) and 0.34 (male AUC=0.858, female AUC=0.831). Gender similarity in AUC (male AUC=0.907 versus female AUC=0.906) was observed only when the TyG-index cut-off point equals 8.93. We have established a simple and effective non-invasive machine learning model that has good clinical value for predicting cardiovascular risk in the general population.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"114 1","pages":""},"PeriodicalIF":4.5,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140634495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Decoding dynamic miRNA:ceRNA interactions unveils therapeutic insights and targets across predominant cancer landscapes 解码动态 miRNA:ceRNA 相互作用，揭示主要癌症景观中的治疗见解和靶点

IF 4.5 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2024-04-17 DOI: 10.1186/s13040-024-00362-4

Selcen Ari Yuka, Alper Yilmaz

Competing endogenous RNAs play key roles in cellular molecular mechanisms through cross-talk in post-transcriptional interactions. Studies on ceRNA cross-talk, which is particularly dependent on the abundance of free transcripts, generally involve large- and small-scale studies involving the integration of transcriptomic data from tissues and correlation analyses. This abundance-dependent nature of ceRNA interactions suggests that tissue- and condition-specific ceRNA dynamics may fluctuate. However, there are no comprehensive studies investigating the ceRNA interactions in normal tissue, ceRNAs that are lost and/or appear in cancerous tissues or their interactions. In this study, we comprehensively analyzed the tumor-specific ceRNA fluctuations observed in the three highest-incidence cancers, LUAD, PRAD, and BRCA, compared to healthy lung, prostate, and breast tissues, respectively. Our observations pertaining to tumor-specific competing endogenous RNA (ceRNA) interactions revealed that, in the cases of lung adenocarcinoma (LUAD), prostate adenocarcinoma (PRAD), and breast invasive carcinoma (BRCA), 3,204, 1,233, and 406 ceRNAs, respectively, engage in post-transcriptional intercommunication within tumor tissues, in contrast to their absence in corresponding healthy samples. We also found that 90 ceRNAs are shared by the three cancer types and that these ceRNAs participate in ceRNA interactions in tumor tissues compared to those in normal tissues. Among the 90 ceRNAs that directly interact with miRNAs, we uncovered a core network of 165 miRNAs and 63 ceRNAs that should be considered in RNA-targeted and RNA-mediated approaches in future studies and could be used in these three aggressive cancer types. More specifically, in this core interaction network, ceRNAs such as GALNT7, KLF9, and DAB2 and miRNAs like miR-106a/b-5p, miR-20a-5p, and miR-519d-3p may have potential as common targets in the three critical cancers. In contrast to conventional methods that construct ceRNA networks using differentially expressed genes compared to normal tissues, our proposed approach identifies ceRNA players by considering their context within the ceRNA:miRNA interactions. Our results have the potential to reveal distinct and common ceRNA interactions in cancer types and to pinpoint critical RNAs, thereby paving the way for RNA-based strategies in the battle against cancer.

竞争性内源 RNA 通过转录后相互作用的交叉作用在细胞分子机制中发挥关键作用。对 ceRNA 交叉作用的研究特别依赖于游离转录本的丰度，一般涉及大、小规模的研究，包括整合来自组织的转录组数据和相关性分析。ceRNA 相互作用的丰度依赖性表明，特定组织和条件的 ceRNA 动态可能会波动。然而，目前还没有全面的研究调查正常组织中的 ceRNA 相互作用、癌症组织中丢失和/或出现的 ceRNA 及其相互作用。在本研究中，我们全面分析了在三种高发癌症（LUAD、PRAD 和 BRCA）中观察到的肿瘤特异性 ceRNA 波动，并分别与健康肺组织、前列腺组织和乳腺组织进行了比较。我们对肿瘤特异性竞争性内源性 RNA（ceRNA）相互作用的观察结果显示，在肺腺癌（LUAD）、前列腺癌（PRAD）和乳腺浸润性癌（BRCA）病例中，分别有 3204、1233 和 406 个 ceRNA 在肿瘤组织内进行转录后互通，而在相应的健康样本中则没有。我们还发现，三种癌症类型共有 90 个 ceRNA，与正常组织相比，这些 ceRNA 参与了肿瘤组织中的 ceRNA 相互作用。在与 miRNAs 直接相互作用的 90 个 ceRNAs 中，我们发现了一个由 165 个 miRNAs 和 63 个 ceRNAs 组成的核心网络，在未来的研究中，RNA 靶向和 RNA 介导的方法应考虑这些核心网络，并可用于这三种侵袭性癌症类型。更具体地说，在这个核心相互作用网络中，GALNT7、KLF9 和 DAB2 等 ceRNA 和 miR-106a/b-5p 、miR-20a-5p 和 miR-519d-3p 等 miRNA 有可能成为这三种侵袭性癌症的共同靶点。与传统的利用与正常组织相比的差异表达基因构建 ceRNA 网络的方法不同，我们提出的方法是通过考虑 ceRNA:miRNA 相互作用中的上下文来识别 ceRNA 参与者。我们的研究结果有可能揭示癌症类型中独特和常见的 ceRNA 相互作用，并确定关键 RNA，从而为基于 RNA 的抗癌策略铺平道路。

{"title":"Decoding dynamic miRNA:ceRNA interactions unveils therapeutic insights and targets across predominant cancer landscapes","authors":"Selcen Ari Yuka, Alper Yilmaz","doi":"10.1186/s13040-024-00362-4","DOIUrl":"https://doi.org/10.1186/s13040-024-00362-4","url":null,"abstract":"Competing endogenous RNAs play key roles in cellular molecular mechanisms through cross-talk in post-transcriptional interactions. Studies on ceRNA cross-talk, which is particularly dependent on the abundance of free transcripts, generally involve large- and small-scale studies involving the integration of transcriptomic data from tissues and correlation analyses. This abundance-dependent nature of ceRNA interactions suggests that tissue- and condition-specific ceRNA dynamics may fluctuate. However, there are no comprehensive studies investigating the ceRNA interactions in normal tissue, ceRNAs that are lost and/or appear in cancerous tissues or their interactions. In this study, we comprehensively analyzed the tumor-specific ceRNA fluctuations observed in the three highest-incidence cancers, LUAD, PRAD, and BRCA, compared to healthy lung, prostate, and breast tissues, respectively. Our observations pertaining to tumor-specific competing endogenous RNA (ceRNA) interactions revealed that, in the cases of lung adenocarcinoma (LUAD), prostate adenocarcinoma (PRAD), and breast invasive carcinoma (BRCA), 3,204, 1,233, and 406 ceRNAs, respectively, engage in post-transcriptional intercommunication within tumor tissues, in contrast to their absence in corresponding healthy samples. We also found that 90 ceRNAs are shared by the three cancer types and that these ceRNAs participate in ceRNA interactions in tumor tissues compared to those in normal tissues. Among the 90 ceRNAs that directly interact with miRNAs, we uncovered a core network of 165 miRNAs and 63 ceRNAs that should be considered in RNA-targeted and RNA-mediated approaches in future studies and could be used in these three aggressive cancer types. More specifically, in this core interaction network, ceRNAs such as GALNT7, KLF9, and DAB2 and miRNAs like miR-106a/b-5p, miR-20a-5p, and miR-519d-3p may have potential as common targets in the three critical cancers. In contrast to conventional methods that construct ceRNA networks using differentially expressed genes compared to normal tissues, our proposed approach identifies ceRNA players by considering their context within the ceRNA:miRNA interactions. Our results have the potential to reveal distinct and common ceRNA interactions in cancer types and to pinpoint critical RNAs, thereby paving the way for RNA-based strategies in the battle against cancer.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":""},"PeriodicalIF":4.5,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140614489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluation of network-guided random forest for disease gene discovery 评估用于疾病基因发现的网络引导随机森林

IF 4.5 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2024-04-16 DOI: 10.1186/s13040-024-00361-5

Jianchang Hu, Silke Szymczak

Gene network information is believed to be beneficial for disease module and pathway identification, but has not been explicitly utilized in the standard random forest (RF) algorithm for gene expression data analysis. We investigate the performance of a network-guided RF where the network information is summarized into a sampling probability of predictor variables which is further used in the construction of the RF. Our simulation results suggest that network-guided RF does not provide better disease prediction than the standard RF. In terms of disease gene discovery, if disease genes form module(s), network-guided RF identifies them more accurately. In addition, when disease status is independent from genes in the given network, spurious gene selection results can occur when using network information, especially on hub genes. Our empirical analysis on two balanced microarray and RNA-Seq breast cancer datasets from The Cancer Genome Atlas (TCGA) for classification of progesterone receptor (PR) status also demonstrates that network-guided RF can identify genes from PGR-related pathways, which leads to a better connected module of identified genes. Gene networks can provide additional information to aid the gene expression analysis for disease module and pathway identification. But they need to be used with caution and validation on the results need to be carried out to guard against spurious gene selection. More robust approaches to incorporate such information into RF construction also warrant further study.

基因网络信息被认为有利于疾病模块和通路的识别，但在用于基因表达数据分析的标准随机森林（RF）算法中尚未得到明确利用。我们研究了网络引导的 RF 的性能，在这种 RF 中，网络信息被归纳为预测变量的抽样概率，并进一步用于构建 RF。我们的模拟结果表明，与标准 RF 相比，网络引导 RF 并不能提供更好的疾病预测。在疾病基因发现方面，如果疾病基因形成模块，网络引导 RF 能更准确地识别它们。此外，当疾病状态与给定网络中的基因无关时，使用网络信息可能会出现虚假的基因选择结果，尤其是在枢纽基因上。我们对来自癌症基因组图谱（TCGA）的两个平衡微阵列和 RNA-Seq 乳腺癌数据集进行了实证分析，以对孕酮受体（PR）状态进行分类，结果也表明网络引导的 RF 可以识别 PGR 相关通路中的基因，从而产生连接性更好的已识别基因模块。基因网络可以为疾病模块和通路识别的基因表达分析提供额外的辅助信息。但需要谨慎使用，并对结果进行验证，以防止虚假的基因选择。将此类信息纳入 RF 构建的更稳健方法也值得进一步研究。

{"title":"Evaluation of network-guided random forest for disease gene discovery","authors":"Jianchang Hu, Silke Szymczak","doi":"10.1186/s13040-024-00361-5","DOIUrl":"https://doi.org/10.1186/s13040-024-00361-5","url":null,"abstract":"Gene network information is believed to be beneficial for disease module and pathway identification, but has not been explicitly utilized in the standard random forest (RF) algorithm for gene expression data analysis. We investigate the performance of a network-guided RF where the network information is summarized into a sampling probability of predictor variables which is further used in the construction of the RF. Our simulation results suggest that network-guided RF does not provide better disease prediction than the standard RF. In terms of disease gene discovery, if disease genes form module(s), network-guided RF identifies them more accurately. In addition, when disease status is independent from genes in the given network, spurious gene selection results can occur when using network information, especially on hub genes. Our empirical analysis on two balanced microarray and RNA-Seq breast cancer datasets from The Cancer Genome Atlas (TCGA) for classification of progesterone receptor (PR) status also demonstrates that network-guided RF can identify genes from PGR-related pathways, which leads to a better connected module of identified genes. Gene networks can provide additional information to aid the gene expression analysis for disease module and pathway identification. But they need to be used with caution and validation on the results need to be carried out to guard against spurious gene selection. More robust approaches to incorporate such information into RF construction also warrant further study.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"55 1","pages":""},"PeriodicalIF":4.5,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MOCAT: multi-omics integration with auxiliary classifiers enhanced autoencoder MOCAT：带辅助分类器的多组学集成增强型自动编码器

IF 4.5 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2024-03-05 DOI: 10.1186/s13040-024-00360-6

Xiaohui Yao, Xiaohan Jiang, Haoran Luo, Hong Liang, Xiufen Ye, Yanhui Wei, Shan Cong

Integrating multi-omics data is emerging as a critical approach in enhancing our understanding of complex diseases. Innovative computational methods capable of managing high-dimensional and heterogeneous datasets are required to unlock the full potential of such rich and diverse data. We propose a Multi-Omics integration framework with auxiliary Classifiers-enhanced AuToencoders (MOCAT) to utilize intra- and inter-omics information comprehensively. Additionally, attention mechanisms with confidence learning are incorporated for enhanced feature representation and trustworthy prediction. Extensive experiments were conducted on four benchmark datasets to evaluate the effectiveness of our proposed model, including BRCA, ROSMAP, LGG, and KIPAN. Our model significantly improved most evaluation measurements and consistently surpassed the state-of-the-art methods. Ablation studies showed that the auxiliary classifiers significantly boosted classification accuracy in the ROSMAP and LGG datasets. Moreover, the attention mechanisms and confidence evaluation block contributed to improvements in the predictive accuracy and generalizability of our model. The proposed framework exhibits superior performance in disease classification and biomarker discovery, establishing itself as a robust and versatile tool for analyzing multi-layer biological data. This study highlights the significance of elaborated designed deep learning methodologies in dissecting complex disease phenotypes and improving the accuracy of disease predictions.

整合多组学数据正在成为增进我们对复杂疾病了解的一种重要方法。我们需要能够管理高维异构数据集的创新计算方法，以充分挖掘这些丰富多样数据的潜力。我们提出了一个多组学集成框架，该框架带有辅助分类器增强型 AuToencoders（MOCAT），可全面利用组学内部和组学之间的信息。此外，还纳入了具有置信度学习的注意力机制，以增强特征表示和可信预测。我们在四个基准数据集（包括 BRCA、ROSMAP、LGG 和 KIPAN）上进行了广泛的实验，以评估我们提出的模型的有效性。我们的模型明显改善了大多数评估指标，并一直超越最先进的方法。消融研究表明，在 ROSMAP 和 LGG 数据集中，辅助分类器大大提高了分类准确率。此外，注意力机制和置信度评估块也有助于提高我们模型的预测准确性和普适性。所提出的框架在疾病分类和生物标记物发现方面表现出卓越的性能，使其成为分析多层生物数据的稳健而通用的工具。这项研究凸显了精心设计的深度学习方法在剖析复杂疾病表型和提高疾病预测准确性方面的重要意义。

{"title":"MOCAT: multi-omics integration with auxiliary classifiers enhanced autoencoder","authors":"Xiaohui Yao, Xiaohan Jiang, Haoran Luo, Hong Liang, Xiufen Ye, Yanhui Wei, Shan Cong","doi":"10.1186/s13040-024-00360-6","DOIUrl":"https://doi.org/10.1186/s13040-024-00360-6","url":null,"abstract":"Integrating multi-omics data is emerging as a critical approach in enhancing our understanding of complex diseases. Innovative computational methods capable of managing high-dimensional and heterogeneous datasets are required to unlock the full potential of such rich and diverse data. We propose a Multi-Omics integration framework with auxiliary Classifiers-enhanced AuToencoders (MOCAT) to utilize intra- and inter-omics information comprehensively. Additionally, attention mechanisms with confidence learning are incorporated for enhanced feature representation and trustworthy prediction. Extensive experiments were conducted on four benchmark datasets to evaluate the effectiveness of our proposed model, including BRCA, ROSMAP, LGG, and KIPAN. Our model significantly improved most evaluation measurements and consistently surpassed the state-of-the-art methods. Ablation studies showed that the auxiliary classifiers significantly boosted classification accuracy in the ROSMAP and LGG datasets. Moreover, the attention mechanisms and confidence evaluation block contributed to improvements in the predictive accuracy and generalizability of our model. The proposed framework exhibits superior performance in disease classification and biomarker discovery, establishing itself as a robust and versatile tool for analyzing multi-layer biological data. This study highlights the significance of elaborated designed deep learning methodologies in dissecting complex disease phenotypes and improving the accuracy of disease predictions.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"42 1","pages":""},"PeriodicalIF":4.5,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140037570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Interpreting drug synergy in breast cancer with deep learning using target-protein inhibition profiles. 利用靶蛋白抑制图谱，通过深度学习解读乳腺癌的药物协同作用。

IF 4.5 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining

Pub Date : 2024-02-29 DOI: 10.1186/s13040-024-00359-z

Thanyawee Srithanyarat, Kittisak Taoma, Thana Sutthibutpong, Marasri Ruengjitchatchawalya, Monrudee Liangruksa, Teeraphan Laomettachit

Background: Breast cancer is the most common malignancy among women worldwide. Despite advances in treating breast cancer over the past decades, drug resistance and adverse effects remain challenging. Recent therapeutic progress has shifted toward using drug combinations for better treatment efficiency. However, with a growing number of potential small-molecule cancer inhibitors, in silico strategies to predict pharmacological synergy before experimental trials are required to compensate for time and cost restrictions. Many deep learning models have been previously proposed to predict the synergistic effects of drug combinations with high performance. However, these models heavily relied on a large number of drug chemical structural fingerprints as their main features, which made model interpretation a challenge.

Results: This study developed a deep neural network model that predicts synergy between small-molecule pairs based on their inhibitory activities against 13 selected key proteins. The synergy prediction model achieved a Pearson correlation coefficient between model predictions and experimental data of 0.63 across five breast cancer cell lines. BT-549 and MCF-7 achieved the highest correlation of 0.67 when considering individual cell lines. Despite achieving a moderate correlation compared to previous deep learning models, our model offers a distinctive advantage in terms of interpretability. Using the inhibitory activities against key protein targets as the main features allowed a straightforward interpretation of the model since the individual features had direct biological meaning. By tracing the synergistic interactions of compounds through their target proteins, we gained insights into the patterns our model recognized as indicative of synergistic effects.

Conclusions: The framework employed in the present study lays the groundwork for future advancements, especially in model interpretation. By combining deep learning techniques and target-specific models, this study shed light on potential patterns of target-protein inhibition profiles that could be exploited in breast cancer treatment.

背景：乳腺癌是全球妇女最常见的恶性肿瘤。尽管过去几十年来乳腺癌的治疗取得了进展，但耐药性和不良反应仍然是一项挑战。最近的治疗进展已转向使用药物组合来提高治疗效率。然而，由于潜在的小分子癌症抑制剂越来越多，因此需要在实验前采用硅学策略预测药理协同作用，以弥补时间和成本的限制。此前已有许多深度学习模型被提出来预测药物组合的高效协同效应。然而，这些模型严重依赖大量的药物化学结构指纹作为其主要特征，这使得模型解释成为一项挑战：本研究建立了一个深度神经网络模型，该模型可根据小分子对 13 种选定关键蛋白的抑制活性预测小分子对之间的协同作用。在五个乳腺癌细胞系中，协同作用预测模型与实验数据之间的皮尔逊相关系数达到 0.63。在考虑单个细胞系时，BT-549 和 MCF-7 的相关性最高，达到 0.67。尽管与之前的深度学习模型相比，我们的模型实现了中等程度的相关性，但在可解释性方面具有明显优势。将对关键蛋白靶点的抑制活性作为主要特征，可以直接解释模型，因为单个特征具有直接的生物学意义。通过追踪化合物与靶蛋白之间的协同作用，我们深入了解了我们的模型所识别的表明协同效应的模式：本研究采用的框架为未来的进步奠定了基础，尤其是在模型解释方面。通过将深度学习技术与靶点特异性模型相结合，本研究揭示了靶点蛋白抑制谱的潜在模式，可用于乳腺癌治疗。

{"title":"Interpreting drug synergy in breast cancer with deep learning using target-protein inhibition profiles.","authors":"Thanyawee Srithanyarat, Kittisak Taoma, Thana Sutthibutpong, Marasri Ruengjitchatchawalya, Monrudee Liangruksa, Teeraphan Laomettachit","doi":"10.1186/s13040-024-00359-z","DOIUrl":"10.1186/s13040-024-00359-z","url":null,"abstract":"Background: Breast cancer is the most common malignancy among women worldwide. Despite advances in treating breast cancer over the past decades, drug resistance and adverse effects remain challenging. Recent therapeutic progress has shifted toward using drug combinations for better treatment efficiency. However, with a growing number of potential small-molecule cancer inhibitors, in silico strategies to predict pharmacological synergy before experimental trials are required to compensate for time and cost restrictions. Many deep learning models have been previously proposed to predict the synergistic effects of drug combinations with high performance. However, these models heavily relied on a large number of drug chemical structural fingerprints as their main features, which made model interpretation a challenge.Results: This study developed a deep neural network model that predicts synergy between small-molecule pairs based on their inhibitory activities against 13 selected key proteins. The synergy prediction model achieved a Pearson correlation coefficient between model predictions and experimental data of 0.63 across five breast cancer cell lines. BT-549 and MCF-7 achieved the highest correlation of 0.67 when considering individual cell lines. Despite achieving a moderate correlation compared to previous deep learning models, our model offers a distinctive advantage in terms of interpretability. Using the inhibitory activities against key protein targets as the main features allowed a straightforward interpretation of the model since the individual features had direct biological meaning. By tracing the synergistic interactions of compounds through their target proteins, we gained insights into the patterns our model recognized as indicative of synergistic effects.Conclusions: The framework employed in the present study lays the groundwork for future advancements, especially in model interpretation. By combining deep learning techniques and target-specific models, this study shed light on potential patterns of target-protein inhibition profiles that could be exploited in breast cancer treatment.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"8"},"PeriodicalIF":4.5,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10905801/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139997938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0