首页 > 最新文献

Interdisciplinary Sciences: Computational Life Sciences最新文献

英文 中文
Predicting CircRNA-Disease Associations Based on Heterogeneous Graph Neural Network and Knowledge Graph Attribute Mining Attention. 基于异构图神经网络和知识图属性挖掘注意力的circrna -疾病关联预测。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-01 Epub Date: 2025-05-13 DOI: 10.1007/s12539-025-00706-6
Wei Lan, Cong Peng, Hongyu Zhang, Chunling Li, Qingfeng Chen, Xin Xiao, Zhiqiang Wang

The exploration of associations between circular RNAs (circRNAs) and diseases contributes to a deeper understanding of the pathogenesis of diseases. Many computational methods have been proposed for circRNA-disease associations identification. However, these methods still exhibit some limitations such as ignoring the effect of noise. In this paper, we proposed a new knowledge graph attribute mining attention network (KAATCDA) to predict circRNA-disease associations based on knowledge graph attribute network (KGA) and attribute mining attention network (AMA). Firstly, KGA is used to learn the feature representation of diseases. Then, the features of circRNAs are obtained using AMA, which are similar to disease feature representations. Finally, the scores of circRNA-disease associations are predicted based on circRNA feature representation and disease feature representation. Experiments of five-fold cross-validation on two datasets demonstrate that KAATCDA outperforms other state-of-the-art methods. In addition, the case study shows our method can effectively predict unknown circRNA-disease associations.

环状rna (circRNAs)与疾病之间的关联的探索有助于更深入地了解疾病的发病机制。许多计算方法已被提出用于circrna -疾病关联鉴定。然而,这些方法仍然存在一些局限性,例如忽略了噪声的影响。本文在知识图属性网络(KGA)和属性挖掘关注网络(AMA)的基础上,提出了一种新的环状rna -疾病关联预测知识图属性挖掘关注网络(KAATCDA)。首先,利用KGA学习疾病的特征表示。然后,使用AMA获得circrna的特征,这类似于疾病特征表示。最后,基于circRNA特征表征和疾病特征表征预测circRNA-疾病关联评分。在两个数据集上进行的五重交叉验证实验表明,KAATCDA优于其他最先进的方法。此外,案例研究表明,我们的方法可以有效地预测未知的circrna -疾病关联。
{"title":"Predicting CircRNA-Disease Associations Based on Heterogeneous Graph Neural Network and Knowledge Graph Attribute Mining Attention.","authors":"Wei Lan, Cong Peng, Hongyu Zhang, Chunling Li, Qingfeng Chen, Xin Xiao, Zhiqiang Wang","doi":"10.1007/s12539-025-00706-6","DOIUrl":"10.1007/s12539-025-00706-6","url":null,"abstract":"<p><p>The exploration of associations between circular RNAs (circRNAs) and diseases contributes to a deeper understanding of the pathogenesis of diseases. Many computational methods have been proposed for circRNA-disease associations identification. However, these methods still exhibit some limitations such as ignoring the effect of noise. In this paper, we proposed a new knowledge graph attribute mining attention network (KAATCDA) to predict circRNA-disease associations based on knowledge graph attribute network (KGA) and attribute mining attention network (AMA). Firstly, KGA is used to learn the feature representation of diseases. Then, the features of circRNAs are obtained using AMA, which are similar to disease feature representations. Finally, the scores of circRNA-disease associations are predicted based on circRNA feature representation and disease feature representation. Experiments of five-fold cross-validation on two datasets demonstrate that KAATCDA outperforms other state-of-the-art methods. In addition, the case study shows our method can effectively predict unknown circRNA-disease associations.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"586-597"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144010476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Storage and Analysis of Genomic Data: A k-mer Frequency Mapping and Image Representation Method. 基因组数据的高效存储与分析:k-mer 频率映射和图像表示方法
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-01 Epub Date: 2024-10-21 DOI: 10.1007/s12539-024-00659-2
Hatice Busra Luleci, Selcen Ari Yuka, Alper Yilmaz

k-mer frequencies are crucial for understanding DNA sequence patterns and structure, with applications in motif discovery, genome classification, and short read assembly. However, the exponential increase in the dimension of frequency tables with increasing k-mer length poses storage challenges. In this study, we present a novel method for compressing k-mer data without information loss, aiming to optimize storage and analysis processes. We employed Chaos Game Representation (CGR) to map k-mers to coordinates and used these components to generate raster images of k-mers. The CGR maps were partitioned and labeled based on substrings, with each substring mapped to a subframe, creating a fractal-like structure. The entire k-mer frequency set of each genomic sequence was represented as a single image, with each pixel corresponding to a specific k-mer and its occurrence. This approach reduced file size by up to 16-fold compared to plain text and 3-fold compared to binary format. Furthermore, we demonstrated the feasibility of performing alignment-free similarity analyses on images derived from k-mer frequencies of whole genome sequences from 14 plant species. Our results highlight the potential of this method as a fast and efficient tool for accessing, processing, and analyzing large biological sequence datasets, enabling the retrieval of k-mer frequencies and image reconstruction.

k-mer 频率对于理解 DNA 序列模式和结构至关重要,可应用于主题发现、基因组分类和短文本组装。然而,随着 k-mer 长度的增加,频率表的维度呈指数增长,这给存储带来了挑战。在本研究中,我们提出了一种在不损失信息的情况下压缩 k-mer 数据的新方法,旨在优化存储和分析过程。我们采用混沌博弈表示法(CGR)将 k-聚合体映射到坐标,并利用这些分量生成 k-聚合体的栅格图像。我们根据子串对 CGR 地图进行了分割和标记,每个子串映射到一个子帧,从而创建了一个类似分形的结构。每个基因组序列的整个 k-聚合体频率集被表示为一幅图像,每个像素对应一个特定的 k-聚合体及其出现情况。与纯文本格式相比,这种方法将文件大小缩小了 16 倍,与二进制格式相比缩小了 3 倍。此外,我们还证明了对来自 14 个植物物种的全基因组序列 k-聚合体频率的图像进行无配对相似性分析的可行性。我们的研究结果凸显了这种方法的潜力,它是访问、处理和分析大型生物序列数据集的快速高效工具,可以检索 k-mer 频率和重建图像。
{"title":"Efficient Storage and Analysis of Genomic Data: A k-mer Frequency Mapping and Image Representation Method.","authors":"Hatice Busra Luleci, Selcen Ari Yuka, Alper Yilmaz","doi":"10.1007/s12539-024-00659-2","DOIUrl":"10.1007/s12539-024-00659-2","url":null,"abstract":"<p><p>k-mer frequencies are crucial for understanding DNA sequence patterns and structure, with applications in motif discovery, genome classification, and short read assembly. However, the exponential increase in the dimension of frequency tables with increasing k-mer length poses storage challenges. In this study, we present a novel method for compressing k-mer data without information loss, aiming to optimize storage and analysis processes. We employed Chaos Game Representation (CGR) to map k-mers to coordinates and used these components to generate raster images of k-mers. The CGR maps were partitioned and labeled based on substrings, with each substring mapped to a subframe, creating a fractal-like structure. The entire k-mer frequency set of each genomic sequence was represented as a single image, with each pixel corresponding to a specific k-mer and its occurrence. This approach reduced file size by up to 16-fold compared to plain text and 3-fold compared to binary format. Furthermore, we demonstrated the feasibility of performing alignment-free similarity analyses on images derived from k-mer frequencies of whole genome sequences from 14 plant species. Our results highlight the potential of this method as a fast and efficient tool for accessing, processing, and analyzing large biological sequence datasets, enabling the retrieval of k-mer frequencies and image reconstruction.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"691-697"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142464357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MultiKD-DTA: Enhancing Drug-Target Affinity Prediction Through Multiscale Feature Extraction. MultiKD-DTA:通过多尺度特征提取增强药物-靶标亲和力预测。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-01 Epub Date: 2025-02-28 DOI: 10.1007/s12539-025-00697-4
Riqian Hu, Ruiquan Ge, Guojian Deng, Jin Fan, Bowen Tang, Changmiao Wang

The discovery and development of novel pharmaceutical agents is characterized by high costs, lengthy timelines, and significant safety concerns. Traditional drug discovery involves pharmacologists manually screening drug molecules against protein targets, focusing on binding within protein cavities. However, this manual process is slow and inherently limited. Given these constraints, the use of deep learning techniques to predict drug-target interaction (DTI) affinities is both significant and promising for future applications. This paper introduces an innovative deep learning architecture designed to enhance the prediction of DTI affinities. The model ingeniously combines graph neural networks, pre-trained large-scale protein models, and attention mechanisms to improve performance. In this framework, molecular structures are represented as graphs and processed through graph neural networks and multiscale convolutional networks to facilitate feature extraction. Simultaneously, protein sequences are encoded using pre-trained ESM-2 large models and processed with bidirectional long short-term memory networks. Subsequently, the molecular and protein embeddings derived from these processes are integrated within a fusion module to compute affinity scores. Experimental results demonstrate that our proposed model outperforms existing methods on two publicly available datasets.

新型药物的发现和开发具有成本高、时间长和安全性高的特点。传统的药物发现工作包括药理学家针对蛋白质靶点手动筛选药物分子,重点关注在蛋白质空腔内的结合情况。然而,这种人工筛选过程既缓慢又存在固有的局限性。鉴于这些限制,使用深度学习技术预测药物与靶点相互作用(DTI)的亲和力意义重大,未来应用前景广阔。本文介绍了一种旨在增强 DTI 亲和力预测的创新型深度学习架构。该模型巧妙地结合了图神经网络、预训练的大规模蛋白质模型和注意力机制,以提高性能。在这一框架中,分子结构被表示为图,并通过图神经网络和多尺度卷积网络进行处理,以促进特征提取。同时,使用预先训练好的 ESM-2 大型模型对蛋白质序列进行编码,并通过双向长短期记忆网络进行处理。随后,将这些处理过程中得到的分子和蛋白质嵌入整合到一个融合模块中,以计算亲和力得分。实验结果表明,我们提出的模型在两个公开数据集上的表现优于现有方法。
{"title":"MultiKD-DTA: Enhancing Drug-Target Affinity Prediction Through Multiscale Feature Extraction.","authors":"Riqian Hu, Ruiquan Ge, Guojian Deng, Jin Fan, Bowen Tang, Changmiao Wang","doi":"10.1007/s12539-025-00697-4","DOIUrl":"10.1007/s12539-025-00697-4","url":null,"abstract":"<p><p>The discovery and development of novel pharmaceutical agents is characterized by high costs, lengthy timelines, and significant safety concerns. Traditional drug discovery involves pharmacologists manually screening drug molecules against protein targets, focusing on binding within protein cavities. However, this manual process is slow and inherently limited. Given these constraints, the use of deep learning techniques to predict drug-target interaction (DTI) affinities is both significant and promising for future applications. This paper introduces an innovative deep learning architecture designed to enhance the prediction of DTI affinities. The model ingeniously combines graph neural networks, pre-trained large-scale protein models, and attention mechanisms to improve performance. In this framework, molecular structures are represented as graphs and processed through graph neural networks and multiscale convolutional networks to facilitate feature extraction. Simultaneously, protein sequences are encoded using pre-trained ESM-2 large models and processed with bidirectional long short-term memory networks. Subsequently, the molecular and protein embeddings derived from these processes are integrated within a fusion module to compute affinity scores. Experimental results demonstrate that our proposed model outperforms existing methods on two publicly available datasets.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"555-565"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143523301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reconstructing Waddington Landscape from Cell Migration and Proliferation. 从细胞迁移和增殖重构沃丁顿景观。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-01 Epub Date: 2025-01-07 DOI: 10.1007/s12539-024-00686-z
Yourui Han, Bolin Chen, Zhongwen Bi, Jianjun Zhang, Youpeng Hu, Jun Bian, Ruiming Kang, Xuequn Shang

The Waddington landscape was initially proposed to depict cell differentiation, and has been extended to explain phenomena such as reprogramming. The landscape serves as a concrete representation of cellular differentiation potential, yet the precise representation of this potential remains an unsolved problem, posing significant challenges to reconstructing the Waddington landscape. The characterization of cellular differentiation potential relies on transcriptomic signatures of known markers typically. Numerous computational models based on various energy indicators, such as Shannon entropy, have been proposed. While these models can effectively characterize cellular differentiation potential, most of them lack corresponding dynamical interpretations, which are crucial for enhancing our understanding of cell fate transitions. Therefore, from the perspective of cell migration and proliferation, a feasible framework was developed for calculating the dynamically interpretable energy indicator to reconstruct Waddington landscape based on sparse autoencoders and the reaction diffusion advection equation. Within this framework, typical cellular developmental processes, such as hematopoiesis and reprogramming processes, were dynamically simulated and their corresponding Waddington landscapes were reconstructed. Furthermore, dynamic simulation and reconstruction were also conducted for special developmental processes, such as embryogenesis and Epithelial-Mesenchymal Transition process. Ultimately, these diverse cell fate transitions were amalgamated into a unified Waddington landscape.

沃丁顿景观最初被提出用来描述细胞分化,并被扩展到解释重编程等现象。景观作为细胞分化潜力的具体表现,然而这种潜力的精确表现仍然是一个未解决的问题,这对重建沃丁顿景观构成了重大挑战。细胞分化潜能的表征通常依赖于已知标记物的转录组特征。许多基于各种能量指标的计算模型,如香农熵,已经被提出。虽然这些模型可以有效地表征细胞分化潜力,但大多数模型缺乏相应的动力学解释,这对于增强我们对细胞命运转变的理解至关重要。因此,从细胞迁移和增殖的角度出发,提出了一种基于稀疏自编码器和反应扩散平流方程计算动态可解释能量指标重构Waddington景观的可行框架。在此框架内,动态模拟了典型的细胞发育过程,如造血和重编程过程,并重建了相应的Waddington景观。此外,还对胚胎发生、上皮-间质转化等特殊发育过程进行了动态模拟和重建。最终,这些不同的细胞命运转变被合并成一个统一的沃丁顿景观。
{"title":"Reconstructing Waddington Landscape from Cell Migration and Proliferation.","authors":"Yourui Han, Bolin Chen, Zhongwen Bi, Jianjun Zhang, Youpeng Hu, Jun Bian, Ruiming Kang, Xuequn Shang","doi":"10.1007/s12539-024-00686-z","DOIUrl":"10.1007/s12539-024-00686-z","url":null,"abstract":"<p><p>The Waddington landscape was initially proposed to depict cell differentiation, and has been extended to explain phenomena such as reprogramming. The landscape serves as a concrete representation of cellular differentiation potential, yet the precise representation of this potential remains an unsolved problem, posing significant challenges to reconstructing the Waddington landscape. The characterization of cellular differentiation potential relies on transcriptomic signatures of known markers typically. Numerous computational models based on various energy indicators, such as Shannon entropy, have been proposed. While these models can effectively characterize cellular differentiation potential, most of them lack corresponding dynamical interpretations, which are crucial for enhancing our understanding of cell fate transitions. Therefore, from the perspective of cell migration and proliferation, a feasible framework was developed for calculating the dynamically interpretable energy indicator to reconstruct Waddington landscape based on sparse autoencoders and the reaction diffusion advection equation. Within this framework, typical cellular developmental processes, such as hematopoiesis and reprogramming processes, were dynamically simulated and their corresponding Waddington landscapes were reconstructed. Furthermore, dynamic simulation and reconstruction were also conducted for special developmental processes, such as embryogenesis and Epithelial-Mesenchymal Transition process. Ultimately, these diverse cell fate transitions were amalgamated into a unified Waddington landscape.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"541-554"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142948366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HiSVision: A Method for Detecting Large-Scale Structural Variations Based on Hi-C Data and Detection Transformer. HiSVision:一种基于Hi-C数据和检测变压器的大规模结构变化检测方法。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-01 Epub Date: 2024-12-23 DOI: 10.1007/s12539-024-00677-0
Haixia Zhai, Chengyao Dong, Tao Wang, Junwei Luo

Structural variation (SV) is an important component of the diversity of the human genome. Many studies have shown that SV has a significant impact on human disease and is strongly associated with the development of cancer. In recent years, the Hi-C sequencing technique has been shown to be useful for detecting large-scale SVs, and several methods have been proposed for identifying SVs from Hi-C data. However, due to the complexity of the 3D genome structure, accurate identifying SVs from the Hi-C contact matrix remains a challenging task. Here, we present HiSVision, a method for identifying large-scale SVs from Hi-C data using a detection transformer framework. Inspired by object detection network, we transform the Hi-C contact matrix into images, then identify candidate SV regions on the image by detection transformer, and finally filter SVs based on features around the breakpoints. Experimental results show that HiSVision outperforms existing methods in terms of precision and F1 score on cancer cell lines and simulated datasets. The source code and data are available from https://github.com/dcy99/HiSVision .

结构变异(SV)是人类基因组多样性的重要组成部分。许多研究表明,SV对人类疾病有重大影响,并与癌症的发展密切相关。近年来,Hi-C测序技术已被证明可用于检测大规模sv,并提出了几种从Hi-C数据中识别sv的方法。然而,由于三维基因组结构的复杂性,从Hi-C接触矩阵中准确识别sv仍然是一项具有挑战性的任务。在这里,我们提出了HiSVision,一种使用检测变压器框架从Hi-C数据中识别大规模sv的方法。受目标检测网络的启发,我们将Hi-C接触矩阵转换成图像,然后通过检测变压器在图像上识别候选SV区域,最后根据断点周围的特征对SV进行滤波。实验结果表明,HiSVision在癌细胞系和模拟数据集上的精度和F1分数都优于现有方法。源代码和数据可从https://github.com/dcy99/HiSVision获得。
{"title":"HiSVision: A Method for Detecting Large-Scale Structural Variations Based on Hi-C Data and Detection Transformer.","authors":"Haixia Zhai, Chengyao Dong, Tao Wang, Junwei Luo","doi":"10.1007/s12539-024-00677-0","DOIUrl":"10.1007/s12539-024-00677-0","url":null,"abstract":"<p><p>Structural variation (SV) is an important component of the diversity of the human genome. Many studies have shown that SV has a significant impact on human disease and is strongly associated with the development of cancer. In recent years, the Hi-C sequencing technique has been shown to be useful for detecting large-scale SVs, and several methods have been proposed for identifying SVs from Hi-C data. However, due to the complexity of the 3D genome structure, accurate identifying SVs from the Hi-C contact matrix remains a challenging task. Here, we present HiSVision, a method for identifying large-scale SVs from Hi-C data using a detection transformer framework. Inspired by object detection network, we transform the Hi-C contact matrix into images, then identify candidate SV regions on the image by detection transformer, and finally filter SVs based on features around the breakpoints. Experimental results show that HiSVision outperforms existing methods in terms of precision and F1 score on cancer cell lines and simulated datasets. The source code and data are available from https://github.com/dcy99/HiSVision .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"519-527"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142876974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Reliable Healthcare Imaging: A Multifaceted Approach in Class Imbalance Handling for Medical Image Segmentation. 迈向可靠的医疗影像:医学影像分割中类不平衡处理的多方位方法。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-01 Epub Date: 2025-07-07 DOI: 10.1007/s12539-025-00726-2
Lijuan Cui, Mingquan Xu, Chao Liu, Tianyu Liu, Xiaoting Yan, Yan Zhang, Xiaofeng Yang

Class imbalance is a dominant challenge in medical image segmentation when dealing with MRI images from highly imbalanced datasets. This study introduces a comprehensive, multifaceted approach to enhance the accuracy and reliability of segmentation models under such conditions. Our model integrates advanced data augmentation, innovative algorithmic adjustments, and novel architectural features to address class label distribution effectively. To ensure the multiple aspects of training process, we have customized the data augmentation technique for medical imaging with multi-dimensional angles. The multi-dimensional augmentation technique helps to reduce the bias towards majority classes. We have implemented novel attention mechanisms, i.e., Enhanced Attention Module (EAM) and spatial attention. These attention mechanisms enhance the focus of the model on the most relevant features. Further, our architecture incorporates a dual decoder system and Pooling Integration Layer (PIL) to capture accurate foreground and background details. We also introduce a hybrid loss function, which is designed to handle the class imbalance by guiding the training process. For experimental purposes, we have used multiple datasets such as Digital Database Thyroid Image (DDTI), Breast Ultrasound Images Dataset (BUSI) and LiTS MICCAI 2017 to demonstrate the prowess of the proposed network using key evaluation metrics, i.e., IoU, Dice coefficient, precision, and recall.

当处理来自高度不平衡数据集的MRI图像时,类不平衡是医学图像分割的主要挑战。本研究引入了一种全面的、多方面的方法来提高在这种情况下分割模型的准确性和可靠性。我们的模型集成了先进的数据增强、创新的算法调整和新颖的架构特征,以有效地解决类别标签分布问题。为了保证培训过程的多方位,我们定制了多维角度医学成像的数据增强技术。多维增强技术有助于减少对大多数阶级的偏见。我们已经实现了新的注意机制,即增强注意模块(EAM)和空间注意。这些注意机制增强了模型对最相关特征的关注。此外,我们的架构结合了一个双解码器系统和池集成层(PIL)来捕获准确的前景和背景细节。我们还引入了一个混合损失函数,通过指导训练过程来处理类不平衡。出于实验目的,我们使用了多个数据集,如数字数据库甲状腺图像(DDTI),乳房超声图像数据集(BUSI)和LiTS MICCAI 2017,以使用关键评估指标(即IoU, Dice系数,精度和召回率)来展示所提出网络的能力。
{"title":"Towards Reliable Healthcare Imaging: A Multifaceted Approach in Class Imbalance Handling for Medical Image Segmentation.","authors":"Lijuan Cui, Mingquan Xu, Chao Liu, Tianyu Liu, Xiaoting Yan, Yan Zhang, Xiaofeng Yang","doi":"10.1007/s12539-025-00726-2","DOIUrl":"10.1007/s12539-025-00726-2","url":null,"abstract":"<p><p>Class imbalance is a dominant challenge in medical image segmentation when dealing with MRI images from highly imbalanced datasets. This study introduces a comprehensive, multifaceted approach to enhance the accuracy and reliability of segmentation models under such conditions. Our model integrates advanced data augmentation, innovative algorithmic adjustments, and novel architectural features to address class label distribution effectively. To ensure the multiple aspects of training process, we have customized the data augmentation technique for medical imaging with multi-dimensional angles. The multi-dimensional augmentation technique helps to reduce the bias towards majority classes. We have implemented novel attention mechanisms, i.e., Enhanced Attention Module (EAM) and spatial attention. These attention mechanisms enhance the focus of the model on the most relevant features. Further, our architecture incorporates a dual decoder system and Pooling Integration Layer (PIL) to capture accurate foreground and background details. We also introduce a hybrid loss function, which is designed to handle the class imbalance by guiding the training process. For experimental purposes, we have used multiple datasets such as Digital Database Thyroid Image (DDTI), Breast Ultrasound Images Dataset (BUSI) and LiTS MICCAI 2017 to demonstrate the prowess of the proposed network using key evaluation metrics, i.e., IoU, Dice coefficient, precision, and recall.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"614-633"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12289783/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144583786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Drug-Disease Association Prediction Method Based on Deep Non-Negative Matrix Factorization with Local Graph Feature. 基于局部图特征的深度非负矩阵分解的药物-疾病关联预测新方法。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-01 Epub Date: 2025-07-07 DOI: 10.1007/s12539-025-00733-3
Mengyun Yang, Bin Yang, Jiajun Chen, Xiwei Tang, Guihua Duan

Computational drug repurposing utilizes data analysis and predictive models to identify new uses for existing drugs and new drugs, significantly improving research efficiency and reducing costs compared to traditional screening methods. Due to the limitations of current computational models in extracting deep key features, we develop a novel drug repurposing model based on the deep non-negative matrix factorization (DNMF-DDA) to enhance the accuracy of drug-disease association predictions. The model leverages similarity and known association data to extract low-rank features from complex data spaces, allowing for the prediction of potential drug-disease associations. To improve performance for novel drugs, we apply the k-nearest neighbors (KNN) algorithm for preprocessing, increasing the density of the matrix's prior information. Next, we construct two integrated matrices based on the similarities of drugs and diseases, respectively, and the optimized association data. During deep matrix factorization, we incorporate graph Laplacian and relaxed regularization constraints to optimize local graph features. This multi-layer optimization enhances the model's understanding of complex drug-disease relationships, effectively mitigating the negative impact of insufficient prior information during cold-start tests. Furthermore, we incorporate non-negativity constraints to ensure that the prediction results are biologically meaningful. To evaluate the performance of DNMF-DDA, we conducted cold-start test and 10-fold cross-validation on three datasets and systematically compared it with five state-of-the-art drug repurposing methods. The results demonstrate that DNMF-DDA performs exceptionally well in predicting drug-disease associations, significantly outperforming existing approaches. Our proposed method not only efficiently handles high-dimensional data but also exhibits superior performance, providing new insights for drug development. Moreover, the case study further validated the significant practical value of the DNMF-DDA model in practical applications.

计算药物再利用利用数据分析和预测模型来识别现有药物和新药的新用途,与传统的筛选方法相比,显著提高了研究效率,降低了成本。鉴于现有计算模型在提取深度关键特征方面的局限性,我们基于深度非负矩阵分解(DNMF-DDA)开发了一种新的药物再利用模型,以提高药物-疾病关联预测的准确性。该模型利用相似性和已知关联数据从复杂数据空间中提取低秩特征,从而预测潜在的药物-疾病关联。为了提高新药的性能,我们采用k近邻(KNN)算法进行预处理,增加矩阵先验信息的密度。接下来,我们分别基于药物和疾病的相似性以及优化后的关联数据构建了两个集成矩阵。在深度矩阵分解过程中,我们结合了图拉普拉斯约束和松弛正则化约束来优化局部图特征。这种多层优化增强了模型对复杂药物-疾病关系的理解,有效减轻了冷启动试验中先验信息不足的负面影响。此外,我们纳入了非负性约束,以确保预测结果具有生物学意义。为了评估DNMF-DDA的性能,我们对三个数据集进行了冷启动测试和10倍交叉验证,并与五种最先进的药物再利用方法进行了系统比较。结果表明,DNMF-DDA在预测药物-疾病关联方面表现非常好,显著优于现有方法。我们提出的方法不仅能有效地处理高维数据,而且表现出优越的性能,为药物开发提供了新的见解。此外,案例研究进一步验证了DNMF-DDA模型在实际应用中的重要实用价值。
{"title":"A Novel Drug-Disease Association Prediction Method Based on Deep Non-Negative Matrix Factorization with Local Graph Feature.","authors":"Mengyun Yang, Bin Yang, Jiajun Chen, Xiwei Tang, Guihua Duan","doi":"10.1007/s12539-025-00733-3","DOIUrl":"10.1007/s12539-025-00733-3","url":null,"abstract":"<p><p>Computational drug repurposing utilizes data analysis and predictive models to identify new uses for existing drugs and new drugs, significantly improving research efficiency and reducing costs compared to traditional screening methods. Due to the limitations of current computational models in extracting deep key features, we develop a novel drug repurposing model based on the deep non-negative matrix factorization (DNMF-DDA) to enhance the accuracy of drug-disease association predictions. The model leverages similarity and known association data to extract low-rank features from complex data spaces, allowing for the prediction of potential drug-disease associations. To improve performance for novel drugs, we apply the k-nearest neighbors (KNN) algorithm for preprocessing, increasing the density of the matrix's prior information. Next, we construct two integrated matrices based on the similarities of drugs and diseases, respectively, and the optimized association data. During deep matrix factorization, we incorporate graph Laplacian and relaxed regularization constraints to optimize local graph features. This multi-layer optimization enhances the model's understanding of complex drug-disease relationships, effectively mitigating the negative impact of insufficient prior information during cold-start tests. Furthermore, we incorporate non-negativity constraints to ensure that the prediction results are biologically meaningful. To evaluate the performance of DNMF-DDA, we conducted cold-start test and 10-fold cross-validation on three datasets and systematically compared it with five state-of-the-art drug repurposing methods. The results demonstrate that DNMF-DDA performs exceptionally well in predicting drug-disease associations, significantly outperforming existing approaches. Our proposed method not only efficiently handles high-dimensional data but also exhibits superior performance, providing new insights for drug development. Moreover, the case study further validated the significant practical value of the DNMF-DDA model in practical applications.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"598-613"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12289730/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144583785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
iEnhancer-GDM: A Deep Learning Framework Based on Generative Adversarial Network and Multi-head Attention Mechanism to Identify Enhancers and Their Strength. 基于生成对抗网络和多头注意机制的深度学习框架,用于识别增强器及其强度。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-01 Epub Date: 2025-05-07 DOI: 10.1007/s12539-025-00703-9
Xiaomei Yang, Meng Liao, Bin Ye, Junfeng Xia, Jianping Zhao

Enhancers are short DNA fragments capable of significantly increase the frequency of gene transcription. They often exert their effects on targeted genes over long distances, either in cis or in trans configurations. Identifying enhancers poses a challenge due to their variable position and sensitivities. Genetic variants within enhancer regions have been implicated in human diseases, highlighting critical importance of enhancers identification and strength prediction. Here, we develop a two-layer predictor named iEnhancer-GDM to identify enhancers and to predict enhancer strength. To address the challenges posed by the limited size of enhancer training dataset, which could cause issues such as model overfitting and low classification accuracy, we introduce a Wasserstein generative adversarial network (WGAN-GP) to augment the dataset. We employ a dna2vec embedding layer to encode raw DNA sequences into numerical feature representations, and then integrate multi-scale convolutional neural network, bidirectional long short-term memory network and multi-head attention mechanism for feature representation and classification. Our results validate the effectiveness of data augmentation in WGAN-GP. Our model iEnhancer-GDM achieves superior performance on an independent test dataset, and outperforms the existing models with improvements of 2.45% for enhancer identification and 11.5% for enhancer strength prediction by benchmarking against current methods. iEnhancer-GDM advances the precise enhancer identification and strength prediction, thereby helping to understand the functions of enhancers and their associations on genomics.

增强子是能够显著提高基因转录频率的短DNA片段。它们经常远距离作用于目标基因,要么是顺式,要么是反式。由于增强子的位置和敏感性不同,识别增强子是一项挑战。增强子区域内的遗传变异与人类疾病有关,突出了增强子鉴定和强度预测的关键重要性。在这里,我们开发了一个名为iEnhancer-GDM的两层预测器来识别增强子并预测增强子的强度。为了解决增强器训练数据集规模有限所带来的挑战,这可能导致模型过拟合和分类精度低等问题,我们引入了Wasserstein生成对抗网络(WGAN-GP)来增强数据集。采用dna2vec嵌入层将原始DNA序列编码为数字特征表示,然后结合多尺度卷积神经网络、双向长短期记忆网络和多头注意机制进行特征表示和分类。我们的结果验证了WGAN-GP中数据增强的有效性。我们的模型iEnhancer-GDM在独立的测试数据集上取得了优异的性能,通过对现有方法的基准测试,增强器识别的性能提高了2.45%,增强器强度预测的性能提高了11.5%。iEnhancer-GDM促进了增强子的精确鉴定和强度预测,从而有助于了解增强子的功能及其与基因组学的关联。
{"title":"iEnhancer-GDM: A Deep Learning Framework Based on Generative Adversarial Network and Multi-head Attention Mechanism to Identify Enhancers and Their Strength.","authors":"Xiaomei Yang, Meng Liao, Bin Ye, Junfeng Xia, Jianping Zhao","doi":"10.1007/s12539-025-00703-9","DOIUrl":"10.1007/s12539-025-00703-9","url":null,"abstract":"<p><p>Enhancers are short DNA fragments capable of significantly increase the frequency of gene transcription. They often exert their effects on targeted genes over long distances, either in cis or in trans configurations. Identifying enhancers poses a challenge due to their variable position and sensitivities. Genetic variants within enhancer regions have been implicated in human diseases, highlighting critical importance of enhancers identification and strength prediction. Here, we develop a two-layer predictor named iEnhancer-GDM to identify enhancers and to predict enhancer strength. To address the challenges posed by the limited size of enhancer training dataset, which could cause issues such as model overfitting and low classification accuracy, we introduce a Wasserstein generative adversarial network (WGAN-GP) to augment the dataset. We employ a dna2vec embedding layer to encode raw DNA sequences into numerical feature representations, and then integrate multi-scale convolutional neural network, bidirectional long short-term memory network and multi-head attention mechanism for feature representation and classification. Our results validate the effectiveness of data augmentation in WGAN-GP. Our model iEnhancer-GDM achieves superior performance on an independent test dataset, and outperforms the existing models with improvements of 2.45% for enhancer identification and 11.5% for enhancer strength prediction by benchmarking against current methods. iEnhancer-GDM advances the precise enhancer identification and strength prediction, thereby helping to understand the functions of enhancers and their associations on genomics.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"662-672"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144012930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NPI-HetGNN: A Prediction Model of ncRNA-Protein Interactions Based on Heterogeneous Graph Neural Networks. NPI-HetGNN:基于异构图神经网络的ncrna -蛋白相互作用预测模型。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-01 Epub Date: 2025-06-02 DOI: 10.1007/s12539-025-00716-4
Fan Zhang, Chaoyang Liu, Binjie Wang, Xiaopan Chen, Xinhong Zhang

Non-coding RNAs (ncRNAs) are one of the components of epigenetic mechanisms that regulates gene expression. Studying ncRNA-protein interactions (NPI) can help to explore a wide range of biological features and related diseases. Traditional NPI research methods often require expensive equipment, a lot of time and labor. With the abundant samples accumulated from traditional experiments, remarkable progress has been made in the study of NPI by computational methods. Heterogeneous graph neural network is a deep learning method that synthesizes heterogeneous types of data as well as network topology. In this study, we propose an NPI-HetGNN model for NPI prediction based on heterogeneous graph neural networks. Firstly, initial features are constructed by integrating the sequence properties of ncRNA and protein data as well as the topology of heterogeneous connections. Then, the multilevel homogeneous subgraph is obtained and its semantic information is aggregated by metapath walking. At the same time, the homogeneous node information is fused within the subgraph metapath. To enhance feature extraction ability of the network, an energy-constrained self-attention module is introduced. Due to the lack of wet lab validation conditions, this study adopts computational verification. The performance of the NPI-HetGNN model on four benchmark datasets is experimentally verified. Ablation experiments also confirmed the comprehensiveness and validity of our model design. The experimental results show that comparing with six state-of-the-art methods, our NPI-HetGNN achieves very satisfactory results on all four datasets.

非编码rna (ncRNAs)是调控基因表达的表观遗传机制的组成部分之一。研究ncrna -蛋白相互作用(NPI)有助于探索广泛的生物学特征和相关疾病。传统的NPI研究方法往往需要昂贵的设备、大量的时间和人力。随着传统实验积累的丰富样本,计算方法对NPI的研究取得了显著进展。异构图神经网络是一种综合异构类型数据和网络拓扑的深度学习方法。在本研究中,我们提出了一种基于异构图神经网络的NPI- hetgnn模型用于NPI预测。首先,通过整合ncRNA的序列特性和蛋白质数据以及异构连接的拓扑结构来构建初始特征;然后,通过元路径遍历得到多层同构子图,并对其语义信息进行聚合。同时,将同构节点信息融合到子图元路径中。为了提高网络的特征提取能力,引入了能量约束的自关注模块。由于缺乏湿室验证条件,本研究采用计算验证。通过实验验证了NPI-HetGNN模型在4个基准数据集上的性能。烧蚀实验也证实了模型设计的全面性和有效性。实验结果表明,与六种最先进的方法相比,我们的NPI-HetGNN在所有四个数据集上都取得了非常满意的结果。
{"title":"NPI-HetGNN: A Prediction Model of ncRNA-Protein Interactions Based on Heterogeneous Graph Neural Networks.","authors":"Fan Zhang, Chaoyang Liu, Binjie Wang, Xiaopan Chen, Xinhong Zhang","doi":"10.1007/s12539-025-00716-4","DOIUrl":"10.1007/s12539-025-00716-4","url":null,"abstract":"<p><p>Non-coding RNAs (ncRNAs) are one of the components of epigenetic mechanisms that regulates gene expression. Studying ncRNA-protein interactions (NPI) can help to explore a wide range of biological features and related diseases. Traditional NPI research methods often require expensive equipment, a lot of time and labor. With the abundant samples accumulated from traditional experiments, remarkable progress has been made in the study of NPI by computational methods. Heterogeneous graph neural network is a deep learning method that synthesizes heterogeneous types of data as well as network topology. In this study, we propose an NPI-HetGNN model for NPI prediction based on heterogeneous graph neural networks. Firstly, initial features are constructed by integrating the sequence properties of ncRNA and protein data as well as the topology of heterogeneous connections. Then, the multilevel homogeneous subgraph is obtained and its semantic information is aggregated by metapath walking. At the same time, the homogeneous node information is fused within the subgraph metapath. To enhance feature extraction ability of the network, an energy-constrained self-attention module is introduced. Due to the lack of wet lab validation conditions, this study adopts computational verification. The performance of the NPI-HetGNN model on four benchmark datasets is experimentally verified. Ablation experiments also confirmed the comprehensiveness and validity of our model design. The experimental results show that comparing with six state-of-the-art methods, our NPI-HetGNN achieves very satisfactory results on all four datasets.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"730-743"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144198997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empowering Graph Neural Network-Based Computational Drug Repositioning with Large Language Model-Inferred Knowledge Representation. 基于图神经网络的计算药物重新定位与大语言模型参考知识表示。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-01 Epub Date: 2024-09-26 DOI: 10.1007/s12539-024-00654-7
Yaowen Gu, Zidu Xu, Carl Yang

Computational drug repositioning, through predicting drug-disease associations (DDA), offers significant potential for discovering new drug indications. Current methods incorporate graph neural networks (GNN) on drug-disease heterogeneous networks to predict DDAs, achieving notable performances compared to traditional machine learning and matrix factorization approaches. However, these methods depend heavily on network topology, hampered by incomplete and noisy network data, and overlook the wealth of biomedical knowledge available. Correspondingly, large language models (LLMs) excel in graph search and relational reasoning, which can possibly enhance the integration of comprehensive biomedical knowledge into drug and disease profiles. In this study, we first investigate the contribution of LLM-inferred knowledge representation in drug repositioning and DDA prediction. A zero-shot prompting template was designed for LLM to extract high-quality knowledge descriptions for drug and disease entities, followed by embedding generation from language models to transform the discrete text to continual numerical representation. Then, we proposed LLM-DDA with three different model architectures (LLM-DDANode Feat, LLM-DDADual GNN, LLM-DDAGNN-AE) to investigate the best fusion mode for LLM-based embeddings. Extensive experiments on four DDA benchmarks show that, LLM-DDAGNN-AE achieved the optimal performance compared to 11 baselines with the overall relative improvement in AUPR of 23.22%, F1-Score of 17.20%, and precision of 25.35%. Meanwhile, selected case studies of involving Prednisone and Allergic Rhinitis highlighted the model's capability to identify reliable DDAs and knowledge descriptions, supported by existing literature. This study showcases the utility of LLMs in drug repositioning with its generality and applicability in other biomedical relation prediction tasks.

通过预测药物-疾病关联(DDA)来计算药物重新定位,为发现新的药物适应症提供了巨大潜力。目前的方法将图神经网络(GNN)纳入药物-疾病异构网络来预测 DDA,与传统的机器学习和矩阵因式分解方法相比,取得了显著的效果。然而,这些方法在很大程度上依赖于网络拓扑结构,受制于不完整和有噪声的网络数据,忽略了大量可用的生物医学知识。与此相对应,大型语言模型(LLM)在图搜索和关系推理方面表现出色,有可能加强将全面的生物医学知识整合到药物和疾病概况中。在本研究中,我们首先研究了 LLM 推断的知识表征在药物重新定位和 DDA 预测中的贡献。我们为 LLM 设计了一个零射提示模板,以提取高质量的药物和疾病实体知识描述,然后通过语言模型的嵌入生成将离散文本转换为连续的数字表示。然后,我们提出了具有三种不同模型架构(LLM-DDANode Feat、LLM-DDADual GNN、LLM-DDAGNN-AE)的 LLM-DDA,以研究基于 LLM 的嵌入的最佳融合模式。在四个 DDA 基准上进行的广泛实验表明,与 11 个基线相比,LLM-DDAGNN-AE 实现了最佳性能,AUPR 整体相对提高了 23.22%,F1-Score 提高了 17.20%,精度提高了 25.35%。同时,涉及泼尼松和过敏性鼻炎的选定案例研究凸显了该模型在现有文献支持下识别可靠的 DDA 和知识描述的能力。这项研究展示了 LLM 在药物重新定位方面的实用性,以及它在其他生物医学关系预测任务中的通用性和适用性。
{"title":"Empowering Graph Neural Network-Based Computational Drug Repositioning with Large Language Model-Inferred Knowledge Representation.","authors":"Yaowen Gu, Zidu Xu, Carl Yang","doi":"10.1007/s12539-024-00654-7","DOIUrl":"10.1007/s12539-024-00654-7","url":null,"abstract":"<p><p>Computational drug repositioning, through predicting drug-disease associations (DDA), offers significant potential for discovering new drug indications. Current methods incorporate graph neural networks (GNN) on drug-disease heterogeneous networks to predict DDAs, achieving notable performances compared to traditional machine learning and matrix factorization approaches. However, these methods depend heavily on network topology, hampered by incomplete and noisy network data, and overlook the wealth of biomedical knowledge available. Correspondingly, large language models (LLMs) excel in graph search and relational reasoning, which can possibly enhance the integration of comprehensive biomedical knowledge into drug and disease profiles. In this study, we first investigate the contribution of LLM-inferred knowledge representation in drug repositioning and DDA prediction. A zero-shot prompting template was designed for LLM to extract high-quality knowledge descriptions for drug and disease entities, followed by embedding generation from language models to transform the discrete text to continual numerical representation. Then, we proposed LLM-DDA with three different model architectures (LLM-DDA<sub>Node Feat</sub>, LLM-DDA<sub>Dual GNN</sub>, LLM-DDA<sub>GNN-AE</sub>) to investigate the best fusion mode for LLM-based embeddings. Extensive experiments on four DDA benchmarks show that, LLM-DDA<sub>GNN-AE</sub> achieved the optimal performance compared to 11 baselines with the overall relative improvement in AUPR of 23.22%, F1-Score of 17.20%, and precision of 25.35%. Meanwhile, selected case studies of involving Prednisone and Allergic Rhinitis highlighted the model's capability to identify reliable DDAs and knowledge descriptions, supported by existing literature. This study showcases the utility of LLMs in drug repositioning with its generality and applicability in other biomedical relation prediction tasks.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"698-715"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142346018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Interdisciplinary Sciences: Computational Life Sciences
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1