The exploration of associations between circular RNAs (circRNAs) and diseases contributes to a deeper understanding of the pathogenesis of diseases. Many computational methods have been proposed for circRNA-disease associations identification. However, these methods still exhibit some limitations such as ignoring the effect of noise. In this paper, we proposed a new knowledge graph attribute mining attention network (KAATCDA) to predict circRNA-disease associations based on knowledge graph attribute network (KGA) and attribute mining attention network (AMA). Firstly, KGA is used to learn the feature representation of diseases. Then, the features of circRNAs are obtained using AMA, which are similar to disease feature representations. Finally, the scores of circRNA-disease associations are predicted based on circRNA feature representation and disease feature representation. Experiments of five-fold cross-validation on two datasets demonstrate that KAATCDA outperforms other state-of-the-art methods. In addition, the case study shows our method can effectively predict unknown circRNA-disease associations.
{"title":"Predicting CircRNA-Disease Associations Based on Heterogeneous Graph Neural Network and Knowledge Graph Attribute Mining Attention.","authors":"Wei Lan, Cong Peng, Hongyu Zhang, Chunling Li, Qingfeng Chen, Xin Xiao, Zhiqiang Wang","doi":"10.1007/s12539-025-00706-6","DOIUrl":"10.1007/s12539-025-00706-6","url":null,"abstract":"<p><p>The exploration of associations between circular RNAs (circRNAs) and diseases contributes to a deeper understanding of the pathogenesis of diseases. Many computational methods have been proposed for circRNA-disease associations identification. However, these methods still exhibit some limitations such as ignoring the effect of noise. In this paper, we proposed a new knowledge graph attribute mining attention network (KAATCDA) to predict circRNA-disease associations based on knowledge graph attribute network (KGA) and attribute mining attention network (AMA). Firstly, KGA is used to learn the feature representation of diseases. Then, the features of circRNAs are obtained using AMA, which are similar to disease feature representations. Finally, the scores of circRNA-disease associations are predicted based on circRNA feature representation and disease feature representation. Experiments of five-fold cross-validation on two datasets demonstrate that KAATCDA outperforms other state-of-the-art methods. In addition, the case study shows our method can effectively predict unknown circRNA-disease associations.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"586-597"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144010476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2024-10-21DOI: 10.1007/s12539-024-00659-2
Hatice Busra Luleci, Selcen Ari Yuka, Alper Yilmaz
k-mer frequencies are crucial for understanding DNA sequence patterns and structure, with applications in motif discovery, genome classification, and short read assembly. However, the exponential increase in the dimension of frequency tables with increasing k-mer length poses storage challenges. In this study, we present a novel method for compressing k-mer data without information loss, aiming to optimize storage and analysis processes. We employed Chaos Game Representation (CGR) to map k-mers to coordinates and used these components to generate raster images of k-mers. The CGR maps were partitioned and labeled based on substrings, with each substring mapped to a subframe, creating a fractal-like structure. The entire k-mer frequency set of each genomic sequence was represented as a single image, with each pixel corresponding to a specific k-mer and its occurrence. This approach reduced file size by up to 16-fold compared to plain text and 3-fold compared to binary format. Furthermore, we demonstrated the feasibility of performing alignment-free similarity analyses on images derived from k-mer frequencies of whole genome sequences from 14 plant species. Our results highlight the potential of this method as a fast and efficient tool for accessing, processing, and analyzing large biological sequence datasets, enabling the retrieval of k-mer frequencies and image reconstruction.
{"title":"Efficient Storage and Analysis of Genomic Data: A k-mer Frequency Mapping and Image Representation Method.","authors":"Hatice Busra Luleci, Selcen Ari Yuka, Alper Yilmaz","doi":"10.1007/s12539-024-00659-2","DOIUrl":"10.1007/s12539-024-00659-2","url":null,"abstract":"<p><p>k-mer frequencies are crucial for understanding DNA sequence patterns and structure, with applications in motif discovery, genome classification, and short read assembly. However, the exponential increase in the dimension of frequency tables with increasing k-mer length poses storage challenges. In this study, we present a novel method for compressing k-mer data without information loss, aiming to optimize storage and analysis processes. We employed Chaos Game Representation (CGR) to map k-mers to coordinates and used these components to generate raster images of k-mers. The CGR maps were partitioned and labeled based on substrings, with each substring mapped to a subframe, creating a fractal-like structure. The entire k-mer frequency set of each genomic sequence was represented as a single image, with each pixel corresponding to a specific k-mer and its occurrence. This approach reduced file size by up to 16-fold compared to plain text and 3-fold compared to binary format. Furthermore, we demonstrated the feasibility of performing alignment-free similarity analyses on images derived from k-mer frequencies of whole genome sequences from 14 plant species. Our results highlight the potential of this method as a fast and efficient tool for accessing, processing, and analyzing large biological sequence datasets, enabling the retrieval of k-mer frequencies and image reconstruction.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"691-697"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142464357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-02-28DOI: 10.1007/s12539-025-00697-4
Riqian Hu, Ruiquan Ge, Guojian Deng, Jin Fan, Bowen Tang, Changmiao Wang
The discovery and development of novel pharmaceutical agents is characterized by high costs, lengthy timelines, and significant safety concerns. Traditional drug discovery involves pharmacologists manually screening drug molecules against protein targets, focusing on binding within protein cavities. However, this manual process is slow and inherently limited. Given these constraints, the use of deep learning techniques to predict drug-target interaction (DTI) affinities is both significant and promising for future applications. This paper introduces an innovative deep learning architecture designed to enhance the prediction of DTI affinities. The model ingeniously combines graph neural networks, pre-trained large-scale protein models, and attention mechanisms to improve performance. In this framework, molecular structures are represented as graphs and processed through graph neural networks and multiscale convolutional networks to facilitate feature extraction. Simultaneously, protein sequences are encoded using pre-trained ESM-2 large models and processed with bidirectional long short-term memory networks. Subsequently, the molecular and protein embeddings derived from these processes are integrated within a fusion module to compute affinity scores. Experimental results demonstrate that our proposed model outperforms existing methods on two publicly available datasets.
{"title":"MultiKD-DTA: Enhancing Drug-Target Affinity Prediction Through Multiscale Feature Extraction.","authors":"Riqian Hu, Ruiquan Ge, Guojian Deng, Jin Fan, Bowen Tang, Changmiao Wang","doi":"10.1007/s12539-025-00697-4","DOIUrl":"10.1007/s12539-025-00697-4","url":null,"abstract":"<p><p>The discovery and development of novel pharmaceutical agents is characterized by high costs, lengthy timelines, and significant safety concerns. Traditional drug discovery involves pharmacologists manually screening drug molecules against protein targets, focusing on binding within protein cavities. However, this manual process is slow and inherently limited. Given these constraints, the use of deep learning techniques to predict drug-target interaction (DTI) affinities is both significant and promising for future applications. This paper introduces an innovative deep learning architecture designed to enhance the prediction of DTI affinities. The model ingeniously combines graph neural networks, pre-trained large-scale protein models, and attention mechanisms to improve performance. In this framework, molecular structures are represented as graphs and processed through graph neural networks and multiscale convolutional networks to facilitate feature extraction. Simultaneously, protein sequences are encoded using pre-trained ESM-2 large models and processed with bidirectional long short-term memory networks. Subsequently, the molecular and protein embeddings derived from these processes are integrated within a fusion module to compute affinity scores. Experimental results demonstrate that our proposed model outperforms existing methods on two publicly available datasets.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"555-565"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143523301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Waddington landscape was initially proposed to depict cell differentiation, and has been extended to explain phenomena such as reprogramming. The landscape serves as a concrete representation of cellular differentiation potential, yet the precise representation of this potential remains an unsolved problem, posing significant challenges to reconstructing the Waddington landscape. The characterization of cellular differentiation potential relies on transcriptomic signatures of known markers typically. Numerous computational models based on various energy indicators, such as Shannon entropy, have been proposed. While these models can effectively characterize cellular differentiation potential, most of them lack corresponding dynamical interpretations, which are crucial for enhancing our understanding of cell fate transitions. Therefore, from the perspective of cell migration and proliferation, a feasible framework was developed for calculating the dynamically interpretable energy indicator to reconstruct Waddington landscape based on sparse autoencoders and the reaction diffusion advection equation. Within this framework, typical cellular developmental processes, such as hematopoiesis and reprogramming processes, were dynamically simulated and their corresponding Waddington landscapes were reconstructed. Furthermore, dynamic simulation and reconstruction were also conducted for special developmental processes, such as embryogenesis and Epithelial-Mesenchymal Transition process. Ultimately, these diverse cell fate transitions were amalgamated into a unified Waddington landscape.
{"title":"Reconstructing Waddington Landscape from Cell Migration and Proliferation.","authors":"Yourui Han, Bolin Chen, Zhongwen Bi, Jianjun Zhang, Youpeng Hu, Jun Bian, Ruiming Kang, Xuequn Shang","doi":"10.1007/s12539-024-00686-z","DOIUrl":"10.1007/s12539-024-00686-z","url":null,"abstract":"<p><p>The Waddington landscape was initially proposed to depict cell differentiation, and has been extended to explain phenomena such as reprogramming. The landscape serves as a concrete representation of cellular differentiation potential, yet the precise representation of this potential remains an unsolved problem, posing significant challenges to reconstructing the Waddington landscape. The characterization of cellular differentiation potential relies on transcriptomic signatures of known markers typically. Numerous computational models based on various energy indicators, such as Shannon entropy, have been proposed. While these models can effectively characterize cellular differentiation potential, most of them lack corresponding dynamical interpretations, which are crucial for enhancing our understanding of cell fate transitions. Therefore, from the perspective of cell migration and proliferation, a feasible framework was developed for calculating the dynamically interpretable energy indicator to reconstruct Waddington landscape based on sparse autoencoders and the reaction diffusion advection equation. Within this framework, typical cellular developmental processes, such as hematopoiesis and reprogramming processes, were dynamically simulated and their corresponding Waddington landscapes were reconstructed. Furthermore, dynamic simulation and reconstruction were also conducted for special developmental processes, such as embryogenesis and Epithelial-Mesenchymal Transition process. Ultimately, these diverse cell fate transitions were amalgamated into a unified Waddington landscape.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"541-554"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142948366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2024-12-23DOI: 10.1007/s12539-024-00677-0
Haixia Zhai, Chengyao Dong, Tao Wang, Junwei Luo
Structural variation (SV) is an important component of the diversity of the human genome. Many studies have shown that SV has a significant impact on human disease and is strongly associated with the development of cancer. In recent years, the Hi-C sequencing technique has been shown to be useful for detecting large-scale SVs, and several methods have been proposed for identifying SVs from Hi-C data. However, due to the complexity of the 3D genome structure, accurate identifying SVs from the Hi-C contact matrix remains a challenging task. Here, we present HiSVision, a method for identifying large-scale SVs from Hi-C data using a detection transformer framework. Inspired by object detection network, we transform the Hi-C contact matrix into images, then identify candidate SV regions on the image by detection transformer, and finally filter SVs based on features around the breakpoints. Experimental results show that HiSVision outperforms existing methods in terms of precision and F1 score on cancer cell lines and simulated datasets. The source code and data are available from https://github.com/dcy99/HiSVision .
{"title":"HiSVision: A Method for Detecting Large-Scale Structural Variations Based on Hi-C Data and Detection Transformer.","authors":"Haixia Zhai, Chengyao Dong, Tao Wang, Junwei Luo","doi":"10.1007/s12539-024-00677-0","DOIUrl":"10.1007/s12539-024-00677-0","url":null,"abstract":"<p><p>Structural variation (SV) is an important component of the diversity of the human genome. Many studies have shown that SV has a significant impact on human disease and is strongly associated with the development of cancer. In recent years, the Hi-C sequencing technique has been shown to be useful for detecting large-scale SVs, and several methods have been proposed for identifying SVs from Hi-C data. However, due to the complexity of the 3D genome structure, accurate identifying SVs from the Hi-C contact matrix remains a challenging task. Here, we present HiSVision, a method for identifying large-scale SVs from Hi-C data using a detection transformer framework. Inspired by object detection network, we transform the Hi-C contact matrix into images, then identify candidate SV regions on the image by detection transformer, and finally filter SVs based on features around the breakpoints. Experimental results show that HiSVision outperforms existing methods in terms of precision and F1 score on cancer cell lines and simulated datasets. The source code and data are available from https://github.com/dcy99/HiSVision .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"519-527"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142876974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-07-07DOI: 10.1007/s12539-025-00726-2
Lijuan Cui, Mingquan Xu, Chao Liu, Tianyu Liu, Xiaoting Yan, Yan Zhang, Xiaofeng Yang
Class imbalance is a dominant challenge in medical image segmentation when dealing with MRI images from highly imbalanced datasets. This study introduces a comprehensive, multifaceted approach to enhance the accuracy and reliability of segmentation models under such conditions. Our model integrates advanced data augmentation, innovative algorithmic adjustments, and novel architectural features to address class label distribution effectively. To ensure the multiple aspects of training process, we have customized the data augmentation technique for medical imaging with multi-dimensional angles. The multi-dimensional augmentation technique helps to reduce the bias towards majority classes. We have implemented novel attention mechanisms, i.e., Enhanced Attention Module (EAM) and spatial attention. These attention mechanisms enhance the focus of the model on the most relevant features. Further, our architecture incorporates a dual decoder system and Pooling Integration Layer (PIL) to capture accurate foreground and background details. We also introduce a hybrid loss function, which is designed to handle the class imbalance by guiding the training process. For experimental purposes, we have used multiple datasets such as Digital Database Thyroid Image (DDTI), Breast Ultrasound Images Dataset (BUSI) and LiTS MICCAI 2017 to demonstrate the prowess of the proposed network using key evaluation metrics, i.e., IoU, Dice coefficient, precision, and recall.
{"title":"Towards Reliable Healthcare Imaging: A Multifaceted Approach in Class Imbalance Handling for Medical Image Segmentation.","authors":"Lijuan Cui, Mingquan Xu, Chao Liu, Tianyu Liu, Xiaoting Yan, Yan Zhang, Xiaofeng Yang","doi":"10.1007/s12539-025-00726-2","DOIUrl":"10.1007/s12539-025-00726-2","url":null,"abstract":"<p><p>Class imbalance is a dominant challenge in medical image segmentation when dealing with MRI images from highly imbalanced datasets. This study introduces a comprehensive, multifaceted approach to enhance the accuracy and reliability of segmentation models under such conditions. Our model integrates advanced data augmentation, innovative algorithmic adjustments, and novel architectural features to address class label distribution effectively. To ensure the multiple aspects of training process, we have customized the data augmentation technique for medical imaging with multi-dimensional angles. The multi-dimensional augmentation technique helps to reduce the bias towards majority classes. We have implemented novel attention mechanisms, i.e., Enhanced Attention Module (EAM) and spatial attention. These attention mechanisms enhance the focus of the model on the most relevant features. Further, our architecture incorporates a dual decoder system and Pooling Integration Layer (PIL) to capture accurate foreground and background details. We also introduce a hybrid loss function, which is designed to handle the class imbalance by guiding the training process. For experimental purposes, we have used multiple datasets such as Digital Database Thyroid Image (DDTI), Breast Ultrasound Images Dataset (BUSI) and LiTS MICCAI 2017 to demonstrate the prowess of the proposed network using key evaluation metrics, i.e., IoU, Dice coefficient, precision, and recall.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"614-633"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12289783/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144583786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-07-07DOI: 10.1007/s12539-025-00733-3
Mengyun Yang, Bin Yang, Jiajun Chen, Xiwei Tang, Guihua Duan
Computational drug repurposing utilizes data analysis and predictive models to identify new uses for existing drugs and new drugs, significantly improving research efficiency and reducing costs compared to traditional screening methods. Due to the limitations of current computational models in extracting deep key features, we develop a novel drug repurposing model based on the deep non-negative matrix factorization (DNMF-DDA) to enhance the accuracy of drug-disease association predictions. The model leverages similarity and known association data to extract low-rank features from complex data spaces, allowing for the prediction of potential drug-disease associations. To improve performance for novel drugs, we apply the k-nearest neighbors (KNN) algorithm for preprocessing, increasing the density of the matrix's prior information. Next, we construct two integrated matrices based on the similarities of drugs and diseases, respectively, and the optimized association data. During deep matrix factorization, we incorporate graph Laplacian and relaxed regularization constraints to optimize local graph features. This multi-layer optimization enhances the model's understanding of complex drug-disease relationships, effectively mitigating the negative impact of insufficient prior information during cold-start tests. Furthermore, we incorporate non-negativity constraints to ensure that the prediction results are biologically meaningful. To evaluate the performance of DNMF-DDA, we conducted cold-start test and 10-fold cross-validation on three datasets and systematically compared it with five state-of-the-art drug repurposing methods. The results demonstrate that DNMF-DDA performs exceptionally well in predicting drug-disease associations, significantly outperforming existing approaches. Our proposed method not only efficiently handles high-dimensional data but also exhibits superior performance, providing new insights for drug development. Moreover, the case study further validated the significant practical value of the DNMF-DDA model in practical applications.
{"title":"A Novel Drug-Disease Association Prediction Method Based on Deep Non-Negative Matrix Factorization with Local Graph Feature.","authors":"Mengyun Yang, Bin Yang, Jiajun Chen, Xiwei Tang, Guihua Duan","doi":"10.1007/s12539-025-00733-3","DOIUrl":"10.1007/s12539-025-00733-3","url":null,"abstract":"<p><p>Computational drug repurposing utilizes data analysis and predictive models to identify new uses for existing drugs and new drugs, significantly improving research efficiency and reducing costs compared to traditional screening methods. Due to the limitations of current computational models in extracting deep key features, we develop a novel drug repurposing model based on the deep non-negative matrix factorization (DNMF-DDA) to enhance the accuracy of drug-disease association predictions. The model leverages similarity and known association data to extract low-rank features from complex data spaces, allowing for the prediction of potential drug-disease associations. To improve performance for novel drugs, we apply the k-nearest neighbors (KNN) algorithm for preprocessing, increasing the density of the matrix's prior information. Next, we construct two integrated matrices based on the similarities of drugs and diseases, respectively, and the optimized association data. During deep matrix factorization, we incorporate graph Laplacian and relaxed regularization constraints to optimize local graph features. This multi-layer optimization enhances the model's understanding of complex drug-disease relationships, effectively mitigating the negative impact of insufficient prior information during cold-start tests. Furthermore, we incorporate non-negativity constraints to ensure that the prediction results are biologically meaningful. To evaluate the performance of DNMF-DDA, we conducted cold-start test and 10-fold cross-validation on three datasets and systematically compared it with five state-of-the-art drug repurposing methods. The results demonstrate that DNMF-DDA performs exceptionally well in predicting drug-disease associations, significantly outperforming existing approaches. Our proposed method not only efficiently handles high-dimensional data but also exhibits superior performance, providing new insights for drug development. Moreover, the case study further validated the significant practical value of the DNMF-DDA model in practical applications.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"598-613"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12289730/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144583785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-05-07DOI: 10.1007/s12539-025-00703-9
Xiaomei Yang, Meng Liao, Bin Ye, Junfeng Xia, Jianping Zhao
Enhancers are short DNA fragments capable of significantly increase the frequency of gene transcription. They often exert their effects on targeted genes over long distances, either in cis or in trans configurations. Identifying enhancers poses a challenge due to their variable position and sensitivities. Genetic variants within enhancer regions have been implicated in human diseases, highlighting critical importance of enhancers identification and strength prediction. Here, we develop a two-layer predictor named iEnhancer-GDM to identify enhancers and to predict enhancer strength. To address the challenges posed by the limited size of enhancer training dataset, which could cause issues such as model overfitting and low classification accuracy, we introduce a Wasserstein generative adversarial network (WGAN-GP) to augment the dataset. We employ a dna2vec embedding layer to encode raw DNA sequences into numerical feature representations, and then integrate multi-scale convolutional neural network, bidirectional long short-term memory network and multi-head attention mechanism for feature representation and classification. Our results validate the effectiveness of data augmentation in WGAN-GP. Our model iEnhancer-GDM achieves superior performance on an independent test dataset, and outperforms the existing models with improvements of 2.45% for enhancer identification and 11.5% for enhancer strength prediction by benchmarking against current methods. iEnhancer-GDM advances the precise enhancer identification and strength prediction, thereby helping to understand the functions of enhancers and their associations on genomics.
{"title":"iEnhancer-GDM: A Deep Learning Framework Based on Generative Adversarial Network and Multi-head Attention Mechanism to Identify Enhancers and Their Strength.","authors":"Xiaomei Yang, Meng Liao, Bin Ye, Junfeng Xia, Jianping Zhao","doi":"10.1007/s12539-025-00703-9","DOIUrl":"10.1007/s12539-025-00703-9","url":null,"abstract":"<p><p>Enhancers are short DNA fragments capable of significantly increase the frequency of gene transcription. They often exert their effects on targeted genes over long distances, either in cis or in trans configurations. Identifying enhancers poses a challenge due to their variable position and sensitivities. Genetic variants within enhancer regions have been implicated in human diseases, highlighting critical importance of enhancers identification and strength prediction. Here, we develop a two-layer predictor named iEnhancer-GDM to identify enhancers and to predict enhancer strength. To address the challenges posed by the limited size of enhancer training dataset, which could cause issues such as model overfitting and low classification accuracy, we introduce a Wasserstein generative adversarial network (WGAN-GP) to augment the dataset. We employ a dna2vec embedding layer to encode raw DNA sequences into numerical feature representations, and then integrate multi-scale convolutional neural network, bidirectional long short-term memory network and multi-head attention mechanism for feature representation and classification. Our results validate the effectiveness of data augmentation in WGAN-GP. Our model iEnhancer-GDM achieves superior performance on an independent test dataset, and outperforms the existing models with improvements of 2.45% for enhancer identification and 11.5% for enhancer strength prediction by benchmarking against current methods. iEnhancer-GDM advances the precise enhancer identification and strength prediction, thereby helping to understand the functions of enhancers and their associations on genomics.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"662-672"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144012930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-06-02DOI: 10.1007/s12539-025-00716-4
Fan Zhang, Chaoyang Liu, Binjie Wang, Xiaopan Chen, Xinhong Zhang
Non-coding RNAs (ncRNAs) are one of the components of epigenetic mechanisms that regulates gene expression. Studying ncRNA-protein interactions (NPI) can help to explore a wide range of biological features and related diseases. Traditional NPI research methods often require expensive equipment, a lot of time and labor. With the abundant samples accumulated from traditional experiments, remarkable progress has been made in the study of NPI by computational methods. Heterogeneous graph neural network is a deep learning method that synthesizes heterogeneous types of data as well as network topology. In this study, we propose an NPI-HetGNN model for NPI prediction based on heterogeneous graph neural networks. Firstly, initial features are constructed by integrating the sequence properties of ncRNA and protein data as well as the topology of heterogeneous connections. Then, the multilevel homogeneous subgraph is obtained and its semantic information is aggregated by metapath walking. At the same time, the homogeneous node information is fused within the subgraph metapath. To enhance feature extraction ability of the network, an energy-constrained self-attention module is introduced. Due to the lack of wet lab validation conditions, this study adopts computational verification. The performance of the NPI-HetGNN model on four benchmark datasets is experimentally verified. Ablation experiments also confirmed the comprehensiveness and validity of our model design. The experimental results show that comparing with six state-of-the-art methods, our NPI-HetGNN achieves very satisfactory results on all four datasets.
{"title":"NPI-HetGNN: A Prediction Model of ncRNA-Protein Interactions Based on Heterogeneous Graph Neural Networks.","authors":"Fan Zhang, Chaoyang Liu, Binjie Wang, Xiaopan Chen, Xinhong Zhang","doi":"10.1007/s12539-025-00716-4","DOIUrl":"10.1007/s12539-025-00716-4","url":null,"abstract":"<p><p>Non-coding RNAs (ncRNAs) are one of the components of epigenetic mechanisms that regulates gene expression. Studying ncRNA-protein interactions (NPI) can help to explore a wide range of biological features and related diseases. Traditional NPI research methods often require expensive equipment, a lot of time and labor. With the abundant samples accumulated from traditional experiments, remarkable progress has been made in the study of NPI by computational methods. Heterogeneous graph neural network is a deep learning method that synthesizes heterogeneous types of data as well as network topology. In this study, we propose an NPI-HetGNN model for NPI prediction based on heterogeneous graph neural networks. Firstly, initial features are constructed by integrating the sequence properties of ncRNA and protein data as well as the topology of heterogeneous connections. Then, the multilevel homogeneous subgraph is obtained and its semantic information is aggregated by metapath walking. At the same time, the homogeneous node information is fused within the subgraph metapath. To enhance feature extraction ability of the network, an energy-constrained self-attention module is introduced. Due to the lack of wet lab validation conditions, this study adopts computational verification. The performance of the NPI-HetGNN model on four benchmark datasets is experimentally verified. Ablation experiments also confirmed the comprehensiveness and validity of our model design. The experimental results show that comparing with six state-of-the-art methods, our NPI-HetGNN achieves very satisfactory results on all four datasets.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"730-743"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144198997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2024-09-26DOI: 10.1007/s12539-024-00654-7
Yaowen Gu, Zidu Xu, Carl Yang
Computational drug repositioning, through predicting drug-disease associations (DDA), offers significant potential for discovering new drug indications. Current methods incorporate graph neural networks (GNN) on drug-disease heterogeneous networks to predict DDAs, achieving notable performances compared to traditional machine learning and matrix factorization approaches. However, these methods depend heavily on network topology, hampered by incomplete and noisy network data, and overlook the wealth of biomedical knowledge available. Correspondingly, large language models (LLMs) excel in graph search and relational reasoning, which can possibly enhance the integration of comprehensive biomedical knowledge into drug and disease profiles. In this study, we first investigate the contribution of LLM-inferred knowledge representation in drug repositioning and DDA prediction. A zero-shot prompting template was designed for LLM to extract high-quality knowledge descriptions for drug and disease entities, followed by embedding generation from language models to transform the discrete text to continual numerical representation. Then, we proposed LLM-DDA with three different model architectures (LLM-DDANode Feat, LLM-DDADual GNN, LLM-DDAGNN-AE) to investigate the best fusion mode for LLM-based embeddings. Extensive experiments on four DDA benchmarks show that, LLM-DDAGNN-AE achieved the optimal performance compared to 11 baselines with the overall relative improvement in AUPR of 23.22%, F1-Score of 17.20%, and precision of 25.35%. Meanwhile, selected case studies of involving Prednisone and Allergic Rhinitis highlighted the model's capability to identify reliable DDAs and knowledge descriptions, supported by existing literature. This study showcases the utility of LLMs in drug repositioning with its generality and applicability in other biomedical relation prediction tasks.
{"title":"Empowering Graph Neural Network-Based Computational Drug Repositioning with Large Language Model-Inferred Knowledge Representation.","authors":"Yaowen Gu, Zidu Xu, Carl Yang","doi":"10.1007/s12539-024-00654-7","DOIUrl":"10.1007/s12539-024-00654-7","url":null,"abstract":"<p><p>Computational drug repositioning, through predicting drug-disease associations (DDA), offers significant potential for discovering new drug indications. Current methods incorporate graph neural networks (GNN) on drug-disease heterogeneous networks to predict DDAs, achieving notable performances compared to traditional machine learning and matrix factorization approaches. However, these methods depend heavily on network topology, hampered by incomplete and noisy network data, and overlook the wealth of biomedical knowledge available. Correspondingly, large language models (LLMs) excel in graph search and relational reasoning, which can possibly enhance the integration of comprehensive biomedical knowledge into drug and disease profiles. In this study, we first investigate the contribution of LLM-inferred knowledge representation in drug repositioning and DDA prediction. A zero-shot prompting template was designed for LLM to extract high-quality knowledge descriptions for drug and disease entities, followed by embedding generation from language models to transform the discrete text to continual numerical representation. Then, we proposed LLM-DDA with three different model architectures (LLM-DDA<sub>Node Feat</sub>, LLM-DDA<sub>Dual GNN</sub>, LLM-DDA<sub>GNN-AE</sub>) to investigate the best fusion mode for LLM-based embeddings. Extensive experiments on four DDA benchmarks show that, LLM-DDA<sub>GNN-AE</sub> achieved the optimal performance compared to 11 baselines with the overall relative improvement in AUPR of 23.22%, F1-Score of 17.20%, and precision of 25.35%. Meanwhile, selected case studies of involving Prednisone and Allergic Rhinitis highlighted the model's capability to identify reliable DDAs and knowledge descriptions, supported by existing literature. This study showcases the utility of LLMs in drug repositioning with its generality and applicability in other biomedical relation prediction tasks.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"698-715"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142346018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}