Copy number variation (CNV) is a major type of structural variation (SV) that plays critical roles in genetic diversity and disease. Currently, many CNV detection tools have been developed. Although each tool exhibits different advantages under specific scenarios, they still have disadvantages such as suboptimal sensitivity, imprecise breakpoint resolution, and reduced robustness in complex sequencing environments. Developing more effective CNV detection tools by building upon the strengths of existing tools presents a significant challenge in the field. To fully leverage the detection results of existing tools and improve the accuracy of CNV detection under complex sequencing conditions, a new method called SSLCNV (semi-supervised learning framework for CNV detection) is proposed. It combines consensus-based pseudo-labeling using density-based clustering. SSLCNV generates high-confidence pseudo-labels by intersecting CNV predictions from four representative tools (CNVkit, GROM-RD, Matchclips2, OTSUCNV) and uses these as core seeds for clustering. Additionally, SSLCNV introduces a new constraint z-score into the DBSCAN algorithm to enhance clustering accuracy. By leveraging the improved DBSCAN and incorporating reliable labels, SSLCNV effectively detects CNV from partially labeled and unlabeled data. Comprehensive evaluations on both simulated and real datasets demonstrate that SSLCNV consistently achieves superior F1-scores compared to existing tools across diverse sequencing depths and tumor purities. Importantly, it maintains robust performance under low-coverage conditions, yielding higher recall without a substantial loss in precision. SSLCNV offers a scalable and accurate solution for CNV detection, particularly advantageous in scenarios with complex genomic backgrounds.
拷贝数变异(Copy number variation, CNV)是一种主要的结构变异(structural variation, SV),在遗传多样性和疾病中起着关键作用。目前,已经开发了许多CNV检测工具。尽管每种工具在特定的场景下表现出不同的优势,但它们仍然存在缺点,例如次优灵敏度、不精确的断点分辨率,以及在复杂的测序环境中鲁棒性降低。在现有工具的基础上开发更有效的CNV检测工具是该领域面临的重大挑战。为了充分利用现有工具的检测结果,提高复杂测序条件下CNV检测的准确性,提出了一种新的方法SSLCNV(半监督学习框架for CNV检测)。它结合了基于共识的伪标记和基于密度的聚类。SSLCNV通过交叉来自四个代表性工具(CNVkit, GROM-RD, Matchclips2, OTSUCNV)的CNV预测来生成高置信度的伪标签,并使用这些作为聚类的核心种子。此外,SSLCNV在DBSCAN算法中引入了一个新的约束z-score,以提高聚类精度。通过利用改进的DBSCAN并结合可靠的标签,SSLCNV可以有效地从部分标记和未标记的数据中检测到CNV。对模拟和真实数据集的综合评估表明,与现有工具相比,SSLCNV在不同测序深度和肿瘤纯度方面始终获得更高的f1分数。重要的是,它在低覆盖率条件下保持稳健的性能,在精度上没有实质性损失的情况下产生更高的召回率。SSLCNV为CNV检测提供了一种可扩展且准确的解决方案,在复杂基因组背景的情况下尤其具有优势。
{"title":"SSLCNV: A Semi-supervised Learning Framework for Accurate Copy Number Variation Detection.","authors":"Ruchao Du, Jinxin Dong, Hua Jiang, Minyong Qi, Yuxi Zhang, Ranran Sun, Mengke Xu","doi":"10.1007/s12539-025-00795-3","DOIUrl":"https://doi.org/10.1007/s12539-025-00795-3","url":null,"abstract":"<p><p>Copy number variation (CNV) is a major type of structural variation (SV) that plays critical roles in genetic diversity and disease. Currently, many CNV detection tools have been developed. Although each tool exhibits different advantages under specific scenarios, they still have disadvantages such as suboptimal sensitivity, imprecise breakpoint resolution, and reduced robustness in complex sequencing environments. Developing more effective CNV detection tools by building upon the strengths of existing tools presents a significant challenge in the field. To fully leverage the detection results of existing tools and improve the accuracy of CNV detection under complex sequencing conditions, a new method called SSLCNV (semi-supervised learning framework for CNV detection) is proposed. It combines consensus-based pseudo-labeling using density-based clustering. SSLCNV generates high-confidence pseudo-labels by intersecting CNV predictions from four representative tools (CNVkit, GROM-RD, Matchclips2, OTSUCNV) and uses these as core seeds for clustering. Additionally, SSLCNV introduces a new constraint z-score into the DBSCAN algorithm to enhance clustering accuracy. By leveraging the improved DBSCAN and incorporating reliable labels, SSLCNV effectively detects CNV from partially labeled and unlabeled data. Comprehensive evaluations on both simulated and real datasets demonstrate that SSLCNV consistently achieves superior F1-scores compared to existing tools across diverse sequencing depths and tumor purities. Importantly, it maintains robust performance under low-coverage conditions, yielding higher recall without a substantial loss in precision. SSLCNV offers a scalable and accurate solution for CNV detection, particularly advantageous in scenarios with complex genomic backgrounds.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145633121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-24DOI: 10.1007/s12539-025-00787-3
Huan Liu, Hanyu Luo, Lingyun Luo, Pingjian Ding
Purpose: Enhancers are critical non-coding regulatory elements, but their prediction remains challenging due to their variability and the absence of clear sequence motifs. This study aims to promote enhancer classification through a novel framework integrating DNA sequence and shape features, addressing the limitations of sequence-only models and improving prediction performance across diverse genomic contexts.
Methods: We propose iEnhancer-Flow, a dual-branch model that integrates DNABERT-2 for extracting robust sequence representations and a hybrid convolutional network-based branch for DNA shape information. Drawing inspiration from central-difference techniques in image processing, the shape branch utilizes similar methods to capture local structural variations. The extracted sequence and shape features are fused via a flow attention mechanism to facilitate dynamic interaction between these complementary feature sets. The combined features are further enhanced with a weighted residual connection and attention pooling before being passed to an MLP classifier for final enhancer prediction.
Results: iEnhancer-Flow consistently outperformed competing methods, achieving significant improvements in balanced accuracy (Bacc), Matthews correlation coefficient (MCC), and other key metrics across six of the eight cell lines tested. For the remaining two cell lines, the model achieved comparable performance across several key metrics, suggesting its stability and robustness in diverse biological contexts.
Conclusion: The integration of sequence and DNA shape information in iEnhancer-Flow marks a significant advancement in enhancer prediction by capturing complementary regulatory signals beyond traditional sequence features. These findings suggest that understanding genomic regulation requires a comprehensive view, incorporating both sequence and structural contexts.
{"title":"iEnhancer-Flow: Integrating Transformer-Based Sequence Learning with DNA Shape Insights for Robust Enhancer Prediction.","authors":"Huan Liu, Hanyu Luo, Lingyun Luo, Pingjian Ding","doi":"10.1007/s12539-025-00787-3","DOIUrl":"https://doi.org/10.1007/s12539-025-00787-3","url":null,"abstract":"<p><strong>Purpose: </strong>Enhancers are critical non-coding regulatory elements, but their prediction remains challenging due to their variability and the absence of clear sequence motifs. This study aims to promote enhancer classification through a novel framework integrating DNA sequence and shape features, addressing the limitations of sequence-only models and improving prediction performance across diverse genomic contexts.</p><p><strong>Methods: </strong>We propose iEnhancer-Flow, a dual-branch model that integrates DNABERT-2 for extracting robust sequence representations and a hybrid convolutional network-based branch for DNA shape information. Drawing inspiration from central-difference techniques in image processing, the shape branch utilizes similar methods to capture local structural variations. The extracted sequence and shape features are fused via a flow attention mechanism to facilitate dynamic interaction between these complementary feature sets. The combined features are further enhanced with a weighted residual connection and attention pooling before being passed to an MLP classifier for final enhancer prediction.</p><p><strong>Results: </strong>iEnhancer-Flow consistently outperformed competing methods, achieving significant improvements in balanced accuracy (Bacc), Matthews correlation coefficient (MCC), and other key metrics across six of the eight cell lines tested. For the remaining two cell lines, the model achieved comparable performance across several key metrics, suggesting its stability and robustness in diverse biological contexts.</p><p><strong>Conclusion: </strong>The integration of sequence and DNA shape information in iEnhancer-Flow marks a significant advancement in enhancer prediction by capturing complementary regulatory signals beyond traditional sequence features. These findings suggest that understanding genomic regulation requires a comprehensive view, incorporating both sequence and structural contexts.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145587294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Drug-drug interactions (DDIs) are crucial throughout various stages of drug development. Using computer-aided methods for accurate prediction of DDIs can enhance clinical safety and accelerate drug discovery. However, most existing deep learning methods heavily rely on the connectivity information between drugs. The neglect of the large number of potential DDI relationships can hinder the model's ability to extract meaningful information, thereby limiting its generalization capacity. To address these limitations, we propose IMF-DDI, an innovative DDI prediction framework that obtains drug molecule representations for DDI prediction by combining information from multiple external entities. First, our proposed information mapping module enables the model to capture the associations between drug molecules in terms of their interactions with multiple external entities. Meanwhile, the multi-source information fusion module efficiently integrates information from multiple external entities to generate the final representations of drug molecules. We carefully designed three distinct experimental tasks to validate the effectiveness of IMF-DDI. Our method establishes the current state-of-the-art across all tasks on the DrugBank dataset, while achieving the best performance in most tasks on the TWOSIDES dataset.
{"title":"IMF-DDI: Information Mapping and Fusion Framework for Drug-drug Interaction Prediction.","authors":"Xiaoyang Li, Yuhao Zhang, Yafei Liu, Xinyu Lu, Peirong Ma, Yafei Li, Masaru Kitsuregawa, Yanhui Gu","doi":"10.1007/s12539-025-00781-9","DOIUrl":"https://doi.org/10.1007/s12539-025-00781-9","url":null,"abstract":"<p><p>Drug-drug interactions (DDIs) are crucial throughout various stages of drug development. Using computer-aided methods for accurate prediction of DDIs can enhance clinical safety and accelerate drug discovery. However, most existing deep learning methods heavily rely on the connectivity information between drugs. The neglect of the large number of potential DDI relationships can hinder the model's ability to extract meaningful information, thereby limiting its generalization capacity. To address these limitations, we propose IMF-DDI, an innovative DDI prediction framework that obtains drug molecule representations for DDI prediction by combining information from multiple external entities. First, our proposed information mapping module enables the model to capture the associations between drug molecules in terms of their interactions with multiple external entities. Meanwhile, the multi-source information fusion module efficiently integrates information from multiple external entities to generate the final representations of drug molecules. We carefully designed three distinct experimental tasks to validate the effectiveness of IMF-DDI. Our method establishes the current state-of-the-art across all tasks on the DrugBank dataset, while achieving the best performance in most tasks on the TWOSIDES dataset.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145563979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-18DOI: 10.1007/s12539-025-00789-1
Jie Wang, Xin Huang, Hulin Kuang, Cheng Yan
Cancer is a complex and lethal disease influenced by multiple factors, and accurate subtyping is crucial for personalized treatment and prognostic evaluation. Although deep learning has made progress in cancer subtype identification, existing methods still face challenges in capturing high-order biological relationships, often overlook omics-specific information, and suffer from information loss caused by conventional feature strategies. To address these challenges, we propose Subtype-HM, a novel cancer subtype identification method based on hypergraph learning and multi-omics data. We employ multi-level hypergraphs to model complex biological structures and design a hypergraph propagation network to capture both intra- and inter-omics correlations, effectively simulating high-order biological relationships. To preserve omics-specific semantics and enrich hypergraph representations, we introduce a parallel discriminator-guided attention module that extracts omics-specific features and complements the correlated representation with unique omics-specific information. Furthermore, to avoid the information loss caused by feature fusion, we propose a multi-omics contrastive entropy alignment that aligns subtype predictions across omics while retaining their unique semantics. Experimental results on TCGA cancer datasets demonstrate that Subtype-HM outperforms 14 methods in cancer subtype identification, achieving the highest average survival analysis([Formula: see text] = 5.0) and enriched clinical parameters (3.1 on average). The identified subtypes demonstrate high biological interpretability through GO and KEGG enrichment analyses.
{"title":"Subtype-HM: A Novel Cancer Subtype Identification Method Based on Hypergraph Learning and Multi-omics Data.","authors":"Jie Wang, Xin Huang, Hulin Kuang, Cheng Yan","doi":"10.1007/s12539-025-00789-1","DOIUrl":"https://doi.org/10.1007/s12539-025-00789-1","url":null,"abstract":"<p><p>Cancer is a complex and lethal disease influenced by multiple factors, and accurate subtyping is crucial for personalized treatment and prognostic evaluation. Although deep learning has made progress in cancer subtype identification, existing methods still face challenges in capturing high-order biological relationships, often overlook omics-specific information, and suffer from information loss caused by conventional feature strategies. To address these challenges, we propose Subtype-HM, a novel cancer subtype identification method based on hypergraph learning and multi-omics data. We employ multi-level hypergraphs to model complex biological structures and design a hypergraph propagation network to capture both intra- and inter-omics correlations, effectively simulating high-order biological relationships. To preserve omics-specific semantics and enrich hypergraph representations, we introduce a parallel discriminator-guided attention module that extracts omics-specific features and complements the correlated representation with unique omics-specific information. Furthermore, to avoid the information loss caused by feature fusion, we propose a multi-omics contrastive entropy alignment that aligns subtype predictions across omics while retaining their unique semantics. Experimental results on TCGA cancer datasets demonstrate that Subtype-HM outperforms 14 methods in cancer subtype identification, achieving the highest average survival analysis([Formula: see text] = 5.0) and enriched clinical parameters (3.1 on average). The identified subtypes demonstrate high biological interpretability through GO and KEGG enrichment analyses.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145549140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The treatment of hypopharyngeal cancer faces complex challenges, and accurate prediction of chemotherapy sensitivity is crucial for personalized treatment. In this study, a multimodal fusion network based on deep learning was used to classify the chemotherapy sensitivity of hypopharyngeal cancer, and the prediction accuracy was improved by integrating 3D CT images and radiomic features. The preprocessed and enhanced 3D CT images were analyzed by 3D ResNet branches to extract spatial features; the radiomic features screened by LASSO regression were processed by three layers of fully connected branches to analyze the tabular data. The extracted vectors were fused by fully connected layers, using complementary advantages to capture complex spatial dependencies and detailed radiomic features. Experiments on the manually segmented NKU-TMU-hphc dataset (containing 102 hypopharyngeal cancer CT images) showed that the multimodal fusion network had high accuracy and outperformed single-modality methods and other models in multiple evaluation indicators. Statistical analysis was performed on the extracted features and clinical characteristics. The model effectively integrates image and clinical data, provides a new method for chemotherapy sensitivity classification, and is expected to improve personalized medicine.
{"title":"HPCSMN: A Classification Method of Chemotherapy Sensitivity of Hypopharyngeal Cancer Based on Multimodal Network.","authors":"Weiqi Fu, Haiyan Li, Xiongwen Quan, Xudong Wang, Wanwan Huang, Han Zhang","doi":"10.1007/s12539-025-00783-7","DOIUrl":"https://doi.org/10.1007/s12539-025-00783-7","url":null,"abstract":"<p><p>The treatment of hypopharyngeal cancer faces complex challenges, and accurate prediction of chemotherapy sensitivity is crucial for personalized treatment. In this study, a multimodal fusion network based on deep learning was used to classify the chemotherapy sensitivity of hypopharyngeal cancer, and the prediction accuracy was improved by integrating 3D CT images and radiomic features. The preprocessed and enhanced 3D CT images were analyzed by 3D ResNet branches to extract spatial features; the radiomic features screened by LASSO regression were processed by three layers of fully connected branches to analyze the tabular data. The extracted vectors were fused by fully connected layers, using complementary advantages to capture complex spatial dependencies and detailed radiomic features. Experiments on the manually segmented NKU-TMU-hphc dataset (containing 102 hypopharyngeal cancer CT images) showed that the multimodal fusion network had high accuracy and outperformed single-modality methods and other models in multiple evaluation indicators. Statistical analysis was performed on the extracted features and clinical characteristics. The model effectively integrates image and clinical data, provides a new method for chemotherapy sensitivity classification, and is expected to improve personalized medicine.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145549229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Drug development is a lengthy and intricate process, where predicting drug-target affinity (DTA) is a vital step. Although traditional experimental techniques yield accurate and reliable results, their high cost and limited throughput render them impractical for large-scale applications. In contrast, computational approaches offer notable advantages in terms of scalability and operational efficiency. However, most existing models focus solely on either sequence information or molecular graph structure, limiting their capacity to capture the multifaceted nature of drug-target interactions. In the present work, we propose GSF-DTA, a novel graph-sequence fusion framework for DTA prediction. GSF-DTA integrates graph-based structural features and sequence-derived semantic representations to capture the interplay between drugs and targets. Quantitative evaluations demonstrate that GSF-DTA achieves superior predictive accuracy and exhibits strong generalization capabilities on the large-scale BindingDB dataset. Notably, GSF-DTA demonstrates robust performance in cold-start scenarios, enabling effective prediction for previously unseen drugs or targets. Extensive ablation studies and interpretability analyses further validate the effectiveness and transparency of our approach. Overall, GSF-DTA provides a promising and generalizable strategy for improving DTA prediction accuracy, contributing to the acceleration of drug design and discovery.
{"title":"GSF-DTA: An Innovative Graph-Sequence Fusion Framework for Drug-Target Affinity Prediction.","authors":"Guiyang Zhang, Yuemei Wang, Danni Zhao, Pengmian Feng, Ting Zhang, Huachao Bin, Wei Chen","doi":"10.1007/s12539-025-00782-8","DOIUrl":"https://doi.org/10.1007/s12539-025-00782-8","url":null,"abstract":"<p><p>Drug development is a lengthy and intricate process, where predicting drug-target affinity (DTA) is a vital step. Although traditional experimental techniques yield accurate and reliable results, their high cost and limited throughput render them impractical for large-scale applications. In contrast, computational approaches offer notable advantages in terms of scalability and operational efficiency. However, most existing models focus solely on either sequence information or molecular graph structure, limiting their capacity to capture the multifaceted nature of drug-target interactions. In the present work, we propose GSF-DTA, a novel graph-sequence fusion framework for DTA prediction. GSF-DTA integrates graph-based structural features and sequence-derived semantic representations to capture the interplay between drugs and targets. Quantitative evaluations demonstrate that GSF-DTA achieves superior predictive accuracy and exhibits strong generalization capabilities on the large-scale BindingDB dataset. Notably, GSF-DTA demonstrates robust performance in cold-start scenarios, enabling effective prediction for previously unseen drugs or targets. Extensive ablation studies and interpretability analyses further validate the effectiveness and transparency of our approach. Overall, GSF-DTA provides a promising and generalizable strategy for improving DTA prediction accuracy, contributing to the acceleration of drug design and discovery.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145512649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-13DOI: 10.1007/s12539-025-00780-w
Dangguo Shao, Yuyang Zou, Lei Ma, Sanli Yi
Accurate prediction of protein-protein interaction (PPI) sites is fundamental to elucidating cellular mechanisms and advancing genomics. However, prevailing graph neural networks are constrained by two key limitations: they often neglect latent correlations between distinct protein graphs and oversimplify neighborhood feature aggregation using rudimentary statistics, thereby discarding vital distributional information. Here, we present MED-PPIS, a novel framework that addresses these challenges through a synergistic integration of architectural innovations. Our model uniquely combines an mLSTM-based matrix memory for capturing long-range sequential dependencies with a multi-order moment GNN that faithfully characterizes complex feature distributions. This is complemented by a graph external attention mechanism to learn universal structural motifs across proteins and a dual-axis attention architecture for efficient, multi-scale feature extraction. Compared to the strongest baseline on the Test_60 dataset, it achieves significant improvements across key metrics, including a 2.1% increase in the area under the precision-recall curve (AUPRC), 1.2% in the area under the receiver operating characteristic curve (AUROC), and 2.3% in F1-score. By providing superior predictive accuracy, our model offers a powerful transparent tool for dissecting the intricate landscapes of protein interactions, paving the way for new biological insights and therapeutic strategies.
{"title":"MED-PPIS: Multi-order Moments External Graph Attention Network with Dual-Axis Attention for Protein-Protein Interaction Site Prediction.","authors":"Dangguo Shao, Yuyang Zou, Lei Ma, Sanli Yi","doi":"10.1007/s12539-025-00780-w","DOIUrl":"https://doi.org/10.1007/s12539-025-00780-w","url":null,"abstract":"<p><p>Accurate prediction of protein-protein interaction (PPI) sites is fundamental to elucidating cellular mechanisms and advancing genomics. However, prevailing graph neural networks are constrained by two key limitations: they often neglect latent correlations between distinct protein graphs and oversimplify neighborhood feature aggregation using rudimentary statistics, thereby discarding vital distributional information. Here, we present MED-PPIS, a novel framework that addresses these challenges through a synergistic integration of architectural innovations. Our model uniquely combines an mLSTM-based matrix memory for capturing long-range sequential dependencies with a multi-order moment GNN that faithfully characterizes complex feature distributions. This is complemented by a graph external attention mechanism to learn universal structural motifs across proteins and a dual-axis attention architecture for efficient, multi-scale feature extraction. Compared to the strongest baseline on the Test_60 dataset, it achieves significant improvements across key metrics, including a 2.1% increase in the area under the precision-recall curve (AUPRC), 1.2% in the area under the receiver operating characteristic curve (AUROC), and 2.3% in F1-score. By providing superior predictive accuracy, our model offers a powerful transparent tool for dissecting the intricate landscapes of protein interactions, paving the way for new biological insights and therapeutic strategies.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145512602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-06DOI: 10.1007/s12539-025-00770-y
Yejin Kan, Dongyeon Kim, Jinkyung Yang, Gangman Yi
Advances in next-generation sequencing have led to an explosion in sequencing data, accelerating genome assembly research. However, draft genomes generated after scaffolding still contain unresolved gaps, often caused by repetitive regions and sequencing errors. These gaps may contain biologically meaningful sequences and thus require accurate resolution. However, existing gap-filling tools often exhibit limited reliability, especially when applied to large and complex eukaryotic genomes, due to their insufficient capacity to resolve repetitive regions or their heavy dependence on error-prone long reads. To address this challenge, we present GapSense, a robust gap-filling method that leverages similarity estimation using third-generation sequencing (TGS) reads. By quantifying pairwise similarity among candidate sequences, GapSense prioritizes informative regions and reconstructs gap sequences with higher accuracy. The proposed method introduces a novel similarity scoring mechanism that evaluates the geometric overlap of adjacent subregions to capture local structural variations and reduces noise from low-coverage and error-prone long reads. Experimental results on six representative species and three popular assemblers show that GapSense consistently outperforms existing tools in terms of gap-filling accuracy and contiguity, while maintaining low performance variability across different datasets. These findings demonstrate the effectiveness and generalizability of GapSense for accurate and scalable gap-filling.
{"title":"GapSense: Similarity Estimation-Based Gap Filler with TGS-Reads for Genome Assemblies.","authors":"Yejin Kan, Dongyeon Kim, Jinkyung Yang, Gangman Yi","doi":"10.1007/s12539-025-00770-y","DOIUrl":"https://doi.org/10.1007/s12539-025-00770-y","url":null,"abstract":"<p><p>Advances in next-generation sequencing have led to an explosion in sequencing data, accelerating genome assembly research. However, draft genomes generated after scaffolding still contain unresolved gaps, often caused by repetitive regions and sequencing errors. These gaps may contain biologically meaningful sequences and thus require accurate resolution. However, existing gap-filling tools often exhibit limited reliability, especially when applied to large and complex eukaryotic genomes, due to their insufficient capacity to resolve repetitive regions or their heavy dependence on error-prone long reads. To address this challenge, we present GapSense, a robust gap-filling method that leverages similarity estimation using third-generation sequencing (TGS) reads. By quantifying pairwise similarity among candidate sequences, GapSense prioritizes informative regions and reconstructs gap sequences with higher accuracy. The proposed method introduces a novel similarity scoring mechanism that evaluates the geometric overlap of adjacent subregions to capture local structural variations and reduces noise from low-coverage and error-prone long reads. Experimental results on six representative species and three popular assemblers show that GapSense consistently outperforms existing tools in terms of gap-filling accuracy and contiguity, while maintaining low performance variability across different datasets. These findings demonstrate the effectiveness and generalizability of GapSense for accurate and scalable gap-filling.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145451891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-29DOI: 10.1007/s12539-025-00747-x
Chengyan Zhou, Xinliang Sun, Xiang Du, Min Zeng, Min Li
Drug repositioning is a promising strategy for accelerating drug development and reducing costs by identifying potential indications for existing drugs. Recently, technological advancements have enabled the development of numerous graph convolutional network (GCN)-based methods for drug repositioning. However, many existing methods overlook the distinct roles of nodes within drug-disease association graphs, limiting their ability to learn effective representations. To address this limitation, we propose a subgraph neural network enhanced by global similarity for drug repositioning, termed GSESNN. Specifically, GSESNN first extracts the subgraph of each drug-disease pair from the entire drug-disease graph. Then, GCN and a sort pooling strategy are utilized to learn the subgraph representation. In addition, to distinguish between different drug-disease pairs with the identical subgraph topology, GSESNN utilizes GCN to learn the similarity information of drugs and diseases, fusing it with the subgraph representation to produce the final representation. Finally, we regard the drug-disease association prediction as a graph classification task. Experimental results show that GSESNN outperforms the baseline model in drug repositioning tasks. Case studies on Alzheimer's disease and Gastric Cancer further demonstrate that our model successfully identifies more accurate drug-disease associations, highlighting its potential for practical applications in drug discovery.
{"title":"Subgraph Neural Networks Enhanced by Global Similarity for Drug Repositioning.","authors":"Chengyan Zhou, Xinliang Sun, Xiang Du, Min Zeng, Min Li","doi":"10.1007/s12539-025-00747-x","DOIUrl":"https://doi.org/10.1007/s12539-025-00747-x","url":null,"abstract":"<p><p>Drug repositioning is a promising strategy for accelerating drug development and reducing costs by identifying potential indications for existing drugs. Recently, technological advancements have enabled the development of numerous graph convolutional network (GCN)-based methods for drug repositioning. However, many existing methods overlook the distinct roles of nodes within drug-disease association graphs, limiting their ability to learn effective representations. To address this limitation, we propose a subgraph neural network enhanced by global similarity for drug repositioning, termed GSESNN. Specifically, GSESNN first extracts the subgraph of each drug-disease pair from the entire drug-disease graph. Then, GCN and a sort pooling strategy are utilized to learn the subgraph representation. In addition, to distinguish between different drug-disease pairs with the identical subgraph topology, GSESNN utilizes GCN to learn the similarity information of drugs and diseases, fusing it with the subgraph representation to produce the final representation. Finally, we regard the drug-disease association prediction as a graph classification task. Experimental results show that GSESNN outperforms the baseline model in drug repositioning tasks. Case studies on Alzheimer's disease and Gastric Cancer further demonstrate that our model successfully identifies more accurate drug-disease associations, highlighting its potential for practical applications in drug discovery.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145400687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}