首页 > 最新文献

Interdisciplinary Sciences: Computational Life Sciences最新文献

英文 中文
HAFMMDA: HIN2vec-Based Attentional Factorization Machines for Predicting Microbe-Drug Associations. HAFMMDA:基于hin2vec的注意力因子分解机器预测微生物与药物的关联。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-01 Epub Date: 2025-07-30 DOI: 10.1007/s12539-025-00710-w
Bo Wang, Junqi Wang, Xiaoxin Du, Jianfei Zhang, Yang He, Fangjian Ma

Emerging research continues to reveal the fundamental contributions of microbial communities to maintaining human physiological balance and advancing drug discovery. However, established wet-lab investigation techniques require significant time and resources. Contemporary research efforts have predominantly concentrated on establishing robust computational architectures to predict microbe-drug associations. Our research establishes a neural network architecture that synthesizes heterogeneous biological relationships with attentional factorization machines (HAFMMDA) to predict undiscovered microbe-drug linkages. The initial step involves assembling a heterogeneous network architecture integrating three key components: microbe similarity networks, drug similarity networks, and established microbe-drug interaction networks. HAFMMDA utilizes HIN2vec to extract feature representations of microbe-drug pairs. Finally, it combines second-order feature interactions and attention mechanism to perform comprehensive prediction. Five-fold cross-validation results confirmed excellent predictive performance with an AUC score of 0.9805, demonstrating statistically significant improvements over five contemporary baseline approaches. These findings corroborate HAFMMDA's effectiveness in uncovering verified drug-microorganism associations while simultaneously predicting innovative therapeutic-microbe relationships.

新兴研究不断揭示微生物群落在维持人体生理平衡和推进药物发现方面的基本贡献。然而,现有的湿实验室调查技术需要大量的时间和资源。当代的研究工作主要集中在建立强大的计算架构来预测微生物与药物的关联。我们的研究建立了一个神经网络架构,通过注意因子分解机器(HAFMMDA)综合异质生物关系来预测未发现的微生物-药物联系。第一步包括组装一个异构网络架构,集成三个关键组件:微生物相似网络、药物相似网络和已建立的微生物-药物相互作用网络。HAFMMDA利用HIN2vec提取微生物-药物对的特征表示。最后,结合二阶特征交互和注意机制进行综合预测。五重交叉验证结果证实了良好的预测性能,AUC得分为0.9805,与五种当代基线方法相比,具有统计学上的显著改善。这些发现证实了HAFMMDA在揭示已证实的药物-微生物关联同时预测创新治疗-微生物关系方面的有效性。
{"title":"HAFMMDA: HIN2vec-Based Attentional Factorization Machines for Predicting Microbe-Drug Associations.","authors":"Bo Wang, Junqi Wang, Xiaoxin Du, Jianfei Zhang, Yang He, Fangjian Ma","doi":"10.1007/s12539-025-00710-w","DOIUrl":"10.1007/s12539-025-00710-w","url":null,"abstract":"<p><p>Emerging research continues to reveal the fundamental contributions of microbial communities to maintaining human physiological balance and advancing drug discovery. However, established wet-lab investigation techniques require significant time and resources. Contemporary research efforts have predominantly concentrated on establishing robust computational architectures to predict microbe-drug associations. Our research establishes a neural network architecture that synthesizes heterogeneous biological relationships with attentional factorization machines (HAFMMDA) to predict undiscovered microbe-drug linkages. The initial step involves assembling a heterogeneous network architecture integrating three key components: microbe similarity networks, drug similarity networks, and established microbe-drug interaction networks. HAFMMDA utilizes HIN2vec to extract feature representations of microbe-drug pairs. Finally, it combines second-order feature interactions and attention mechanism to perform comprehensive prediction. Five-fold cross-validation results confirmed excellent predictive performance with an AUC score of 0.9805, demonstrating statistically significant improvements over five contemporary baseline approaches. These findings corroborate HAFMMDA's effectiveness in uncovering verified drug-microorganism associations while simultaneously predicting innovative therapeutic-microbe relationships.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"1083-1100"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144753273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ViruSeg: Harnessing the Power of Large Language-Image Model for Enhanced Virus Image Segmentation. 利用大语言图像模型增强病毒图像分割。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-01 Epub Date: 2025-06-27 DOI: 10.1007/s12539-025-00711-9
Shengxiang Wang, Xiangzheng Fu, Zhenya Du, Xinxin Liu, Qiaochu Mai, Linlin Zhuo, Boqia Xie, Quan Zou

The emergence of novel viral diseases, with SARS-CoV-2 as a stark example, poses increasing threats to public health, causing significant global morbidity and mortality. Accurate identification and segmentation of viral imaging are crucial for tracking virus progression and mutations, and for devising new treatment strategies. Advanced virus recognition and segmentation models, utilizing high-performance networks like U-Net, have achieved notable success. However, these models struggle with multiple challenges, including limited labeled virus images, significant morphological variability, and indistinct boundaries. Consequently, this study introduces ViruSeg, based on the EVA-02 large language-image pre-trained model and data augmentation techniques, designed to efficiently perform virus segmentation tasks. Initially, the ViruSeg model employs data augmentation techniques like cutout and image fine-tuning to enrich electron microscope virus images, enhancing model generalization and effectively delineating virus boundaries and different forms. Secondly, ViruSeg utilizes the EVA-02 pre-trained model to learn a universal representation of virus images, enhancing adaptability to data scarcity. Finally, virus segmentation is conducted using the Cascade Mask R-CNN (CMR) model. Comprehensive evaluations on benchmark datasets demonstrate the superior performance of ViruSeg compared to advanced virus segmentation methods. We anticipate that the proposed solution will advance virology research and the development of treatments for related diseases. All dataset and code are available through https://github.com/xiachashuanghua/project .

以SARS-CoV-2为典型例子的新型病毒性疾病的出现,对公共卫生构成越来越大的威胁,在全球造成巨大的发病率和死亡率。病毒成像的准确识别和分割对于跟踪病毒进展和突变以及制定新的治疗策略至关重要。利用U-Net等高性能网络的高级病毒识别和分割模型已取得显著成功。然而,这些模型面临着多重挑战,包括有限的标记病毒图像、显著的形态变异和模糊的边界。因此,本研究引入了基于EVA-02大型语言图像预训练模型和数据增强技术的ViruSeg,旨在有效地执行病毒分割任务。ViruSeg模型最初采用剪切、图像微调等数据增强技术丰富电子显微镜病毒图像,增强模型泛化能力,有效描绘病毒边界和不同形态。​最后,利用级联掩码R-CNN (CMR)模型进行病毒分割。对基准数据集的综合评估表明,与先进的病毒分割方法相比,ViruSeg具有优越的性能。我们预计,提出的解决方案将推动病毒学研究和相关疾病治疗的发展。所有数据集和代码可通过https://github.com/xiachashuanghua/project获得。
{"title":"ViruSeg: Harnessing the Power of Large Language-Image Model for Enhanced Virus Image Segmentation.","authors":"Shengxiang Wang, Xiangzheng Fu, Zhenya Du, Xinxin Liu, Qiaochu Mai, Linlin Zhuo, Boqia Xie, Quan Zou","doi":"10.1007/s12539-025-00711-9","DOIUrl":"10.1007/s12539-025-00711-9","url":null,"abstract":"<p><p>The emergence of novel viral diseases, with SARS-CoV-2 as a stark example, poses increasing threats to public health, causing significant global morbidity and mortality. Accurate identification and segmentation of viral imaging are crucial for tracking virus progression and mutations, and for devising new treatment strategies. Advanced virus recognition and segmentation models, utilizing high-performance networks like U-Net, have achieved notable success. However, these models struggle with multiple challenges, including limited labeled virus images, significant morphological variability, and indistinct boundaries. Consequently, this study introduces ViruSeg, based on the EVA-02 large language-image pre-trained model and data augmentation techniques, designed to efficiently perform virus segmentation tasks. Initially, the ViruSeg model employs data augmentation techniques like cutout and image fine-tuning to enrich electron microscope virus images, enhancing model generalization and effectively delineating virus boundaries and different forms. Secondly, ViruSeg utilizes the EVA-02 pre-trained model to learn a universal representation of virus images, enhancing adaptability to data scarcity. Finally, virus segmentation is conducted using the Cascade Mask R-CNN (CMR) model. Comprehensive evaluations on benchmark datasets demonstrate the superior performance of ViruSeg compared to advanced virus segmentation methods. We anticipate that the proposed solution will advance virology research and the development of treatments for related diseases. All dataset and code are available through https://github.com/xiachashuanghua/project .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"987-997"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144511839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DSMR: Dual-Stream Networks with Refinement Module for Unsupervised Multi-modal Image Registration. 基于改进模块的无监督多模态图像配准双流网络。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-01 Epub Date: 2025-04-19 DOI: 10.1007/s12539-025-00707-5
Lei Li, Liumin Zhu, Qifu Wang, Zhuoli Dong, Tianli Liao, Peng Li

 Multi-modal medical image registration aims to align images from different modalities to establish spatial correspondences. Although deep learning-based methods have shown great potential, the lack of explicit reference relations makes unsupervised multi-modal registration still a challenging task. In this paper, we propose a novel unsupervised dual-stream multi-modal registration framework (DSMR), which combines a dual-stream registration network with a refinement module. Unlike existing methods that treat multi-modal registration as a uni-modal problem using a translation network, DSMR leverages the moving, fixed and translated images to generate two deformation fields. Specifically, we first utilize a translation network to convert a moving image into a translated image similar to a fixed image. Then, we employ the dual-stream registration network to compute two deformation fields respectively: the initial deformation field generated from the fixed image and the moving image, and the translated deformation field generated from the translated image and the fixed image. The translated deformation field acts as a pseudo-ground truth to refine the initial deformation field and mitigate issues such as artificial features introduced by translation. Finally, we use the refinement module to enhance the deformation field by integrating registration errors and contextual information. Extensive experimental results show that our DSMR achieves exceptional performance, demonstrating its strong generalization in learning the spatial relationships between images from unsupervised modalities. The source code of this work is available at https://github.com/raylihaut/DSMR .

多模态医学图像配准的目的是将不同模态的图像进行对齐,建立空间对应关系。尽管基于深度学习的方法显示出巨大的潜力,但由于缺乏明确的参考关系,使得无监督多模态配准仍然是一项具有挑战性的任务。本文提出了一种新的无监督双流多模态配准框架(DSMR),该框架将双流配准网络与细化模块相结合。与使用平移网络将多模态配准视为单模态问题的现有方法不同,DSMR利用移动、固定和平移图像来生成两个变形场。具体来说,我们首先利用翻译网络将运动图像转换为类似于固定图像的翻译图像。然后,我们采用双流配准网络分别计算两个变形场:由固定图像和运动图像生成的初始变形场,以及由平移图像和固定图像生成的平移变形场。平移变形场作为伪地面真值,可以细化初始变形场,减轻平移引入的人为特征等问题。最后,我们利用精化模块整合配准误差和上下文信息来增强变形场。大量的实验结果表明,我们的DSMR取得了优异的性能,证明了它在从无监督模态中学习图像之间的空间关系方面具有很强的泛化能力。该工作的源代码可从https://github.com/raylihaut/DSMR获得。
{"title":"DSMR: Dual-Stream Networks with Refinement Module for Unsupervised Multi-modal Image Registration.","authors":"Lei Li, Liumin Zhu, Qifu Wang, Zhuoli Dong, Tianli Liao, Peng Li","doi":"10.1007/s12539-025-00707-5","DOIUrl":"10.1007/s12539-025-00707-5","url":null,"abstract":"<p><p> Multi-modal medical image registration aims to align images from different modalities to establish spatial correspondences. Although deep learning-based methods have shown great potential, the lack of explicit reference relations makes unsupervised multi-modal registration still a challenging task. In this paper, we propose a novel unsupervised dual-stream multi-modal registration framework (DSMR), which combines a dual-stream registration network with a refinement module. Unlike existing methods that treat multi-modal registration as a uni-modal problem using a translation network, DSMR leverages the moving, fixed and translated images to generate two deformation fields. Specifically, we first utilize a translation network to convert a moving image into a translated image similar to a fixed image. Then, we employ the dual-stream registration network to compute two deformation fields respectively: the initial deformation field generated from the fixed image and the moving image, and the translated deformation field generated from the translated image and the fixed image. The translated deformation field acts as a pseudo-ground truth to refine the initial deformation field and mitigate issues such as artificial features introduced by translation. Finally, we use the refinement module to enhance the deformation field by integrating registration errors and contextual information. Extensive experimental results show that our DSMR achieves exceptional performance, demonstrating its strong generalization in learning the spatial relationships between images from unsupervised modalities. The source code of this work is available at https://github.com/raylihaut/DSMR .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"804-821"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143985663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Multi-modal Drug Target Affinity Prediction Based on Graph Features and Pre-trained Sequence Embeddings. 基于图特征和预训练序列嵌入的多模态药物靶点亲和力预测。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-01 Epub Date: 2025-06-02 DOI: 10.1007/s12539-025-00713-7
Xin Tang, Xiujuan Lei, Lian Liu

With the advantages of reducing biochemical experiments and enabling the rapid screening of potential druggable compounds, accurate computational methods are essential for predicting Drug-Target affinity (DTA). Current deep learning-based DTA prediction methods predominantly concentrate on single-modal information from drugs or targets. In this article, we propose a new multi-modal DTA prediction method, MGSDTA, to integrate graph features and sequence features of drug molecules and target proteins. We extract features from the drug molecular graphs and target protein graphs, meanwhile, we extract sequence features using continuous embeddings generated by advanced self-supervised pre-trained models, Mol2vec and ProtVec, for drug substructures and target subsequences respectively. Finally, they are integrated with a weighted fusion module for DTA prediction. Experiments on benchmark datasets indicate that the performance of MGSDTA exceeds single-modal methods based solely on sequences or graphs.

精确的计算方法具有减少生化实验和快速筛选潜在可药物化合物的优点,是预测药物靶标亲和力(Drug-Target affinity, DTA)的必要条件。目前基于深度学习的DTA预测方法主要集中在药物或靶标的单模态信息上。在本文中,我们提出了一种新的多模态DTA预测方法MGSDTA,将药物分子和靶蛋白的图特征和序列特征相结合。我们从药物分子图和目标蛋白图中提取特征,同时,我们分别使用先进的自监督预训练模型Mol2vec和ProtVec生成的连续嵌入提取药物子结构和目标子序列的序列特征。最后,结合加权融合模块进行DTA预测。在基准数据集上的实验表明,MGSDTA的性能优于单纯基于序列或图的单模态方法。
{"title":"A Multi-modal Drug Target Affinity Prediction Based on Graph Features and Pre-trained Sequence Embeddings.","authors":"Xin Tang, Xiujuan Lei, Lian Liu","doi":"10.1007/s12539-025-00713-7","DOIUrl":"10.1007/s12539-025-00713-7","url":null,"abstract":"<p><p>With the advantages of reducing biochemical experiments and enabling the rapid screening of potential druggable compounds, accurate computational methods are essential for predicting Drug-Target affinity (DTA). Current deep learning-based DTA prediction methods predominantly concentrate on single-modal information from drugs or targets. In this article, we propose a new multi-modal DTA prediction method, MGSDTA, to integrate graph features and sequence features of drug molecules and target proteins. We extract features from the drug molecular graphs and target protein graphs, meanwhile, we extract sequence features using continuous embeddings generated by advanced self-supervised pre-trained models, Mol2vec and ProtVec, for drug substructures and target subsequences respectively. Finally, they are integrated with a weighted fusion module for DTA prediction. Experiments on benchmark datasets indicate that the performance of MGSDTA exceeds single-modal methods based solely on sequences or graphs.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"822-843"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144198994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OptimDase: An Algorithm for Predicting DNA Binding Sites with Combined Feature Encoding. 基于组合特征编码的DNA结合位点预测算法。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-01 Epub Date: 2025-06-10 DOI: 10.1007/s12539-025-00704-8
Zhendong Liu, Jun S Liu, Dongqing Wei, Rongjun Man, Jiamin Jiang, Bofeng Zhang, Liping Li, Zhiyong Zhao

Identifying DNA binding sites remains a critical task in bioinformatics, with applications ranging from gene regulation studies to drug design. Although progress has been made in computational techniques, we still face challenges such as data complexity and prediction accuracy. In this paper, we introduce OptimDase, a new algorithm. It integrates feature encoding with optimum decision-making frameworks to improve DNA binding site prediction. OptimDase integrates multi-scale scanning and feature selection strategies, making it highly effective for both classification and regression tasks. Our experiments demonstrate that OptimDase achieves superior performance with an accuracy of 0.8943 in classification tasks and an RMSE of 0.0054 in regression tasks, outperforming existing algorithms in key evaluation metrics. These results highlight OptimDase's portability and robustness, making it an effective solution for identifying DNA binding sites and advancing the applications of drug design.

识别DNA结合位点仍然是生物信息学中的一项关键任务,其应用范围从基因调控研究到药物设计。尽管计算技术已经取得了进步,但我们仍然面临着数据复杂性和预测准确性等挑战。本文介绍了一种新的算法OptimDase。它将特征编码与最佳决策框架相结合,以提高DNA结合位点的预测。OptimDase集成了多尺度扫描和特征选择策略,使其在分类和回归任务中都非常有效。我们的实验表明,OptimDase在分类任务中的准确率为0.8943,在回归任务中的RMSE为0.0054,在关键评估指标上优于现有算法。这些结果突出了OptimDase的可移植性和健壮性,使其成为识别DNA结合位点和推进药物设计应用的有效解决方案。
{"title":"OptimDase: An Algorithm for Predicting DNA Binding Sites with Combined Feature Encoding.","authors":"Zhendong Liu, Jun S Liu, Dongqing Wei, Rongjun Man, Jiamin Jiang, Bofeng Zhang, Liping Li, Zhiyong Zhao","doi":"10.1007/s12539-025-00704-8","DOIUrl":"10.1007/s12539-025-00704-8","url":null,"abstract":"<p><p>Identifying DNA binding sites remains a critical task in bioinformatics, with applications ranging from gene regulation studies to drug design. Although progress has been made in computational techniques, we still face challenges such as data complexity and prediction accuracy. In this paper, we introduce OptimDase, a new algorithm. It integrates feature encoding with optimum decision-making frameworks to improve DNA binding site prediction. OptimDase integrates multi-scale scanning and feature selection strategies, making it highly effective for both classification and regression tasks. Our experiments demonstrate that OptimDase achieves superior performance with an accuracy of 0.8943 in classification tasks and an RMSE of 0.0054 in regression tasks, outperforming existing algorithms in key evaluation metrics. These results highlight OptimDase's portability and robustness, making it an effective solution for identifying DNA binding sites and advancing the applications of drug design.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"791-803"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12672825/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144266102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MOPSOGAT: Predicting CircRNA-Disease Associations via Improved Multi-objective Particle Swarm Optimization and Graph Attention Network. 基于改进多目标粒子群优化和图关注网络的circrna -疾病关联预测。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-01 Epub Date: 2025-06-13 DOI: 10.1007/s12539-025-00725-3
Yuehao Wang, Pengli Lu

Recently increasing researches have discovered that circRNAs are remarkably reliable in organisms and play a crucial role as marker in many diseases. Although deep learning techniques has been universally applied to investigate the relationship of circRNA-disease, optimizing many parameters involved in these techniques for best performance has been a challenge. Therefore, we present, for the first time, a multi-objective particle swarm optimization algorithm to optimize the parameters in a graph attention network, ensuring that the model operates at peak efficiency. In addition, it also limits feature learning due to uneven distribution of different node types in heterogeneous graphs based on association relationships. We suggest a unique approach, MOPSOGAT, to overcome the aforementioned problems. MOPSOGAT is a method for predicting circRNA-disease associations utilizing the improved multi-objective particle swarm optimization (MOPSO) and the graph attention network. Initially, we obtain node sequences by utilizing multiple circRNA similarities and disease phenotypic similarities, and employing a heterogeneous graph with random walks incorporating jump and stay strategies. These sequences are then processed using word2vec to derive the neighbor vectors of the nodes, thus providing initial embeddings for circRNAs and diseases. Subsequently, in order to model convergence and diversity of the Pareto front solutions, an improved MOPSO algorithm is used to iteratively search for optimal solutions in the parameter space. After MOPSO optimization, parameters are fed into a graph attention network to further refine the model embedding. As a result, MOPSOGAT performs better than deep learning based methods, solely multi-objective optimization-based methods and machine learning-based ways. Moreover, the potential associations predicted by MOPSOGAT have been validated through case studies, further demonstrating the potential of MOPSOGAT in future biomedical research.

近年来越来越多的研究发现,环状rna在生物体中非常可靠,在许多疾病中起着重要的标记作用。尽管深度学习技术已被广泛应用于研究circrna与疾病的关系,但优化这些技术中涉及的许多参数以获得最佳性能一直是一个挑战。因此,我们首次提出了一种多目标粒子群优化算法来优化图关注网络中的参数,以确保模型在最高效率下运行。此外,由于基于关联关系的异构图中不同节点类型的分布不均匀,也限制了特征学习。我们建议一种独特的方法,MOPSOGAT,以克服上述问题。MOPSOGAT是一种利用改进的多目标粒子群优化(MOPSO)和图关注网络预测circrna与疾病关联的方法。最初,我们通过利用多个环状rna相似性和疾病表型相似性,并采用包含跳跃和停留策略的随机行走的异构图来获得节点序列。然后使用word2vec对这些序列进行处理,得出节点的邻近向量,从而为环状rna和疾病提供初始嵌入。随后,为了模拟Pareto前解的收敛性和多样性,采用改进的MOPSO算法在参数空间中迭代搜索最优解。经过MOPSO优化后,将参数输入到图关注网络中,进一步细化模型嵌入。因此,MOPSOGAT的性能优于基于深度学习的方法、单纯基于多目标优化的方法和基于机器学习的方法。此外,通过案例研究验证了MOPSOGAT预测的潜在关联,进一步证明了MOPSOGAT在未来生物医学研究中的潜力。
{"title":"MOPSOGAT: Predicting CircRNA-Disease Associations via Improved Multi-objective Particle Swarm Optimization and Graph Attention Network.","authors":"Yuehao Wang, Pengli Lu","doi":"10.1007/s12539-025-00725-3","DOIUrl":"10.1007/s12539-025-00725-3","url":null,"abstract":"<p><p>Recently increasing researches have discovered that circRNAs are remarkably reliable in organisms and play a crucial role as marker in many diseases. Although deep learning techniques has been universally applied to investigate the relationship of circRNA-disease, optimizing many parameters involved in these techniques for best performance has been a challenge. Therefore, we present, for the first time, a multi-objective particle swarm optimization algorithm to optimize the parameters in a graph attention network, ensuring that the model operates at peak efficiency. In addition, it also limits feature learning due to uneven distribution of different node types in heterogeneous graphs based on association relationships. We suggest a unique approach, MOPSOGAT, to overcome the aforementioned problems. MOPSOGAT is a method for predicting circRNA-disease associations utilizing the improved multi-objective particle swarm optimization (MOPSO) and the graph attention network. Initially, we obtain node sequences by utilizing multiple circRNA similarities and disease phenotypic similarities, and employing a heterogeneous graph with random walks incorporating jump and stay strategies. These sequences are then processed using word2vec to derive the neighbor vectors of the nodes, thus providing initial embeddings for circRNAs and diseases. Subsequently, in order to model convergence and diversity of the Pareto front solutions, an improved MOPSO algorithm is used to iteratively search for optimal solutions in the parameter space. After MOPSO optimization, parameters are fed into a graph attention network to further refine the model embedding. As a result, MOPSOGAT performs better than deep learning based methods, solely multi-objective optimization-based methods and machine learning-based ways. Moreover, the potential associations predicted by MOPSOGAT have been validated through case studies, further demonstrating the potential of MOPSOGAT in future biomedical research.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"1038-1055"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144293713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WDGBANDTI: A Deep Graph Convolutional Network-Based Bilinear Attention Network for Drug-Target Interaction Prediction with Domain Adaptation. 基于深度图卷积网络的药物-靶标相互作用预测双线性注意网络。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-01 Epub Date: 2025-05-23 DOI: 10.1007/s12539-025-00714-6
Nianrui Wang, Shumin Zhao, Ziwei Li, Jianqiang Sun, Ming Yi

Backgrounds: During the development of new drugs, it is essential to assess their effectiveness and examine the potential mechanisms behind side effects. This process typically involves combining the analysis of drugs under development with relevant existing drugs to more precisely evaluate the effects of drugs and targets. The use of deep learning methods to analyze this problem is currently a research hotspot, but several limitations remain: (i) how to deepen the analysis from the molecular level to the atomic level and analyze the key substructures that affect interactions on the basis of pharmaceutical mechanisms; (ii) how to integrate biomedical analysis with deep learning methods to make it medically sound and enhance interpretability.

Methods: To address the limitations of existing research, based on Deep Graph Convolutional Network (Deep-GCN) and Bilinear Attention Network (BAN), we have constructed an interpretable deep learning framework, WDGBANDTI, to analyze and predict drug‒target interactions at the substructure level and enhance the prediction capability of the model with respect to unidentified target pairings by adding modules.

Results: For different application scenarios, we validated the model via several commonly used and highly covered datasets. We also selected several state-of-the-art computer methods as comparison objects, and our model demonstrates advantages in accuracy, sensitivity, specificity, and other deep learning features. More importantly, the model can identify the substructures that play a role in drug‒target interactions through BAN, highlighting its excellent interpretability.

Conclusion: In conclusion, we believe that our work will contribute to advancements in drug development and side effect experiments and provide meaningful guidance for drug design.

背景:在新药开发过程中,评估其有效性和检查副作用背后的潜在机制是至关重要的。这一过程通常包括将正在开发的药物的分析与相关的现有药物相结合,以更精确地评估药物和靶点的效果。利用深度学习方法分析这一问题是目前的研究热点,但仍存在一些局限性:(1)如何从分子水平深入分析到原子水平,在药物机制的基础上分析影响相互作用的关键子结构;(ii)如何将生物医学分析与深度学习方法相结合,使其在医学上合理并增强可解释性。方法:针对现有研究的局限性,基于深度图卷积网络(Deep- Graph Convolutional Network, Deep- gcn)和双线性注意网络(Bilinear Attention Network, BAN),构建了一个可解释的深度学习框架WDGBANDTI,从子结构层面分析和预测药物-靶标相互作用,并通过增加模块来增强模型对未知靶标配对的预测能力。结果:对于不同的应用场景,我们通过几个常用的和高度覆盖的数据集验证了模型。我们还选择了几种最先进的计算机方法作为比较对象,我们的模型在准确性、灵敏度、特异性和其他深度学习特征方面表现出优势。更重要的是,该模型可以通过BAN识别在药物-靶标相互作用中起作用的子结构,突出了其出色的可解释性。结论:我们相信我们的工作将有助于药物开发和副作用实验的进步,并为药物设计提供有意义的指导。
{"title":"WDGBANDTI: A Deep Graph Convolutional Network-Based Bilinear Attention Network for Drug-Target Interaction Prediction with Domain Adaptation.","authors":"Nianrui Wang, Shumin Zhao, Ziwei Li, Jianqiang Sun, Ming Yi","doi":"10.1007/s12539-025-00714-6","DOIUrl":"10.1007/s12539-025-00714-6","url":null,"abstract":"<p><strong>Backgrounds: </strong>During the development of new drugs, it is essential to assess their effectiveness and examine the potential mechanisms behind side effects. This process typically involves combining the analysis of drugs under development with relevant existing drugs to more precisely evaluate the effects of drugs and targets. The use of deep learning methods to analyze this problem is currently a research hotspot, but several limitations remain: (i) how to deepen the analysis from the molecular level to the atomic level and analyze the key substructures that affect interactions on the basis of pharmaceutical mechanisms; (ii) how to integrate biomedical analysis with deep learning methods to make it medically sound and enhance interpretability.</p><p><strong>Methods: </strong>To address the limitations of existing research, based on Deep Graph Convolutional Network (Deep-GCN) and Bilinear Attention Network (BAN), we have constructed an interpretable deep learning framework, WDGBANDTI, to analyze and predict drug‒target interactions at the substructure level and enhance the prediction capability of the model with respect to unidentified target pairings by adding modules.</p><p><strong>Results: </strong>For different application scenarios, we validated the model via several commonly used and highly covered datasets. We also selected several state-of-the-art computer methods as comparison objects, and our model demonstrates advantages in accuracy, sensitivity, specificity, and other deep learning features. More importantly, the model can identify the substructures that play a role in drug‒target interactions through BAN, highlighting its excellent interpretability.</p><p><strong>Conclusion: </strong>In conclusion, we believe that our work will contribute to advancements in drug development and side effect experiments and provide meaningful guidance for drug design.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"998-1017"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144132317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CMedRAGBot: A Chinese Medical Chatbot Based on Graph RAG and Large Language Models. CMedRAGBot:基于Graph RAG和大型语言模型的中文医疗聊天机器人。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-01 Epub Date: 2025-06-05 DOI: 10.1007/s12539-025-00715-5
Dongfang Zhang, Haoze Du, Xiaolei Wang, Mingdong Zhu, Xiaoxiao Pang, Dongqing Wei, Xianfang Wang

In the domain of Chinese clinical medical question-answering (QA), traditional Large Language Models (LLMs) encounter challenges such as hallucinations and difficulties in updating knowledge for knowledge-intensive tasks. To address these issues, this research presents a Chinese clinical medical QA model that integrates Retrieval-Augmented Generation (RAG) and a medical knowledge graph, named CMedRAGBot. First, a Chinese medical knowledge graph encompassing multiple entity types-including diseases, medications, and symptoms-is constructed. Based on this knowledge graph, a Named Entity Recognition (NER) model built on a Chinese-RoBERTa and BiGRU architecture is designed, with data augmentation strategies employed to enhance its generalization capability. In addition, prompt engineering techniques are used to implement intent recognition for user queries, mapping them to predefined intent categories. Finally, the aforementioned modules are integrated to form a complete Chinese clinical medical QA system. In the experimental evaluation, CMedRAGBot is deployed on five state-of-the-art LLMs (including ChatGPT-4o, ChatGPT-o1, DeepSeek-R1, Llama-3.3-70B-Instruct, and Gemini 2.0 Flash) and tested using specialized question banks derived from the Chinese Clinical Medical Qualification Examinations and Residency Standardization Training Examinations from 2000 to 2023. The results indicate that the integration of CMedRAGBot significantly improves the test accuracy of all models, with increases of up to approximately 10%. Furthermore, ablation experiments reveal that data augmentation enhances NER model's F1 score from 95.27% to 97.55%, while the inclusion of an intent recognition module markedly improves the model's ability to understand complex queries, thereby further boosting answer accuracy. Source code of the research is available at https://github.com/zhdongfang/CMedRAGBot .

在中国临床医学问答(QA)领域,传统的大语言模型(llm)在知识密集型任务中遇到了诸如幻觉和知识更新困难等挑战。为了解决这些问题,本研究提出了一个集成检索增强生成(RAG)和医学知识图谱的中国临床医学质量保证模型,命名为CMedRAGBot。首先,构建了包含多种实体类型(包括疾病、药物和症状)的中医知识图谱。在此基础上,设计了基于中文roberta和BiGRU架构的命名实体识别(NER)模型,并采用数据增强策略增强其泛化能力。此外,提示工程技术用于实现用户查询的意图识别,将其映射到预定义的意图类别。最后,将上述模块整合起来,形成一个完整的中国临床医学质量保证体系。在实验评估中,CMedRAGBot被部署在5台最先进的llm(包括chatgpt - 40、chatgpt - 01、DeepSeek-R1、Llama-3.3-70B-Instruct和Gemini 2.0 Flash)上,并使用来自2000年至2023年中国临床医学资格考试和住院医师标准化培训考试的专业题库进行测试。结果表明,CMedRAGBot的集成显著提高了所有模型的测试精度,提高幅度约为10%。此外,消融实验表明,数据增强将NER模型的F1分数从95.27%提高到97.55%,而意图识别模块的加入显著提高了模型对复杂查询的理解能力,从而进一步提高了答案的准确性。该研究的源代码可在https://github.com/zhdongfang/CMedRAGBot上获得。
{"title":"CMedRAGBot: A Chinese Medical Chatbot Based on Graph RAG and Large Language Models.","authors":"Dongfang Zhang, Haoze Du, Xiaolei Wang, Mingdong Zhu, Xiaoxiao Pang, Dongqing Wei, Xianfang Wang","doi":"10.1007/s12539-025-00715-5","DOIUrl":"10.1007/s12539-025-00715-5","url":null,"abstract":"<p><p>In the domain of Chinese clinical medical question-answering (QA), traditional Large Language Models (LLMs) encounter challenges such as hallucinations and difficulties in updating knowledge for knowledge-intensive tasks. To address these issues, this research presents a Chinese clinical medical QA model that integrates Retrieval-Augmented Generation (RAG) and a medical knowledge graph, named CMedRAGBot. First, a Chinese medical knowledge graph encompassing multiple entity types-including diseases, medications, and symptoms-is constructed. Based on this knowledge graph, a Named Entity Recognition (NER) model built on a Chinese-RoBERTa and BiGRU architecture is designed, with data augmentation strategies employed to enhance its generalization capability. In addition, prompt engineering techniques are used to implement intent recognition for user queries, mapping them to predefined intent categories. Finally, the aforementioned modules are integrated to form a complete Chinese clinical medical QA system. In the experimental evaluation, CMedRAGBot is deployed on five state-of-the-art LLMs (including ChatGPT-4o, ChatGPT-o1, DeepSeek-R1, Llama-3.3-70B-Instruct, and Gemini 2.0 Flash) and tested using specialized question banks derived from the Chinese Clinical Medical Qualification Examinations and Residency Standardization Training Examinations from 2000 to 2023. The results indicate that the integration of CMedRAGBot significantly improves the test accuracy of all models, with increases of up to approximately 10%. Furthermore, ablation experiments reveal that data augmentation enhances NER model's F1 score from 95.27% to 97.55%, while the inclusion of an intent recognition module markedly improves the model's ability to understand complex queries, thereby further boosting answer accuracy. Source code of the research is available at https://github.com/zhdongfang/CMedRAGBot .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"844-859"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144234006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DSGCNLDA: A Multi-view Learning Model with DualScope Attention for lncRNA-Disease Association Prediction. DSGCNLDA:用于lncrna -疾病关联预测的双视野学习模型。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-27 DOI: 10.1007/s12539-025-00790-8
Dengju Yao, Zhanhe Li, Xiaojuan Zhan, Bo Zhang, Xiangkui Li

Long non-coding RNAs (lncRNAs) play key regulatory roles in biological activities, making it crucial to accurately predict lncRNA-disease relationships for understanding disease pathophysiology and developing effective prevention and treatment strategies. Nevertheless, existing computational prediction methods are often limited by data sparsity and incompleteness, as well as the inadequate representation of node information. To mitigate these issues, a novel prediction model, termed DSGCNLDA, is proposed to enhance predictive performance. The proposed model integrates biological similarity features from multiple perspectives into a comprehensive similarity matrix using multi-view fusion learning. Utilizing this similarity matrix in conjunction with the adjacency matrix, a heterogeneous network of lncRNA-disease associations is constructed. Subsequently, feature extraction is conducted on the randomly masked heterogeneous network. During this process, a graph convolutional network (GCN) is employed as the encoder, and our proposed DualScope attention mechanism is introduced to more effectively capture complex topological relationships between nodes, thereby obtaining a comprehensive representation of the nodes. Finally, association prediction is made via a multi-layer perceptron (MLP). Experimental results on multiple public datasets show that DSGCNLDA performs strongly in lncRNA-disease association prediction. Ablation studies confirm its novelty, while case studies and generalization evaluations demonstrate its effectiveness in biomedical prediction tasks.

长链非编码rna (Long non-coding rna, lncrna)在生物活动中发挥着关键的调控作用,准确预测lncrna与疾病的关系对于认识疾病病理生理、制定有效的防治策略至关重要。然而,现有的计算预测方法往往受到数据稀疏性和不完整性以及节点信息表示不足的限制。为了缓解这些问题,提出了一种新的预测模型,称为DSGCNLDA,以提高预测性能。该模型利用多视角融合学习技术,将多个视角的生物相似性特征整合到一个综合的相似性矩阵中。利用该相似矩阵和邻接矩阵,构建了lncrna -疾病关联的异构网络。随后,对随机屏蔽的异构网络进行特征提取。在此过程中,采用图卷积网络(GCN)作为编码器,并引入我们提出的DualScope注意机制,更有效地捕获节点之间复杂的拓扑关系,从而获得节点的全面表示。最后,通过多层感知器(MLP)进行关联预测。在多个公开数据集上的实验结果表明,DSGCNLDA在lncrna -疾病关联预测中表现优异。消融研究证实了它的新颖性,而案例研究和综合评估则证明了它在生物医学预测任务中的有效性。
{"title":"DSGCNLDA: A Multi-view Learning Model with DualScope Attention for lncRNA-Disease Association Prediction.","authors":"Dengju Yao, Zhanhe Li, Xiaojuan Zhan, Bo Zhang, Xiangkui Li","doi":"10.1007/s12539-025-00790-8","DOIUrl":"https://doi.org/10.1007/s12539-025-00790-8","url":null,"abstract":"<p><p>Long non-coding RNAs (lncRNAs) play key regulatory roles in biological activities, making it crucial to accurately predict lncRNA-disease relationships for understanding disease pathophysiology and developing effective prevention and treatment strategies. Nevertheless, existing computational prediction methods are often limited by data sparsity and incompleteness, as well as the inadequate representation of node information. To mitigate these issues, a novel prediction model, termed DSGCNLDA, is proposed to enhance predictive performance. The proposed model integrates biological similarity features from multiple perspectives into a comprehensive similarity matrix using multi-view fusion learning. Utilizing this similarity matrix in conjunction with the adjacency matrix, a heterogeneous network of lncRNA-disease associations is constructed. Subsequently, feature extraction is conducted on the randomly masked heterogeneous network. During this process, a graph convolutional network (GCN) is employed as the encoder, and our proposed DualScope attention mechanism is introduced to more effectively capture complex topological relationships between nodes, thereby obtaining a comprehensive representation of the nodes. Finally, association prediction is made via a multi-layer perceptron (MLP). Experimental results on multiple public datasets show that DSGCNLDA performs strongly in lncRNA-disease association prediction. Ablation studies confirm its novelty, while case studies and generalization evaluations demonstrate its effectiveness in biomedical prediction tasks.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145633138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SSLCNV: A Semi-supervised Learning Framework for Accurate Copy Number Variation Detection. SSLCNV:一种精确拷贝数变异检测的半监督学习框架。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-27 DOI: 10.1007/s12539-025-00795-3
Ruchao Du, Jinxin Dong, Hua Jiang, Minyong Qi, Yuxi Zhang, Ranran Sun, Mengke Xu

Copy number variation (CNV) is a major type of structural variation (SV) that plays critical roles in genetic diversity and disease. Currently, many CNV detection tools have been developed. Although each tool exhibits different advantages under specific scenarios, they still have disadvantages such as suboptimal sensitivity, imprecise breakpoint resolution, and reduced robustness in complex sequencing environments. Developing more effective CNV detection tools by building upon the strengths of existing tools presents a significant challenge in the field. To fully leverage the detection results of existing tools and improve the accuracy of CNV detection under complex sequencing conditions, a new method called SSLCNV (semi-supervised learning framework for CNV detection) is proposed. It combines consensus-based pseudo-labeling using density-based clustering. SSLCNV generates high-confidence pseudo-labels by intersecting CNV predictions from four representative tools (CNVkit, GROM-RD, Matchclips2, OTSUCNV) and uses these as core seeds for clustering. Additionally, SSLCNV introduces a new constraint z-score into the DBSCAN algorithm to enhance clustering accuracy. By leveraging the improved DBSCAN and incorporating reliable labels, SSLCNV effectively detects CNV from partially labeled and unlabeled data. Comprehensive evaluations on both simulated and real datasets demonstrate that SSLCNV consistently achieves superior F1-scores compared to existing tools across diverse sequencing depths and tumor purities. Importantly, it maintains robust performance under low-coverage conditions, yielding higher recall without a substantial loss in precision. SSLCNV offers a scalable and accurate solution for CNV detection, particularly advantageous in scenarios with complex genomic backgrounds.

拷贝数变异(Copy number variation, CNV)是一种主要的结构变异(structural variation, SV),在遗传多样性和疾病中起着关键作用。目前,已经开发了许多CNV检测工具。尽管每种工具在特定的场景下表现出不同的优势,但它们仍然存在缺点,例如次优灵敏度、不精确的断点分辨率,以及在复杂的测序环境中鲁棒性降低。在现有工具的基础上开发更有效的CNV检测工具是该领域面临的重大挑战。为了充分利用现有工具的检测结果,提高复杂测序条件下CNV检测的准确性,提出了一种新的方法SSLCNV(半监督学习框架for CNV检测)。它结合了基于共识的伪标记和基于密度的聚类。SSLCNV通过交叉来自四个代表性工具(CNVkit, GROM-RD, Matchclips2, OTSUCNV)的CNV预测来生成高置信度的伪标签,并使用这些作为聚类的核心种子。此外,SSLCNV在DBSCAN算法中引入了一个新的约束z-score,以提高聚类精度。通过利用改进的DBSCAN并结合可靠的标签,SSLCNV可以有效地从部分标记和未标记的数据中检测到CNV。对模拟和真实数据集的综合评估表明,与现有工具相比,SSLCNV在不同测序深度和肿瘤纯度方面始终获得更高的f1分数。重要的是,它在低覆盖率条件下保持稳健的性能,在精度上没有实质性损失的情况下产生更高的召回率。SSLCNV为CNV检测提供了一种可扩展且准确的解决方案,在复杂基因组背景的情况下尤其具有优势。
{"title":"SSLCNV: A Semi-supervised Learning Framework for Accurate Copy Number Variation Detection.","authors":"Ruchao Du, Jinxin Dong, Hua Jiang, Minyong Qi, Yuxi Zhang, Ranran Sun, Mengke Xu","doi":"10.1007/s12539-025-00795-3","DOIUrl":"https://doi.org/10.1007/s12539-025-00795-3","url":null,"abstract":"<p><p>Copy number variation (CNV) is a major type of structural variation (SV) that plays critical roles in genetic diversity and disease. Currently, many CNV detection tools have been developed. Although each tool exhibits different advantages under specific scenarios, they still have disadvantages such as suboptimal sensitivity, imprecise breakpoint resolution, and reduced robustness in complex sequencing environments. Developing more effective CNV detection tools by building upon the strengths of existing tools presents a significant challenge in the field. To fully leverage the detection results of existing tools and improve the accuracy of CNV detection under complex sequencing conditions, a new method called SSLCNV (semi-supervised learning framework for CNV detection) is proposed. It combines consensus-based pseudo-labeling using density-based clustering. SSLCNV generates high-confidence pseudo-labels by intersecting CNV predictions from four representative tools (CNVkit, GROM-RD, Matchclips2, OTSUCNV) and uses these as core seeds for clustering. Additionally, SSLCNV introduces a new constraint z-score into the DBSCAN algorithm to enhance clustering accuracy. By leveraging the improved DBSCAN and incorporating reliable labels, SSLCNV effectively detects CNV from partially labeled and unlabeled data. Comprehensive evaluations on both simulated and real datasets demonstrate that SSLCNV consistently achieves superior F1-scores compared to existing tools across diverse sequencing depths and tumor purities. Importantly, it maintains robust performance under low-coverage conditions, yielding higher recall without a substantial loss in precision. SSLCNV offers a scalable and accurate solution for CNV detection, particularly advantageous in scenarios with complex genomic backgrounds.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145633121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Interdisciplinary Sciences: Computational Life Sciences
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1