Pub Date : 2025-12-01Epub Date: 2025-07-30DOI: 10.1007/s12539-025-00710-w
Bo Wang, Junqi Wang, Xiaoxin Du, Jianfei Zhang, Yang He, Fangjian Ma
Emerging research continues to reveal the fundamental contributions of microbial communities to maintaining human physiological balance and advancing drug discovery. However, established wet-lab investigation techniques require significant time and resources. Contemporary research efforts have predominantly concentrated on establishing robust computational architectures to predict microbe-drug associations. Our research establishes a neural network architecture that synthesizes heterogeneous biological relationships with attentional factorization machines (HAFMMDA) to predict undiscovered microbe-drug linkages. The initial step involves assembling a heterogeneous network architecture integrating three key components: microbe similarity networks, drug similarity networks, and established microbe-drug interaction networks. HAFMMDA utilizes HIN2vec to extract feature representations of microbe-drug pairs. Finally, it combines second-order feature interactions and attention mechanism to perform comprehensive prediction. Five-fold cross-validation results confirmed excellent predictive performance with an AUC score of 0.9805, demonstrating statistically significant improvements over five contemporary baseline approaches. These findings corroborate HAFMMDA's effectiveness in uncovering verified drug-microorganism associations while simultaneously predicting innovative therapeutic-microbe relationships.
{"title":"HAFMMDA: HIN2vec-Based Attentional Factorization Machines for Predicting Microbe-Drug Associations.","authors":"Bo Wang, Junqi Wang, Xiaoxin Du, Jianfei Zhang, Yang He, Fangjian Ma","doi":"10.1007/s12539-025-00710-w","DOIUrl":"10.1007/s12539-025-00710-w","url":null,"abstract":"<p><p>Emerging research continues to reveal the fundamental contributions of microbial communities to maintaining human physiological balance and advancing drug discovery. However, established wet-lab investigation techniques require significant time and resources. Contemporary research efforts have predominantly concentrated on establishing robust computational architectures to predict microbe-drug associations. Our research establishes a neural network architecture that synthesizes heterogeneous biological relationships with attentional factorization machines (HAFMMDA) to predict undiscovered microbe-drug linkages. The initial step involves assembling a heterogeneous network architecture integrating three key components: microbe similarity networks, drug similarity networks, and established microbe-drug interaction networks. HAFMMDA utilizes HIN2vec to extract feature representations of microbe-drug pairs. Finally, it combines second-order feature interactions and attention mechanism to perform comprehensive prediction. Five-fold cross-validation results confirmed excellent predictive performance with an AUC score of 0.9805, demonstrating statistically significant improvements over five contemporary baseline approaches. These findings corroborate HAFMMDA's effectiveness in uncovering verified drug-microorganism associations while simultaneously predicting innovative therapeutic-microbe relationships.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"1083-1100"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144753273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-06-27DOI: 10.1007/s12539-025-00711-9
Shengxiang Wang, Xiangzheng Fu, Zhenya Du, Xinxin Liu, Qiaochu Mai, Linlin Zhuo, Boqia Xie, Quan Zou
The emergence of novel viral diseases, with SARS-CoV-2 as a stark example, poses increasing threats to public health, causing significant global morbidity and mortality. Accurate identification and segmentation of viral imaging are crucial for tracking virus progression and mutations, and for devising new treatment strategies. Advanced virus recognition and segmentation models, utilizing high-performance networks like U-Net, have achieved notable success. However, these models struggle with multiple challenges, including limited labeled virus images, significant morphological variability, and indistinct boundaries. Consequently, this study introduces ViruSeg, based on the EVA-02 large language-image pre-trained model and data augmentation techniques, designed to efficiently perform virus segmentation tasks. Initially, the ViruSeg model employs data augmentation techniques like cutout and image fine-tuning to enrich electron microscope virus images, enhancing model generalization and effectively delineating virus boundaries and different forms. Secondly, ViruSeg utilizes the EVA-02 pre-trained model to learn a universal representation of virus images, enhancing adaptability to data scarcity. Finally, virus segmentation is conducted using the Cascade Mask R-CNN (CMR) model. Comprehensive evaluations on benchmark datasets demonstrate the superior performance of ViruSeg compared to advanced virus segmentation methods. We anticipate that the proposed solution will advance virology research and the development of treatments for related diseases. All dataset and code are available through https://github.com/xiachashuanghua/project .
{"title":"ViruSeg: Harnessing the Power of Large Language-Image Model for Enhanced Virus Image Segmentation.","authors":"Shengxiang Wang, Xiangzheng Fu, Zhenya Du, Xinxin Liu, Qiaochu Mai, Linlin Zhuo, Boqia Xie, Quan Zou","doi":"10.1007/s12539-025-00711-9","DOIUrl":"10.1007/s12539-025-00711-9","url":null,"abstract":"<p><p>The emergence of novel viral diseases, with SARS-CoV-2 as a stark example, poses increasing threats to public health, causing significant global morbidity and mortality. Accurate identification and segmentation of viral imaging are crucial for tracking virus progression and mutations, and for devising new treatment strategies. Advanced virus recognition and segmentation models, utilizing high-performance networks like U-Net, have achieved notable success. However, these models struggle with multiple challenges, including limited labeled virus images, significant morphological variability, and indistinct boundaries. Consequently, this study introduces ViruSeg, based on the EVA-02 large language-image pre-trained model and data augmentation techniques, designed to efficiently perform virus segmentation tasks. Initially, the ViruSeg model employs data augmentation techniques like cutout and image fine-tuning to enrich electron microscope virus images, enhancing model generalization and effectively delineating virus boundaries and different forms. Secondly, ViruSeg utilizes the EVA-02 pre-trained model to learn a universal representation of virus images, enhancing adaptability to data scarcity. Finally, virus segmentation is conducted using the Cascade Mask R-CNN (CMR) model. Comprehensive evaluations on benchmark datasets demonstrate the superior performance of ViruSeg compared to advanced virus segmentation methods. We anticipate that the proposed solution will advance virology research and the development of treatments for related diseases. All dataset and code are available through https://github.com/xiachashuanghua/project .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"987-997"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144511839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-04-19DOI: 10.1007/s12539-025-00707-5
Lei Li, Liumin Zhu, Qifu Wang, Zhuoli Dong, Tianli Liao, Peng Li
Multi-modal medical image registration aims to align images from different modalities to establish spatial correspondences. Although deep learning-based methods have shown great potential, the lack of explicit reference relations makes unsupervised multi-modal registration still a challenging task. In this paper, we propose a novel unsupervised dual-stream multi-modal registration framework (DSMR), which combines a dual-stream registration network with a refinement module. Unlike existing methods that treat multi-modal registration as a uni-modal problem using a translation network, DSMR leverages the moving, fixed and translated images to generate two deformation fields. Specifically, we first utilize a translation network to convert a moving image into a translated image similar to a fixed image. Then, we employ the dual-stream registration network to compute two deformation fields respectively: the initial deformation field generated from the fixed image and the moving image, and the translated deformation field generated from the translated image and the fixed image. The translated deformation field acts as a pseudo-ground truth to refine the initial deformation field and mitigate issues such as artificial features introduced by translation. Finally, we use the refinement module to enhance the deformation field by integrating registration errors and contextual information. Extensive experimental results show that our DSMR achieves exceptional performance, demonstrating its strong generalization in learning the spatial relationships between images from unsupervised modalities. The source code of this work is available at https://github.com/raylihaut/DSMR .
{"title":"DSMR: Dual-Stream Networks with Refinement Module for Unsupervised Multi-modal Image Registration.","authors":"Lei Li, Liumin Zhu, Qifu Wang, Zhuoli Dong, Tianli Liao, Peng Li","doi":"10.1007/s12539-025-00707-5","DOIUrl":"10.1007/s12539-025-00707-5","url":null,"abstract":"<p><p> Multi-modal medical image registration aims to align images from different modalities to establish spatial correspondences. Although deep learning-based methods have shown great potential, the lack of explicit reference relations makes unsupervised multi-modal registration still a challenging task. In this paper, we propose a novel unsupervised dual-stream multi-modal registration framework (DSMR), which combines a dual-stream registration network with a refinement module. Unlike existing methods that treat multi-modal registration as a uni-modal problem using a translation network, DSMR leverages the moving, fixed and translated images to generate two deformation fields. Specifically, we first utilize a translation network to convert a moving image into a translated image similar to a fixed image. Then, we employ the dual-stream registration network to compute two deformation fields respectively: the initial deformation field generated from the fixed image and the moving image, and the translated deformation field generated from the translated image and the fixed image. The translated deformation field acts as a pseudo-ground truth to refine the initial deformation field and mitigate issues such as artificial features introduced by translation. Finally, we use the refinement module to enhance the deformation field by integrating registration errors and contextual information. Extensive experimental results show that our DSMR achieves exceptional performance, demonstrating its strong generalization in learning the spatial relationships between images from unsupervised modalities. The source code of this work is available at https://github.com/raylihaut/DSMR .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"804-821"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143985663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-06-02DOI: 10.1007/s12539-025-00713-7
Xin Tang, Xiujuan Lei, Lian Liu
With the advantages of reducing biochemical experiments and enabling the rapid screening of potential druggable compounds, accurate computational methods are essential for predicting Drug-Target affinity (DTA). Current deep learning-based DTA prediction methods predominantly concentrate on single-modal information from drugs or targets. In this article, we propose a new multi-modal DTA prediction method, MGSDTA, to integrate graph features and sequence features of drug molecules and target proteins. We extract features from the drug molecular graphs and target protein graphs, meanwhile, we extract sequence features using continuous embeddings generated by advanced self-supervised pre-trained models, Mol2vec and ProtVec, for drug substructures and target subsequences respectively. Finally, they are integrated with a weighted fusion module for DTA prediction. Experiments on benchmark datasets indicate that the performance of MGSDTA exceeds single-modal methods based solely on sequences or graphs.
{"title":"A Multi-modal Drug Target Affinity Prediction Based on Graph Features and Pre-trained Sequence Embeddings.","authors":"Xin Tang, Xiujuan Lei, Lian Liu","doi":"10.1007/s12539-025-00713-7","DOIUrl":"10.1007/s12539-025-00713-7","url":null,"abstract":"<p><p>With the advantages of reducing biochemical experiments and enabling the rapid screening of potential druggable compounds, accurate computational methods are essential for predicting Drug-Target affinity (DTA). Current deep learning-based DTA prediction methods predominantly concentrate on single-modal information from drugs or targets. In this article, we propose a new multi-modal DTA prediction method, MGSDTA, to integrate graph features and sequence features of drug molecules and target proteins. We extract features from the drug molecular graphs and target protein graphs, meanwhile, we extract sequence features using continuous embeddings generated by advanced self-supervised pre-trained models, Mol2vec and ProtVec, for drug substructures and target subsequences respectively. Finally, they are integrated with a weighted fusion module for DTA prediction. Experiments on benchmark datasets indicate that the performance of MGSDTA exceeds single-modal methods based solely on sequences or graphs.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"822-843"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144198994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-06-10DOI: 10.1007/s12539-025-00704-8
Zhendong Liu, Jun S Liu, Dongqing Wei, Rongjun Man, Jiamin Jiang, Bofeng Zhang, Liping Li, Zhiyong Zhao
Identifying DNA binding sites remains a critical task in bioinformatics, with applications ranging from gene regulation studies to drug design. Although progress has been made in computational techniques, we still face challenges such as data complexity and prediction accuracy. In this paper, we introduce OptimDase, a new algorithm. It integrates feature encoding with optimum decision-making frameworks to improve DNA binding site prediction. OptimDase integrates multi-scale scanning and feature selection strategies, making it highly effective for both classification and regression tasks. Our experiments demonstrate that OptimDase achieves superior performance with an accuracy of 0.8943 in classification tasks and an RMSE of 0.0054 in regression tasks, outperforming existing algorithms in key evaluation metrics. These results highlight OptimDase's portability and robustness, making it an effective solution for identifying DNA binding sites and advancing the applications of drug design.
{"title":"OptimDase: An Algorithm for Predicting DNA Binding Sites with Combined Feature Encoding.","authors":"Zhendong Liu, Jun S Liu, Dongqing Wei, Rongjun Man, Jiamin Jiang, Bofeng Zhang, Liping Li, Zhiyong Zhao","doi":"10.1007/s12539-025-00704-8","DOIUrl":"10.1007/s12539-025-00704-8","url":null,"abstract":"<p><p>Identifying DNA binding sites remains a critical task in bioinformatics, with applications ranging from gene regulation studies to drug design. Although progress has been made in computational techniques, we still face challenges such as data complexity and prediction accuracy. In this paper, we introduce OptimDase, a new algorithm. It integrates feature encoding with optimum decision-making frameworks to improve DNA binding site prediction. OptimDase integrates multi-scale scanning and feature selection strategies, making it highly effective for both classification and regression tasks. Our experiments demonstrate that OptimDase achieves superior performance with an accuracy of 0.8943 in classification tasks and an RMSE of 0.0054 in regression tasks, outperforming existing algorithms in key evaluation metrics. These results highlight OptimDase's portability and robustness, making it an effective solution for identifying DNA binding sites and advancing the applications of drug design.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"791-803"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12672825/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144266102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-06-13DOI: 10.1007/s12539-025-00725-3
Yuehao Wang, Pengli Lu
Recently increasing researches have discovered that circRNAs are remarkably reliable in organisms and play a crucial role as marker in many diseases. Although deep learning techniques has been universally applied to investigate the relationship of circRNA-disease, optimizing many parameters involved in these techniques for best performance has been a challenge. Therefore, we present, for the first time, a multi-objective particle swarm optimization algorithm to optimize the parameters in a graph attention network, ensuring that the model operates at peak efficiency. In addition, it also limits feature learning due to uneven distribution of different node types in heterogeneous graphs based on association relationships. We suggest a unique approach, MOPSOGAT, to overcome the aforementioned problems. MOPSOGAT is a method for predicting circRNA-disease associations utilizing the improved multi-objective particle swarm optimization (MOPSO) and the graph attention network. Initially, we obtain node sequences by utilizing multiple circRNA similarities and disease phenotypic similarities, and employing a heterogeneous graph with random walks incorporating jump and stay strategies. These sequences are then processed using word2vec to derive the neighbor vectors of the nodes, thus providing initial embeddings for circRNAs and diseases. Subsequently, in order to model convergence and diversity of the Pareto front solutions, an improved MOPSO algorithm is used to iteratively search for optimal solutions in the parameter space. After MOPSO optimization, parameters are fed into a graph attention network to further refine the model embedding. As a result, MOPSOGAT performs better than deep learning based methods, solely multi-objective optimization-based methods and machine learning-based ways. Moreover, the potential associations predicted by MOPSOGAT have been validated through case studies, further demonstrating the potential of MOPSOGAT in future biomedical research.
{"title":"MOPSOGAT: Predicting CircRNA-Disease Associations via Improved Multi-objective Particle Swarm Optimization and Graph Attention Network.","authors":"Yuehao Wang, Pengli Lu","doi":"10.1007/s12539-025-00725-3","DOIUrl":"10.1007/s12539-025-00725-3","url":null,"abstract":"<p><p>Recently increasing researches have discovered that circRNAs are remarkably reliable in organisms and play a crucial role as marker in many diseases. Although deep learning techniques has been universally applied to investigate the relationship of circRNA-disease, optimizing many parameters involved in these techniques for best performance has been a challenge. Therefore, we present, for the first time, a multi-objective particle swarm optimization algorithm to optimize the parameters in a graph attention network, ensuring that the model operates at peak efficiency. In addition, it also limits feature learning due to uneven distribution of different node types in heterogeneous graphs based on association relationships. We suggest a unique approach, MOPSOGAT, to overcome the aforementioned problems. MOPSOGAT is a method for predicting circRNA-disease associations utilizing the improved multi-objective particle swarm optimization (MOPSO) and the graph attention network. Initially, we obtain node sequences by utilizing multiple circRNA similarities and disease phenotypic similarities, and employing a heterogeneous graph with random walks incorporating jump and stay strategies. These sequences are then processed using word2vec to derive the neighbor vectors of the nodes, thus providing initial embeddings for circRNAs and diseases. Subsequently, in order to model convergence and diversity of the Pareto front solutions, an improved MOPSO algorithm is used to iteratively search for optimal solutions in the parameter space. After MOPSO optimization, parameters are fed into a graph attention network to further refine the model embedding. As a result, MOPSOGAT performs better than deep learning based methods, solely multi-objective optimization-based methods and machine learning-based ways. Moreover, the potential associations predicted by MOPSOGAT have been validated through case studies, further demonstrating the potential of MOPSOGAT in future biomedical research.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"1038-1055"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144293713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-05-23DOI: 10.1007/s12539-025-00714-6
Nianrui Wang, Shumin Zhao, Ziwei Li, Jianqiang Sun, Ming Yi
Backgrounds: During the development of new drugs, it is essential to assess their effectiveness and examine the potential mechanisms behind side effects. This process typically involves combining the analysis of drugs under development with relevant existing drugs to more precisely evaluate the effects of drugs and targets. The use of deep learning methods to analyze this problem is currently a research hotspot, but several limitations remain: (i) how to deepen the analysis from the molecular level to the atomic level and analyze the key substructures that affect interactions on the basis of pharmaceutical mechanisms; (ii) how to integrate biomedical analysis with deep learning methods to make it medically sound and enhance interpretability.
Methods: To address the limitations of existing research, based on Deep Graph Convolutional Network (Deep-GCN) and Bilinear Attention Network (BAN), we have constructed an interpretable deep learning framework, WDGBANDTI, to analyze and predict drug‒target interactions at the substructure level and enhance the prediction capability of the model with respect to unidentified target pairings by adding modules.
Results: For different application scenarios, we validated the model via several commonly used and highly covered datasets. We also selected several state-of-the-art computer methods as comparison objects, and our model demonstrates advantages in accuracy, sensitivity, specificity, and other deep learning features. More importantly, the model can identify the substructures that play a role in drug‒target interactions through BAN, highlighting its excellent interpretability.
Conclusion: In conclusion, we believe that our work will contribute to advancements in drug development and side effect experiments and provide meaningful guidance for drug design.
{"title":"WDGBANDTI: A Deep Graph Convolutional Network-Based Bilinear Attention Network for Drug-Target Interaction Prediction with Domain Adaptation.","authors":"Nianrui Wang, Shumin Zhao, Ziwei Li, Jianqiang Sun, Ming Yi","doi":"10.1007/s12539-025-00714-6","DOIUrl":"10.1007/s12539-025-00714-6","url":null,"abstract":"<p><strong>Backgrounds: </strong>During the development of new drugs, it is essential to assess their effectiveness and examine the potential mechanisms behind side effects. This process typically involves combining the analysis of drugs under development with relevant existing drugs to more precisely evaluate the effects of drugs and targets. The use of deep learning methods to analyze this problem is currently a research hotspot, but several limitations remain: (i) how to deepen the analysis from the molecular level to the atomic level and analyze the key substructures that affect interactions on the basis of pharmaceutical mechanisms; (ii) how to integrate biomedical analysis with deep learning methods to make it medically sound and enhance interpretability.</p><p><strong>Methods: </strong>To address the limitations of existing research, based on Deep Graph Convolutional Network (Deep-GCN) and Bilinear Attention Network (BAN), we have constructed an interpretable deep learning framework, WDGBANDTI, to analyze and predict drug‒target interactions at the substructure level and enhance the prediction capability of the model with respect to unidentified target pairings by adding modules.</p><p><strong>Results: </strong>For different application scenarios, we validated the model via several commonly used and highly covered datasets. We also selected several state-of-the-art computer methods as comparison objects, and our model demonstrates advantages in accuracy, sensitivity, specificity, and other deep learning features. More importantly, the model can identify the substructures that play a role in drug‒target interactions through BAN, highlighting its excellent interpretability.</p><p><strong>Conclusion: </strong>In conclusion, we believe that our work will contribute to advancements in drug development and side effect experiments and provide meaningful guidance for drug design.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"998-1017"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144132317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-06-05DOI: 10.1007/s12539-025-00715-5
Dongfang Zhang, Haoze Du, Xiaolei Wang, Mingdong Zhu, Xiaoxiao Pang, Dongqing Wei, Xianfang Wang
In the domain of Chinese clinical medical question-answering (QA), traditional Large Language Models (LLMs) encounter challenges such as hallucinations and difficulties in updating knowledge for knowledge-intensive tasks. To address these issues, this research presents a Chinese clinical medical QA model that integrates Retrieval-Augmented Generation (RAG) and a medical knowledge graph, named CMedRAGBot. First, a Chinese medical knowledge graph encompassing multiple entity types-including diseases, medications, and symptoms-is constructed. Based on this knowledge graph, a Named Entity Recognition (NER) model built on a Chinese-RoBERTa and BiGRU architecture is designed, with data augmentation strategies employed to enhance its generalization capability. In addition, prompt engineering techniques are used to implement intent recognition for user queries, mapping them to predefined intent categories. Finally, the aforementioned modules are integrated to form a complete Chinese clinical medical QA system. In the experimental evaluation, CMedRAGBot is deployed on five state-of-the-art LLMs (including ChatGPT-4o, ChatGPT-o1, DeepSeek-R1, Llama-3.3-70B-Instruct, and Gemini 2.0 Flash) and tested using specialized question banks derived from the Chinese Clinical Medical Qualification Examinations and Residency Standardization Training Examinations from 2000 to 2023. The results indicate that the integration of CMedRAGBot significantly improves the test accuracy of all models, with increases of up to approximately 10%. Furthermore, ablation experiments reveal that data augmentation enhances NER model's F1 score from 95.27% to 97.55%, while the inclusion of an intent recognition module markedly improves the model's ability to understand complex queries, thereby further boosting answer accuracy. Source code of the research is available at https://github.com/zhdongfang/CMedRAGBot .
{"title":"CMedRAGBot: A Chinese Medical Chatbot Based on Graph RAG and Large Language Models.","authors":"Dongfang Zhang, Haoze Du, Xiaolei Wang, Mingdong Zhu, Xiaoxiao Pang, Dongqing Wei, Xianfang Wang","doi":"10.1007/s12539-025-00715-5","DOIUrl":"10.1007/s12539-025-00715-5","url":null,"abstract":"<p><p>In the domain of Chinese clinical medical question-answering (QA), traditional Large Language Models (LLMs) encounter challenges such as hallucinations and difficulties in updating knowledge for knowledge-intensive tasks. To address these issues, this research presents a Chinese clinical medical QA model that integrates Retrieval-Augmented Generation (RAG) and a medical knowledge graph, named CMedRAGBot. First, a Chinese medical knowledge graph encompassing multiple entity types-including diseases, medications, and symptoms-is constructed. Based on this knowledge graph, a Named Entity Recognition (NER) model built on a Chinese-RoBERTa and BiGRU architecture is designed, with data augmentation strategies employed to enhance its generalization capability. In addition, prompt engineering techniques are used to implement intent recognition for user queries, mapping them to predefined intent categories. Finally, the aforementioned modules are integrated to form a complete Chinese clinical medical QA system. In the experimental evaluation, CMedRAGBot is deployed on five state-of-the-art LLMs (including ChatGPT-4o, ChatGPT-o1, DeepSeek-R1, Llama-3.3-70B-Instruct, and Gemini 2.0 Flash) and tested using specialized question banks derived from the Chinese Clinical Medical Qualification Examinations and Residency Standardization Training Examinations from 2000 to 2023. The results indicate that the integration of CMedRAGBot significantly improves the test accuracy of all models, with increases of up to approximately 10%. Furthermore, ablation experiments reveal that data augmentation enhances NER model's F1 score from 95.27% to 97.55%, while the inclusion of an intent recognition module markedly improves the model's ability to understand complex queries, thereby further boosting answer accuracy. Source code of the research is available at https://github.com/zhdongfang/CMedRAGBot .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"844-859"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144234006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-27DOI: 10.1007/s12539-025-00790-8
Dengju Yao, Zhanhe Li, Xiaojuan Zhan, Bo Zhang, Xiangkui Li
Long non-coding RNAs (lncRNAs) play key regulatory roles in biological activities, making it crucial to accurately predict lncRNA-disease relationships for understanding disease pathophysiology and developing effective prevention and treatment strategies. Nevertheless, existing computational prediction methods are often limited by data sparsity and incompleteness, as well as the inadequate representation of node information. To mitigate these issues, a novel prediction model, termed DSGCNLDA, is proposed to enhance predictive performance. The proposed model integrates biological similarity features from multiple perspectives into a comprehensive similarity matrix using multi-view fusion learning. Utilizing this similarity matrix in conjunction with the adjacency matrix, a heterogeneous network of lncRNA-disease associations is constructed. Subsequently, feature extraction is conducted on the randomly masked heterogeneous network. During this process, a graph convolutional network (GCN) is employed as the encoder, and our proposed DualScope attention mechanism is introduced to more effectively capture complex topological relationships between nodes, thereby obtaining a comprehensive representation of the nodes. Finally, association prediction is made via a multi-layer perceptron (MLP). Experimental results on multiple public datasets show that DSGCNLDA performs strongly in lncRNA-disease association prediction. Ablation studies confirm its novelty, while case studies and generalization evaluations demonstrate its effectiveness in biomedical prediction tasks.
{"title":"DSGCNLDA: A Multi-view Learning Model with DualScope Attention for lncRNA-Disease Association Prediction.","authors":"Dengju Yao, Zhanhe Li, Xiaojuan Zhan, Bo Zhang, Xiangkui Li","doi":"10.1007/s12539-025-00790-8","DOIUrl":"https://doi.org/10.1007/s12539-025-00790-8","url":null,"abstract":"<p><p>Long non-coding RNAs (lncRNAs) play key regulatory roles in biological activities, making it crucial to accurately predict lncRNA-disease relationships for understanding disease pathophysiology and developing effective prevention and treatment strategies. Nevertheless, existing computational prediction methods are often limited by data sparsity and incompleteness, as well as the inadequate representation of node information. To mitigate these issues, a novel prediction model, termed DSGCNLDA, is proposed to enhance predictive performance. The proposed model integrates biological similarity features from multiple perspectives into a comprehensive similarity matrix using multi-view fusion learning. Utilizing this similarity matrix in conjunction with the adjacency matrix, a heterogeneous network of lncRNA-disease associations is constructed. Subsequently, feature extraction is conducted on the randomly masked heterogeneous network. During this process, a graph convolutional network (GCN) is employed as the encoder, and our proposed DualScope attention mechanism is introduced to more effectively capture complex topological relationships between nodes, thereby obtaining a comprehensive representation of the nodes. Finally, association prediction is made via a multi-layer perceptron (MLP). Experimental results on multiple public datasets show that DSGCNLDA performs strongly in lncRNA-disease association prediction. Ablation studies confirm its novelty, while case studies and generalization evaluations demonstrate its effectiveness in biomedical prediction tasks.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145633138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Copy number variation (CNV) is a major type of structural variation (SV) that plays critical roles in genetic diversity and disease. Currently, many CNV detection tools have been developed. Although each tool exhibits different advantages under specific scenarios, they still have disadvantages such as suboptimal sensitivity, imprecise breakpoint resolution, and reduced robustness in complex sequencing environments. Developing more effective CNV detection tools by building upon the strengths of existing tools presents a significant challenge in the field. To fully leverage the detection results of existing tools and improve the accuracy of CNV detection under complex sequencing conditions, a new method called SSLCNV (semi-supervised learning framework for CNV detection) is proposed. It combines consensus-based pseudo-labeling using density-based clustering. SSLCNV generates high-confidence pseudo-labels by intersecting CNV predictions from four representative tools (CNVkit, GROM-RD, Matchclips2, OTSUCNV) and uses these as core seeds for clustering. Additionally, SSLCNV introduces a new constraint z-score into the DBSCAN algorithm to enhance clustering accuracy. By leveraging the improved DBSCAN and incorporating reliable labels, SSLCNV effectively detects CNV from partially labeled and unlabeled data. Comprehensive evaluations on both simulated and real datasets demonstrate that SSLCNV consistently achieves superior F1-scores compared to existing tools across diverse sequencing depths and tumor purities. Importantly, it maintains robust performance under low-coverage conditions, yielding higher recall without a substantial loss in precision. SSLCNV offers a scalable and accurate solution for CNV detection, particularly advantageous in scenarios with complex genomic backgrounds.
拷贝数变异(Copy number variation, CNV)是一种主要的结构变异(structural variation, SV),在遗传多样性和疾病中起着关键作用。目前,已经开发了许多CNV检测工具。尽管每种工具在特定的场景下表现出不同的优势,但它们仍然存在缺点,例如次优灵敏度、不精确的断点分辨率,以及在复杂的测序环境中鲁棒性降低。在现有工具的基础上开发更有效的CNV检测工具是该领域面临的重大挑战。为了充分利用现有工具的检测结果,提高复杂测序条件下CNV检测的准确性,提出了一种新的方法SSLCNV(半监督学习框架for CNV检测)。它结合了基于共识的伪标记和基于密度的聚类。SSLCNV通过交叉来自四个代表性工具(CNVkit, GROM-RD, Matchclips2, OTSUCNV)的CNV预测来生成高置信度的伪标签,并使用这些作为聚类的核心种子。此外,SSLCNV在DBSCAN算法中引入了一个新的约束z-score,以提高聚类精度。通过利用改进的DBSCAN并结合可靠的标签,SSLCNV可以有效地从部分标记和未标记的数据中检测到CNV。对模拟和真实数据集的综合评估表明,与现有工具相比,SSLCNV在不同测序深度和肿瘤纯度方面始终获得更高的f1分数。重要的是,它在低覆盖率条件下保持稳健的性能,在精度上没有实质性损失的情况下产生更高的召回率。SSLCNV为CNV检测提供了一种可扩展且准确的解决方案,在复杂基因组背景的情况下尤其具有优势。
{"title":"SSLCNV: A Semi-supervised Learning Framework for Accurate Copy Number Variation Detection.","authors":"Ruchao Du, Jinxin Dong, Hua Jiang, Minyong Qi, Yuxi Zhang, Ranran Sun, Mengke Xu","doi":"10.1007/s12539-025-00795-3","DOIUrl":"https://doi.org/10.1007/s12539-025-00795-3","url":null,"abstract":"<p><p>Copy number variation (CNV) is a major type of structural variation (SV) that plays critical roles in genetic diversity and disease. Currently, many CNV detection tools have been developed. Although each tool exhibits different advantages under specific scenarios, they still have disadvantages such as suboptimal sensitivity, imprecise breakpoint resolution, and reduced robustness in complex sequencing environments. Developing more effective CNV detection tools by building upon the strengths of existing tools presents a significant challenge in the field. To fully leverage the detection results of existing tools and improve the accuracy of CNV detection under complex sequencing conditions, a new method called SSLCNV (semi-supervised learning framework for CNV detection) is proposed. It combines consensus-based pseudo-labeling using density-based clustering. SSLCNV generates high-confidence pseudo-labels by intersecting CNV predictions from four representative tools (CNVkit, GROM-RD, Matchclips2, OTSUCNV) and uses these as core seeds for clustering. Additionally, SSLCNV introduces a new constraint z-score into the DBSCAN algorithm to enhance clustering accuracy. By leveraging the improved DBSCAN and incorporating reliable labels, SSLCNV effectively detects CNV from partially labeled and unlabeled data. Comprehensive evaluations on both simulated and real datasets demonstrate that SSLCNV consistently achieves superior F1-scores compared to existing tools across diverse sequencing depths and tumor purities. Importantly, it maintains robust performance under low-coverage conditions, yielding higher recall without a substantial loss in precision. SSLCNV offers a scalable and accurate solution for CNV detection, particularly advantageous in scenarios with complex genomic backgrounds.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145633121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}