The identification of protein homologs in large databases is critical for biological advancements. Traditional methods, such as protein sequence alignment, often miss remote homologs. To address this limitation, we present the Basic Embedding Search Tool (BEST), a fast and sensitive approach that employs protein language models to create sequence embeddings enriched with evolutionary and structural information. Besides, we introduce a segmented distillation pruning technique to accelerate sequence encoding and develop a multi-layer acceleration structure to achieve a 4290.86-fold speedup in swift access and retrieval of dense vectors. Extensive experiments on real datasets demonstrate that BEST increases sensitivity by over 20% compared to prior methods while maintaining precision and recall. It operates 23.41 times faster than traditional tools like PSI-BLAST and 3.92 times faster than Foldseek, while also detecting homologous sequences that conventional methods miss. BEST and its open-access web server ( http://pm2s.cpolar.top/best1/ ) are poised to significantly aid enzyme mining and advance biological research. The code is publicly available at https://github.com/SkyTai-W/ProteinMiningEvaluator .
{"title":"BEST: Basic Embedding Search Tool Enhancing Discovery of Novel Enzyme.","authors":"Yuxuan Wu, Xiao Yi, Yang Tan, Huiqun Yu, Guisheng Fan, Gaowei Zheng","doi":"10.1007/s12539-025-00753-z","DOIUrl":"10.1007/s12539-025-00753-z","url":null,"abstract":"<p><p>The identification of protein homologs in large databases is critical for biological advancements. Traditional methods, such as protein sequence alignment, often miss remote homologs. To address this limitation, we present the Basic Embedding Search Tool (BEST), a fast and sensitive approach that employs protein language models to create sequence embeddings enriched with evolutionary and structural information. Besides, we introduce a segmented distillation pruning technique to accelerate sequence encoding and develop a multi-layer acceleration structure to achieve a 4290.86-fold speedup in swift access and retrieval of dense vectors. Extensive experiments on real datasets demonstrate that BEST increases sensitivity by over 20% compared to prior methods while maintaining precision and recall. It operates 23.41 times faster than traditional tools like PSI-BLAST and 3.92 times faster than Foldseek, while also detecting homologous sequences that conventional methods miss. BEST and its open-access web server ( http://pm2s.cpolar.top/best1/ ) are poised to significantly aid enzyme mining and advance biological research. The code is publicly available at https://github.com/SkyTai-W/ProteinMiningEvaluator .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"101-121"},"PeriodicalIF":3.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144821313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-13DOI: 10.1007/s12539-025-00801-8
Congcong Jiang, Wenlan Chen, Yanyan Tan, Hai Zhong, Cheng Liang
Single-cell RNA sequencing (scRNA-seq) technology has improved cellular heterogeneity resolution but faces challenges like high dimensionality, sparsity, and technical noise in downstream analysis. Existing methods often treat all negative samples equally, ignoring local structures that are essential for capturing meaningful semantic relationships within the data. In this paper, we propose scMSDA, a novel multi-view fusion framework for scRNA-seq data clustering, which leverages semantic consistency and distribution alignment to effectively learn robust representations for downstream tasks. Our model first performs data augmentation on the original data by introducing dropout regularization. Then, we perform global feature aggregation on two latent representations obtained from the encoders with non-shared parameters. To further alleviate the representation conflict problem in traditional contrastive learning, we propose a distance-guided adaptive-negative contrastive learning strategy, which dynamically adjusts the contribution of negative sample pairs through a neighborhood-aware weight matrix. In addition, our method enhances intra-cluster compactness while maximizing inter-cluster separation through an iterative centroid refinement process guided by pseudo-labels. Finally, the optimal transport (OT)-based cross-view alignment explicitly minimizes transport costs between semantically related instances and target clusters, effectively enforcing distribution alignment across views. We evaluate our model on 17 publicly available datasets and the experimental results show our model outperforms 10 baseline methods in terms of various clustering metrics. The source code of scMSDA is freely available at https://github.com/LiangSDNULab/scMSDA.
{"title":"scMSDA: A Novel Multi-View Fusion Framework for Single-Cell RNA-seq Data Clustering with Semantic and Distribution Alignment.","authors":"Congcong Jiang, Wenlan Chen, Yanyan Tan, Hai Zhong, Cheng Liang","doi":"10.1007/s12539-025-00801-8","DOIUrl":"https://doi.org/10.1007/s12539-025-00801-8","url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) technology has improved cellular heterogeneity resolution but faces challenges like high dimensionality, sparsity, and technical noise in downstream analysis. Existing methods often treat all negative samples equally, ignoring local structures that are essential for capturing meaningful semantic relationships within the data. In this paper, we propose scMSDA, a novel multi-view fusion framework for scRNA-seq data clustering, which leverages semantic consistency and distribution alignment to effectively learn robust representations for downstream tasks. Our model first performs data augmentation on the original data by introducing dropout regularization. Then, we perform global feature aggregation on two latent representations obtained from the encoders with non-shared parameters. To further alleviate the representation conflict problem in traditional contrastive learning, we propose a distance-guided adaptive-negative contrastive learning strategy, which dynamically adjusts the contribution of negative sample pairs through a neighborhood-aware weight matrix. In addition, our method enhances intra-cluster compactness while maximizing inter-cluster separation through an iterative centroid refinement process guided by pseudo-labels. Finally, the optimal transport (OT)-based cross-view alignment explicitly minimizes transport costs between semantically related instances and target clusters, effectively enforcing distribution alignment across views. We evaluate our model on 17 publicly available datasets and the experimental results show our model outperforms 10 baseline methods in terms of various clustering metrics. The source code of scMSDA is freely available at https://github.com/LiangSDNULab/scMSDA.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146179352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-06DOI: 10.1007/s12539-025-00806-3
Alicia Gómez-Pascual, Araks Martirosyan, Katja Hebestreit, Andrew Kottick, Michelle Mighdoll, Victor Hanson-Smith, José Luis Mellina-Andreu, Alejandro Cisterna, Matthew G Holt, Grant Belgard, Sebastian Guelfi, Juan A Botía
{"title":"Multi-Network Co-expression Analysis Enhances Biological Insights from Single-Cell Gene Expression.","authors":"Alicia Gómez-Pascual, Araks Martirosyan, Katja Hebestreit, Andrew Kottick, Michelle Mighdoll, Victor Hanson-Smith, José Luis Mellina-Andreu, Alejandro Cisterna, Matthew G Holt, Grant Belgard, Sebastian Guelfi, Juan A Botía","doi":"10.1007/s12539-025-00806-3","DOIUrl":"https://doi.org/10.1007/s12539-025-00806-3","url":null,"abstract":"","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146131556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1007/s12539-025-00799-z
Fei Wang, Dacheng Ruan, Yang Zhang, Yue Chen, Xiujuan Lei, Fang-Xiang Wu, Yansen Su, Chunhou Zheng
Purpose: Predicting drug-target interactions (DTIs) is a practical demand in drug development and drug repositioning. Therefore, developing accurate and efficient DTI prediction methods has significant application value. Current models focus on the features of either drugs or targets independently, and concatenate them together for downstream prediction. They ignore the hidden associations between drugs and targets, which may affect the implementation of DTIs.
Methods: In this work, we design a contrastive learning model to fuse intramolecular and intermolecular features of drugs and targets, named IIC-DTI. The intramolecular features focus on drug chemical structures and target amino acid sequences, which are generated separately. Meanwhile, the intermolecular features are focused on drug-target pairs, extracted by a multi-head cross-attention network. For the two embeddings of either a drug or a target in two views, a contrastive learning module is applied to update the embedding of one view by fusing information from the other view. Those novel embeddings are concatenated and fed into a 3-hidden layer neural network for predicting potential DTIs.
Results: Multiple comparative experiments show that our proposed model has better performance than nine state-of-the-art methods, including two pre-trained large language models, according to several evaluation metrics on four benchmark datasets. In case study, 16 out of 20 drug-target pairs were verified by literature evidence. Moreover, IIC-DTI identified related interactions of a given drug and target successfully. It indicates that IIC-DTI has the potential application to identify DTIs in realistic conditions.
{"title":"IIC-DTI: A Contrastive Learning Enhanced Inter-Intra Molecular Fusing Framework for Drug-Target Interaction Prediction.","authors":"Fei Wang, Dacheng Ruan, Yang Zhang, Yue Chen, Xiujuan Lei, Fang-Xiang Wu, Yansen Su, Chunhou Zheng","doi":"10.1007/s12539-025-00799-z","DOIUrl":"https://doi.org/10.1007/s12539-025-00799-z","url":null,"abstract":"<p><strong>Purpose: </strong>Predicting drug-target interactions (DTIs) is a practical demand in drug development and drug repositioning. Therefore, developing accurate and efficient DTI prediction methods has significant application value. Current models focus on the features of either drugs or targets independently, and concatenate them together for downstream prediction. They ignore the hidden associations between drugs and targets, which may affect the implementation of DTIs.</p><p><strong>Methods: </strong>In this work, we design a contrastive learning model to fuse intramolecular and intermolecular features of drugs and targets, named IIC-DTI. The intramolecular features focus on drug chemical structures and target amino acid sequences, which are generated separately. Meanwhile, the intermolecular features are focused on drug-target pairs, extracted by a multi-head cross-attention network. For the two embeddings of either a drug or a target in two views, a contrastive learning module is applied to update the embedding of one view by fusing information from the other view. Those novel embeddings are concatenated and fed into a 3-hidden layer neural network for predicting potential DTIs.</p><p><strong>Results: </strong>Multiple comparative experiments show that our proposed model has better performance than nine state-of-the-art methods, including two pre-trained large language models, according to several evaluation metrics on four benchmark datasets. In case study, 16 out of 20 drug-target pairs were verified by literature evidence. Moreover, IIC-DTI identified related interactions of a given drug and target successfully. It indicates that IIC-DTI has the potential application to identify DTIs in realistic conditions.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146113081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1007/s12539-025-00805-4
Xiaofeng Xie, Peng Xue, Yihao Guo, Huijuan Chen, Li Fan, Rongnian Tang, Zhenkai Xu, Xuanqi Wang, Tao Liu, Feng Chen
Early and accurate diagnosis of mild cognitive impairment (MCI), a prodromal stage of Alzheimer's disease (AD), is critical for timely intervention and management. Nevertheless, effectively integrating heterogeneous multi-modal data for AD diagnosis remains worthy of further investigation. Therefore, we propose a supervised contrastive learning framework that integrates single nucleotide polymorphisms (SNPs), plasma proteomics, and T1-weighted structural magnetic resonance imaging (sMRI) from a biologically informed perspective, with SNPs influencing protein structure or gene expression levels, ultimately altering brain structure. Through a supervised contrastive learning mechanism, we construct a cross-modal feature space and introduce a similarity-based symmetrical attention mechanism to capture intermodal interactions and mitigate modality heterogeneity. We validate the proposed method on the Alzheimer's Disease Neuroimaging Initiative dataset, and experimental results demonstrate accuracy of 96.1%, 86.2%, and 86.1% for the AD-NC task, MCI-NC task, and AD-MCI task. In addition, the application of explainable methods to our model identified multi-modal biomarkers related to AD diagnosis. The experimental results validate the effectiveness of our model in the diagnosis of AD and MCI.
{"title":"Multi-Modal Fusion with Supervised Contrastive Learning Model for Early Alzheimer's Disease Diagnosis and Multi-Modal Biomarker Identification.","authors":"Xiaofeng Xie, Peng Xue, Yihao Guo, Huijuan Chen, Li Fan, Rongnian Tang, Zhenkai Xu, Xuanqi Wang, Tao Liu, Feng Chen","doi":"10.1007/s12539-025-00805-4","DOIUrl":"https://doi.org/10.1007/s12539-025-00805-4","url":null,"abstract":"<p><p>Early and accurate diagnosis of mild cognitive impairment (MCI), a prodromal stage of Alzheimer's disease (AD), is critical for timely intervention and management. Nevertheless, effectively integrating heterogeneous multi-modal data for AD diagnosis remains worthy of further investigation. Therefore, we propose a supervised contrastive learning framework that integrates single nucleotide polymorphisms (SNPs), plasma proteomics, and T1-weighted structural magnetic resonance imaging (sMRI) from a biologically informed perspective, with SNPs influencing protein structure or gene expression levels, ultimately altering brain structure. Through a supervised contrastive learning mechanism, we construct a cross-modal feature space and introduce a similarity-based symmetrical attention mechanism to capture intermodal interactions and mitigate modality heterogeneity. We validate the proposed method on the Alzheimer's Disease Neuroimaging Initiative dataset, and experimental results demonstrate accuracy of 96.1%, 86.2%, and 86.1% for the AD-NC task, MCI-NC task, and AD-MCI task. In addition, the application of explainable methods to our model identified multi-modal biomarkers related to AD diagnosis. The experimental results validate the effectiveness of our model in the diagnosis of AD and MCI.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146113104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1007/s12539-025-00802-7
Xiaolin Ju, Tao Liu, Bowen Luo, Heling Cao, Zhan Gao, Haiyan Pan
Automated electrocardiogram (ECG) classification plays a critical role in arrhythmia diagnosis. However, current deep learning-based methodologies frequently fail to account for physiological rhythms and clinical diagnostic reasoning, thereby compromising their reliability and interpretability. This study proposes a clinically inspired multi-lead oscillatory Transformer framework, named FHGNet, to enhance the precision and interoperability of classifying ventricular tachycardia (VT) and supraventricular tachycardia (SVT). The proposed architecture integrates R-peak detection for heartbeat segmentation and adaptive-length patch extraction with R-wave positional encoding to enhance temporal awareness. It employs a convolutional neural network (CNN) to capture intra-beat morphological features (QRS morphology), a Transformer with FANLayer to model inter-beat rhythmic patterns, and a graph attention network (GAT) to fuse multi-lead dependencies. Additionally, a two-stage classifier is designed to enhance the detection of rare arrhythmia classes. Experimental evaluations on the MIT-BIH Supraventricular Arrhythmia dataset demonstrate FHGNet achieves a macro F1-score of 91.35% outperforming baselines. Ablation studies reveal that removing GAT reduces F1 by 2.42% in multi-lead scenarios, while the two-stage design improves minority class recall by 5.82%. Attention visualization confirms the model focuses on clinically relevant features, such as ST-T segment energy ratios and inter-lead phase differences, aligning with established diagnostic criteria. Additionally, the interpretability of FHGNet is further enhanced by two aspects: 1) Explicit integration of physiological priors (e.g., RR interval variability, intra-beat positional information) in dynamic feature engineering, which enables the model to align with clinicians' rhythm analysis logic; 2) The two-stage classifier strictly follows the clinical diagnostic workflow (first screening abnormalities, then subclassifying), making the decision-making process traceable. This work provides an interpretable, clinically adaptive framework for high-accuracy ECG classification, potentially reducing reliance on invasive electrophysiological studies.
{"title":"FHGNet: A Feature-Centric Hierarchical Network with Graph Attention Layer for Supraventricular Tachycardia Classification.","authors":"Xiaolin Ju, Tao Liu, Bowen Luo, Heling Cao, Zhan Gao, Haiyan Pan","doi":"10.1007/s12539-025-00802-7","DOIUrl":"https://doi.org/10.1007/s12539-025-00802-7","url":null,"abstract":"<p><p>Automated electrocardiogram (ECG) classification plays a critical role in arrhythmia diagnosis. However, current deep learning-based methodologies frequently fail to account for physiological rhythms and clinical diagnostic reasoning, thereby compromising their reliability and interpretability. This study proposes a clinically inspired multi-lead oscillatory Transformer framework, named FHGNet, to enhance the precision and interoperability of classifying ventricular tachycardia (VT) and supraventricular tachycardia (SVT). The proposed architecture integrates R-peak detection for heartbeat segmentation and adaptive-length patch extraction with R-wave positional encoding to enhance temporal awareness. It employs a convolutional neural network (CNN) to capture intra-beat morphological features (QRS morphology), a Transformer with FANLayer to model inter-beat rhythmic patterns, and a graph attention network (GAT) to fuse multi-lead dependencies. Additionally, a two-stage classifier is designed to enhance the detection of rare arrhythmia classes. Experimental evaluations on the MIT-BIH Supraventricular Arrhythmia dataset demonstrate FHGNet achieves a macro F1-score of 91.35% outperforming baselines. Ablation studies reveal that removing GAT reduces F1 by 2.42% in multi-lead scenarios, while the two-stage design improves minority class recall by 5.82%. Attention visualization confirms the model focuses on clinically relevant features, such as ST-T segment energy ratios and inter-lead phase differences, aligning with established diagnostic criteria. Additionally, the interpretability of FHGNet is further enhanced by two aspects: 1) Explicit integration of physiological priors (e.g., RR interval variability, intra-beat positional information) in dynamic feature engineering, which enables the model to align with clinicians' rhythm analysis logic; 2) The two-stage classifier strictly follows the clinical diagnostic workflow (first screening abnormalities, then subclassifying), making the decision-making process traceable. This work provides an interpretable, clinically adaptive framework for high-accuracy ECG classification, potentially reducing reliance on invasive electrophysiological studies.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146113119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-02DOI: 10.1007/s12539-025-00807-2
Tong Wang, Zhendong Liu
{"title":"A Deep Learning Framework with Multi-perspective Feature Fusion for Transcription Factor Binding Site Prediction.","authors":"Tong Wang, Zhendong Liu","doi":"10.1007/s12539-025-00807-2","DOIUrl":"https://doi.org/10.1007/s12539-025-00807-2","url":null,"abstract":"","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146105426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Psychrophilic proteins, which maintain high activity and stability in low-temperature environments, hold significant potential for industrial and ecological research. However, existing predictive tools predominantly focus on thermophilic proteins, while psychrophilic protein prediction models remain constrained by data scarcity and subtle sequence variations, resulting in suboptimal performance. To overcome these barriers, this study introduces ESM-PsyPred, a computational framework that integrates the evolutionary-scale protein language model ESM-2 with a support vector machine (SVM). By extracting high-dimensional semantic features from protein sequences via ESM-2 and employing an SVM classifier, the model achieves independent test accuracies of 88.9% and 83.9% in binary (psychrophilic vs. mesophilic) and ternary (psychrophilic, mesophilic, thermophilic) classification tasks, respectively, significantly outperforming existing methods. Visualization analyses demonstrate the model's ability to identify critical cold-adaptation signatures. Furthermore, the construction of high-quality datasets, PMTTer and PNPBin, alongside cross-dataset validation, underscores the framework's robust generalization capabilities. The open-source availability of the code (accessible at https://github.com/tust-lamee/ESM-PsyPred ) establishes ESM-PsyPred as an efficient tool for the rational design and industrial development of cold-adapted proteins.
{"title":"ESM-PsyPred: Leveraging Protein Language Models for Accurate Prediction of Psychrophilic Proteins.","authors":"Chong Peng, Yarui Bian, Chengwu Yuan, Yuying Chen, Dingkuo Liu, Fuping Lu, Fufeng Liu, Yihan Liu","doi":"10.1007/s12539-025-00810-7","DOIUrl":"https://doi.org/10.1007/s12539-025-00810-7","url":null,"abstract":"<p><p>Psychrophilic proteins, which maintain high activity and stability in low-temperature environments, hold significant potential for industrial and ecological research. However, existing predictive tools predominantly focus on thermophilic proteins, while psychrophilic protein prediction models remain constrained by data scarcity and subtle sequence variations, resulting in suboptimal performance. To overcome these barriers, this study introduces ESM-PsyPred, a computational framework that integrates the evolutionary-scale protein language model ESM-2 with a support vector machine (SVM). By extracting high-dimensional semantic features from protein sequences via ESM-2 and employing an SVM classifier, the model achieves independent test accuracies of 88.9% and 83.9% in binary (psychrophilic vs. mesophilic) and ternary (psychrophilic, mesophilic, thermophilic) classification tasks, respectively, significantly outperforming existing methods. Visualization analyses demonstrate the model's ability to identify critical cold-adaptation signatures. Furthermore, the construction of high-quality datasets, PMTTer and PNPBin, alongside cross-dataset validation, underscores the framework's robust generalization capabilities. The open-source availability of the code (accessible at https://github.com/tust-lamee/ESM-PsyPred ) establishes ESM-PsyPred as an efficient tool for the rational design and industrial development of cold-adapted proteins.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146085658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1007/s12539-025-00809-0
Zijian Yang, Liujin Zhang, Luyuan Li, Yan Song, Jie Sun, Fanling Meng
Homologous recombination deficiency (HRD) is a critical biomarker in high-grade serous ovarian cancer for the clinical benefit from platinum-based chemotherapy and poly polymerase inhibitors, but molecular testing is costly, time-consuming, and limited by tissue requirements. In this study, we introduce dPathHRD (digital Pathological assessment of Homologous Recombination Deficiency), a deep learning model designed to predict HRD status and platinum chemotherapy response directly from routine hematoxylin and eosin-stained whole-slide images. By integrating a pre-trained transformer-based pathology foundation model with an attention-based multiple-instance learning architecture, dPathHRD successfully predicts HRD status with an area under the curve of 0.920 in the discovery cohort and 0.766 in the validation cohort. The digital scores generated by dPathHRD were significantly correlated with established HRD-related genomic and transcriptomic features. Furthermore, dPathHRD demonstrated the ability to predict therapeutic response to platinum chemotherapy, with the HRD-like group showing higher complete response rates and longer progression-free and recurrence-free survival compared to the homologous recombination proficiency (HRP)-like group across all three cohorts. Interpretation analysis via attention mapping confirmed the model's reliance on biologically relevant histopathological features in tumor and stromal regions. In conclusion, dPathHRD offers a promising, cost-effective alternative to molecular testing, leveraging widely available digital pathology images to inform personalized treatment strategies. Further prospective validation is warranted to confirm its clinical applicability and predictive power.
{"title":"HRD-Informed Digital Histology Model for Predicting Platinum Chemo-Response and Prognosis in High-Grade Serous Ovarian Cancer.","authors":"Zijian Yang, Liujin Zhang, Luyuan Li, Yan Song, Jie Sun, Fanling Meng","doi":"10.1007/s12539-025-00809-0","DOIUrl":"https://doi.org/10.1007/s12539-025-00809-0","url":null,"abstract":"<p><p>Homologous recombination deficiency (HRD) is a critical biomarker in high-grade serous ovarian cancer for the clinical benefit from platinum-based chemotherapy and poly polymerase inhibitors, but molecular testing is costly, time-consuming, and limited by tissue requirements. In this study, we introduce dPathHRD (digital Pathological assessment of Homologous Recombination Deficiency), a deep learning model designed to predict HRD status and platinum chemotherapy response directly from routine hematoxylin and eosin-stained whole-slide images. By integrating a pre-trained transformer-based pathology foundation model with an attention-based multiple-instance learning architecture, dPathHRD successfully predicts HRD status with an area under the curve of 0.920 in the discovery cohort and 0.766 in the validation cohort. The digital scores generated by dPathHRD were significantly correlated with established HRD-related genomic and transcriptomic features. Furthermore, dPathHRD demonstrated the ability to predict therapeutic response to platinum chemotherapy, with the HRD-like group showing higher complete response rates and longer progression-free and recurrence-free survival compared to the homologous recombination proficiency (HRP)-like group across all three cohorts. Interpretation analysis via attention mapping confirmed the model's reliance on biologically relevant histopathological features in tumor and stromal regions. In conclusion, dPathHRD offers a promising, cost-effective alternative to molecular testing, leveraging widely available digital pathology images to inform personalized treatment strategies. Further prospective validation is warranted to confirm its clinical applicability and predictive power.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146085660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}