Interdisciplinary Sciences: Computational Life Sciences最新文献_第7页

hERG-MFFGNN: An Explainable Deep Learning Model for Predicting Cardiotoxicity Using Multi-feature Fusion and Graph Neural Networks. heg - mffgnn：一种可解释的深度学习模型，用于使用多特征融合和图神经网络预测心脏毒性。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-09-22 DOI: 10.1007/s12539-025-00768-6

Bingyu Jin, Jiarun Wang, Xin Yang, Lijie Na, Qi Zhao

Drug-related cardiotoxicity, most notably arrhythmia, represents a major challenge in drug development. Inhibition of hERG potassium channel by certain compounds has the potential to delay cardiac repolarization, manifested as QT interval prolongation, thereby elevating the risk of severe cardiac arrhythmias like torsades de pointes (TdP). Accurate assessment of compounds' impact on hERG channels is crucial. Traditional methods are costly and inefficient for large-scale screening. Therefore, developing efficient and accurate computational methods for hERG inhibition prediction is critical. In this study, we present a deep learning framework, named hERG-MFFGNN, aimed at accurately predicting hERG channel blockers while providing model interpretability. To improve both accuracy and generalizability, we implement a multi-feature fusion strategy that systematically integrates molecular structural information. Initially, multiple molecular fingerprint features and molecular descriptors are fused to construct an initial feature representation. Then, graph neural networks are used to extract molecular topological features. These two sets of features are weighted and fused using an attention mechanism to form the final compound representation, enabling a more comprehensive expression of molecular features. The performance of hERG-MFFGNN is assessed using fivefold cross-validation on the benchmark dataset and external validation datasets. The results demonstrate that hERG-MFFGNN achieves AUROC of 0.909 and ACC of 0.854, highlighting its robust predictive capabilities for hERG activity across diverse datasets. We believe that may function as an effective instrument for the early prediction of hERG channel blockers in the phases of drug discovery and development. The complete source code is publicly accessible at https://github.com/zhaoqi106/hERG-MFFGNN .

药物相关的心脏毒性，尤其是心律失常，是药物开发中的一个主要挑战。某些化合物对hERG钾通道的抑制有可能延迟心脏复极，表现为QT间期延长，从而增加严重心律失常（如点扭转（TdP））的风险。准确评估化合物对hERG通道的影响至关重要。对于大规模筛选，传统方法成本高、效率低。因此，开发高效、准确的hERG抑制预测计算方法至关重要。在本研究中，我们提出了一个名为hERG- mffgnn的深度学习框架，旨在准确预测hERG通道阻断剂，同时提供模型可解释性。为了提高准确性和泛化性，我们实现了一种多特征融合策略，系统地整合了分子结构信息。首先，融合多个分子指纹特征和分子描述符来构建初始特征表示。然后，利用图神经网络提取分子拓扑特征。这两组特征通过注意机制进行加权和融合，形成最终的复合表征，从而能够更全面地表达分子特征。在基准数据集和外部验证数据集上使用五倍交叉验证来评估her - mffgnn的性能。结果表明，hERG- mffgnn的AUROC为0.909，ACC为0.854，显示了其在不同数据集上对hERG活动的强大预测能力。我们相信，这可能是药物发现和开发阶段早期预测hERG通道阻滞剂的有效工具。完整的源代码可在https://github.com/zhaoqi106/hERG-MFFGNN公开访问。

{"title":"hERG-MFFGNN: An Explainable Deep Learning Model for Predicting Cardiotoxicity Using Multi-feature Fusion and Graph Neural Networks.","authors":"Bingyu Jin, Jiarun Wang, Xin Yang, Lijie Na, Qi Zhao","doi":"10.1007/s12539-025-00768-6","DOIUrl":"https://doi.org/10.1007/s12539-025-00768-6","url":null,"abstract":"Drug-related cardiotoxicity, most notably arrhythmia, represents a major challenge in drug development. Inhibition of hERG potassium channel by certain compounds has the potential to delay cardiac repolarization, manifested as QT interval prolongation, thereby elevating the risk of severe cardiac arrhythmias like torsades de pointes (TdP). Accurate assessment of compounds' impact on hERG channels is crucial. Traditional methods are costly and inefficient for large-scale screening. Therefore, developing efficient and accurate computational methods for hERG inhibition prediction is critical. In this study, we present a deep learning framework, named hERG-MFFGNN, aimed at accurately predicting hERG channel blockers while providing model interpretability. To improve both accuracy and generalizability, we implement a multi-feature fusion strategy that systematically integrates molecular structural information. Initially, multiple molecular fingerprint features and molecular descriptors are fused to construct an initial feature representation. Then, graph neural networks are used to extract molecular topological features. These two sets of features are weighted and fused using an attention mechanism to form the final compound representation, enabling a more comprehensive expression of molecular features. The performance of hERG-MFFGNN is assessed using fivefold cross-validation on the benchmark dataset and external validation datasets. The results demonstrate that hERG-MFFGNN achieves AUROC of 0.909 and ACC of 0.854, highlighting its robust predictive capabilities for hERG activity across diverse datasets. We believe that may function as an effective instrument for the early prediction of hERG channel blockers in the phases of drug discovery and development. The complete source code is publicly accessible at https://github.com/zhaoqi106/hERG-MFFGNN .","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145124720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MKLNID: Identifying Melanoma-related Pathogenic Genes Through Multiple Kernel Learning and Network Impulsive Dynamics. MKLNID：通过多核学习和网络脉冲动力学识别黑色素瘤相关致病基因。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-09-22 DOI: 10.1007/s12539-025-00755-x

Linconghua Wang, Ju Xiang, Zihao Guo, Kaixin Zeng, Min Li

Melanoma is a highly malignant skin cancer, and identifying its pathogenic genes is crucial for understanding its pathogenesis and developing treatment strategies. Network-based approaches effectively capture the synergistic interactions among genes and their products within biological systems, yet extracting functional insights from these complex networks remains challenging. Here, we propose a novel approach that combines multiple kernel learning and network impulsive dynamics (MKLNID) to predict melanoma-related pathogenic genes. Specifically, we construct similarity kernels of diseases and genes from the original disease-gene heterogeneous network and melanoma expression profiles. These kernels are integrated via multiple kernel learning to generate enhanced similarity networks for diseases and genes, respectively. Impulsive signals are then applied to specific nodes in the enhanced heterogeneous network, and the resulting dynamical response signatures are used to infer potential pathogenic genes. Comprehensive experiments and case analyses demonstrate the effectiveness of MKLNID in identifying melanoma-related genes. By deeply integrating heterogeneous disease networks with omics data and introducing network dynamics to simulate gene responses, MKLNID offers a new strategy for identifying melanoma-related genes, with potential implications for precision diagnosis and therapy.

黑色素瘤是一种高度恶性的皮肤癌，确定其致病基因对了解其发病机制和制定治疗策略至关重要。基于网络的方法有效地捕获了生物系统中基因及其产物之间的协同相互作用，但从这些复杂的网络中提取功能见解仍然具有挑战性。在这里，我们提出了一种结合多核学习和网络脉冲动力学（MKLNID）的新方法来预测黑色素瘤相关的致病基因。具体来说，我们从原始的疾病-基因异质网络和黑色素瘤表达谱中构建疾病和基因的相似核。这些核通过多核学习进行整合，分别为疾病和基因生成增强的相似性网络。然后将脉冲信号应用于增强的异质网络中的特定节点，并使用产生的动态响应特征来推断潜在的致病基因。综合实验和案例分析证明了MKLNID在识别黑色素瘤相关基因方面的有效性。通过将异质性疾病网络与组学数据深度整合，并引入网络动力学来模拟基因反应，MKLNID为识别黑色素瘤相关基因提供了一种新的策略，对精确诊断和治疗具有潜在的意义。

{"title":"MKLNID: Identifying Melanoma-related Pathogenic Genes Through Multiple Kernel Learning and Network Impulsive Dynamics.","authors":"Linconghua Wang, Ju Xiang, Zihao Guo, Kaixin Zeng, Min Li","doi":"10.1007/s12539-025-00755-x","DOIUrl":"https://doi.org/10.1007/s12539-025-00755-x","url":null,"abstract":"Melanoma is a highly malignant skin cancer, and identifying its pathogenic genes is crucial for understanding its pathogenesis and developing treatment strategies. Network-based approaches effectively capture the synergistic interactions among genes and their products within biological systems, yet extracting functional insights from these complex networks remains challenging. Here, we propose a novel approach that combines multiple kernel learning and network impulsive dynamics (MKLNID) to predict melanoma-related pathogenic genes. Specifically, we construct similarity kernels of diseases and genes from the original disease-gene heterogeneous network and melanoma expression profiles. These kernels are integrated via multiple kernel learning to generate enhanced similarity networks for diseases and genes, respectively. Impulsive signals are then applied to specific nodes in the enhanced heterogeneous network, and the resulting dynamical response signatures are used to infer potential pathogenic genes. Comprehensive experiments and case analyses demonstrate the effectiveness of MKLNID in identifying melanoma-related genes. By deeply integrating heterogeneous disease networks with omics data and introducing network dynamics to simulate gene responses, MKLNID offers a new strategy for identifying melanoma-related genes, with potential implications for precision diagnosis and therapy.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145124747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Causal Transformer for Learning Embeddings from Structured Medical History Records and Multi-Source Data Integration for Complex Disease Risk Prediction. 从结构化病历记录中学习嵌入的因果转换器和用于复杂疾病风险预测的多源数据集成。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-09-17 DOI: 10.1007/s12539-025-00749-9

Zeming Li, Yu Xu, Debajyoti Chowdhury, Hip Fung Yip, Chonghao Wang, Lu Zhang

Traditional disease risk prediction models predominantly rely on statistical algorithms and often focus on genetic factors or a limited set of lifestyle factors to estimate the risk of disease onset. Recently, more comprehensive approaches have emerged that integrate genetic factors with additional lifestyle factors (e.g., alcohol intake) and physical features (e.g., body mass index, age) to increase predictive accuracy. Since the onset of complex diseases is often accompanied by the occurrence of comorbidities, incorporating medical history records is a critical yet underexplored avenue for improving risk prediction. In this study, we propose a novel framework, MIDRP (Multi-source Integration for Disease Risk Prediction), which incorporates genetic variants, lifestyle factors, physical attributes, and medical history records to achieve more robust and accurate predictions. At the heart of our approach lies a causal Transformer architecture, specifically designed to extract and interpret nuanced patterns from medical history records. In the experiments, we compared MIDRP with several baselines, including LDPred2, random forest, multilayer perception, logistic regression, AdaBoost, DiseaseCapsule, EIR, and Med-Bert, on three complex diseases Coronary Artery Disease, Type 2 Diabetes, and Breast Cancer using data from the UK Biobank. Our method achieved state-of-the-art performance, AUROC scores of 0.783, 0.841, and 0.784, respectively, demonstrating its potential in the field of complex disease risk prediction.

传统的疾病风险预测模型主要依赖于统计算法，往往侧重于遗传因素或一组有限的生活方式因素来估计疾病发作的风险。最近，出现了更全面的方法，将遗传因素与其他生活方式因素（如酒精摄入量）和身体特征（如体重指数、年龄）结合起来，以提高预测的准确性。由于复杂疾病的发病往往伴随着合并症的发生，因此纳入病史记录是改善风险预测的一个关键但尚未得到充分探索的途径。在这项研究中，我们提出了一个新的框架，MIDRP（疾病风险预测的多源集成），它结合了遗传变异、生活方式因素、身体属性和病史记录，以实现更稳健和准确的预测。我们的方法的核心是因果转换器架构，专门用于从病史记录中提取和解释细微的模式。在实验中，我们将MIDRP与LDPred2、随机森林、多层感知、逻辑回归、AdaBoost、疾病胶囊、EIR和Med-Bert等几种基线进行了比较，使用来自英国生物银行的数据，对冠状动脉疾病、2型糖尿病和乳腺癌三种复杂疾病进行了研究。我们的方法达到了最先进的性能，AUROC得分分别为0.783、0.841和0.784，显示了其在复杂疾病风险预测领域的潜力。

{"title":"Causal Transformer for Learning Embeddings from Structured Medical History Records and Multi-Source Data Integration for Complex Disease Risk Prediction.","authors":"Zeming Li, Yu Xu, Debajyoti Chowdhury, Hip Fung Yip, Chonghao Wang, Lu Zhang","doi":"10.1007/s12539-025-00749-9","DOIUrl":"https://doi.org/10.1007/s12539-025-00749-9","url":null,"abstract":"Traditional disease risk prediction models predominantly rely on statistical algorithms and often focus on genetic factors or a limited set of lifestyle factors to estimate the risk of disease onset. Recently, more comprehensive approaches have emerged that integrate genetic factors with additional lifestyle factors (e.g., alcohol intake) and physical features (e.g., body mass index, age) to increase predictive accuracy. Since the onset of complex diseases is often accompanied by the occurrence of comorbidities, incorporating medical history records is a critical yet underexplored avenue for improving risk prediction. In this study, we propose a novel framework, MIDRP (Multi-source Integration for Disease Risk Prediction), which incorporates genetic variants, lifestyle factors, physical attributes, and medical history records to achieve more robust and accurate predictions. At the heart of our approach lies a causal Transformer architecture, specifically designed to extract and interpret nuanced patterns from medical history records. In the experiments, we compared MIDRP with several baselines, including LDPred2, random forest, multilayer perception, logistic regression, AdaBoost, DiseaseCapsule, EIR, and Med-Bert, on three complex diseases Coronary Artery Disease, Type 2 Diabetes, and Breast Cancer using data from the UK Biobank. Our method achieved state-of-the-art performance, AUROC scores of 0.783, 0.841, and 0.784, respectively, demonstrating its potential in the field of complex disease risk prediction.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145080571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ORMCKB: A Knowledge Database for Personalized Medicine in Deciphering the Oral Microbiome-Disease Axis. ORMCKB：一个个性化医疗知识库，用于破译口腔微生物群-疾病轴。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-09-17 DOI: 10.1007/s12539-025-00769-5

Yutao Wu, Yi Zhou, Wenjing Shi, Siyu Zhou, Min Jiang, Ke Shen, Xingyun Liu, Xiaoyu Li, Jiao Wang, Chi Zhang, Bairong Shen, Weidong Tian

The oral microbiome plays a crucial role in the development and progression of diseases. The complex interactions between the oral microbiome and diseases are challenging for clinicians in clinical decision-making and scientific research. To address this gap, we developed an oral microbiome knowledge database (ORMCKB), to provide evidence for personalized medicine and scientific research in the oral microbiome-disease axis. The current version of ORMCKB contains 11,554 data entries, encompassing 6941 oral microbe taxonomies, 234 diseases, 220 interventions, and 175 bacteriostats extracted from 818 publications. Compared to ChatGPT-4o, ORMCKB demonstrates superior performance in matching questions with responses (10 vs. 9.6), presenting research article details (10 vs. 5.80), and recommended scientific article authenticity ratio (100% vs. 33.63%). The system usability scale (SUS) and the net promoter score (NPS) were 86.07 and 85.71, respectively. As the first knowledge database focused on the oral microbiome-disease axis, ORMCKB provides a comprehensive, accurate, and user-friendly online resource for identifying key microbial players and their associations with oral diseases in personalized medicine. ORMCKB is set to sustain its prominence in cutting-edge research on the oral microbiome-disease axis, paving the way for future artificial intelligence applications in both scientific research and clinical practice. ORMCKB is publicly available at: http://sysbio.org.cn/ormckb.

口腔微生物组在疾病的发生和发展中起着至关重要的作用。口腔微生物群与疾病之间复杂的相互作用给临床医生的临床决策和科学研究带来了挑战。为了解决这一差距，我们开发了口腔微生物组知识数据库（ORMCKB），为口腔微生物组-疾病轴的个性化医疗和科学研究提供证据。ORMCKB的当前版本包含11,554个数据条目，包括6941种口腔微生物分类，234种疾病，220种干预措施和从818种出版物中提取的175种抑菌剂。与chatgpt - 40相比，ORMCKB在问题与答案的匹配（10比9.6）、研究文章细节的呈现（10比5.80）和推荐科学文章的真实性（100%比33.63%）方面表现出更优越的性能。系统可用性量表（SUS）和净推荐值（NPS）分别为86.07和85.71。ORMCKB是第一个专注于口腔微生物组-疾病轴的知识数据库，为个性化医疗中识别关键微生物及其与口腔疾病的关联提供了全面、准确和用户友好的在线资源。ORMCKB将在口腔微生物群-疾病轴的前沿研究中保持其突出地位，为未来人工智能在科学研究和临床实践中的应用铺平道路。ORMCKB可在http://sysbio.org.cn/ormckb公开获取。

{"title":"ORMCKB: A Knowledge Database for Personalized Medicine in Deciphering the Oral Microbiome-Disease Axis.","authors":"Yutao Wu, Yi Zhou, Wenjing Shi, Siyu Zhou, Min Jiang, Ke Shen, Xingyun Liu, Xiaoyu Li, Jiao Wang, Chi Zhang, Bairong Shen, Weidong Tian","doi":"10.1007/s12539-025-00769-5","DOIUrl":"https://doi.org/10.1007/s12539-025-00769-5","url":null,"abstract":"The oral microbiome plays a crucial role in the development and progression of diseases. The complex interactions between the oral microbiome and diseases are challenging for clinicians in clinical decision-making and scientific research. To address this gap, we developed an oral microbiome knowledge database (ORMCKB), to provide evidence for personalized medicine and scientific research in the oral microbiome-disease axis. The current version of ORMCKB contains 11,554 data entries, encompassing 6941 oral microbe taxonomies, 234 diseases, 220 interventions, and 175 bacteriostats extracted from 818 publications. Compared to ChatGPT-4o, ORMCKB demonstrates superior performance in matching questions with responses (10 vs. 9.6), presenting research article details (10 vs. 5.80), and recommended scientific article authenticity ratio (100% vs. 33.63%). The system usability scale (SUS) and the net promoter score (NPS) were 86.07 and 85.71, respectively. As the first knowledge database focused on the oral microbiome-disease axis, ORMCKB provides a comprehensive, accurate, and user-friendly online resource for identifying key microbial players and their associations with oral diseases in personalized medicine. ORMCKB is set to sustain its prominence in cutting-edge research on the oral microbiome-disease axis, paving the way for future artificial intelligence applications in both scientific research and clinical practice. ORMCKB is publicly available at: http://sysbio.org.cn/ormckb.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145080607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Accurately Predicting Cell Type Abundance from Spatial Histology Image Through HPCell. 通过HPCell从空间组织学图像准确预测细胞类型丰度。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-09-03 DOI: 10.1007/s12539-025-00757-9

Yongkang Zhao, Youyang Li, Weijiang Yu, Hongyu Zhang, Zheng Wang, Yuedong Yang, Yuansong Zeng

Recent advancements in spatial transcriptomics (ST) have revolutionized our ability to simultaneously profile gene expression, spatial location, and tissue morphology, enabling the precise mapping of cell types and signaling pathways within their native tissue context. However, the high cost of sequencing remains a significant barrier to its widespread adoption. Although existing methods often leverage histopathological images to predict transcriptomic profiles and identify cellular heterogeneity, few approaches directly estimate cell-type abundance from these images. To address this gap, we propose HPCell, a deep learning framework for inferring cell-type abundance directly from H&E-stained histology images. HPCell comprises three key modules: a pathology foundation module, a hypergraph module, and a Transformer module. It begins by dividing whole-slide images (WSIs) into patches, which are processed by the pathology foundation module using a teacher-student framework to extract robust morphological features. These features are used to construct a hypergraph, where each patch (node) connects to its spatial neighbors to model complex many-to-many relationships. The Transformer module applies attention to the hypergraph features to capture long-range dependencies. Finally, features from all modules are integrated to estimate cell-type abundance. Extensive experiments show that HPCell consistently outperforms state-of-the-art methods across multiple spatial transcriptomics datasets, offering a scalable and cost-effective approach for investigating tissue structure and cellular interactions.

空间转录组学（ST）的最新进展彻底改变了我们同时分析基因表达、空间位置和组织形态的能力，使细胞类型和信号通路在其原生组织环境中的精确定位成为可能。然而，测序的高成本仍然是其广泛采用的一个重大障碍。虽然现有的方法经常利用组织病理学图像来预测转录组谱和识别细胞异质性，但很少有方法直接从这些图像中估计细胞类型丰度。为了解决这一差距，我们提出了HPCell，这是一个深度学习框架，用于直接从h&e染色的组织学图像推断细胞类型丰度。HPCell包括三个关键模块：病理学基础模块、超图模块和Transformer模块。它首先将整张幻灯片图像（wsi）分割成小块，由病理学基础模块使用师生框架进行处理，以提取稳健的形态特征。这些特征被用来构建一个超图，其中每个补丁（节点）连接到它的空间邻居来建模复杂的多对多关系。Transformer模块关注超图特性，以捕获远程依赖关系。最后，综合所有模块的特征来估计细胞类型丰度。大量实验表明，HPCell在多个空间转录组学数据集上始终优于最先进的方法，为研究组织结构和细胞相互作用提供了一种可扩展且具有成本效益的方法。

{"title":"Accurately Predicting Cell Type Abundance from Spatial Histology Image Through HPCell.","authors":"Yongkang Zhao, Youyang Li, Weijiang Yu, Hongyu Zhang, Zheng Wang, Yuedong Yang, Yuansong Zeng","doi":"10.1007/s12539-025-00757-9","DOIUrl":"https://doi.org/10.1007/s12539-025-00757-9","url":null,"abstract":"Recent advancements in spatial transcriptomics (ST) have revolutionized our ability to simultaneously profile gene expression, spatial location, and tissue morphology, enabling the precise mapping of cell types and signaling pathways within their native tissue context. However, the high cost of sequencing remains a significant barrier to its widespread adoption. Although existing methods often leverage histopathological images to predict transcriptomic profiles and identify cellular heterogeneity, few approaches directly estimate cell-type abundance from these images. To address this gap, we propose HPCell, a deep learning framework for inferring cell-type abundance directly from H&E-stained histology images. HPCell comprises three key modules: a pathology foundation module, a hypergraph module, and a Transformer module. It begins by dividing whole-slide images (WSIs) into patches, which are processed by the pathology foundation module using a teacher-student framework to extract robust morphological features. These features are used to construct a hypergraph, where each patch (node) connects to its spatial neighbors to model complex many-to-many relationships. The Transformer module applies attention to the hypergraph features to capture long-range dependencies. Finally, features from all modules are integrated to estimate cell-type abundance. Extensive experiments show that HPCell consistently outperforms state-of-the-art methods across multiple spatial transcriptomics datasets, offering a scalable and cost-effective approach for investigating tissue structure and cellular interactions.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144992470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hot-Spot-Guided Generative Deep Learning for Drug-Like PPI Inhibitor Design. 热点引导生成深度学习用于类药物PPI抑制剂设计。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-09-02 DOI: 10.1007/s12539-025-00756-w

Heqi Sun, Jiayi Li, Yufang Zhang, Shenggeng Lin, Junwei Chen, Hong Tan, Ruixuan Wang, Xueying Mao, Jianwei Zhao, Rongpei Li, Dong-Qing Wei

Protein-protein interactions (PPIs) are essential therapeutic targets, yet their large and relatively flat interfaces hinder the development of small-molecule inhibitors. Traditional computational approaches rely heavily on existing chemical libraries or expert heuristics, restricting exploration of novel chemical space. To address these challenges, we present Hot2Mol, a generative deep learning framework for the de novo design of target-specific and drug-like PPI inhibitors. Hot2Mol captures crucial pharmacophoric features from hot-spot residues, allowing precise targeting of PPI interfaces while eliminating the need for known bioactive ligands. The framework integrates three main components: a conditional transformer for pharmacophore-guided, property-constrained molecular generation; an E(n)-equivariant graph neural network to ensure accurate spatial alignment with PPI hot-spot pharmacophores; a variational autoencoder to sample novel and diverse molecular structures. Comprehensive assessments demonstrate that Hot2Mol outperforms state-of-the-art models in binding affinity, drug-likeness, synthetic accessibility, novelty, and uniqueness. Molecular dynamics simulations further confirm the strong binding stability of generated compounds. Case studies underscore Hot2Mol's ability to design high-affinity and selective PPI inhibitors, highlighting its potential to accelerate rational PPI-targeted drug discovery.

蛋白质-蛋白质相互作用（PPIs）是必不可少的治疗靶点，但它们的大而相对平坦的界面阻碍了小分子抑制剂的发展。传统的计算方法严重依赖于现有的化学库或专家启发式，限制了对新化学空间的探索。为了解决这些挑战，我们提出了Hot2Mol，这是一个生成式深度学习框架，用于重新设计靶向特异性和药物样PPI抑制剂。Hot2Mol捕获热点残基的关键药效特征，允许精确靶向PPI界面，同时消除对已知生物活性配体的需求。该框架集成了三个主要组成部分：一个条件转换器，用于药物团引导、属性约束的分子生成；E(n)-等变图神经网络确保与PPI热点药效团的精确空间对齐；一个变分自编码器采样新的和不同的分子结构。综合评估表明，Hot2Mol在结合亲和力、药物相似性、合成可及性、新颖性和独特性方面优于最先进的模型。分子动力学模拟进一步证实了所生成化合物的强结合稳定性。案例研究强调了Hot2Mol设计高亲和力和选择性PPI抑制剂的能力，突出了其加速合理的PPI靶向药物发现的潜力。

{"title":"Hot-Spot-Guided Generative Deep Learning for Drug-Like PPI Inhibitor Design.","authors":"Heqi Sun, Jiayi Li, Yufang Zhang, Shenggeng Lin, Junwei Chen, Hong Tan, Ruixuan Wang, Xueying Mao, Jianwei Zhao, Rongpei Li, Dong-Qing Wei","doi":"10.1007/s12539-025-00756-w","DOIUrl":"https://doi.org/10.1007/s12539-025-00756-w","url":null,"abstract":"Protein-protein interactions (PPIs) are essential therapeutic targets, yet their large and relatively flat interfaces hinder the development of small-molecule inhibitors. Traditional computational approaches rely heavily on existing chemical libraries or expert heuristics, restricting exploration of novel chemical space. To address these challenges, we present Hot2Mol, a generative deep learning framework for the de novo design of target-specific and drug-like PPI inhibitors. Hot2Mol captures crucial pharmacophoric features from hot-spot residues, allowing precise targeting of PPI interfaces while eliminating the need for known bioactive ligands. The framework integrates three main components: a conditional transformer for pharmacophore-guided, property-constrained molecular generation; an E(n)-equivariant graph neural network to ensure accurate spatial alignment with PPI hot-spot pharmacophores; a variational autoencoder to sample novel and diverse molecular structures. Comprehensive assessments demonstrate that Hot2Mol outperforms state-of-the-art models in binding affinity, drug-likeness, synthetic accessibility, novelty, and uniqueness. Molecular dynamics simulations further confirm the strong binding stability of generated compounds. Case studies underscore Hot2Mol's ability to design high-affinity and selective PPI inhibitors, highlighting its potential to accelerate rational PPI-targeted drug discovery.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144953019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Clustering Single-Cell RNA-Seq Data with Low-Rank Matrix Factorization and Local Graph Regularization. 基于低秩矩阵分解和局部图正则化的单细胞RNA-Seq数据聚类。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-09-02 DOI: 10.1007/s12539-025-00762-y

Yue Yu, Wei Zhang, Xiaoying Zheng, Juan Shen, Yuanyuan Li

Single-cell RNA sequencing (scRNA-seq) offers significant opportunities to reveal cellular heterogeneity and diversity. Accurate cell type identification is critical for downstream analyses and understanding the mechanisms of heterogeneity. However, challenges arise from the high dimensionality, sparsity, and noise of scRNA-seq data. While various low-rank representation (LRR)-based clustering methods have been developed, many existing approaches may inaccurately capture relationships or conflate true patterns with noise. To address these limitations, we introduce a novel clustering algorithm that integrates low-rank matrix decomposition with local graph regularization (LRMGC). This approach applies a tri-decomposition strategy to the representation matrix to derive an aligned core matrix, and characterizes the "distance" between cells in a lower-dimensional space through a local manifold regularization term. Rather than relying on the kernel norm of the representation matrix, the Schatten p-norm is applied to the core matrix to robustly learn the similarity matrix against noise and outliers, while maintaining the high-dimensional noisy data's underlying subspace structure for accurate and robust clustering. Additionally, the final similarity matrix is obtained by applying the angular alignment strategy on the similarity matrix. Comprehensive experiments and comparisons with advanced methods on scRNA-seq datasets demonstrate LRMGC's superior performance and reliability in uncovering cell type composition. Furthermore, a variety of downstream analyses, such as marker gene identification, functional enrichment analysis, rare cell recognition, and cell-cell communication, also demonstrate the effectiveness of LRMGC.

单细胞RNA测序（scRNA-seq）为揭示细胞异质性和多样性提供了重要的机会。准确的细胞类型鉴定对于下游分析和理解异质性机制至关重要。然而，scRNA-seq数据的高维性、稀疏性和噪声带来了挑战。虽然已经开发了各种基于低秩表示（LRR）的聚类方法，但许多现有方法可能无法准确捕获关系或将真实模式与噪声混淆。为了解决这些限制，我们引入了一种新的聚类算法，该算法将低秩矩阵分解与局部图正则化（LRMGC）相结合。该方法对表示矩阵采用三分解策略，导出对齐的核心矩阵，并通过局部流形正则化项表征低维空间中单元之间的“距离”。该方法不依赖于表示矩阵的核范数，而是将Schatten p-范数应用于核心矩阵，对噪声和离群值进行鲁棒学习相似矩阵，同时保持高维噪声数据的底层子空间结构，从而实现准确和鲁棒的聚类。此外，通过对相似矩阵应用角对齐策略，得到最终的相似矩阵。在scRNA-seq数据集上进行的综合实验和与先进方法的比较表明，LRMGC在揭示细胞类型组成方面具有优越的性能和可靠性。此外，各种下游分析，如标记基因鉴定、功能富集分析、稀有细胞识别和细胞间通讯，也证明了LRMGC的有效性。

{"title":"Clustering Single-Cell RNA-Seq Data with Low-Rank Matrix Factorization and Local Graph Regularization.","authors":"Yue Yu, Wei Zhang, Xiaoying Zheng, Juan Shen, Yuanyuan Li","doi":"10.1007/s12539-025-00762-y","DOIUrl":"10.1007/s12539-025-00762-y","url":null,"abstract":"Single-cell RNA sequencing (scRNA-seq) offers significant opportunities to reveal cellular heterogeneity and diversity. Accurate cell type identification is critical for downstream analyses and understanding the mechanisms of heterogeneity. However, challenges arise from the high dimensionality, sparsity, and noise of scRNA-seq data. While various low-rank representation (LRR)-based clustering methods have been developed, many existing approaches may inaccurately capture relationships or conflate true patterns with noise. To address these limitations, we introduce a novel clustering algorithm that integrates low-rank matrix decomposition with local graph regularization (LRMGC). This approach applies a tri-decomposition strategy to the representation matrix to derive an aligned core matrix, and characterizes the \"distance\" between cells in a lower-dimensional space through a local manifold regularization term. Rather than relying on the kernel norm of the representation matrix, the Schatten p-norm is applied to the core matrix to robustly learn the similarity matrix against noise and outliers, while maintaining the high-dimensional noisy data's underlying subspace structure for accurate and robust clustering. Additionally, the final similarity matrix is obtained by applying the angular alignment strategy on the similarity matrix. Comprehensive experiments and comparisons with advanced methods on scRNA-seq datasets demonstrate LRMGC's superior performance and reliability in uncovering cell type composition. Furthermore, a variety of downstream analyses, such as marker gene identification, functional enrichment analysis, rare cell recognition, and cell-cell communication, also demonstrate the effectiveness of LRMGC.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144952966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EnDM-CPP: A Multi-view Explainable Framework Based on Deep Learning and Machine Learning for Identifying Cell-Penetrating Peptides with Transformers and Analyzing Sequence Information. EnDM-CPP：基于深度学习和机器学习的细胞穿透肽识别和序列信息分析的多视图可解释框架。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-09-01 Epub Date: 2024-12-23 DOI: 10.1007/s12539-024-00673-4

Lun Zhu, Zehua Chen, Sen Yang

Cell-Penetrating Peptides (CPPs) are a crucial carrier for drug delivery. Since the process of synthesizing new CPPs in the laboratory is both time- and resource-consuming, computational methods to predict potential CPPs can be used to find CPPs to enhance the development of CPPs in therapy. In this study, EnDM-CPP is proposed, which combines machine learning algorithms (SVM and CatBoost) with convolutional neural networks (CNN and TextCNN). For dataset construction, three previous CPP benchmark datasets, including CPPsite 2.0, MLCPP 2.0, and CPP924, are merged to improve the diversity and reduce homology. For feature generation, two language model-based features obtained from the Transformer architecture, including ProtT5 and ESM-2, are employed in CNN and TextCNN. Additionally, sequence features, such as CPRS, Hybrid PseAAC, KSC, etc., are input to SVM and CatBoost. Based on the result of each predictor, Logistic Regression (LR) is built to predict the final decision. The experiment results indicate that ProtT5 and ESM-2 fusion features significantly contribute to predicting CPP and that combining employed features and models demonstrates better association. On an independent test dataset comparison, EnDM-CPP achieved an accuracy of 0.9495 and a Matthews correlation coefficient of 0.9008 with an improvement of 2.23%-9.48% and 4.32%-19.02%, respectively, compared with other state-of-the-art methods. Code and data are available at https://github.com/tudou1231/EnDM-CPP.git .

细胞穿透肽（CPPs）是药物传递的重要载体。由于在实验室中合成新的CPPs的过程既费时又耗费资源，因此可以使用预测潜在CPPs的计算方法来发现CPPs，以促进CPPs在治疗中的发展。本研究提出了EnDM-CPP，将机器学习算法（SVM和CatBoost）与卷积神经网络（CNN和TextCNN）相结合。在数据集构建方面，将CPPsite 2.0、MLCPP 2.0和CPP924三个CPP基准数据集合并，提高了多样性，降低了同源性。对于特征生成，CNN和TextCNN采用了从Transformer体系结构中获得的两个基于语言模型的特征，包括ProtT5和ESM-2。此外，将CPRS、Hybrid PseAAC、KSC等序列特征输入到SVM和CatBoost中。根据每个预测器的结果，建立逻辑回归（LR）来预测最终的决策。实验结果表明，ProtT5和ESM-2融合特征对CPP的预测有显著的贡献，并且将所采用的特征与模型相结合具有更好的关联性。在独立测试数据集对比中，EnDM-CPP的准确率为0.9495，马修斯相关系数为0.9008，与其他先进方法相比，分别提高了2.23% ~ 9.48%和4.32% ~ 19.02%。代码和数据可在https://github.com/tudou1231/EnDM-CPP.git上获得。

{"title":"EnDM-CPP: A Multi-view Explainable Framework Based on Deep Learning and Machine Learning for Identifying Cell-Penetrating Peptides with Transformers and Analyzing Sequence Information.","authors":"Lun Zhu, Zehua Chen, Sen Yang","doi":"10.1007/s12539-024-00673-4","DOIUrl":"10.1007/s12539-024-00673-4","url":null,"abstract":"Cell-Penetrating Peptides (CPPs) are a crucial carrier for drug delivery. Since the process of synthesizing new CPPs in the laboratory is both time- and resource-consuming, computational methods to predict potential CPPs can be used to find CPPs to enhance the development of CPPs in therapy. In this study, EnDM-CPP is proposed, which combines machine learning algorithms (SVM and CatBoost) with convolutional neural networks (CNN and TextCNN). For dataset construction, three previous CPP benchmark datasets, including CPPsite 2.0, MLCPP 2.0, and CPP924, are merged to improve the diversity and reduce homology. For feature generation, two language model-based features obtained from the Transformer architecture, including ProtT5 and ESM-2, are employed in CNN and TextCNN. Additionally, sequence features, such as CPRS, Hybrid PseAAC, KSC, etc., are input to SVM and CatBoost. Based on the result of each predictor, Logistic Regression (LR) is built to predict the final decision. The experiment results indicate that ProtT5 and ESM-2 fusion features significantly contribute to predicting CPP and that combining employed features and models demonstrates better association. On an independent test dataset comparison, EnDM-CPP achieved an accuracy of 0.9495 and a Matthews correlation coefficient of 0.9008 with an improvement of 2.23%-9.48% and 4.32%-19.02%, respectively, compared with other state-of-the-art methods. Code and data are available at https://github.com/tudou1231/EnDM-CPP.git .","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"744-769"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142876973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advancing the Accuracy of Anti-MRSA Peptide Prediction Through Integrating Multi-Source Protein Language Models. 整合多源蛋白语言模型提高抗mrsa多肽预测的准确性

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-09-01 Epub Date: 2025-03-11 DOI: 10.1007/s12539-025-00696-5

Watshara Shoombuatong, Pakpoom Mookdarsanit, Lawankorn Mookdarsanit, Nalini Schaduangrat, Saeed Ahmed, Muhammad Kabir, Pramote Chumnanpuen

The emergence of methicillin-resistant Staphylococcus aureus (MRSA) as a recognized cause of community-acquired and hospital infections has brought about a need for the efficient and accurate identification of peptides with anti-MRSA properties in drug discovery and development pipelines. However, current experimental methods often tend to be labor- and resource-intensive. Thus, there is an immediate requirement to develop practical computational solutions for identifying sequence-based anti-MRSA peptides. Lately, pre-trained protein language models (pLMs) have emerged as a remarkable advancement for encoding peptide sequences as discriminative feature embeddings, uncovering plentiful protein-level information and successfully repurposing it for in silico peptide property prediction. In this study, we present pLM4MRSA, a framework based on pLMs designed to enhance the accuracy of predicting anti-MRSA peptides. In this framework, we combine feature embeddings from various pLMs, such as ProtTrans, and evolutionary-scale modeling (ESM-2) which provide complementary information for prediction. These individual pLM strengths are integrated to form hybrid feature embeddings. Next, we apply principal component analysis (PCA) to process these hybrid embeddings. The resulting PCA-transformed feature vectors are then used as inputs for constructing the predictive model. Experimental results on the independent test dataset showed that the proposed pLM4MRSA approach achieved a balanced accuracy and Matthew correlation coefficient of 0.983 and 0.980, respectively, representing remarkable improvements over the state-of-the-art methods by 2.53%-4.83% and 7.73%-13.23%, respectively. This indicates that pLM4MRSA is a high-performance prediction model with excellent scope of applicability. Additionally, comparison with well-known hand-crafted features demonstrated that the proposed hybrid feature embeddings complement each other effectively, capturing discriminative patterns for more accurate anti-MRSA peptide prediction. We anticipate that pLM4MRSA will serve as an effective solution for accurate and high-capacity prediction of anti-MRSA peptides from peptide sequences.

耐甲氧西林金黄色葡萄球菌（MRSA）的出现是社区获得性和医院感染的公认原因，这使得在药物发现和开发管道中需要有效和准确地鉴定具有抗MRSA特性的肽。然而，目前的实验方法往往倾向于劳动和资源密集型。因此，迫切需要开发实用的计算解决方案来识别基于序列的抗mrsa肽。最近，预训练的蛋白质语言模型（pLMs）作为一种显著的进步出现，用于编码肽序列作为判别特征嵌入，揭示丰富的蛋白质水平信息，并成功地将其重新用于硅肽性质预测。在这项研究中，我们提出了pLM4MRSA，这是一个基于pLMs的框架，旨在提高预测抗mrsa肽的准确性。在这个框架中，我们结合了来自各种plm的特征嵌入，如ProtTrans和进化尺度建模（ESM-2），为预测提供补充信息。这些单独的pLM优势被整合成混合特征嵌入。接下来，我们应用主成分分析（PCA）来处理这些混合嵌入。然后将得到的pca变换后的特征向量用作构建预测模型的输入。在独立测试数据集上的实验结果表明，提出的pLM4MRSA方法获得了0.983和0.980的平衡精度和马修相关系数，比现有方法分别提高了2.53% ~ 4.83%和7.73% ~ 13.23%。这表明pLM4MRSA是一种高性能的预测模型，具有很好的适用性。此外，与已知的手工特征比较表明，所提出的混合特征嵌入可以有效地互补，捕获判别模式，从而更准确地预测抗mrsa肽。我们预计pLM4MRSA将成为从肽序列中准确和高容量预测抗mrsa肽的有效解决方案。

{"title":"Advancing the Accuracy of Anti-MRSA Peptide Prediction Through Integrating Multi-Source Protein Language Models.","authors":"Watshara Shoombuatong, Pakpoom Mookdarsanit, Lawankorn Mookdarsanit, Nalini Schaduangrat, Saeed Ahmed, Muhammad Kabir, Pramote Chumnanpuen","doi":"10.1007/s12539-025-00696-5","DOIUrl":"10.1007/s12539-025-00696-5","url":null,"abstract":"The emergence of methicillin-resistant Staphylococcus aureus (MRSA) as a recognized cause of community-acquired and hospital infections has brought about a need for the efficient and accurate identification of peptides with anti-MRSA properties in drug discovery and development pipelines. However, current experimental methods often tend to be labor- and resource-intensive. Thus, there is an immediate requirement to develop practical computational solutions for identifying sequence-based anti-MRSA peptides. Lately, pre-trained protein language models (pLMs) have emerged as a remarkable advancement for encoding peptide sequences as discriminative feature embeddings, uncovering plentiful protein-level information and successfully repurposing it for in silico peptide property prediction. In this study, we present pLM4MRSA, a framework based on pLMs designed to enhance the accuracy of predicting anti-MRSA peptides. In this framework, we combine feature embeddings from various pLMs, such as ProtTrans, and evolutionary-scale modeling (ESM-2) which provide complementary information for prediction. These individual pLM strengths are integrated to form hybrid feature embeddings. Next, we apply principal component analysis (PCA) to process these hybrid embeddings. The resulting PCA-transformed feature vectors are then used as inputs for constructing the predictive model. Experimental results on the independent test dataset showed that the proposed pLM4MRSA approach achieved a balanced accuracy and Matthew correlation coefficient of 0.983 and 0.980, respectively, representing remarkable improvements over the state-of-the-art methods by 2.53%-4.83% and 7.73%-13.23%, respectively. This indicates that pLM4MRSA is a high-performance prediction model with excellent scope of applicability. Additionally, comparison with well-known hand-crafted features demonstrated that the proposed hybrid feature embeddings complement each other effectively, capturing discriminative patterns for more accurate anti-MRSA peptide prediction. We anticipate that pLM4MRSA will serve as an effective solution for accurate and high-capacity prediction of anti-MRSA peptides from peptide sequences.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"716-729"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143604811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Self-Supervised Graph Representation Learning for Single-Cell Classification. 单细胞分类的自监督图表示学习。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-09-01 Epub Date: 2025-04-03 DOI: 10.1007/s12539-025-00700-y

Qiguo Dai, Wuhao Liu, Xianhai Yu, Xiaodong Duan, Ziqiang Liu

Accurately identifying cell types in single-cell RNA sequencing data is critical for understanding cellular differentiation and pathological mechanisms in downstream analysis. As traditional biological approaches are laborious and time-intensive, it is imperative to develop computational biology methods for cell classification. However, it remains a challenge for existing methods to adequately utilize the potential gene expression information within the vast amount of unlabeled cell data, which limits their classification and generalization performance. Therefore, we propose a novel self-supervised graph representation learning framework for single-cell classification, named scSSGC. Specifically, in the pre-training stage of self-supervised learning, multiple K-means clustering tasks conducted on unlabeled cell data are jointly employed for model training, thereby mitigating the issue of limited labeled data. To effectively capture the potential interactions among cells, we introduce a locally augmented graph neural network to enhance the information aggregation capability for nodes with fewer neighbors in the cell graph. A range of benchmark experiments demonstrates that scSSGC outperforms existing state-of-the-art cell classification methods. More importantly, scSSGC provides stable performance when faced with cross-datasets, indicating better generalization ability.

在单细胞RNA测序数据中准确识别细胞类型对于理解细胞分化和下游分析中的病理机制至关重要。由于传统的生物学方法费时费力，因此开发计算生物学方法进行细胞分类势在必行。然而，现有方法难以充分利用大量未标记细胞数据中潜在的基因表达信息，这限制了它们的分类和泛化性能。因此，我们提出了一种新的单细胞分类自监督图表示学习框架，命名为scSSGC。具体而言，在自监督学习的预训练阶段，联合使用对未标记的单元数据进行的多个K-means聚类任务进行模型训练，从而缓解了标记数据有限的问题。为了有效地捕捉细胞间潜在的相互作用，我们引入了局部增强图神经网络来增强细胞图中邻居较少的节点的信息聚合能力。一系列基准实验表明，scSSGC优于现有的最先进的细胞分类方法。更重要的是，scSSGC在面对跨数据集时提供了稳定的性能，表明了更好的泛化能力。

{"title":"Self-Supervised Graph Representation Learning for Single-Cell Classification.","authors":"Qiguo Dai, Wuhao Liu, Xianhai Yu, Xiaodong Duan, Ziqiang Liu","doi":"10.1007/s12539-025-00700-y","DOIUrl":"10.1007/s12539-025-00700-y","url":null,"abstract":"Accurately identifying cell types in single-cell RNA sequencing data is critical for understanding cellular differentiation and pathological mechanisms in downstream analysis. As traditional biological approaches are laborious and time-intensive, it is imperative to develop computational biology methods for cell classification. However, it remains a challenge for existing methods to adequately utilize the potential gene expression information within the vast amount of unlabeled cell data, which limits their classification and generalization performance. Therefore, we propose a novel self-supervised graph representation learning framework for single-cell classification, named scSSGC. Specifically, in the pre-training stage of self-supervised learning, multiple K-means clustering tasks conducted on unlabeled cell data are jointly employed for model training, thereby mitigating the issue of limited labeled data. To effectively capture the potential interactions among cells, we introduce a locally augmented graph neural network to enhance the information aggregation capability for nodes with fewer neighbors in the cell graph. A range of benchmark experiments demonstrates that scSSGC outperforms existing state-of-the-art cell classification methods. More importantly, scSSGC provides stable performance when faced with cross-datasets, indicating better generalization ability.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"566-575"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143780053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0