Interdisciplinary Sciences: Computational Life Sciences最新文献_第2页

Identification of Multi-functional Therapeutic Peptides Based on Prototypical Supervised Contrastive Learning. 基于原型监督对比学习的多功能治疗肽识别。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2024-12-23 DOI: 10.1007/s12539-024-00674-3

Sitong Niu, Henghui Fan, Fei Wang, Xiaomei Yang, Junfeng Xia

High-throughput sequencing has exponentially increased peptide sequences, necessitating a computational method to identify multi-functional therapeutic peptides (MFTP) from their sequences. However, existing computational methods are challenged by class imbalance, particularly in learning effective sequence representations. To address this, we propose PSCFA, a prototypical supervised contrastive learning with a feature augmentation method for MFTP prediction. We employ a two-stage training scheme to train the feature extractor and the classifier respectively, underpinned by the principle that better feature representation boosts classification accuracy. In the first stage, we utilize a prototypical supervised contrastive learning strategy to enhance the uniformity of feature space distribution, ensuring that the characteristics of samples within the same category are tightly clustered while those from different categories are more dispersed. In the second stage, a feature augmentation strategy that focuses on infrequent labels (tail labels) is used to refine the learning process of the classifier. We use a prototype-based variational autoencoder to capture semantic links among common labels (head labels) and their prototypes. This knowledge is then transferred to tail labels, generating enhanced features for classifier training. The experiments prove that the PSCFA method significantly outperforms existing methods for MFTP prediction, making a significant advancement in therapeutic peptide identification.

高通量测序使得多肽序列呈指数增长，因此需要一种计算方法来从多肽序列中识别多功能治疗肽（MFTP）。然而，现有的计算方法受到类不平衡的挑战，特别是在学习有效的序列表示方面。为了解决这个问题，我们提出了PSCFA，一种典型的带有特征增强的监督对比学习方法，用于MFTP预测。我们采用两阶段训练方案分别训练特征提取器和分类器，以更好的特征表示提高分类精度的原则为基础。在第一阶段，我们利用一种原型监督对比学习策略来增强特征空间分布的均匀性，确保同一类别样本的特征紧密聚类，而不同类别样本的特征更加分散。在第二阶段，使用一种关注不频繁标签（尾标签）的特征增强策略来改进分类器的学习过程。我们使用基于原型的变分自编码器来捕获常见标签（头标签）及其原型之间的语义链接。然后将这些知识转移到尾部标签，生成用于分类器训练的增强特征。实验证明，PSCFA方法明显优于现有的MFTP预测方法，在治疗肽鉴定方面取得了重大进展。

{"title":"Identification of Multi-functional Therapeutic Peptides Based on Prototypical Supervised Contrastive Learning.","authors":"Sitong Niu, Henghui Fan, Fei Wang, Xiaomei Yang, Junfeng Xia","doi":"10.1007/s12539-024-00674-3","DOIUrl":"https://doi.org/10.1007/s12539-024-00674-3","url":null,"abstract":"High-throughput sequencing has exponentially increased peptide sequences, necessitating a computational method to identify multi-functional therapeutic peptides (MFTP) from their sequences. However, existing computational methods are challenged by class imbalance, particularly in learning effective sequence representations. To address this, we propose PSCFA, a prototypical supervised contrastive learning with a feature augmentation method for MFTP prediction. We employ a two-stage training scheme to train the feature extractor and the classifier respectively, underpinned by the principle that better feature representation boosts classification accuracy. In the first stage, we utilize a prototypical supervised contrastive learning strategy to enhance the uniformity of feature space distribution, ensuring that the characteristics of samples within the same category are tightly clustered while those from different categories are more dispersed. In the second stage, a feature augmentation strategy that focuses on infrequent labels (tail labels) is used to refine the learning process of the classifier. We use a prototype-based variational autoencoder to capture semantic links among common labels (head labels) and their prototypes. This knowledge is then transferred to tail labels, generating enhanced features for classifier training. The experiments prove that the PSCFA method significantly outperforms existing methods for MFTP prediction, making a significant advancement in therapeutic peptide identification.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142876975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SpatialCVGAE: Consensus Clustering Improves Spatial Domain Identification of Spatial Transcriptomics Using VGAE. SpatialCVGAE：共识聚类改进了使用 VGAE 的空间转录组学的空间域识别。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2024-12-16 DOI: 10.1007/s12539-024-00676-1

Jinyun Niu, Fangfang Zhu, Donghai Fang, Wenwen Min

The advent of spatially resolved transcriptomics (SRT) has provided critical insights into the spatial context of tissue microenvironments. Spatial clustering is a fundamental aspect of analyzing spatial transcriptomics data. However, spatial clustering methods often suffer from instability caused by the sparsity and high noise in the SRT data. To address this challenge, we propose SpatialCVGAE, a consensus clustering framework designed for SRT data analysis. SpatialCVGAE adopts the expression of high-variable genes from different dimensions along with multiple spatial graphs as inputs to variational graph autoencoders (VGAEs), learning multiple latent representations for clustering. These clustering results are then integrated using a consensus clustering approach, which enhances the model's stability and robustness by combining multiple clustering outcomes. Experiments demonstrate that SpatialCVGAE effectively mitigates the instability typically associated with non-ensemble deep learning methods, significantly improving both the stability and accuracy of the results. Compared to previous non-ensemble methods in representation learning and post-processing, our method fully leverages the diversity of multiple representations to accurately identify spatial domains, showing superior robustness and adaptability. All code and public datasets used in this paper are available at https://github.com/wenwenmin/SpatialCVGAE .

空间解析转录组学（SRT）的出现为组织微环境的空间背景提供了重要的见解。空间聚类是分析空间转录组学数据的一个基本方面。然而，由于SRT数据的稀疏性和高噪声，空间聚类方法往往存在不稳定性。为了解决这一挑战，我们提出了一个为SRT数据分析设计的共识聚类框架SpatialCVGAE。SpatialCVGAE采用不同维度的高变量基因表达以及多个空间图作为变分图自编码器（VGAEs）的输入，学习多个潜在表征进行聚类。然后使用共识聚类方法整合这些聚类结果，该方法通过组合多个聚类结果来增强模型的稳定性和鲁棒性。实验表明，SpatialCVGAE有效地缓解了非集成深度学习方法的不稳定性，显著提高了结果的稳定性和准确性。与以往在表征学习和后处理方面的非集成方法相比，该方法充分利用了多个表征的多样性来准确识别空间域，具有较强的鲁棒性和适应性。本文中使用的所有代码和公共数据集可在https://github.com/wenwenmin/SpatialCVGAE上获得。

{"title":"SpatialCVGAE: Consensus Clustering Improves Spatial Domain Identification of Spatial Transcriptomics Using VGAE.","authors":"Jinyun Niu, Fangfang Zhu, Donghai Fang, Wenwen Min","doi":"10.1007/s12539-024-00676-1","DOIUrl":"https://doi.org/10.1007/s12539-024-00676-1","url":null,"abstract":"The advent of spatially resolved transcriptomics (SRT) has provided critical insights into the spatial context of tissue microenvironments. Spatial clustering is a fundamental aspect of analyzing spatial transcriptomics data. However, spatial clustering methods often suffer from instability caused by the sparsity and high noise in the SRT data. To address this challenge, we propose SpatialCVGAE, a consensus clustering framework designed for SRT data analysis. SpatialCVGAE adopts the expression of high-variable genes from different dimensions along with multiple spatial graphs as inputs to variational graph autoencoders (VGAEs), learning multiple latent representations for clustering. These clustering results are then integrated using a consensus clustering approach, which enhances the model's stability and robustness by combining multiple clustering outcomes. Experiments demonstrate that SpatialCVGAE effectively mitigates the instability typically associated with non-ensemble deep learning methods, significantly improving both the stability and accuracy of the results. Compared to previous non-ensemble methods in representation learning and post-processing, our method fully leverages the diversity of multiple representations to accurately identify spatial domains, showing superior robustness and adaptability. All code and public datasets used in this paper are available at https://github.com/wenwenmin/SpatialCVGAE .","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142828461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DeepPD: A Deep Learning Method for Predicting Peptide Detectability Based on Multi-feature Representation and Information Bottleneck. 基于多特征表示和信息瓶颈的深度学习多肽可检测性预测方法。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2024-12-11 DOI: 10.1007/s12539-024-00665-4

Fenglin Li, Yannan Bin, Jianping Zhao, Chunhou Zheng

Peptide detectability measures the relationship between the protein composition and abundance in the sample and the peptides identified during the analytical procedure. This relationship has significant implications for the fundamental tasks of proteomics. Existing methods primarily rely on a single type of feature representation, which limits their ability to capture the intricate and diverse characteristics of peptides. In response to this limitation, we introduce DeepPD, an innovative deep learning framework incorporating multi-feature representation and the information bottleneck principle (IBP) to predict peptide detectability. DeepPD extracts semantic information from peptides using evolutionary scale modeling 2 (ESM-2) and integrates sequence and evolutionary information to construct the feature space collaboratively. The IBP effectively guides the feature learning process, minimizing redundancy in the feature space. Experimental results across various datasets demonstrate that DeepPD outperforms state-of-the-art methods. Furthermore, we demonstrate that DeepPD exhibits competitive generalization and transfer learning capabilities across diverse datasets and species. In conclusion, DeepPD emerges as the most effective method for predicting peptide detectability, showcasing its potential applicability to other protein sequence prediction tasks.

多肽可检测性测量样品中蛋白质组成和丰度与分析过程中鉴定的多肽之间的关系。这种关系对蛋白质组学的基本任务具有重要意义。现有的方法主要依赖于单一类型的特征表示，这限制了它们捕捉肽的复杂和多样化特征的能力。针对这一限制，我们引入了DeepPD，这是一种创新的深度学习框架，结合多特征表示和信息瓶颈原理（IBP）来预测肽的可检测性。deep - ppd利用进化尺度模型2 （evolutionary scale modeling 2, ESM-2）从肽段中提取语义信息，并将序列信息和进化信息整合，协同构建特征空间。IBP有效地指导特征学习过程，最大限度地减少特征空间中的冗余。不同数据集的实验结果表明，DeepPD优于最先进的方法。此外，我们证明了deepd在不同的数据集和物种中表现出竞争性的泛化和迁移学习能力。总之，DeepPD是预测肽可检测性最有效的方法，显示了其在其他蛋白质序列预测任务中的潜在适用性。

{"title":"DeepPD: A Deep Learning Method for Predicting Peptide Detectability Based on Multi-feature Representation and Information Bottleneck.","authors":"Fenglin Li, Yannan Bin, Jianping Zhao, Chunhou Zheng","doi":"10.1007/s12539-024-00665-4","DOIUrl":"https://doi.org/10.1007/s12539-024-00665-4","url":null,"abstract":"Peptide detectability measures the relationship between the protein composition and abundance in the sample and the peptides identified during the analytical procedure. This relationship has significant implications for the fundamental tasks of proteomics. Existing methods primarily rely on a single type of feature representation, which limits their ability to capture the intricate and diverse characteristics of peptides. In response to this limitation, we introduce DeepPD, an innovative deep learning framework incorporating multi-feature representation and the information bottleneck principle (IBP) to predict peptide detectability. DeepPD extracts semantic information from peptides using evolutionary scale modeling 2 (ESM-2) and integrates sequence and evolutionary information to construct the feature space collaboratively. The IBP effectively guides the feature learning process, minimizing redundancy in the feature space. Experimental results across various datasets demonstrate that DeepPD outperforms state-of-the-art methods. Furthermore, we demonstrate that DeepPD exhibits competitive generalization and transfer learning capabilities across diverse datasets and species. In conclusion, DeepPD emerges as the most effective method for predicting peptide detectability, showcasing its potential applicability to other protein sequence prediction tasks.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142806788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Repurposing Drugs for Infectious Diseases by Graph Convolutional Network with Sensitivity-Based Graph Reduction. 基于灵敏度的图约简的图卷积网络对传染病药物的再利用。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2024-12-04 DOI: 10.1007/s12539-024-00672-5

Rongting Yue, Abhishek Dutta

Computational systems biology employs computational algorithms and integrates diverse data sources, such as gene expression profiles, molecular interactions, and network modeling, to identify promising drug candidates through repurposing existing compounds in response to urgent healthcare needs. This study tackles the urgent need for rapid therapeutic development against emerging infectious diseases. We introduce a novel analytic expression for sensitivity analysis based on the Kronecker product and enhance model prediction performance using Graph Convolutional Networks (GCNs) with sensitivity-based graph reduction. Our algorithm refines prediction performance by leveraging sensitivity-based graph reduction. By integrating RNA-seq data, molecular interactions, and GCNs, we identify disease-related genes and pathways, construct heterogeneous graph models, and predict potential drugs. This approach involves novel analytical expressions that assess sensitivity to model loss, employing the Kronecker product approach. Subgraph analysis identifies nodes for removal, leading to a refined graph used for model retraining. This cost-effective pipeline focuses on computational methods for drug repurposing, targeting infectious diseases such as Zika virus and COVID-19 infection. Applied to these infections, our methodology integrates 659 proteins and 703 drugs for Zika virus, and 495 proteins and 468 drugs for COVID-19, along with their interactions derived from gene expression profiles. Top candidate drugs, such as Betamethasone phosphate and Bizelesin for Zika virus, and Chloroquine, Heparin Disaccharide, and Resveratrol for COVID-19, were validated through literature review or docking analysis. This scalable approach demonstrates promise in repurposing drugs for urgent healthcare challenges.

计算系统生物学采用计算算法并集成各种数据源，如基因表达谱、分子相互作用和网络建模，通过重新利用现有化合物来识别有希望的候选药物，以响应紧急医疗保健需求。这项研究解决了针对新发传染病的快速治疗发展的迫切需要。我们引入了一种新的基于Kronecker积的灵敏度分析解析表达式，并使用基于灵敏度的图约简的图卷积网络（GCNs）增强模型预测性能。我们的算法通过利用基于灵敏度的图约简来改进预测性能。通过整合RNA-seq数据、分子相互作用和GCNs，我们可以识别疾病相关基因和途径，构建异构图模型，并预测潜在的药物。这种方法涉及新的分析表达式，评估敏感性的模型损失，采用克罗内克积方法。子图分析确定要删除的节点，从而生成用于模型再训练的精细图。这一具有成本效益的项目侧重于药物再利用的计算方法，针对寨卡病毒和COVID-19感染等传染病。应用于这些感染，我们的方法整合了用于寨卡病毒的659种蛋白质和703种药物，以及用于COVID-19的495种蛋白质和468种药物，以及来自基因表达谱的相互作用。通过文献综述或对接分析，验证了治疗寨卡病毒的磷酸倍他米松、比别列星、治疗新冠病毒的氯喹、双糖肝素、白藜芦醇等热门候选药物。这种可扩展的方法显示了重新利用药物应对紧急医疗保健挑战的希望。

{"title":"Repurposing Drugs for Infectious Diseases by Graph Convolutional Network with Sensitivity-Based Graph Reduction.","authors":"Rongting Yue, Abhishek Dutta","doi":"10.1007/s12539-024-00672-5","DOIUrl":"https://doi.org/10.1007/s12539-024-00672-5","url":null,"abstract":"Computational systems biology employs computational algorithms and integrates diverse data sources, such as gene expression profiles, molecular interactions, and network modeling, to identify promising drug candidates through repurposing existing compounds in response to urgent healthcare needs. This study tackles the urgent need for rapid therapeutic development against emerging infectious diseases. We introduce a novel analytic expression for sensitivity analysis based on the Kronecker product and enhance model prediction performance using Graph Convolutional Networks (GCNs) with sensitivity-based graph reduction. Our algorithm refines prediction performance by leveraging sensitivity-based graph reduction. By integrating RNA-seq data, molecular interactions, and GCNs, we identify disease-related genes and pathways, construct heterogeneous graph models, and predict potential drugs. This approach involves novel analytical expressions that assess sensitivity to model loss, employing the Kronecker product approach. Subgraph analysis identifies nodes for removal, leading to a refined graph used for model retraining. This cost-effective pipeline focuses on computational methods for drug repurposing, targeting infectious diseases such as Zika virus and COVID-19 infection. Applied to these infections, our methodology integrates 659 proteins and 703 drugs for Zika virus, and 495 proteins and 468 drugs for COVID-19, along with their interactions derived from gene expression profiles. Top candidate drugs, such as Betamethasone phosphate and Bizelesin for Zika virus, and Chloroquine, Heparin Disaccharide, and Resveratrol for COVID-19, were validated through literature review or docking analysis. This scalable approach demonstrates promise in repurposing drugs for urgent healthcare challenges.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142768515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Artificial Intelligence-Based Classification of CT Images Using a Hybrid SpinalZFNet. 使用混合 SpinalZFNet 对 CT 图像进行基于人工智能的分类。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2024-12-01 Epub Date: 2024-08-21 DOI: 10.1007/s12539-024-00649-4

Faiqa Maqsood, Wang Zhenfei, Muhammad Mumtaz Ali, Baozhi Qiu, Naveed Ur Rehman, Fahad Sabah, Tahir Mahmood, Irfanud Din, Raheem Sarwar

The kidney is an abdominal organ in the human body that supports filtering excess water and waste from the blood. Kidney diseases generally occur due to changes in certain supplements, medical conditions, obesity, and diet, which causes kidney function and ultimately leads to complications such as chronic kidney disease, kidney failure, and other renal disorders. Combining patient metadata with computed tomography (CT) images is essential to accurately and timely diagnosing such complications. Deep Neural Networks (DNNs) have transformed medical fields by providing high accuracy in complex tasks. However, the high computational cost of these models is a significant challenge, particularly in real-time applications. This paper proposed SpinalZFNet, a hybrid deep learning approach that integrates the architectural strengths of Spinal Network (SpinalNet) with the feature extraction capabilities of Zeiler and Fergus Network (ZFNet) to classify kidney disease accurately using CT images. This unique combination enhanced feature analysis, significantly improving classification accuracy while reducing the computational overhead. At first, the acquired CT images are pre-processed using a median filter, and the pre-processed image is segmented using Efficient Neural Network (ENet). Later, the images are augmented, and different features are extracted from the augmented CT images. The extracted features finally classify the kidney disease into normal, tumor, cyst, and stone using the proposed SpinalZFNet model. The SpinalZFNet outperformed other models, with 99.9% sensitivity, 99.5% specificity, precision 99.6%, 99.8% accuracy, and 99.7% F1-Score in classifying kidney disease.

肾脏是人体的腹腔器官，负责过滤血液中多余的水分和废物。肾脏疾病的发生一般是由于某些补品、医疗条件、肥胖和饮食的变化，从而引起肾功能的变化，最终导致慢性肾病、肾衰竭和其他肾脏疾病等并发症。将患者元数据与计算机断层扫描（CT）图像相结合，对于准确及时地诊断此类并发症至关重要。深度神经网络（DNN）通过在复杂任务中提供高精确度，改变了医疗领域。然而，这些模型的高计算成本是一个重大挑战，尤其是在实时应用中。本文提出的 SpinalZFNet 是一种混合深度学习方法，它整合了脊柱网络（SpinalNet）的架构优势和 Zeiler 与 Fergus 网络（ZFNet）的特征提取能力，可利用 CT 图像对肾病进行准确分类。这种独特的组合增强了特征分析，显著提高了分类准确性，同时降低了计算开销。首先，使用中值滤波器对获取的 CT 图像进行预处理，然后使用高效神经网络（ENet）对预处理后的图像进行分割。之后，对图像进行增强，并从增强后的 CT 图像中提取不同的特征。提取的特征最终通过所提出的 SpinalZFNet 模型将肾病分为正常、肿瘤、囊肿和结石。在肾病分类方面，SpinalZFNet 的灵敏度为 99.9%，特异度为 99.5%，精确度为 99.6%，准确度为 99.8%，F1-Score 为 99.7%，均优于其他模型。

{"title":"Artificial Intelligence-Based Classification of CT Images Using a Hybrid SpinalZFNet.","authors":"Faiqa Maqsood, Wang Zhenfei, Muhammad Mumtaz Ali, Baozhi Qiu, Naveed Ur Rehman, Fahad Sabah, Tahir Mahmood, Irfanud Din, Raheem Sarwar","doi":"10.1007/s12539-024-00649-4","DOIUrl":"10.1007/s12539-024-00649-4","url":null,"abstract":"The kidney is an abdominal organ in the human body that supports filtering excess water and waste from the blood. Kidney diseases generally occur due to changes in certain supplements, medical conditions, obesity, and diet, which causes kidney function and ultimately leads to complications such as chronic kidney disease, kidney failure, and other renal disorders. Combining patient metadata with computed tomography (CT) images is essential to accurately and timely diagnosing such complications. Deep Neural Networks (DNNs) have transformed medical fields by providing high accuracy in complex tasks. However, the high computational cost of these models is a significant challenge, particularly in real-time applications. This paper proposed SpinalZFNet, a hybrid deep learning approach that integrates the architectural strengths of Spinal Network (SpinalNet) with the feature extraction capabilities of Zeiler and Fergus Network (ZFNet) to classify kidney disease accurately using CT images. This unique combination enhanced feature analysis, significantly improving classification accuracy while reducing the computational overhead. At first, the acquired CT images are pre-processed using a median filter, and the pre-processed image is segmented using Efficient Neural Network (ENet). Later, the images are augmented, and different features are extracted from the augmented CT images. The extracted features finally classify the kidney disease into normal, tumor, cyst, and stone using the proposed SpinalZFNet model. The SpinalZFNet outperformed other models, with 99.9% sensitivity, 99.5% specificity, precision 99.6%, 99.8% accuracy, and 99.7% F1-Score in classifying kidney disease.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"907-925"},"PeriodicalIF":3.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11512893/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142017327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PLMC: Language Model of Protein Sequences Enhances Protein Crystallization Prediction. PLMC：蛋白质序列语言模型增强蛋白质结晶预测。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2024-12-01 Epub Date: 2024-08-19 DOI: 10.1007/s12539-024-00639-6

Dapeng Xiong, Kaicheng U, Jianfeng Sun, Adam P Cribbs

X-ray diffraction crystallography has been most widely used for protein three-dimensional (3D) structure determination for which whether proteins are crystallizable is a central prerequisite. Yet, there are a number of procedures during protein crystallization, including protein material production, purification, and crystal production, which take turns affecting the crystallization outcome. Due to the expensive and laborious nature of this multi-stage process, various computational tools have been developed to predict protein crystallization propensity, which is then used to guide the experimental determination. In this study, we presented a novel deep learning framework, PLMC, to improve multi-stage protein crystallization propensity prediction by leveraging a pre-trained protein language model. To effectively train PLMC, two groups of features of each protein were integrated into a more comprehensive representation, including protein language embeddings from the large-scale protein sequence database and a handcrafted feature set consisting of physicochemical, sequence-based and disordered-related information. These features were further separately embedded for refinement, and then concatenated for the final prediction. Notably, our extensive benchmarking tests demonstrate that PLMC greatly outperforms other state-of-the-art methods by achieving AUC scores of 0.773, 0.893, and 0.913, respectively, at the aforementioned individual stages, and 0.982 at the final crystallization stage. Furthermore, PLMC is shown to be superior for predicting the crystallization of both globular and membrane proteins, as demonstrated by an AUC score of 0.991 for the latter. These results suggest the significant potential of PLMC in assisting researchers with the experimental design of crystallizable protein variants.

X 射线衍射晶体学最广泛地应用于蛋白质三维（3D）结构的确定，而蛋白质是否可结晶是其核心前提。然而，蛋白质结晶过程中存在许多程序，包括蛋白质材料生产、纯化和晶体生产，这些程序会轮流影响结晶结果。由于这一多阶段过程既昂贵又费力，人们开发了各种计算工具来预测蛋白质的结晶倾向，然后用来指导实验测定。在本研究中，我们提出了一种新颖的深度学习框架 PLMC，利用预先训练好的蛋白质语言模型来改进多阶段蛋白质结晶倾向预测。为了有效地训练 PLMC，我们将每个蛋白质的两组特征整合为一个更全面的表征，包括来自大规模蛋白质序列数据库的蛋白质语言嵌入，以及由物理化学、基于序列和无序相关信息组成的手工特征集。这些特征被进一步分别嵌入以进行细化，然后进行合并以进行最终预测。值得注意的是，我们进行的大量基准测试表明，PLMC 在上述各个阶段的 AUC 分数分别为 0.773、0.893 和 0.913，在最终结晶阶段的 AUC 分数为 0.982，大大优于其他最先进的方法。此外，PLMC 在预测球蛋白和膜蛋白的结晶方面也表现出色，后者的 AUC 得分为 0.991。这些结果表明，PLMC 在协助研究人员进行可结晶蛋白质变体的实验设计方面具有巨大潜力。

{"title":"PLMC: Language Model of Protein Sequences Enhances Protein Crystallization Prediction.","authors":"Dapeng Xiong, Kaicheng U, Jianfeng Sun, Adam P Cribbs","doi":"10.1007/s12539-024-00639-6","DOIUrl":"10.1007/s12539-024-00639-6","url":null,"abstract":"X-ray diffraction crystallography has been most widely used for protein three-dimensional (3D) structure determination for which whether proteins are crystallizable is a central prerequisite. Yet, there are a number of procedures during protein crystallization, including protein material production, purification, and crystal production, which take turns affecting the crystallization outcome. Due to the expensive and laborious nature of this multi-stage process, various computational tools have been developed to predict protein crystallization propensity, which is then used to guide the experimental determination. In this study, we presented a novel deep learning framework, PLMC, to improve multi-stage protein crystallization propensity prediction by leveraging a pre-trained protein language model. To effectively train PLMC, two groups of features of each protein were integrated into a more comprehensive representation, including protein language embeddings from the large-scale protein sequence database and a handcrafted feature set consisting of physicochemical, sequence-based and disordered-related information. These features were further separately embedded for refinement, and then concatenated for the final prediction. Notably, our extensive benchmarking tests demonstrate that PLMC greatly outperforms other state-of-the-art methods by achieving AUC scores of 0.773, 0.893, and 0.913, respectively, at the aforementioned individual stages, and 0.982 at the final crystallization stage. Furthermore, PLMC is shown to be superior for predicting the crystallization of both globular and membrane proteins, as demonstrated by an AUC score of 0.991 for the latter. These results suggest the significant potential of PLMC in assisting researchers with the experimental design of crystallizable protein variants.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"802-813"},"PeriodicalIF":3.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141999874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Predicting Disease-Metabolite Associations Based on the Metapath Aggregation of Tripartite Heterogeneous Networks. 基于三方异构网络元路径聚合的疾病-代谢物关联预测

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2024-12-01 Epub Date: 2024-08-07 DOI: 10.1007/s12539-024-00645-8

Wenzhi Liu, Pengli Lu

The exploration of the interactions between diseases and metabolites holds significant implications for the diagnosis and treatment of diseases. However, traditional experimental methods are time-consuming and costly, and current computational methods often overlook the influence of other biological entities on both. In light of these limitations, we proposed a novel deep learning model based on metapath aggregation of tripartite heterogeneous networks (MAHN) to explore disease-related metabolites. Specifically, we introduced microbes to construct a tripartite heterogeneous network and employed graph convolutional network and enhanced GraphSAGE to learn node features with metapath length 3. Additionally, we utilized node-level and semantic-level attention mechanisms, a more granular approach, to aggregate node features with metapath length 2. Finally, the reconstructed association probability is obtained by fusing features from different metapaths into the bilinear decoder. The experiments demonstrate that the proposed MAHN model achieved superior performance in five-fold cross-validation with Acc (91.85%), Pre (90.48%), Recall (93.53%), F1 (91.94%), AUC (97.39%), and AUPR (97.47%), outperforming four state-of-the-art algorithms. Case studies on two complex diseases, irritable bowel syndrome and obesity, further validate the predictive results, and the MAHN model is a trustworthy prediction tool for discovering potential metabolites. Moreover, deep learning models integrating multi-omics data represent the future mainstream direction for predicting disease-related biological entities.

探索疾病与代谢物之间的相互作用对疾病的诊断和治疗具有重要意义。然而，传统的实验方法耗时长、成本高，目前的计算方法往往忽略了其他生物实体对两者的影响。鉴于这些局限性，我们提出了一种基于三方异构网络元路径聚合（MAHN）的新型深度学习模型来探索疾病相关代谢物。具体来说，我们引入微生物来构建三方异构网络，并采用图卷积网络和增强型 GraphSAGE 来学习元路径长度为 3 的节点特征；此外，我们还利用节点级和语义级注意力机制（一种更精细的方法）来聚合元路径长度为 2 的节点特征。实验证明，所提出的 MAHN 模型在五倍交叉验证中取得了优异的性能，Acc（91.85%）、Pre（90.48%）、Recall（93.53%）、F1（91.94%）、AUC（97.39%）和 AUPR（97.47%）均优于四种最先进的算法。对肠易激综合征和肥胖症这两种复杂疾病的案例研究进一步验证了预测结果，MAHN 模型是发现潜在代谢物的值得信赖的预测工具。此外，整合多组学数据的深度学习模型代表了预测疾病相关生物实体的未来主流方向。

{"title":"Predicting Disease-Metabolite Associations Based on the Metapath Aggregation of Tripartite Heterogeneous Networks.","authors":"Wenzhi Liu, Pengli Lu","doi":"10.1007/s12539-024-00645-8","DOIUrl":"10.1007/s12539-024-00645-8","url":null,"abstract":"The exploration of the interactions between diseases and metabolites holds significant implications for the diagnosis and treatment of diseases. However, traditional experimental methods are time-consuming and costly, and current computational methods often overlook the influence of other biological entities on both. In light of these limitations, we proposed a novel deep learning model based on metapath aggregation of tripartite heterogeneous networks (MAHN) to explore disease-related metabolites. Specifically, we introduced microbes to construct a tripartite heterogeneous network and employed graph convolutional network and enhanced GraphSAGE to learn node features with metapath length 3. Additionally, we utilized node-level and semantic-level attention mechanisms, a more granular approach, to aggregate node features with metapath length 2. Finally, the reconstructed association probability is obtained by fusing features from different metapaths into the bilinear decoder. The experiments demonstrate that the proposed MAHN model achieved superior performance in five-fold cross-validation with Acc (91.85%), Pre (90.48%), Recall (93.53%), F1 (91.94%), AUC (97.39%), and AUPR (97.47%), outperforming four state-of-the-art algorithms. Case studies on two complex diseases, irritable bowel syndrome and obesity, further validate the predictive results, and the MAHN model is a trustworthy prediction tool for discovering potential metabolites. Moreover, deep learning models integrating multi-omics data represent the future mainstream direction for predicting disease-related biological entities.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"829-843"},"PeriodicalIF":3.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141901633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Viral Rebound After Antiviral Treatment: A Mathematical Modeling Study of the Role of Antiviral Mechanism of Action. 抗病毒治疗后的病毒反弹：抗病毒作用机制的数学模型研究。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2024-12-01 Epub Date: 2024-07-21 DOI: 10.1007/s12539-024-00643-w

Aubrey Chiarelli, Hana Dobrovolny

The development of antiviral treatments for SARS-CoV-2 was an important turning point for the pandemic. Availability of safe and effective antivirals has allowed people to return back to normal life. While SARS-CoV-2 antivirals are highly effective at preventing severe disease, there have been concerning reports of viral rebound in some patients after cessation of antiviral treatment. In this study, we use a mathematical model of viral infection to study the potential of different antivirals to prevent viral rebound. We find that antivirals that block production are most likely to result in viral rebound if the treatment time course is not sufficiently long. Since these antivirals do not prevent infection of cells, cells continue to be infected during treatment. When treatment is stopped, the infected cells will begin producing virus at the usual rate. Antivirals that prevent infection of cells are less likely to result in viral rebound since cells are not being infected during treatment. This study highlights the role of antiviral mechanism of action in increasing or reducing the probability of viral rebound.

针对 SARS-CoV-2 的抗病毒疗法的开发是这次大流行病的一个重要转折点。安全有效的抗病毒药物使人们得以恢复正常生活。虽然 SARS-CoV-2 抗病毒药物在预防严重疾病方面非常有效，但也有一些患者在停止抗病毒治疗后病毒反弹的报道，令人担忧。在这项研究中，我们利用病毒感染的数学模型来研究不同抗病毒药物预防病毒反弹的潜力。我们发现，如果治疗时间不够长，阻断病毒生成的抗病毒药物最有可能导致病毒反弹。由于这些抗病毒药物不能阻止细胞感染，因此在治疗期间细胞会继续受到感染。治疗停止后，受感染的细胞又会以通常的速度开始产生病毒。防止细胞感染的抗病毒药物不太可能导致病毒反弹，因为细胞在治疗期间没有受到感染。这项研究强调了抗病毒药物的作用机制在增加或减少病毒反弹概率方面的作用。

{"title":"Viral Rebound After Antiviral Treatment: A Mathematical Modeling Study of the Role of Antiviral Mechanism of Action.","authors":"Aubrey Chiarelli, Hana Dobrovolny","doi":"10.1007/s12539-024-00643-w","DOIUrl":"10.1007/s12539-024-00643-w","url":null,"abstract":"The development of antiviral treatments for SARS-CoV-2 was an important turning point for the pandemic. Availability of safe and effective antivirals has allowed people to return back to normal life. While SARS-CoV-2 antivirals are highly effective at preventing severe disease, there have been concerning reports of viral rebound in some patients after cessation of antiviral treatment. In this study, we use a mathematical model of viral infection to study the potential of different antivirals to prevent viral rebound. We find that antivirals that block production are most likely to result in viral rebound if the treatment time course is not sufficiently long. Since these antivirals do not prevent infection of cells, cells continue to be infected during treatment. When treatment is stopped, the infected cells will begin producing virus at the usual rate. Antivirals that prevent infection of cells are less likely to result in viral rebound since cells are not being infected during treatment. This study highlights the role of antiviral mechanism of action in increasing or reducing the probability of viral rebound.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"844-853"},"PeriodicalIF":3.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141734033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FPJA-Net: A Lightweight End-to-End Network for Sleep Stage Prediction Based on Feature Pyramid and Joint Attention. FPJA-Net：基于特征金字塔和联合注意力的轻量级端到端睡眠阶段预测网络

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2024-12-01 Epub Date: 2024-08-19 DOI: 10.1007/s12539-024-00636-9

Zhi Liu, Qinhan Zhang, Sixin Luo, Meiqiao Qin

Sleep staging is the most crucial work before diagnosing and treating sleep disorders. Traditional manual sleep staging is time-consuming and depends on the skill of experts. Nowadays, automatic sleep staging based on deep learning attracts more and more scientific researchers. As we know, the salient waves in sleep signals contain the most important information for automatic sleep staging. However, the key information is not fully utilized in existing deep learning methods since most of them only use CNN or RNN which could not capture multi-scale features in salient waves effectively. To tackle this limitation, we propose a lightweight end-to-end network for sleep stage prediction based on feature pyramid and joint attention. The feature pyramid module is designed to effectively extract multi-scale features in salient waves, and these features are then fed to the joint attention module to closely attend to the channel and location information of the salient waves. The proposed network has much fewer parameters and significant performance improvement, which is better than the state-of-the-art results. The overall accuracy and macro F1 score on the public dataset Sleep-EDF39, Sleep-EDF153 and SHHS are 90.1%, 87.8%, 87.4%, 84.4% and 86.9%, 83.9%, respectively. Ablation experiments confirm the effectiveness of each module.

睡眠分期是诊断和治疗睡眠障碍前最关键的工作。传统的人工睡眠分期耗时长，且依赖于专家的技术。如今，基于深度学习的自动睡眠分期吸引了越来越多的科研人员。我们知道，睡眠信号中的显著波包含了对自动睡眠分期最重要的信息。然而，由于现有的深度学习方法大多只使用 CNN 或 RNN，无法有效捕捉显著波的多尺度特征，因此无法充分利用这些关键信息。针对这一局限，我们提出了一种基于特征金字塔和联合注意力的轻量级端到端网络，用于预测睡眠阶段。特征金字塔模块旨在有效提取突出波的多尺度特征，然后将这些特征反馈给联合注意模块，以密切关注突出波的信道和位置信息。所提出的网络参数更少，性能提升显著，优于最先进的结果。在公开数据集 Sleep-EDF39、Sleep-EDF153 和 SHHS 上的总体准确率和宏观 F1 分数分别为 90.1%、87.8%、87.4%、84.4% 和 86.9%、83.9%。消融实验证实了每个模块的有效性。

{"title":"FPJA-Net: A Lightweight End-to-End Network for Sleep Stage Prediction Based on Feature Pyramid and Joint Attention.","authors":"Zhi Liu, Qinhan Zhang, Sixin Luo, Meiqiao Qin","doi":"10.1007/s12539-024-00636-9","DOIUrl":"10.1007/s12539-024-00636-9","url":null,"abstract":"Sleep staging is the most crucial work before diagnosing and treating sleep disorders. Traditional manual sleep staging is time-consuming and depends on the skill of experts. Nowadays, automatic sleep staging based on deep learning attracts more and more scientific researchers. As we know, the salient waves in sleep signals contain the most important information for automatic sleep staging. However, the key information is not fully utilized in existing deep learning methods since most of them only use CNN or RNN which could not capture multi-scale features in salient waves effectively. To tackle this limitation, we propose a lightweight end-to-end network for sleep stage prediction based on feature pyramid and joint attention. The feature pyramid module is designed to effectively extract multi-scale features in salient waves, and these features are then fed to the joint attention module to closely attend to the channel and location information of the salient waves. The proposed network has much fewer parameters and significant performance improvement, which is better than the state-of-the-art results. The overall accuracy and macro F1 score on the public dataset Sleep-EDF39, Sleep-EDF153 and SHHS are 90.1%, 87.8%, 87.4%, 84.4% and 86.9%, 83.9%, respectively. Ablation experiments confirm the effectiveness of each module.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"769-780"},"PeriodicalIF":3.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141999873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Function-Genes and Disease-Genes Prediction Based on Network Embedding and One-Class Classification. 基于网络嵌入和单类分类的功能基因和疾病基因预测

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2024-12-01 Epub Date: 2024-09-04 DOI: 10.1007/s12539-024-00638-7

Weiyu Shi, Yan Zhang, Yeqing Sun, Zhengkui Lin

Using genes which have been experimentally-validated for diseases (functions) can develop machine learning methods to predict new disease/function-genes. However, the prediction of both function-genes and disease-genes faces the same problem: there are only certain positive examples, but no negative examples. To solve this problem, we proposed a function/disease-genes prediction algorithm based on network embedding (Variational Graph Auto-Encoders, VGAE) and one-class classification (Fast Minimum Covariance Determinant, Fast-MCD): VGAEMCD. Firstly, we constructed a protein-protein interaction (PPI) network centered on experimentally-validated genes; then VGAE was used to get the embeddings of nodes (genes) in the network; finally, the embeddings were input into the improved deep learning one-class classifier based on Fast-MCD to predict function/disease-genes. VGAEMCD can predict function-gene and disease-gene in a unified way, and only the experimentally-verified genes are needed to provide (no need for expression profile). VGAEMCD outperforms classical one-class classification algorithms in Recall, Precision, F-measure, Specificity, and Accuracy. Further experiments show that seven metrics of VGAEMCD are higher than those of state-of-art function/disease-genes prediction algorithms. The above results indicate that VGAEMCD can well learn the distribution characteristics of positive examples and accurately identify function/disease-genes.

利用已通过实验验证的疾病基因（功能），可以开发出预测新疾病/功能基因的机器学习方法。然而，功能基因和疾病基因的预测都面临同样的问题：只有一定的正例，而没有负例。为了解决这个问题，我们提出了一种基于网络嵌入（变异图自动编码器，VGAE）和单类分类（快速最小协方差判定，Fast-MCD）的功能/疾病基因预测算法：VGAEMCD。首先，我们以经过实验验证的基因为中心构建了一个蛋白质-蛋白质相互作用（PPI）网络；然后，使用VGAE获得网络中节点（基因）的嵌入；最后，将嵌入输入基于Fast-MCD的改进型深度学习单类分类器，以预测功能/疾病基因。VGAEMCD 可以统一预测功能基因和疾病基因，只需提供实验验证的基因（无需表达谱）。VGAEMCD 在 Recall、Precision、F-measure、Specificity 和 Accuracy 方面均优于经典的单类分类算法。进一步的实验表明，VGAEMCD 的七项指标均高于最先进的功能/疾病基因预测算法。上述结果表明，VGAEMCD 能很好地学习正例的分布特征，并准确识别功能/疾病基因。

{"title":"Function-Genes and Disease-Genes Prediction Based on Network Embedding and One-Class Classification.","authors":"Weiyu Shi, Yan Zhang, Yeqing Sun, Zhengkui Lin","doi":"10.1007/s12539-024-00638-7","DOIUrl":"10.1007/s12539-024-00638-7","url":null,"abstract":"Using genes which have been experimentally-validated for diseases (functions) can develop machine learning methods to predict new disease/function-genes. However, the prediction of both function-genes and disease-genes faces the same problem: there are only certain positive examples, but no negative examples. To solve this problem, we proposed a function/disease-genes prediction algorithm based on network embedding (Variational Graph Auto-Encoders, VGAE) and one-class classification (Fast Minimum Covariance Determinant, Fast-MCD): VGAEMCD. Firstly, we constructed a protein-protein interaction (PPI) network centered on experimentally-validated genes; then VGAE was used to get the embeddings of nodes (genes) in the network; finally, the embeddings were input into the improved deep learning one-class classifier based on Fast-MCD to predict function/disease-genes. VGAEMCD can predict function-gene and disease-gene in a unified way, and only the experimentally-verified genes are needed to provide (no need for expression profile). VGAEMCD outperforms classical one-class classification algorithms in Recall, Precision, F-measure, Specificity, and Accuracy. Further experiments show that seven metrics of VGAEMCD are higher than those of state-of-art function/disease-genes prediction algorithms. The above results indicate that VGAEMCD can well learn the distribution characteristics of positive examples and accurately identify function/disease-genes.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"781-801"},"PeriodicalIF":3.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142125655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0