首页 > 最新文献

Interdisciplinary Sciences: Computational Life Sciences最新文献

英文 中文
Transformative Deep Neural Network Approaches in Kidney Ultrasound Segmentation: Empirical Validation with an Annotated Dataset. 肾脏超声波分割中的变革性深度神经网络方法:使用注释数据集进行经验验证。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 Epub Date: 2024-02-27 DOI: 10.1007/s12539-024-00620-3
Rashid Khan, Chuda Xiao, Yang Liu, Jinyu Tian, Zhuo Chen, Liyilei Su, Dan Li, Haseeb Hassan, Haoyu Li, Weiguo Xie, Wen Zhong, Bingding Huang

Kidney ultrasound (US) images are primarily employed for diagnosing different renal diseases. Among them, one is renal localization and detection, which can be carried out by segmenting the kidney US images. However, kidney segmentation from US images is challenging due to low contrast, speckle noise, fluid, variations in kidney shape, and modality artifacts. Moreover, well-annotated US datasets for renal segmentation and detection are scarce. This study aims to build a novel, well-annotated dataset containing 44,880 US images. In addition, we propose a novel training scheme that utilizes the encoder and decoder parts of a state-of-the-art segmentation algorithm. In the pre-processing step, pixel intensity normalization improves contrast and facilitates model convergence. The modified encoder-decoder architecture improves pyramid-shaped hole pooling, cascaded multiple-hole convolutions, and batch normalization. The pre-processing step gradually reconstructs spatial information, including the capture of complete object boundaries, and the post-processing module with a concave curvature reduces the false positive rate of the results. We present benchmark findings to validate the quality of the proposed training scheme and dataset. We applied six evaluation metrics and several baseline segmentation approaches to our novel kidney US dataset. Among the evaluated models, DeepLabv3+ performed well and achieved the highest dice, Hausdorff distance 95, accuracy, specificity, average symmetric surface distance, and recall scores of 89.76%, 9.91, 98.14%, 98.83%, 3.03, and 90.68%, respectively. The proposed training strategy aids state-of-the-art segmentation models, resulting in better-segmented predictions. Furthermore, the large, well-annotated kidney US public dataset will serve as a valuable baseline source for future medical image analysis research.

肾脏超声波(US)图像主要用于诊断不同的肾脏疾病。其中,肾脏定位和检测可通过分割肾脏 US 图像来实现。然而,由于对比度低、斑点噪声、流体、肾脏形状变化和模式伪影等原因,从 US 图像中分割肾脏具有挑战性。此外,用于肾脏分割和检测的注释良好的 US 数据集也很少。本研究旨在建立一个包含 44,880 张 US 图像的新型、注释完善的数据集。此外,我们还提出了一种新的训练方案,该方案利用了最先进的分割算法的编码器和解码器部分。在预处理步骤中,像素强度归一化可提高对比度并促进模型收敛。修改后的编码器-解码器架构改进了金字塔形孔池、级联多孔卷积和批量归一化。预处理步骤逐步重建空间信息,包括捕捉完整的物体边界,而带有凹曲率的后处理模块则降低了结果的误报率。我们提出了基准结果,以验证所提出的训练方案和数据集的质量。我们对新型肾脏 US 数据集采用了六种评估指标和几种基线分割方法。在接受评估的模型中,DeepLabv3+ 表现出色,在骰子、豪斯多夫距离 95、准确性、特异性、平均对称面距离和召回率方面分别取得了 89.76%、9.91、98.14%、98.83%、3.03 和 90.68% 的最高分。所提出的训练策略有助于最先进的分割模型,从而获得更好的分割预测结果。此外,美国肾脏公共数据集规模大、注释详尽,将成为未来医学图像分析研究的宝贵基准源。
{"title":"Transformative Deep Neural Network Approaches in Kidney Ultrasound Segmentation: Empirical Validation with an Annotated Dataset.","authors":"Rashid Khan, Chuda Xiao, Yang Liu, Jinyu Tian, Zhuo Chen, Liyilei Su, Dan Li, Haseeb Hassan, Haoyu Li, Weiguo Xie, Wen Zhong, Bingding Huang","doi":"10.1007/s12539-024-00620-3","DOIUrl":"10.1007/s12539-024-00620-3","url":null,"abstract":"<p><p>Kidney ultrasound (US) images are primarily employed for diagnosing different renal diseases. Among them, one is renal localization and detection, which can be carried out by segmenting the kidney US images. However, kidney segmentation from US images is challenging due to low contrast, speckle noise, fluid, variations in kidney shape, and modality artifacts. Moreover, well-annotated US datasets for renal segmentation and detection are scarce. This study aims to build a novel, well-annotated dataset containing 44,880 US images. In addition, we propose a novel training scheme that utilizes the encoder and decoder parts of a state-of-the-art segmentation algorithm. In the pre-processing step, pixel intensity normalization improves contrast and facilitates model convergence. The modified encoder-decoder architecture improves pyramid-shaped hole pooling, cascaded multiple-hole convolutions, and batch normalization. The pre-processing step gradually reconstructs spatial information, including the capture of complete object boundaries, and the post-processing module with a concave curvature reduces the false positive rate of the results. We present benchmark findings to validate the quality of the proposed training scheme and dataset. We applied six evaluation metrics and several baseline segmentation approaches to our novel kidney US dataset. Among the evaluated models, DeepLabv3+ performed well and achieved the highest dice, Hausdorff distance 95, accuracy, specificity, average symmetric surface distance, and recall scores of 89.76%, 9.91, 98.14%, 98.83%, 3.03, and 90.68%, respectively. The proposed training strategy aids state-of-the-art segmentation models, resulting in better-segmented predictions. Furthermore, the large, well-annotated kidney US public dataset will serve as a valuable baseline source for future medical image analysis research.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"439-454"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139982903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inference of Gene Regulatory Networks Based on Multi-view Hierarchical Hypergraphs. 基于多视角层次超图的基因调控网络推断。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 Epub Date: 2024-02-11 DOI: 10.1007/s12539-024-00604-3
Songyang Wu, Kui Jin, Mingjing Tang, Yuelong Xia, Wei Gao

Since gene regulation is a complex process in which multiple genes act simultaneously, accurately inferring gene regulatory networks (GRNs) is a long-standing challenge in systems biology. Although graph neural networks can formally describe intricate gene expression mechanisms, current GRN inference methods based on graph learning regard only transcription factor (TF)-target gene interactions as pairwise relationships, and cannot model the many-to-many high-order regulatory patterns that prevail among genes. Moreover, these methods often rely on limited prior regulatory knowledge, ignoring the structural information of GRNs in gene expression profiles. Therefore, we propose a multi-view hierarchical hypergraphs GRN (MHHGRN) inference model. Specifically, multiple heterogeneous biological information is integrated to construct multi-view hierarchical hypergraphs of TFs and target genes, using hypergraph convolution networks to model higher order complex regulatory relationships. Meanwhile, the coupled information diffusion mechanism and the cross-domain messaging mechanism facilitate the information sharing between genes to optimise gene embedding representations. Finally, a unique channel attention mechanism is used to adaptively learn feature representations from multiple views for GRN inference. Experimental results show that MHHGRN achieves better results than the baseline methods on the E. coli and S. cerevisiae benchmark datasets of the DREAM5 challenge, and it has excellent cross-species generalization, achieving comparable or better performance on scRNA-seq datasets from five mouse and two human cell lines.

由于基因调控是一个多基因同时作用的复杂过程,因此准确推断基因调控网络(GRN)是系统生物学长期面临的挑战。虽然图神经网络可以正式描述复杂的基因表达机制,但目前基于图学习的基因调控网络推断方法仅将转录因子(TF)与目标基因之间的相互作用视为配对关系,无法模拟基因之间普遍存在的多对多高阶调控模式。此外,这些方法往往依赖于有限的先验调控知识,忽略了基因表达谱中 GRN 的结构信息。因此,我们提出了一种多视图分层超图 GRN(MHHGRN)推断模型。具体来说,通过整合多种异构生物信息,构建 TFs 和靶基因的多视图分层超图,利用超图卷积网络建立高阶复杂调控关系模型。同时,耦合信息扩散机制和跨域信息传递机制促进了基因之间的信息共享,从而优化了基因嵌入表征。最后,一种独特的通道关注机制被用于自适应地学习多个视图的特征表征,以进行 GRN 推理。实验结果表明,MHHGRN 在 DREAM5 挑战赛的大肠杆菌和酿酒葡萄球菌基准数据集上取得了比基线方法更好的结果,而且它具有出色的跨物种泛化能力,在来自五种小鼠和两种人类细胞系的 scRNA-seq 数据集上取得了相当或更好的性能。
{"title":"Inference of Gene Regulatory Networks Based on Multi-view Hierarchical Hypergraphs.","authors":"Songyang Wu, Kui Jin, Mingjing Tang, Yuelong Xia, Wei Gao","doi":"10.1007/s12539-024-00604-3","DOIUrl":"10.1007/s12539-024-00604-3","url":null,"abstract":"<p><p>Since gene regulation is a complex process in which multiple genes act simultaneously, accurately inferring gene regulatory networks (GRNs) is a long-standing challenge in systems biology. Although graph neural networks can formally describe intricate gene expression mechanisms, current GRN inference methods based on graph learning regard only transcription factor (TF)-target gene interactions as pairwise relationships, and cannot model the many-to-many high-order regulatory patterns that prevail among genes. Moreover, these methods often rely on limited prior regulatory knowledge, ignoring the structural information of GRNs in gene expression profiles. Therefore, we propose a multi-view hierarchical hypergraphs GRN (MHHGRN) inference model. Specifically, multiple heterogeneous biological information is integrated to construct multi-view hierarchical hypergraphs of TFs and target genes, using hypergraph convolution networks to model higher order complex regulatory relationships. Meanwhile, the coupled information diffusion mechanism and the cross-domain messaging mechanism facilitate the information sharing between genes to optimise gene embedding representations. Finally, a unique channel attention mechanism is used to adaptively learn feature representations from multiple views for GRN inference. Experimental results show that MHHGRN achieves better results than the baseline methods on the E. coli and S. cerevisiae benchmark datasets of the DREAM5 challenge, and it has excellent cross-species generalization, achieving comparable or better performance on scRNA-seq datasets from five mouse and two human cell lines.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"318-332"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139717494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MetaV: A Pioneer in feature Augmented Meta-Learning Based Vision Transformer for Medical Image Classification. MetaV:用于医学图像分类的基于特征增强元学习的视觉变换器的先驱。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 Epub Date: 2024-06-29 DOI: 10.1007/s12539-024-00630-1
Shaharyar Alam Ansari, Arun Prakash Agrawal, Mohd Anas Wajid, Mohammad Saif Wajid, Aasim Zafar

Image classification, a fundamental task in computer vision, faces challenges concerning limited data handling, interpretability, improved feature representation, efficiency across diverse image types, and processing noisy data. Conventional architectural approaches have made insufficient progress in addressing these challenges, necessitating architectures capable of fine-grained classification, enhanced accuracy, and superior generalization. Among these, the vision transformer emerges as a noteworthy computer vision architecture. However, its reliance on substantial data for training poses a drawback due to its complexity and high data requirements. To surmount these challenges, this paper proposes an innovative approach, MetaV, integrating meta-learning into a vision transformer for medical image classification. N-way K-shot learning is employed to train the model, drawing inspiration from human learning mechanisms utilizing past knowledge. Additionally, deformational convolution and patch merging techniques are incorporated into the vision transformer model to mitigate complexity and overfitting while enhancing feature representation. Augmentation methods such as perturbation and Grid Mask are introduced to address the scarcity and noise in medical images, particularly for rare diseases. The proposed model is evaluated using diverse datasets including Break His, ISIC 2019, SIPaKMed, and STARE. The achieved performance accuracies of 89.89%, 87.33%, 94.55%, and 80.22% for Break His, ISIC 2019, SIPaKMed, and STARE, respectively, present evidence validating the superior performance of the proposed model in comparison to conventional models, setting a new benchmark for meta-vision image classification models.

图像分类是计算机视觉领域的一项基本任务,它面临的挑战包括有限的数据处理、可解释性、改进的特征表示、不同图像类型的效率以及噪声数据的处理。传统的架构方法在应对这些挑战方面没有取得足够的进展,因此需要能够进行细粒度分类、提高准确性和卓越通用性的架构。其中,视觉转换器是一种值得关注的计算机视觉架构。然而,由于其复杂性和对数据的高要求,它对大量训练数据的依赖构成了一个缺点。为了克服这些挑战,本文提出了一种创新方法--MetaV,将元学习集成到用于医学图像分类的视觉转换器中。从人类利用过去知识的学习机制中汲取灵感,采用 N 路 K-shot 学习来训练模型。此外,变形卷积和补丁合并技术被纳入视觉转换器模型,以减轻复杂性和过拟合,同时增强特征表示。此外,还引入了扰动和网格掩码等增强方法,以解决医学图像中的稀缺性和噪声问题,尤其是针对罕见疾病。我们使用不同的数据集对所提出的模型进行了评估,包括 Break His、ISIC 2019、SIPaKMed 和 STARE。Break His、ISIC 2019、SIPaKMed 和 STARE 的准确率分别为 89.89%、87.33%、94.55% 和 80.22%,证明了所提出的模型与传统模型相比具有更优越的性能,为元视觉图像分类模型树立了新的标杆。
{"title":"MetaV: A Pioneer in feature Augmented Meta-Learning Based Vision Transformer for Medical Image Classification.","authors":"Shaharyar Alam Ansari, Arun Prakash Agrawal, Mohd Anas Wajid, Mohammad Saif Wajid, Aasim Zafar","doi":"10.1007/s12539-024-00630-1","DOIUrl":"10.1007/s12539-024-00630-1","url":null,"abstract":"<p><p>Image classification, a fundamental task in computer vision, faces challenges concerning limited data handling, interpretability, improved feature representation, efficiency across diverse image types, and processing noisy data. Conventional architectural approaches have made insufficient progress in addressing these challenges, necessitating architectures capable of fine-grained classification, enhanced accuracy, and superior generalization. Among these, the vision transformer emerges as a noteworthy computer vision architecture. However, its reliance on substantial data for training poses a drawback due to its complexity and high data requirements. To surmount these challenges, this paper proposes an innovative approach, MetaV, integrating meta-learning into a vision transformer for medical image classification. N-way K-shot learning is employed to train the model, drawing inspiration from human learning mechanisms utilizing past knowledge. Additionally, deformational convolution and patch merging techniques are incorporated into the vision transformer model to mitigate complexity and overfitting while enhancing feature representation. Augmentation methods such as perturbation and Grid Mask are introduced to address the scarcity and noise in medical images, particularly for rare diseases. The proposed model is evaluated using diverse datasets including Break His, ISIC 2019, SIPaKMed, and STARE. The achieved performance accuracies of 89.89%, 87.33%, 94.55%, and 80.22% for Break His, ISIC 2019, SIPaKMed, and STARE, respectively, present evidence validating the superior performance of the proposed model in comparison to conventional models, setting a new benchmark for meta-vision image classification models.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"469-488"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141476538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scEM: A New Ensemble Framework for Predicting Cell Type Composition Based on scRNA-Seq Data. scEM:基于 scRNA-Seq 数据预测细胞类型组成的新组合框架。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 Epub Date: 2024-02-18 DOI: 10.1007/s12539-023-00601-y
Xianxian Cai, Wei Zhang, Xiaoying Zheng, Yaxin Xu, Yuanyuan Li

With the advent of single-cell RNA sequencing (scRNA-seq) technology, many scRNA-seq data have become available, providing an unprecedented opportunity to explore cellular composition and heterogeneity. Recently, many computational algorithms for predicting cell type composition have been developed, and these methods are typically evaluated on different datasets and performance metrics using diverse techniques. Consequently, the lack of comprehensive and standardized comparative analysis makes it difficult to gain a clear understanding of the strengths and weaknesses of these methods. To address this gap, we reviewed 20 cutting-edge unsupervised cell type identification methods and evaluated these methods comprehensively using 24 real scRNA-seq datasets of varying scales. In addition, we proposed a new ensemble cell-type identification method, named scEM, which learns the consensus similarity matrix by applying the entropy weight method to the four representative methods are selected. The Louvain algorithm is adopted to obtain the final classification of individual cells based on the consensus matrix. Extensive evaluation and comparison with 11 other similarity-based methods under real scRNA-seq datasets demonstrate that the newly developed ensemble algorithm scEM is effective in predicting cellular type composition.

随着单细胞 RNA 测序(scRNA-seq)技术的出现,许多 scRNA-seq 数据已经可用,为探索细胞组成和异质性提供了前所未有的机会。最近,许多预测细胞类型组成的计算算法被开发出来,这些方法通常使用不同的技术在不同的数据集和性能指标上进行评估。因此,由于缺乏全面和标准化的比较分析,很难清楚地了解这些方法的优缺点。为了弥补这一不足,我们回顾了 20 种前沿的无监督细胞类型鉴定方法,并使用 24 个不同规模的真实 scRNA-seq 数据集对这些方法进行了全面评估。此外,我们还提出了一种新的集合细胞类型鉴定方法(名为 scEM),该方法通过对选出的四种代表性方法应用熵权法来学习共识相似性矩阵。根据共识矩阵,采用卢万算法获得单个细胞的最终分类。在真实的 scRNA-seq 数据集下与其他 11 种基于相似性的方法进行的广泛评估和比较表明,新开发的集合算法 scEM 能有效预测细胞类型组成。
{"title":"scEM: A New Ensemble Framework for Predicting Cell Type Composition Based on scRNA-Seq Data.","authors":"Xianxian Cai, Wei Zhang, Xiaoying Zheng, Yaxin Xu, Yuanyuan Li","doi":"10.1007/s12539-023-00601-y","DOIUrl":"10.1007/s12539-023-00601-y","url":null,"abstract":"<p><p>With the advent of single-cell RNA sequencing (scRNA-seq) technology, many scRNA-seq data have become available, providing an unprecedented opportunity to explore cellular composition and heterogeneity. Recently, many computational algorithms for predicting cell type composition have been developed, and these methods are typically evaluated on different datasets and performance metrics using diverse techniques. Consequently, the lack of comprehensive and standardized comparative analysis makes it difficult to gain a clear understanding of the strengths and weaknesses of these methods. To address this gap, we reviewed 20 cutting-edge unsupervised cell type identification methods and evaluated these methods comprehensively using 24 real scRNA-seq datasets of varying scales. In addition, we proposed a new ensemble cell-type identification method, named scEM, which learns the consensus similarity matrix by applying the entropy weight method to the four representative methods are selected. The Louvain algorithm is adopted to obtain the final classification of individual cells based on the consensus matrix. Extensive evaluation and comparison with 11 other similarity-based methods under real scRNA-seq datasets demonstrate that the newly developed ensemble algorithm scEM is effective in predicting cellular type composition.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"304-317"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139898012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LPI-SKMSC: Predicting LncRNA-Protein Interactions with Segmented k-mer Frequencies and Multi-space Clustering. LPI-SKMSC:利用分割 k-mer 频率和多空间聚类预测 LncRNA 与蛋白质的相互作用。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 Epub Date: 2024-01-11 DOI: 10.1007/s12539-023-00598-4
Dian-Zheng Sun, Zhan-Li Sun, Mengya Liu, Shuang-Hao Yong

 Long noncoding RNAs (lncRNAs) have significant regulatory roles in gene expression. Interactions with proteins are one of the ways lncRNAs play their roles. Since experiments to determine lncRNA-protein interactions (LPIs) are expensive and time-consuming, many computational methods for predicting LPIs have been proposed as alternatives. In the LPIs prediction problem, there commonly exists the imbalance in the distribution of positive and negative samples. However, there are few existing methods that give specific consideration to this problem. In this paper, we proposed a new clustering-based LPIs prediction method using segmented k-mer frequencies and multi-space clustering (LPI-SKMSC). It was dedicated to handling the imbalance of positive and negative samples. We constructed segmented k-mer frequencies to obtain global and local features of lncRNA and protein sequences. Then, the multi-space clustering was applied to LPI-SKMSC. The convolutional neural network (CNN)-based encoders were used to map different features of a sample to different spaces. It used multiple spaces to jointly constrain the classification of samples. Finally, the distances between the output features of the encoder and the cluster center in each space were calculated. The sum of distances in all spaces was compared with the cluster radius to predict the LPIs. We performed cross-validation on 3 public datasets and LPI-SKMSC showed the best performance compared to other existing methods. Experimental results showed that LPI-SKMSC could predict LPIs more effectively when faced with imbalanced positive and negative samples. In addition, we illustrated that our model was better at uncovering potential lncRNA-protein interaction pairs.

长非编码 RNA(lncRNA)在基因表达中具有重要的调控作用。与蛋白质相互作用是 lncRNA 发挥作用的方式之一。由于确定 lncRNA 与蛋白质相互作用(LPIs)的实验既昂贵又耗时,人们提出了许多预测 LPIs 的计算方法作为替代。在 LPIs 预测问题中,通常存在阳性样本和阴性样本分布不平衡的问题。然而,现有的方法很少专门考虑这一问题。在本文中,我们提出了一种新的基于聚类的 LPIs 预测方法(LPI-SKMSC),该方法使用分段 k-mer 频率和多空间聚类。该方法致力于处理正负样本的不平衡问题。我们构建了分段k-mer频率,以获得lncRNA和蛋白质序列的全局和局部特征。然后,将多空间聚类应用于 LPI-SKMSC。基于卷积神经网络(CNN)的编码器被用来将样本的不同特征映射到不同的空间。它使用多个空间来共同约束样本的分类。最后,计算编码器输出特性与每个空间的聚类中心之间的距离。将所有空间的距离总和与聚类半径进行比较,以预测 LPI。我们在 3 个公共数据集上进行了交叉验证,与其他现有方法相比,LPI-SKMSC 的性能最佳。实验结果表明,面对不平衡的正负样本,LPI-SKMSC 可以更有效地预测 LPI。此外,我们还证明了我们的模型能更好地发现潜在的 lncRNA 蛋白相互作用对。
{"title":"LPI-SKMSC: Predicting LncRNA-Protein Interactions with Segmented k-mer Frequencies and Multi-space Clustering.","authors":"Dian-Zheng Sun, Zhan-Li Sun, Mengya Liu, Shuang-Hao Yong","doi":"10.1007/s12539-023-00598-4","DOIUrl":"10.1007/s12539-023-00598-4","url":null,"abstract":"<p><p> Long noncoding RNAs (lncRNAs) have significant regulatory roles in gene expression. Interactions with proteins are one of the ways lncRNAs play their roles. Since experiments to determine lncRNA-protein interactions (LPIs) are expensive and time-consuming, many computational methods for predicting LPIs have been proposed as alternatives. In the LPIs prediction problem, there commonly exists the imbalance in the distribution of positive and negative samples. However, there are few existing methods that give specific consideration to this problem. In this paper, we proposed a new clustering-based LPIs prediction method using segmented k-mer frequencies and multi-space clustering (LPI-SKMSC). It was dedicated to handling the imbalance of positive and negative samples. We constructed segmented k-mer frequencies to obtain global and local features of lncRNA and protein sequences. Then, the multi-space clustering was applied to LPI-SKMSC. The convolutional neural network (CNN)-based encoders were used to map different features of a sample to different spaces. It used multiple spaces to jointly constrain the classification of samples. Finally, the distances between the output features of the encoder and the cluster center in each space were calculated. The sum of distances in all spaces was compared with the cluster radius to predict the LPIs. We performed cross-validation on 3 public datasets and LPI-SKMSC showed the best performance compared to other existing methods. Experimental results showed that LPI-SKMSC could predict LPIs more effectively when faced with imbalanced positive and negative samples. In addition, we illustrated that our model was better at uncovering potential lncRNA-protein interaction pairs.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"378-391"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139416997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting Microbe-Disease Associations Based on a Linear Neighborhood Label Propagation Method with Multi-order Similarity Fusion Learning. 基于多阶相似性融合学习的线性邻域标签传播方法预测微生物与疾病的关联性
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 Epub Date: 2024-03-04 DOI: 10.1007/s12539-024-00607-0
Ruibin Chen, Guobo Xie, Zhiyi Lin, Guosheng Gu, Yi Yu, Junrui Yu, Zhenguo Liu

Computational approaches employed for predicting potential microbe-disease associations often rely on similarity information between microbes and diseases. Therefore, it is important to obtain reliable similarity information by integrating multiple types of similarity information. However, existing similarity fusion methods do not consider multi-order fusion of similarity networks. To address this problem, a novel method of linear neighborhood label propagation with multi-order similarity fusion learning (MOSFL-LNP) is proposed to predict potential microbe-disease associations. Multi-order fusion learning comprises two parts: low-order global learning and high-order feature learning. Low-order global learning is used to obtain common latent features from multiple similarity sources. High-order feature learning relies on the interactions between neighboring nodes to identify high-order similarities and learn deeper interactive network structures. Coefficients are assigned to different high-order feature learning modules to balance the similarities learned from different orders and enhance the robustness of the fusion network. Overall, by combining low-order global learning with high-order feature learning, multi-order fusion learning can capture both the shared and unique features of different similarity networks, leading to more accurate predictions of microbe-disease associations. In comparison to six other advanced methods, MOSFL-LNP exhibits superior prediction performance in the leave-one-out cross-validation and 5-fold validation frameworks. In the case study, the predicted 10 microbes associated with asthma and type 1 diabetes have an accuracy rate of up to 90% and 100%, respectively.

用于预测潜在微生物-疾病关联的计算方法通常依赖于微生物与疾病之间的相似性信息。因此,通过整合多种类型的相似性信息来获取可靠的相似性信息非常重要。然而,现有的相似性融合方法并未考虑相似性网络的多阶融合。为了解决这个问题,我们提出了一种线性邻域标签传播与多阶相似性融合学习(MOSFL-LNP)的新方法来预测潜在的微生物-疾病关联。多阶融合学习包括两个部分:低阶全局学习和高阶特征学习。低阶全局学习用于从多个相似性来源中获取共同的潜在特征。高阶特征学习依靠相邻节点之间的交互作用来识别高阶相似性,并学习更深层次的交互网络结构。系数被分配给不同的高阶特征学习模块,以平衡从不同阶学习到的相似性,增强融合网络的鲁棒性。总之,通过将低阶全局学习与高阶特征学习相结合,多阶融合学习可以捕捉不同相似性网络的共享特征和独特特征,从而更准确地预测微生物与疾病的关联。与其他六种先进方法相比,MOSFL-LNP 在留一交叉验证和五倍验证框架中表现出更优越的预测性能。在案例研究中,预测与哮喘和 1 型糖尿病相关的 10 种微生物的准确率分别高达 90% 和 100%。
{"title":"Predicting Microbe-Disease Associations Based on a Linear Neighborhood Label Propagation Method with Multi-order Similarity Fusion Learning.","authors":"Ruibin Chen, Guobo Xie, Zhiyi Lin, Guosheng Gu, Yi Yu, Junrui Yu, Zhenguo Liu","doi":"10.1007/s12539-024-00607-0","DOIUrl":"10.1007/s12539-024-00607-0","url":null,"abstract":"<p><p>Computational approaches employed for predicting potential microbe-disease associations often rely on similarity information between microbes and diseases. Therefore, it is important to obtain reliable similarity information by integrating multiple types of similarity information. However, existing similarity fusion methods do not consider multi-order fusion of similarity networks. To address this problem, a novel method of linear neighborhood label propagation with multi-order similarity fusion learning (MOSFL-LNP) is proposed to predict potential microbe-disease associations. Multi-order fusion learning comprises two parts: low-order global learning and high-order feature learning. Low-order global learning is used to obtain common latent features from multiple similarity sources. High-order feature learning relies on the interactions between neighboring nodes to identify high-order similarities and learn deeper interactive network structures. Coefficients are assigned to different high-order feature learning modules to balance the similarities learned from different orders and enhance the robustness of the fusion network. Overall, by combining low-order global learning with high-order feature learning, multi-order fusion learning can capture both the shared and unique features of different similarity networks, leading to more accurate predictions of microbe-disease associations. In comparison to six other advanced methods, MOSFL-LNP exhibits superior prediction performance in the leave-one-out cross-validation and 5-fold validation frameworks. In the case study, the predicted 10 microbes associated with asthma and type 1 diabetes have an accuracy rate of up to 90% and 100%, respectively.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"345-360"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140021617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ResDeepSurv: A Survival Model for Deep Neural Networks Based on Residual Blocks and Self-attention Mechanism. ResDeepSurv:基于残块和自我关注机制的深度神经网络生存模型
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 Epub Date: 2024-03-15 DOI: 10.1007/s12539-024-00617-y
Yuchen Wang, Xianchun Kong, Xiao Bi, Lizhen Cui, Hong Yu, Hao Wu

Survival analysis, as a widely used method for analyzing and predicting the timing of event occurrence, plays a crucial role in the medicine field. Medical professionals utilize survival models to gain insight into the effects of patient covariates on the disease, and the correlation with the effectiveness of different treatment strategies. This knowledge is essential for the development of treatment plans and the enhancement of treatment approaches. Conventional survival models, such as the Cox proportional hazards model, require a significant amount of feature engineering or prior knowledge to facilitate personalized modeling. To address these limitations, we propose a novel residual-based self-attention deep neural network for survival modeling, called ResDeepSurv, which combines the benefits of neural networks and the Cox proportional hazards regression model. The model proposed in our study simulates the distribution of survival time and the correlation between covariates and outcomes, but does not impose strict assumptions on the basic distribution of survival data. This approach effectively accounts for both linear and nonlinear risk functions in survival data analysis. The performance of our model in analyzing survival data with various risk functions is on par with or even superior to that of other existing survival analysis methods. Furthermore, we validate the superior performance of our model in comparison to currently existing methods by evaluating multiple publicly available clinical datasets. Through this study, we prove the effectiveness of our proposed model in survival analysis, providing a promising alternative to traditional approaches. The application of deep learning techniques and the ability to capture complex relationships between covariates and survival outcomes without relying on extensive feature engineering make our model a valuable tool for personalized medicine and decision-making in clinical practice.

生存分析作为一种广泛应用于分析和预测事件发生时间的方法,在医学领域发挥着至关重要的作用。医学专家利用生存模型来深入了解患者协变量对疾病的影响,以及与不同治疗策略效果的相关性。这些知识对于制定治疗计划和改进治疗方法至关重要。传统的生存模型,如 Cox 比例危险模型,需要大量的特征工程或先验知识才能促进个性化建模。为了解决这些局限性,我们提出了一种用于生存建模的新型基于残差的自我关注深度神经网络,称为 ResDeepSurv,它结合了神经网络和 Cox 比例危险回归模型的优点。我们研究中提出的模型模拟了生存时间的分布以及协变量与结果之间的相关性,但并没有对生存数据的基本分布施加严格的假设。这种方法在生存数据分析中有效地考虑了线性和非线性风险函数。我们的模型在分析具有各种风险函数的生存数据时,其性能与现有的其他生存分析方法相当,甚至更胜一筹。此外,我们还通过评估多个公开的临床数据集,验证了我们的模型与现有方法相比的卓越性能。通过这项研究,我们证明了我们提出的模型在生存分析中的有效性,为传统方法提供了一种有前途的替代方案。深度学习技术的应用和捕捉协变量与生存结果之间复杂关系的能力,无需依赖大量的特征工程,使我们的模型成为临床实践中个性化医疗和决策的重要工具。
{"title":"ResDeepSurv: A Survival Model for Deep Neural Networks Based on Residual Blocks and Self-attention Mechanism.","authors":"Yuchen Wang, Xianchun Kong, Xiao Bi, Lizhen Cui, Hong Yu, Hao Wu","doi":"10.1007/s12539-024-00617-y","DOIUrl":"10.1007/s12539-024-00617-y","url":null,"abstract":"<p><p>Survival analysis, as a widely used method for analyzing and predicting the timing of event occurrence, plays a crucial role in the medicine field. Medical professionals utilize survival models to gain insight into the effects of patient covariates on the disease, and the correlation with the effectiveness of different treatment strategies. This knowledge is essential for the development of treatment plans and the enhancement of treatment approaches. Conventional survival models, such as the Cox proportional hazards model, require a significant amount of feature engineering or prior knowledge to facilitate personalized modeling. To address these limitations, we propose a novel residual-based self-attention deep neural network for survival modeling, called ResDeepSurv, which combines the benefits of neural networks and the Cox proportional hazards regression model. The model proposed in our study simulates the distribution of survival time and the correlation between covariates and outcomes, but does not impose strict assumptions on the basic distribution of survival data. This approach effectively accounts for both linear and nonlinear risk functions in survival data analysis. The performance of our model in analyzing survival data with various risk functions is on par with or even superior to that of other existing survival analysis methods. Furthermore, we validate the superior performance of our model in comparison to currently existing methods by evaluating multiple publicly available clinical datasets. Through this study, we prove the effectiveness of our proposed model in survival analysis, providing a promising alternative to traditional approaches. The application of deep learning techniques and the ability to capture complex relationships between covariates and survival outcomes without relying on extensive feature engineering make our model a valuable tool for personalized medicine and decision-making in clinical practice.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"405-417"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140136680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Combined Manual Annotation and Deep-Learning Natural Language Processing Study on Accurate Entity Extraction in Hereditary Disease Related Biomedical Literature. 在遗传性疾病相关生物医学文献中准确提取实体的人工标注与深度学习自然语言处理相结合的研究。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 Epub Date: 2024-02-10 DOI: 10.1007/s12539-024-00605-2
Dao-Ling Huang, Quanlei Zeng, Yun Xiong, Shuixia Liu, Chaoqun Pang, Menglei Xia, Ting Fang, Yanli Ma, Cuicui Qiang, Yi Zhang, Yu Zhang, Hong Li, Yuying Yuan

We report a combined manual annotation and deep-learning natural language processing study to make accurate entity extraction in hereditary disease related biomedical literature. A total of 400 full articles were manually annotated based on published guidelines by experienced genetic interpreters at Beijing Genomics Institute (BGI). The performance of our manual annotations was assessed by comparing our re-annotated results with those publicly available. The overall Jaccard index was calculated to be 0.866 for the four entity types-gene, variant, disease and species. Both a BERT-based large name entity recognition (NER) model and a DistilBERT-based simplified NER model were trained, validated and tested, respectively. Due to the limited manually annotated corpus, Such NER models were fine-tuned with two phases. The F1-scores of BERT-based NER for gene, variant, disease and species are 97.28%, 93.52%, 92.54% and 95.76%, respectively, while those of DistilBERT-based NER are 95.14%, 86.26%, 91.37% and 89.92%, respectively. Most importantly, the entity type of variant has been extracted by a large language model for the first time and a comparable F1-score with the state-of-the-art variant extraction model tmVar has been achieved.

我们报告了一项人工标注与深度学习自然语言处理相结合的研究,以准确提取遗传病相关生物医学文献中的实体。北京基因组研究所(BGI)经验丰富的基因解读人员根据已发布的指南对总共 400 篇完整文章进行了人工标注。通过将我们重新标注的结果与公开发表的结果进行比较,评估了我们人工标注的性能。经计算,基因、变异体、疾病和物种这四种实体类型的总体 Jaccard 指数为 0.866。对基于 BERT 的大型名称实体识别(NER)模型和基于 DistilBERT 的简化 NER 模型分别进行了训练、验证和测试。由于人工标注的语料有限,这些 NER 模型分两个阶段进行了微调。基于 BERT 的基因、变体、疾病和物种 NER 的 F1 分数分别为 97.28%、93.52%、92.54% 和 95.76%,而基于 DistilBERT 的 NER 的 F1 分数分别为 95.14%、86.26%、91.37% 和 89.92%。最重要的是,变体的实体类型首次由大型语言模型提取,并取得了与最先进的变体提取模型 tmVar 相当的 F1 分数。
{"title":"A Combined Manual Annotation and Deep-Learning Natural Language Processing Study on Accurate Entity Extraction in Hereditary Disease Related Biomedical Literature.","authors":"Dao-Ling Huang, Quanlei Zeng, Yun Xiong, Shuixia Liu, Chaoqun Pang, Menglei Xia, Ting Fang, Yanli Ma, Cuicui Qiang, Yi Zhang, Yu Zhang, Hong Li, Yuying Yuan","doi":"10.1007/s12539-024-00605-2","DOIUrl":"10.1007/s12539-024-00605-2","url":null,"abstract":"<p><p>We report a combined manual annotation and deep-learning natural language processing study to make accurate entity extraction in hereditary disease related biomedical literature. A total of 400 full articles were manually annotated based on published guidelines by experienced genetic interpreters at Beijing Genomics Institute (BGI). The performance of our manual annotations was assessed by comparing our re-annotated results with those publicly available. The overall Jaccard index was calculated to be 0.866 for the four entity types-gene, variant, disease and species. Both a BERT-based large name entity recognition (NER) model and a DistilBERT-based simplified NER model were trained, validated and tested, respectively. Due to the limited manually annotated corpus, Such NER models were fine-tuned with two phases. The F1-scores of BERT-based NER for gene, variant, disease and species are 97.28%, 93.52%, 92.54% and 95.76%, respectively, while those of DistilBERT-based NER are 95.14%, 86.26%, 91.37% and 89.92%, respectively. Most importantly, the entity type of variant has been extracted by a large language model for the first time and a comparable F1-score with the state-of-the-art variant extraction model tmVar has been achieved.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"333-344"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11289304/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139716027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: Machine Learning Accelerates De Novo Design of Antimicrobial Peptides. 更正:机器学习加速了抗菌肽的新设计。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 DOI: 10.1007/s12539-024-00631-0
Kedong Yin, Wen Xu, Shiming Ren, Qingpeng Xu, Shaojie Zhang, Ruiling Zhang, Mengwan Jiang, Yuhong Zhang, Degang Xu, Ruifang Li
{"title":"Correction to: Machine Learning Accelerates De Novo Design of Antimicrobial Peptides.","authors":"Kedong Yin, Wen Xu, Shiming Ren, Qingpeng Xu, Shaojie Zhang, Ruiling Zhang, Mengwan Jiang, Yuhong Zhang, Degang Xu, Ruifang Li","doi":"10.1007/s12539-024-00631-0","DOIUrl":"10.1007/s12539-024-00631-0","url":null,"abstract":"","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"404"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141330853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Review and Comparative Analysis of Methods and Advancements in Predicting Protein Complex Structure. 蛋白质复合物结构预测方法和进展的回顾与比较分析。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 Epub Date: 2024-07-02 DOI: 10.1007/s12539-024-00626-x
Nan Zhao, Tong Wu, Wenda Wang, Lunchuan Zhang, Xinqi Gong

Protein complexes perform diverse biological functions, and obtaining their three-dimensional structure is critical to understanding and grasping their functions. In many cases, it's not just two proteins interacting to form a dimer; instead, multiple proteins interact to form a multimer. Experimentally resolving protein complex structures can be quite challenging. Recently, there have been efforts and methods that build upon prior predictions of dimer structures to attempt to predict multimer structures. However, in comparison to monomeric protein structure prediction, the accuracy of protein complex structure prediction remains relatively low. This paper provides an overview of recent advancements in efficient computational models for predicting protein complex structures. We introduce protein-protein docking methods in detail and summarize their main ideas, applicable modes, and related information. To enhance prediction accuracy, other critical protein-related information is also integrated, such as predicting interchain residue contact, utilizing experimental data like cryo-EM experiments, and considering protein interactions and non-interactions. In addition, we comprehensively review computational approaches for end-to-end prediction of protein complex structures based on artificial intelligence (AI) technology and describe commonly used datasets and representative evaluation metrics in protein complexes. Finally, we analyze the formidable challenges faced in current protein complex structure prediction tasks, including the structure prediction of heteromeric complex, disordered regions in complex, antibody-antigen complex, and RNA-related complex, as well as the evaluation metrics for complex assessment. We hope that this work will provide comprehensive knowledge of complex structure predictions to contribute to future advanced predictions.

蛋白质复合物具有多种生物功能,获得它们的三维结构对于理解和掌握它们的功能至关重要。在许多情况下,并非只有两种蛋白质相互作用形成二聚体,而是多种蛋白质相互作用形成多聚体。通过实验解析蛋白质复合物结构是一项相当具有挑战性的工作。最近,一些研究人员和方法在先前预测二聚体结构的基础上,尝试预测多聚体结构。然而,与单体蛋白质结构预测相比,蛋白质复合体结构预测的准确性仍然相对较低。本文概述了预测蛋白质复合体结构的高效计算模型的最新进展。我们详细介绍了蛋白质-蛋白质对接方法,并总结了这些方法的主要思想、适用模式和相关信息。为了提高预测精度,我们还整合了其他与蛋白质相关的关键信息,如预测链间残基接触、利用冷冻电镜实验等实验数据以及考虑蛋白质相互作用和非相互作用等。此外,我们还全面回顾了基于人工智能(AI)技术的端到端预测蛋白质复合物结构的计算方法,并介绍了蛋白质复合物中常用的数据集和具有代表性的评价指标。最后,我们分析了当前蛋白质复合物结构预测任务所面临的艰巨挑战,包括异构复合物、复合物中的无序区、抗体-抗原复合物和 RNA 相关复合物的结构预测,以及复合物评估的评价指标。我们希望这项工作能为复杂结构预测提供全面的知识,为未来的高级预测做出贡献。
{"title":"Review and Comparative Analysis of Methods and Advancements in Predicting Protein Complex Structure.","authors":"Nan Zhao, Tong Wu, Wenda Wang, Lunchuan Zhang, Xinqi Gong","doi":"10.1007/s12539-024-00626-x","DOIUrl":"10.1007/s12539-024-00626-x","url":null,"abstract":"<p><p>Protein complexes perform diverse biological functions, and obtaining their three-dimensional structure is critical to understanding and grasping their functions. In many cases, it's not just two proteins interacting to form a dimer; instead, multiple proteins interact to form a multimer. Experimentally resolving protein complex structures can be quite challenging. Recently, there have been efforts and methods that build upon prior predictions of dimer structures to attempt to predict multimer structures. However, in comparison to monomeric protein structure prediction, the accuracy of protein complex structure prediction remains relatively low. This paper provides an overview of recent advancements in efficient computational models for predicting protein complex structures. We introduce protein-protein docking methods in detail and summarize their main ideas, applicable modes, and related information. To enhance prediction accuracy, other critical protein-related information is also integrated, such as predicting interchain residue contact, utilizing experimental data like cryo-EM experiments, and considering protein interactions and non-interactions. In addition, we comprehensively review computational approaches for end-to-end prediction of protein complex structures based on artificial intelligence (AI) technology and describe commonly used datasets and representative evaluation metrics in protein complexes. Finally, we analyze the formidable challenges faced in current protein complex structure prediction tasks, including the structure prediction of heteromeric complex, disordered regions in complex, antibody-antigen complex, and RNA-related complex, as well as the evaluation metrics for complex assessment. We hope that this work will provide comprehensive knowledge of complex structure predictions to contribute to future advanced predictions.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"261-288"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141491789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Interdisciplinary Sciences: Computational Life Sciences
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1